In Notes 0, we introduced the notion of a measure space , which includes as a special case the notion of a probability space. By selecting one such probability space as a sample space, one obtains a model for random events and random variables, with random events being modeled by measurable sets in , and random variables taking values in a measurable space being modeled by measurable functions . We then defined some basic operations on these random events and variables:

- Given events , we defined the conjunction , the disjunction , and the complement . For countable families of events, we similarly defined and . We also defined the empty event and the sure event , and what it meant for two events to be equal.
- Given random variables in ranges respectively, and a measurable function , we defined the random variable in range . (As the special case of this, every deterministic element of was also a random variable taking values in .) Given a relation , we similarly defined the event . Conversely, given an event , we defined the indicator random variable . Finally, we defined what it meant for two random variables to be equal.
- Given an event , we defined its probability .

These operations obey various axioms; for instance, the boolean operations on events obey the axioms of a Boolean algebra, and the probabilility function obeys the Kolmogorov axioms. However, we will not focus on the axiomatic approach to probability theory here, instead basing the foundations of probability theory on the sample space models as discussed in Notes 0. (But see this previous post for a treatment of one such axiomatic approach.)

It turns out that almost all of the other operations on random events and variables we need can be constructed in terms of the above basic operations. In particular, this allows one to safely *extend* the sample space in probability theory whenever needed, provided one uses an extension that respects the above basic operations; this is an important operation when one needs to add new sources of randomness to an existing system of events and random variables, or to couple together two separate such systems into a joint system that extends both of the original systems. We gave a simple example of such an extension in the previous notes, but now we give a more formal definition:

Definition 1Suppose that we are using a probability space as the model for a collection of events and random variables. Anextensionof this probability space is a probability space , together with a measurable map (sometimes called thefactor map) which is probability-preserving in the sense thatfor all . (

Caution: this doesnotimply that for all – why not?)An event which is modeled by a measurable subset in the sample space , will be modeled by the measurable set in the extended sample space . Similarly, a random variable taking values in some range that is modeled by a measurable function in , will be modeled instead by the measurable function in . We also allow the extension to model additional events and random variables that were not modeled by the original sample space (indeed, this is one of the main reasons why we perform extensions in probability in the first place).

Thus, for instance, the sample space in Example 3 of the previous post is an extension of the sample space in that example, with the factor map given by the first coordinate projection . One can verify that all of the basic operations on events and random variables listed above are unaffected by the above extension (with one caveat, see remark below). For instance, the conjunction of two events can be defined via the original model by the formula

or via the extension via the formula

The two definitions are consistent with each other, thanks to the obvious set-theoretic identity

Similarly, the assumption (1) is precisely what is needed to ensure that the probability of an event remains unchanged when one replaces a sample space model with an extension. We leave the verification of preservation of the other basic operations described above under extension as exercises to the reader.

Remark 2There is one minor exception to this general rule if we do not impose the additional requirement that the factor map is surjective. Namely, for non-surjective , it can become possible that two events are unequal in the original sample space model, but become equal in the extension (and similarly for random variables), although the converse never happens (events that are equal in the original sample space always remain equal in the extension). For instance, let be the discrete probability space with and , and let be the discrete probability space with , and non-surjective factor map defined by . Then the event modeled by in is distinct from the empty event when viewed in , but becomes equal to that event when viewed in . Thus we see that extending the sample space by a non-surjective factor map can identify previously distinct events together (though of course, being probability preserving, this can only happen if those two events were already almost surely equal anyway). This turns out to be fairly harmless though; while it is nice to know if two given events are equal, or if they differ by a non-null event, it is almost never useful to know that two events are unequal if they are already almost surely equal. Alternatively, one can add the additional requirement of surjectivity in the definition of an extension, which is also a fairly harmless constraint to impose (this is what I chose to do in this previous set of notes).

Roughly speaking, one can define probability theory as the study of those properties of random events and random variables that are model-independent in the sense that they are preserved by extensions. For instance, the cardinality of the model of an event is *not* a concept within the scope of probability theory, as it is not preserved by extensions: continuing Example 3 from Notes 0, the event that a die roll is even is modeled by a set of cardinality in the original sample space model , but by a set of cardinality in the extension. Thus it does not make sense in the context of probability theory to refer to the “cardinality of an event “.

On the other hand, the supremum of a collection of random variables in the extended real line is a valid probabilistic concept. This can be seen by manually verifying that this operation is preserved under extension of the sample space, but one can also see this by defining the supremum in terms of existing basic operations. Indeed, note from Exercise 24 of Notes 0 that a random variable in the extended real line is completely specified by the threshold events for ; in particular, two such random variables are equal if and only if the events and are surely equal for all . From the identity

we thus see that one can completely specify in terms of using only the basic operations provided in the above list (and in particular using the countable conjunction .) Of course, the same considerations hold if one replaces supremum, by infimum, limit superior, limit inferior, or (if it exists) the limit.

In this set of notes, we will define some further important operations on scalar random variables, in particular the *expectation* of these variables. In the sample space models, expectation corresponds to the notion of integration on a measure space. As we will need to use both expectation and integration in this course, we will thus begin by quickly reviewing the basics of integration on a measure space, although we will then translate the key results of this theory into probabilistic language.

As the finer details of the Lebesgue integral construction are not the core focus of this probability course, some of the details of this construction will be left to exercises. See also Chapter 1 of Durrett, or these previous blog notes, for a more detailed treatment.

** — 1. Integration on measure spaces — **

Let be a measure space, and let be a measurable function on , taking values either in the reals , the non-negative extended reals , the extended reals , or the complex numbers . We would like to define the integral

of on . (One could make the integration variable explicit, e.g. by writing , but we will usually not do so here.) When integrating a reasonably nice function (e.g. a continuous function) on a reasonably nice domain (e.g. a box in ), the Riemann integral that one learns about in undergraduate calculus classes suffices for this task; however, for the purposes of probability theory, we need the much more general notion of a Lebesgue integral in order to properly define (2) for the spaces and functions we will need to study.

Not every measurable function can be integrated by the Lebesgue integral. There are two key classes of functions for which the integral exists and is well behaved:

*Unsigned*measurable functions , that take values in the non-negative extended reals ; and*Absolutely integrable*functions or , which are scalar measurable functions whose absolute value has a finite integral: . (Sometimes we also allow absolutely integrable functions to attain an infinite value , so long as they only do so on a set of measure zero.)

One could in principle extend the Lebesgue integral to slightly more general classes of functions, e.g. to sums of absolutely integrable functions and unsigned functions. However, the above two classes already suffice for most applications (and as a general rule of thumb, it is dangerous to apply the Lebesgue integral to functions that are not unsigned or absolutely integrable, unless you really know what you are doing).

We will construct the Lebesgue integral in the following four stages. First, we will define the Lebesgue integral just for unsigned simple functions – unsigned measurable functions that take on only finitely many values. Then, by a limiting procedure, we extend the Lebesgue integral to unsigned functions. After that, by decomposing a real absolutely integrable function into unsigned components, we extend the integral to real absolutely integrable functions. Finally, by taking real and imaginary parts, we extend to complex absolutely integrable functions. (This is not the only order in which one could perform this construction; for instance, in Durrett, one first constructs integration of bounded functions on finite measure support before passing to arbitrary unsigned functions.)

First consider an unsigned simple function , thus is measurable and only takes values at a finite number of values. Then we can express as a finite linear combination (in ) of indicator functions. Indeed, if we enumerate the values that takes as (avoiding repetitions) and setting for , then it is clear that

(It should be noted at this point that the operations of addition and multiplication on are defined by setting for all , and for all *positive* , but that is defined to equal . To put it another way, multiplication is defined to be continuous from below, rather than from above: . One can verify that the commutative, associative, and distributive laws continue to hold on , but we caution that the cancellation laws do *not* hold when is involved.)

Conversely, given any coefficients (not necessarily distinct) and measurable sets in (not necessarily disjoint), the sum is an unsigned simple function.

A single simple function can be decomposed in multiple ways as a linear combination of unsigned simple functions. For instance, on the real line , the function can also be written as or as . However, there is an invariant of all these decompositions:

Exercise 3Suppose that an unsigned simple function has two representations as the linear combination of indicator functions:where are nonnegative integers, lie in , and are measurable sets. Show that

(

Hint:first handle the special case where the are all disjoint and non-empty, and each of the is expressible as the union of some subcollection of the . Then handle the general case by considering the atoms of the finite boolean algebra generated by and .)

We capture this invariant by introducing the *simple integral* of an unsigned simple function by the formula

whenever admits a decomposition . The above exercise is then precisely the assertion that the simple integral is well-defined as an element of .

Exercise 4Let be unsigned simple functions, and let .

- (i) (Linearity) Show that
and

- (ii) Show that if and are equal almost everywhere, then
- (iii) Show that , with equality if and only if is zero almost everywhere.
- (iv) (Monotonicity) If almost everywhere, show that .
- (v) (Markov inequality) Show that for any .

Now we extend from unsigned simple functions to more general unsigned functions. If is an unsigned measurable function, we define the *unsigned integral* as

where the supremum is over all unsigned simple functions such that for all .

Many of the properties of the simple integral carry over to the unsigned integral easily:

Exercise 5Let be unsigned functions, and let .

- (i) (Superadditivity) Show that
and

- (ii) Show that if and are equal almost everywhere, then
- (iii) Show that , with equality if and only if is zero almost everywhere.
- (iv) (Monotonicity) If almost everywhere, show that .
- (v) (Markov inequality) Show that for any . In particular, if , then is finite almost everywhere.
- (vi) (Compatibility with simple integral) If is simple, show that .
- (vii) (Compatibility with measure) For any measurable set , show that .

Exercise 6If is a discrete probability space (with the associated probability measure ), and is a function, show that(Note that the condition in the definition of a discrete probability space is not required to prove this identity.)

The observant reader will notice that the linearity property of simple functions has been weakened to superadditivity. This can be traced back to a breakdown of symmetry in the definition (3); the unsigned simple integral of is defined via approximation from below, but not from above. Indeed the opposite claim

can fail. For a counterexample, take to be the discrete probability space with probabilities , and let be the function . By Exercise 6 we have . On the other hand, any simple function with must equal on a set of positive measure (why?) and so the right-hand side of (4) can be infinite. However, one can get around this difficulty under some further assumptions on , and thus recover full linearity for the unsigned integral:

Exercise 7 (Linearity of the unsigned integral)Let be a measure space.

- (i) Let be an unsigned measurable function which is both bounded (i.e., there is a finite such that for all ) and has finite measure support (i.e., there is a measurable set with such that for all ). Show that (4) holds for this function .
- (ii) Establish the additivity property
whenever are unsigned measurable functions that are bounded with finite measure support.

- (iii) Show that
as whenever is unsigned measurable.

- (iv) Using (iii), extend (ii) to the case where are unsigned measurable functions with finite measure support, but are not necessarily bounded.
- (v) Show that
as whenever is unsigned measurable.

- (vi) Using (iii) and (v), show that (ii) holds for any unsigned measurable (which are not necessarily bounded or of finite measure support).

Next, we apply the integral to absolutely integrable functions. We call a scalar function or *absolutely integrable* if it is measurable and the unsigned integral is finite. A real-valued absolutely integrable function can be expressed as the difference of two *unsigned* absolutely integrable functions ; indeed, one can check that the choice and work for this. Conversely, any difference of unsigned absolutely integrable functions is absolutely integrable (this follows from the triangle inequality ). A single absolutely integrable function may be written as a difference of unsigned absolutely integrable functions in more than one way, for instance we might have

for unsigned absolutely integrable functions . But when this happens, we can rearrange to obtain

and thus by linearity of the unsigned integral

By the absolute integrability of , all the integrals are finite, so we may rearrange this identity as

This allows us to define the Lebesgue integral of a real-valued absolutely integrable function to be the expression

for any given decomposition of as the difference of two unsigned absolutely integrable functions. Note that if is both unsigned and absolutely integrable, then the unsigned integral and the Lebesgue integral of agree (as can be seen by using the decomposition ), and so there is no ambiguity in using the same notation to denote both integrals. (By the same token, we may now drop the modifier from the simple integral of a simple unsigned , which we may now also denote by .)

The Lebesgue integral also enjoys good linearity properties:

Exercise 8Let be real-valued absolutely integrable functions, and let .

- (i) (Linearity) Show that and are also real-valued absolutely integrable functions, with
and

(For the second relation, one may wish to first treat the special cases and .)

- (ii) Show that if and are equal almost everywhere, then
- (iii) Show that , with equality if and only if is zero almost everywhere.
- (iv) (Monotonicity) If almost everywhere, show that .
- (v) (Markov inequality) Show that for any .

Because of part (iii) of the above exercise, we can extend the Lebesgue integral to real-valued absolutely integrable functions that are only defined and real-valued almost everywhere, rather than everywhere. In particular, we can apply the Lebesgue integral to functions that are sometimes infinite, so long as they are only infinite on a set of measure zero, and the function is absolutely integrable everywhere else.

Finally, we extend to complex-valued functions. If is absolutely integrable, observe that the real and imaginary parts are also absolutely integrable (because ). We then define the (complex) Lebesgue integral of in terms of the real Lebesgue integral by the formula

Clearly, if is real-valued and absolutely integrable, then the real Lebesgue integral and the complex Lebesgue integral of coincide, so it does not create ambiguity to use the same symbol for both concepts. It is routine to extend the linearity properties of the real Lebesgue integral to its complex counterpart:

Exercise 9Let be complex-valued absolutely integrable functions, and let .

- (i) (Linearity) Show that and are also complex-valued absolutely integrable functions, with
and

(For the second relation, one may wish to first treat the special cases and .)

- (ii) Show that if and are equal almost everywhere, then
- (iii) Show that , with equality if and only if is zero almost everywhere.
- (iv) (Markov inequality) Show that for any .

We record a simple, but incredibly fundamental, inequality concerning the Lebesgue integral:

Lemma 10 (Triangle inequality)If is a complex-valued absolutely integrable function, then

*Proof:* We have

This looks weaker than what we want to prove, but we can “amplify” this inequality to the full strength triangle inequality as follows. Replacing by for any real , we have

Since we can choose the phase to make the expression equal to , the claim follows.

Finally, we observe that the Lebesgue integral extends the Riemann integral, which is particularly useful when it comes to actually *computing* some of these integrals:

Exercise 11If is a Riemann integrable function on a compact interval , show that is also absolutely integrable, and that the Lebesgue integral (with Lebesgue measure restricted to ) coincides with the Riemann integral . Similarly if is Riemann integrable on a box .

** — 2. Expectation of random variables — **

We now translate the above notions of integration on measure spaces to the probabilistic setting.

A random variable taking values in the unsigned extended real line is said to be *simple* if it takes on at most finitely many values. Equivalently, can be expressed as a finite unsigned linear combination

of indicator random variables, where are unsigned and are events. We then define the *simple expectation* of to be the quantity

and checks that this definition is independent of the choice of decomposition of into indicator functions. Observe that if we model the random variable using a probability space , then the simple expectation of is precisely the simple integral of the corresponding unsigned simple function .

Next, given an arbitrary unsigned random variable taking values in , modeled by a probability space one defines its (unsigned) *expectation* as

where ranges over all simple unsigned random variables modeled by the same space such that is surely true. There is a subtle issue that this definition might in principle depend on the choice of (because this affects the pool of available simple unsigned random variables ), but this is not the case:

Exercise 12Let be a random variable taking values in , and let be a simple unsigned random variable such that is surely true. Show that there exists a function taking on finitely many values such that is surely true. Conclude in particular that the above definition of expectation does not depend on the choice of model .

The expectation extends the simple expectation (thus for all simple unsigned ), and in terms of a probability space model , the expectation is precisely the unsigned integral of . The expectation of a random variable is also often referred to as the *mean*, particularly in applications connected to statistics. In some literature is also called the *expected value* of , but this is a somewhat misleading term as often one expects to deviate above or below .

A scalar random variable is said to be *absolutely integrable* if , thus for instance any bounded random variable is absolutely integrable. If is real-valued and absolutely integrable, we define its expectation by the formula

where is any representation of as the difference of unsigned absolutely integrable random variables ; one can check that this definition does not depend on the choice of representation and is thus well-defined. For complex-valued absolutely integrable , we then define

In all of these cases, the expectation of is equal to the integral of the representation of in any probability space model; in the case that is given by a discrete probability model, one can check that this definition of expectation agrees with the one given in Notes 0. Using the former fact, we can translate the properties of integration already established to the probabilistic setting:

Proposition 13

- (i) (Unsigned linearity) If are unsigned random variables, and is a deterministic unsigned quantity, then and . (Note that these identities hold even when are not absolutely integrable.)
- (ii) (Complex linearity) If are absolutely integrable random variables, and is a deterministic complex quantity, then and are also absolutely integrable, with and .
- (iii) (Compatibility with probability) If is an event, then . In particular, .
- (iv) (Almost sure equivalence) If are unsigned (resp. absolutely integrable) and almost surely, then .
- (v) If is unsigned or absolutely integrable, then , with equality if and only if almost surely.
- (vi) (Monotonicity) If are unsigned or real-valued absolutely integrable, and almost surely, then .
- (vii) (Markov inequality) If is unsigned or absolutely integrable, then for any deterministic .
- (viii) (Triangle inequality) If is absolutely integrable, then .

As before, we can use part (iv) to define expectation of scalar random variables that are only defined and finite almost surely, rather than surely.

From Exercise 12, the notion of expectation is automatically probabilistic in the same sense. Because of this, we will be easily able to manipulate expectations of random variables without having to explicitly mention an underlying probability space , and so one will now see such spaces fade from view starting from this point in the course.

** — 3. Exchanging limits with integrals or expectations — **

When performing analysis on measure spaces, it is important to know if one can interchange a limit with an integral:

Similarly, in probability theory, we often wish to interchange a limit with an expectation:

Of course, one needs the integrands or random variables to be either unsigned or absolutely integrable, and the limits to be well-defined to have any hope of doing this. Naively, one could hope that limits and integrals could always be exchanged when the expressions involved are well-defined, but this is unfortunately not the case. In the case of integration on, say, the real line using Lebesgue measure , we already see four key examples:

- (Moving bump example) Take . Then , but .
- (Spreading bump example) Take . Then , but .
- (Concentrating bump example) Take . Then , but .
- (Receding infinity example) Take . Then , but .

In all these examples, the limit of the integral exceeds the integral of the limit; by replacing with in the first three examples (which involve absolutely integrable functions) one can also build examples where the limit of the integral is less than the integral of the limit. Most of these examples rely on the infinite measure of the real line and thus do not directly have probabilistic analogues, but the concentrating bump example involves functions that are all supported on the unit interval and thus also poses a problem in the probabilistic setting.

Nevertheless, there are three important cases in which we can relate the limit (or, in the case of Fatou’s lemma, the limit inferior) of the integral to the integral of the limit (or limit inferior). Informally, they are:

- (Fatou’s lemma) For
*unsigned*, the integral of the limit inferior cannot exceed the limit inferior of the integral. “Limits (or more precisely, limits inferior) can destroy (unsigned) mass, but cannot create it.” - (Monotone convergence theorem) For unsigned
*monotone increasing*, the limit of the integral equals the integral of the limit. - (Dominated convergence theorem) For that are uniformly
*dominated*by an absolutely integrable function, the limit of the integral equals the integral of the limit.

These three results then have analogues for convergence of random variables. We will also mention a fourth useful tool in that setting, which allows one to exchange limits and expectations when one controls a higher moment. There are a few more such general results allowing limits to be exchanged with integrals or expectations, but my advice would be to work out such exchanges by hand rather than blindly cite (possibly incorrectly) an additional convergence theorem beyond the four mentioned above, as this is safer and will help strengthen one’s intuition on the situation.

We now state and prove these results more explicitly.

Lemma 14 (Fatou’s lemma)Let be a measure space, and let be a sequence of unsigned measurable functions. Then

An equivalent form of this lemma is that if one has

for some and all sufficiently large , then one has

as well. That is to say, if the original unsigned functions eventually have “mass” less than or equal to , then the limit (inferior) also has “mass” less than or equal to . The limit may have substantially less mass, as the four examples above show, but it can never have *more* mass (asymptotically) than the functions that comprise the limit. Of course, one can replace limit inferior by limit in the left or right hand side if one knows that the relevant limit actually exists (but one cannot replace limit inferior by limit superior if one does not already have convergence, see Example 16 below). On the other hand, it is essential that the are unsigned for Fatou’s lemma to work, as can be seen by negating one of the first three key examples mentioned above.

*Proof:* By definition of the unsigned integral, it suffices to show that

whenever is an unsigned simple function with . At present, is allowed to take the infinite , but it suffices to establish this claim for that only take finite values, since the claim then follows for possibly infinite-valued by applying the claim with replaced by and then letting go to infinity.

Multiplying by , it thus suffices to show that

for any and any unsigned as above.

We can write as the sum for some strictly positive finite and disjoint ; we allow the and the measures to be infinite. On each , we have . Thus, if we define

then the increase to as : . By continuity from below (Exercise 23 of Notes 0), we thus have

as . Since

we conclude upon integration that

and thus on taking limit inferior

But the right-hand side is , and the claim follows.

Of course, Fatou’s lemma may be phrased probabilistically:

Lemma 15 (Fatou’s lemma for random variables)Let be a sequence of unsigned random variables. Then

As a corollary, if are unsigned and converge almost surely to a random variable , then

Example 16We now give an example to show that limit inferior cannot be replaced with limit superior in Fatou’s lemma. Let be drawn uniformly at random from , and for each , let be the binary digit of , thus when has odd integer part, and otherwise. (There is some ambiguity with the binary expansion when is a terminating binary decimal, but this event almost surely does not occur and can thus be safely ignored.) One has for all (why?). It is then easy to see that is almost surely (which is consistent with Fatou’s lemma) but is almost surely (so Fatou’s lemma fails if one replaces limit inferior with limit superior).

Next, we establish the monotone convergence theorem.

Theorem 17 (Monotone convergence theorem)Let be a measure space, and let be a sequence of unsigned measurable functions which is monotone increasing, thus for all and . Then

Note that the limits exist on both sides because monotone sequences always have limits. Indeed the limit in either side is equal to the supremum. The receding infinity example shows that it is important that the functions here are monotone increasing rather than monotone decreasing. We also observe that it is enough for the to be increasing almost everywhere rather than everywhere, since one can then modify the on a set of measure zero to be increasing everywhere, which does not affect the integrals on either side of this theorem.

*Proof:* From Fatou’s lemma we already have

On the other hand, from monotonicity we see that

for any natural number , and on taking limits as we obtain the claim.

Note that continuity from below for measures (Exercise 23.3 of Notes 0 can be viewed as the special case of the monotone convergence theorem when the functions are all indicator functions.)

An important corollary of the monotone convergence theorem is that one can freely interchange infinite sums with integrals for unsigned functions, that is to say

for any unsigned (not necessarily monotone). Indeed, to see this one simply applies the monotone convergence theorem to the partial sums .

We of course can translate this into the probabilistic context:

Theorem 18 (Monotone convergence theorem for random variables)Let be a monotone non-decreasing sequence of unsigned random variables. ThenSimilarly, for any unsigned random variables , we have

Again, it is sufficient for the to be non-decreasing almost surely. We note a basic but important corollary of this theorem, namely the (first) Borel-Cantelli lemma:

Lemma 19 (Borel-Cantelli lemma)Let be a sequence of events with . Then almost surely, at most finitely many of the events hold; that is to say, one has almost surely.

*Proof:* From the monotone convergence theorem, we have

By Markov’s inequality, this implies that is almost surely finite, as required.

As the above proof shows, the Borel-Cantelli lemma is almost a triviality if one has the machinery of expectation (or integration); but it is remarkably hard to prove the lemma without that machinery, and it is an instructive exercise to attempt to do so.

We will develop a partial converse to the above lemma (the “second” Borel-Cantelli lemma) in a subsequent set of notes. For now, we give a crude converse in which we assume not only that the sum to infinity, but they are in fact uniformly bounded from below:

Exercise 20Let be a sequence of events with . Show that with positive probability, an infinite number of the hold; that is to say, . (Hint:if for all , establish the lower bound for all . Alternatively, one can apply Fatou’s lemma to the random variables .)

Exercise 21Let be a sequence such that . Show that there exist a sequence of events modeled by some probability space , such that for all , and such that almost surely infinitely many of the occur. Thus we see that the hypothesis in the Borel-Cantelli lemma cannot be relaxed.

Finally, we give the dominated convergence theorem.

Theorem 22 (Dominated convergence theorem)Let be a measure space, and let be measurable functions which converge pointwise to some limit. Suppose that there is an unsigned absolutely integrable function whichdominatesthe in the sense that for all and all . ThenIn particular, the limit on the right-hand side exists.

Again, it will suffice for to dominate each almost everywhere rather than everywhere, as one can upgrade this to everywhere domination by modifying each on a set of measure zero. Similarly, pointwise convergence can be replaced with pointwise convergence almost everywhere. The domination of each by a single function implies that the integrals are uniformly bounded in , but this latter condition is not sufficient by itself to guarantee interchangeability of the limit and integral, as can be seen by the first three examples given at the start of this section.

*Proof:* By splitting into real and imaginary parts, we may assume without loss of generality that the are real-valued. As is absolutely integrable, it is finite almost everywhere; after modification on a set of measure zero we may assume it is finite everywhere. Let denote the pointwise limit of the . From Fatou’s lemma applied to the unsigned functions and , we have

and

Rearranging this (taking crucial advantage of the finite nature of the , and hence and ), we conclude that

and the claim follows.

Remark 23Amusingly, one can use the dominated convergence theorem to give an (extremely indirect) proof of the divergence of the harmonic series . For, if that series was convergent, then the function would be absolutely integrable, and the spreading bump example described above would contradict the dominated convergence theorem. (Expert challenge: see if you can deconstruct the above argument enough to lower bound the rate of divergence of the harmonic series .)

We again translate the above theorem to the probabilistic context:

Theorem 24 (Dominated convergence theorem for random variables)Let be scalar random variables which converge almost surely to a limit . Suppose there is an unsigned absolutely integrable random variable such that almost surely for each . Then

As a corollary of the dominated convergence theorem for random variables we have the *bounded convergence theorem*: if are scalar random variables that converge almost surely to a limit , and are almost surely bounded in magnitude by a uniform constant , then we have

(In Durrett, the bounded convergence theorem is proven first, and then used to establish Fatou’s theorem and the dominated and monotone convergence theorems. The order in which one establishes these results – which are all closely related to each other – is largely a matter of personal taste.) A closely related corollary (which can also be established directly is that if are scalar absolutely integrable random variables that converge *uniformly* to (thus, for each there is such that is surely true for all ), then converges to .

A further corollary of the dominated convergence theorem is that one has the identity

whenever are scalar random variables with absolutely integrable (or equivalently, that is finite).

Another useful variant of the dominated convergence theorem is

Theorem 25 (Convergence for random variables with bounded moment)Let be scalar random variables which converge almost surely to a limit . Suppose there is and such that for all . Then

This theorem fails for , as the concentrating bump example shows. The case (that is to say, bounded second moment ) is already quite useful. The intuition here is that concentrating bumps are in some sense the *only* obstruction to interchanging limits and expectations, and these can be eliminated by hypotheses such as a bounded higher moment hypothesis or a domination hypothesis.

*Proof:* By taking real and imaginary parts we may assume that the (and hence ) are real-valued. For any natural number , let denote the truncation of to the interval , and similarly define . Then converges pointwise to , and hence by the bounded convergence theorem

On the other hand, we have

(why?) and thus on taking expectations and using the triangle inequality

where we are using the asymptotic notation to denote a quantity bounded in magnitude by for an absolute constant . Also, from Fatou’s lemma we have

so we similarly have

Putting all this together, we see that

Sending , we obtain the claim.

Remark 26The essential point about the condition was that the function grew faster than linearly as . One could accomplish the same result with any other function with this property, e.g. a hypothesis such as would also suffice. The most natural general condition to impose here is that of uniform integrability, which encompasses the hypotheses already mentioned, but we will not focus on this condition here.

Exercise 27 (Scheffé’s lemma)Let be a sequence of absolutely integrable scalar random variables that converge almost surely to another absolutely integrable scalar random variable . Suppose also that converges to as . Show that converges to zero as . (Hint:there are several ways to prove this result, known as Scheffe’s lemma. One is to split into two components , such that is dominated by but converges almost surely to , and is such that . Then apply the dominated convergence theorem.)

** — 4. The distribution of a random variable — **

We have seen that the expectation of a random variable is a special case of the more general notion of Lebesgue integration on a measure space. There is however another way to think of expectation as a special case of integration, which is particularly convenient for *computing* expectations. We first need the following definition.

Definition 28Let be a random variable taking values in a measurable space . The distribution of (also known as thelawof ) is the probability measure on defined by the formulafor all measurable sets ; one easily sees from the Kolmogorov axioms that this is indeed a probability measure.

In the language of measure theory, the distribution on is the push-forward of the probability measure on the sample space by the model of on that sample space.

Example 29If only takes on at most countably many values (and if every point in is measurable), then the distribution is the discrete measure that assigns each point in the range of a measure of .

Example 30If is a real random variable with cumulative distribution function , then is the Lebesgue-Stieltjes measure associated to . For instance, if is drawn uniformly at random from , then is Lebesgue measure restricted to . In particular, two scalar variables are equal in distribution if and only if they have the same cumulative distribution function.

Example 31If and are the results of two separate rolls of a fair die (as in Example 3 of Notes 0), then and are equal in distribution, but are not equal as random variables.

Remark 32In the converse direction, given a probability measure on a measurable space , one can always build a probability space model and a random variable represented by that model whose distribution is . Indeed, one can perform the “tautological” construction of defining the probability space model to be , and to be the identity function , and then one easily checks that . Compare with Corollaries 26 and 29 of Notes 0. Furthermore, one can view this tautological model as a “base” model for random variables of distribution as follows. Suppose one has a random variable of distribution which is modeled by some other probability space , thus is a measurable function such thatfor all . Then one can view the probability space as an extension of the tautological probability space using as the factor map.

We say that two random variables are *equal in distribution*, and write , if they have the same law: , that is to say for any measurable set in the range. This definition makes sense even when are defined on different sample spaces. Roughly speaking, the distribution captures the “size” and “shape” of the random variable, but not its “location” or how it relates to other random variables. We also say that is a *copy* of if they are equal in distribution. For instance, the two dice rolls in Example 3 of Notes 0 are copies of each other.

Theorem 33 (Change of variables formula)Let be a random variable taking values in a measurable space . Let or be a measurable scalar function (giving or the Borel -algebra of course) such that either , or that . Then

Thus for instance, if is a real random variable, then

and more generally

for all ; furthermore, if is unsigned or absolutely integrable, one has

The point here is that the integration is not over some unspecified sample space , but over a very explicit domain, namely the reals; we have “changed variables” to integrate over instead over , with the distribution representing the “Jacobian” factor that typically shows up in such change of variables formulae.

If is a scalar variable that only takes on at most countably many values , the change of variables formula tells us that

if is unsigned or absolutely integrable.

*Proof:* First suppose that is unsigned and only takes on a finite number of values. Then

and hence

as required.

Next, suppose that is unsigned but can take on infinitely many values. We can express as the monotone increasing limit of functions that only take a finite number of values; for instance we can define to be rounded down to the largest multiple of less than both and . By the preceding computation, we have

and on taking limits as using the monotone convergence theorem we obtain the claim in this case.

Now suppose that is real-valued with . We write where and , then we have and

for . Subtracting these two identities together, we obtain the claim.

Finally, the case of complex-valued with follows from the real-valued case by taking real and imaginary parts.

Example 34Let be the uniform distribution on , thenfor any Riemann integrable ; thus for instance

for any .

Remark 35An alternate way to prove the change of variables formula is to observe that the formula is obviously true when one uses the tautological model for , and then the claim follows from the model-independence of expectation and the observation from Remark 32 that any other model for is an extension of the tautological model.

Exercise 36Let be a measurable function with . If one defines for any Borel subset of by the formulashow that is a probability measure on with Stieltjes measure function . If is a real random variable with probability distribution (in which case we call a random variable with an absolutely continuous distribution, and the probability density function (PDF) of ), show that

when either is an unsigned measurable function, or is measurable with absolutely integrable (or equivalently, that .

Exercise 37Let be a real random variable with the probability density function of the standard normal distribution. Establish theStein identitywhenever is a continuously differentiable function with and both of polynomial growth (i.e., there exist constants such that for all ). There is a robust converse to this identity which underpins the basis of Stein’s method, discussed in this previous blog post. Use this identity recursively to establish the identities

when is an odd natural number and

when is an even natural number. (This quantity is also known as the double factorial of .)

Exercise 38Let be a real random variable with cumulative distribution function . Show thatfor all . If is nonnegative, show that

for all .

** — 5. Some basic inequalities — **

The change of variables formula allows us, in principle at least, to compute the expectation of a scalar random variable as an integral. In very simple situations, for instance when has one of the standard distributions (e.g. uniform, gaussian, binomial, etc.), this allows us to compute such expectations exactly. However, once one gets to more complicated situations, one usually does not expect to be able to evaluate the required integrals in closed form. In such situations, it is often more useful to have some general *inequalities* concerning expectation, rather than *identities*.

We therefore record here for future reference some basic inequalities concerning expectation that we will need in the sequel. We have already seen the triangle inequality

for absolutely integrable , and the Markov inequality

for arbitrary scalar and (note the inequality is trivial if is not absolutely integrable). Applying the triangle inequality to the difference of two absolutely integrable random variables , we obtain the variant

Thus, for instance, if is a sequence of absolutely integrable scalar random variables which converges in to another absolutely integrable random variable , in the sense that as , then as .

Similarly, applying the Markov inequality to the quantity we obtain the important Chebyshev inequality

for absolutely integrable and , where the Variance of is defined as

Next, we record

Lemma 39 (Jensen’s inequality)If is a convex function, is a real random variable with and both absolutely integrable, then

*Proof:* Let be a real number. Being convex, the graph of must be supported by some line at , that is to say there exists a slope (depending on ) such that for all . (If is differentiable at , one can take to be the derivative of at , but one always has a supporting line even in the non-differentiable case.) In particular

Taking expectations and using linearity of expectation, we conclude

and the claim follows from setting .

Exercise 40 (Complex Jensen inequality)Let be a convex function (thus for all complex and all , and let be a complex random variable with and both absolutely integrable. Show that

Note that the triangle inequality is the special case of Jensen’s inequality (or the complex Jensen’s inequality, if is complex-valued) corresponding to the convex function on (or on ). Another useful example is

Applying Jensen’s inequality to the convex function and the random variable for some , we obtain the arithmetic mean-geometric mean inequality

assuming that and are absolutely integrable.

As a related application of convexity, observe from the convexity of the function that

for any and . This implies in particular Young’s inequality

for any scalar and any exponents with ; note that this inequality is also trivially true if one or both of are infinite. Taking expectations, we conclude that

if are scalar random variabels and are deterministic exponents with . In particular, if are absolutely integrable, then so is , and

We can amplify this inequality as follows. Multiplying by some and dividing by the same , we conclude that

optimising the right-hand side in , we obtain (after some algebra, and after disposing of some edge cases when or is almost surely zero) the important Hölder inequality

where we use the notation

for . Using the convention

(thus is the essential supremum of ), we also see from the triangle inequality that the Hölder inequality applies in the boundary case when one of is allowed to be (so that the other is equal to ):

The case is the important Cauchy-Schwarz inequality

valid whenever are *square-integrable* in the sense that are finite.

Exercise 41Show that the expressions are non-decreasing in for . In particular, if is finite for some , then it is automatically finite for all smaller values of .

Exercise 42For any square-integrable , show that

Exercise 43If and are scalar random variables with , use Hölder’s inequality to establish thatand

and then conclude the Minkowski inequality

Show that this inequality is also valid at the endpoint cases and .

Exercise 44If is non-negative and square-integrable, and , establish the Paley-Zygmund inequality(

Hint:use the Cauchy-Schwarz inequality to upper bound in terms of and .)

Exercise 45Let be a non-negative random variable that is almost surely bounded but not identically zero, show that

## 54 comments

Comments feed for this article

3 October, 2015 at 7:47 pm

pauljungGreat notes so far. You have a stray * appearing as f*(omega) in the first bullet of Exercise 7. In the same exercise, second bullet, I think you want equality instead of greater or equal. In Proposition 12 (i), do you not need that X and Y are integrable?

[Corrected, thanks – T.]3 October, 2015 at 8:02 pm

pauljungBy the way, I like your statement about Probability Theory being the study of random events which are model-independent. M. Loeve’s Vol. I (pg 173) has a similar take on this view.

3 October, 2015 at 9:46 pm

AnonymousGreat notes! One small correction: in the proof of theorem 16, you should have LHS >= RHS in the second displayed equation.

[Corrected, thanks – T.]3 October, 2015 at 11:48 pm

PerryZhaoReblogged this on 木秀于林.

3 October, 2015 at 11:56 pm

pauljungIgnore my comment about Proposition 12, I didn’t realize unsigned there meant nonnegative.

4 October, 2015 at 4:12 am

AnonymousProfessor – it looks like your informal heuristic for Fatou is backwards.

[Looks like it is in the right direction to me – could you elaborate? -T.]4 October, 2015 at 9:32 am

AnonymousSure – in your informal heuristic bullet list you state:

“For unsigned {f_n}, the limit inferior of the integral cannot exceed the integral of the limit inferior”

but the inequality in Fatou actually goes the other way.

[Got it now. Corrected, thanks -T.]4 October, 2015 at 4:37 am

David GonzalesExercise 9(iv) talks about monotonicity of complex functions $f$ and $g$, which isn’t defined so should probably be changed.

[Oops, that should have been deleted, thanks – T.]4 October, 2015 at 6:55 am

JenniferThanks for providing these great notes for self-study! Do you by any chance also share pdf versions of them?

[One should be able to print to PDF (with headers and sidebar removed) from the “Print” feature on your browser. -T]5 October, 2015 at 6:44 pm

obryantThe very first bullet has a typo: wrong notation for the sure event. Thanks for the notes, and also for the pre-work that surely went into creating them.

[Actually, I am using (the complement of the empty event) to denote the sure event. -T.]7 October, 2015 at 6:33 am

Not A Music ExpertWhat kind of music do you listen to?

7 October, 2015 at 9:52 am

John MangualI am going to go out on a limb here and say a lot of these complications arise from the non-compactness of ? Markov’s inequality is essentially the pigeonhole principle. Can the same be said of Chebyshev inequality? Or even Hölder inequality?

12 October, 2015 at 11:41 am

Ryan McNeiveThanks (as always) for these great notes!

I wanted to ask about a typo. In the proof of Fatou’s lemma, on the RHS of the last three equations, surely the sum should be from 1 to n rather than 1 to N? Unless I am very confused.

[One can use the symbol in place of here (thus replacing , by respectively) if desired. I chose not to do so as this makes the definition of a bit confusing unless one also changes the symbol appearing there to some other symbol. -T.][Added, Oct 14: oh, I see the problem now, I had used n for two unrelated things. Fixed now, thanks – T.]12 October, 2015 at 12:34 pm

275A, Notes 2: Product measures and independence | What's new[…] the previous set of notes, we constructed the measure-theoretic notion of the Lebesgue integral, and used this to set up the […]

14 October, 2015 at 11:30 am

SamAbove exercise 3, isn’t there a mistake in {1 \times 1_{[0,2)} + 1 \times 1_{[1,3)}? Perhaps it should be {1 \times 1_{[0,1)} + 1 \times 1_{[0,3)}?

[Corrected, thanks – T.]14 October, 2015 at 12:07 pm

SamI think there is still a typo in the second indicator function: instead of .

[Corrected, thanks -T.]17 October, 2015 at 1:22 pm

AnonymousShould the second centered equation in Theorem 23, be ? The 2 comes from considering the events such that and for these ,

17 October, 2015 at 7:13 pm

Terence TaoThe factor of 2 is unnecessary; one has the pointwise bound when (checking the cases and separately) and otherwise.

18 October, 2015 at 4:51 am

L.In the proof of the Fatou’s lemma, why do we need the “modification”? There is at least one point that I am not sure: since the are allowed to be , we should have on (instead of strict inequality in the note). But we know that on , . Now if we define , then we have , which is equal to . The are still increasing to .

18 October, 2015 at 10:51 am

Terence TaoAh, I did not treat the case when some of the were infinite; I’ve fixed the proof to address this.

One needs strict inequality in the bound to ensure that for sufficiently large n. The bound does not ensure that for even a single choice of (for instance, the could be increasing and converge to in the limit).

19 October, 2015 at 10:58 am

L.Oh, I see the point. Thanks very much for your comments.

2 November, 2015 at 7:06 pm

275A, Notes 4: The central limit theorem | What's new[…] this and the Paley-Zygmund inequality (Exercise 39 of Notes 1) we also get some lower bound for of the […]

20 November, 2015 at 12:41 pm

AnonymousIs Chebyshev inequality (8) optimal in the sense that if the first two moments of a random variable are given with a threshold , then (8) can be made arbitrarily close to equality by an appropriate selection of a probability distribution for (depending of course on the given expectation and variance of and the given threshold ) ?

20 November, 2015 at 1:50 pm

Terence TaoYes, this can be seen by experimenting with Bernoulli type random variables (e.g. ones which attain a threshold with some probability , with probability , and with probability , for various choices of parameters ).

20 November, 2015 at 5:54 pm

AnonymousThank you! Let me add some comments:

1. If has mean and variance , without loss of generality we may assume that and if the standard deviation and the threshold are given, consider the two cases:

(i) : In this case the RHS of (8) is greater than 1, so (8) is trivial

but not(!) optimal.

(ii) : In this case (using your suggestion) choose and denote (which is less than 1) . If attain with probability , with probability , and with probability , then has zero mean and (from this choice of ) variance – as required. Moreover, (8) will be .

Hence (by choosing arbitrarily close to ) Chebyshev inequality (8) an be made arbitrarily close to equality (i.e. (8) is optimal in this case.)

2. Interestingly, this idea can be applied also for the one-sided chebyshev inequality (Chebyshev-Cantelli inequality):

Where and .

Choose and define

.

If attain with probability and

with probability , it follows that

and

– as required.

Moreover, the one-sided Chebyshev inequality will be

Hence (by choosing arbitrarily close to ) the one-sided Chebyshev inequality is also optimal.

21 November, 2015 at 4:51 pm

AnonymousIt is interesting to observe that for case (i) above (for which Chebyshev inequality (8) is non-optimal), the optimal bound on the LHS of (8) is the trivial bound 1. To see that, we use a simplified version of the proof of case (ii) above, in which the random variable attain with probability and with probability . This gives (as required) and as the probability in the LHS of (8).

25 November, 2015 at 6:40 am

AnonymousHi Terry, there’s a small typo at the end of the proof dominated convergence theorem (Theorem 21). The final equation is missing integral signs.

[Corrected, thanks – T.]28 November, 2015 at 3:26 pm

AnonymousJust before Exercise 3, there is “can also be written as {1 \times 1_{[0,1)} + 1 \times 1_{[1,3)}}”, but the last subscript is probably meant to be

[Corrected, thanks -T.]28 November, 2015 at 4:21 pm

Anonymous‘real-valuked’ to ‘real-valued’ (it’s somewhere in the middle)

[Corrected, thanks -T.]28 November, 2015 at 4:24 pm

AnonymousForgotten subscript:

\displaystyle {\bf E} f_i(X) = \int{\bf R} f_i(x)\ d\mu_X(x)

should be:

\displaystyle {\bf E} f_i(X) = \int_{\bf R} f_i(x)\ d\mu_X(x)

(just after valuked)

[Corrected, thanks -T.]30 December, 2015 at 9:53 am

AnonymousBelow Remark 31, “… for any measurable set {R} in the range…” should read “… any measurable set {S}..”

[Corrected, thanks – T.]6 January, 2016 at 5:12 am

AnonymousJust a small typo, I guest. In the proof of Theorem 32, when the general case is considered and the functions f_n are considered, there is a missed expectation before letting n to infinite. Should be

.

[Corrected, thanks – T.]6 January, 2016 at 7:48 am

AnonymousIn exercise 37 I get the integrals with instead of just in both cases. Maybe there is a typo here.

[Corrected, thanks – T.]4 August, 2016 at 5:01 pm

Sebastien ZanyThanks for these notes!

Is entropy not a probabilistic concept? If not, then how should we think about it?

6 August, 2016 at 12:50 pm

Terence TaoThere are several different notions of entropy used in mathematics, not all of which are directly tied to probability. But the Shannon entropy of a discrete random variable is a probabilistic concept, since it can be expressed in terms of probabilities as (with the convention that ).

15 November, 2016 at 6:16 am

AnonymousI’m a little confused about the summary at the very beginning of this note. In Notes 0, a measure space was introduced to

modelsome randomness (an “abstract” probability space). And this abstract probability space is a triple which satisfies Kolmogrov’s three axioms. The difference between the concrete model and the abstract probability space that models is that(1) is a measure space where is a (concrete) -algebra

on;(2)while in , it does not have to be a measure space and the “event space” is an abstract -algebra and hence do not have to be a collection of subsets of .

Shouldn’t we have first so that we can talk about we models using a model ? Why do you define the “conjunction”, “disjunction” and the “probability” of an event

aftera model is given? (Why is this not a circular argument?) Shouldn’t all these be defined first?Are you in fact abstracting thing out from a concrete model to get an abstract probability space , like we can get an abstract real space out of a concrete real space , and one can actually define an abstract probability space without given any model in the first place?

15 November, 2016 at 10:10 am

Terence TaoYes, if one wants to, one can define abstract probability spaces first without reference to concrete measure spaces and only then talk about their representations by concrete measure spaces. This is discussed in Section 4 of Notes 0. But from a pedagogical point of view this is undesirable as this delays the point in the course where one actually does probability, rather than foundations of probability. This is one reason why most texts compress the foundations by working entirely with concrete probability spaces and not abstract ones (somewhat similarly to how in a first undergraduate linear algebra class, vectors are often _defined_ to be rows or columns of numbers, rather than as elements of an abstract vector space, in order to get on with the linear algebra rather than the foundations of linear algebra).

For the purposes of actually doing probability, the only relevant features of the concept of an abstract probability space are that (a) it can be modeled by at least one concrete probability space, and (b) the basic probabilistic notions (e.g. boolean operations, probability of an event, whether two events are equal etc.) are independent of the model. The existence of such a concept can be guaranteed by the formal definition of an abstract probability space as an abstract sigma algebra equipped with an abstract probability measure (or by the other alternative definitions of this concept in Section 4 of Notes 0). But one could also proceed by defining concrete probability spaces first, and defining an abstraction of that space to be anything isomorphic to the events of that space together with their probabilities and boolean operations; this would suffice for most practical purposes, and is basically the approach taken in Notes 0 before Section 4.

Incidentally, while the 1933 text of Kolmogorov does use a concrete sigma algebra (or, in the language of that era, a field of sets) to model the event space, it is clear from his writing that he would have used an abstraction of this if was available at the time (e.g. on page 1 of your linked translation he writes “what the elements of this set represent is of no importance”). One reason for this is that the correct axiomatisation of an abstract sigma algebra was not identified and justified until the work of Loomis and Sikorski in 1947, which was many years after the original text of Kolmogorov.

15 November, 2016 at 6:31 am

AnonymousRight after Definition 27, I think you mean “the distribution on is the push-forward of the probability measure .

[Corrected, thanks – T.]17 February, 2017 at 9:57 pm

254A, Notes 2: The central limit theorem | What's new[…] this and the Paley-Zygmund inequality (Exercise 42 of Notes 1) we also get some lower bound for of the […]

29 November, 2017 at 7:30 am

haonanzHello Prof. Tao, may I ask if you could provide a counter example where two unsigned functions that the integral is strictly superadditive? Based on the construction, I will assume the counter example would be either function to be unbounded value or unbounded support, but I just could not come up with an explicit one. Thanks,

6 December, 2017 at 12:00 pm

Terence TaoThere is no such counterexample for measurable functions; see Exercise 7(vi). In the nonmeasurable case, a simple example would be given when is a two-element space with the trivial algebra and the obvious probability measure with and .

22 January, 2019 at 5:18 am

Weibo ShuDear Pro. Tao:

I have some questions about remark 25, you say any function which satisfies that is ok to substitute the condition in theorem 24. But I am a little skeptical, since I can only get by this condition (here we can adjust m to make N arbitrary big). Without , I can’t amplify to . consequently, the inequality can’t become . Hence I can't deduce the uniform integrability.

is there anything wrong with my statement?

22 January, 2019 at 8:28 am

Terence Tao[Regarding LaTeX formatting: wordpress interprets text between the < and > signs as HTML rather than LaTeX. For a workaround, see the discussion in https://terrytao.wordpress.com/about/ . I have repaired the issues in this particular comment.]If , then for any there exists such that $g(|X|)/|X| \geq N$ whenever , which implies that . (Here of course should be understood to take values in the positive reals.)

23 January, 2019 at 6:28 pm

Weibo ShuThanks for your reply. I know is positive when , but I don’t know whether it’s positive on the whole positive real line. However, I want to amplify to for the purpose of proving {} is uniform integrable. That’s because is uniform bounded (that’s the condition of theorem 24) but is not necessarily uniform bounded.

what I want to do is ‘‘ , but without is non-negative on the positive real line, the second can’t be hold.

An example is , though it’s non-negative when , it’s negative when less than 1.

24 January, 2019 at 8:02 pm

Terence Taois bounded if and only if is bounded, for any bounded . So, as long as is bounded from below by , one can make nonnegative without difficulty.

27 January, 2019 at 2:02 am

Weibo ShuSo, I think we at least need the condition g is bounded from below by a constant C. Since there is a function which is not bounded from below but still satisfies that is bounded.

24 January, 2019 at 7:14 am

Weibo ShuHi, Prof.Tao, I have some confusion about exercise 37.

(a) may the condition be ‘‘ rather than ‘‘ ?

(b) in question (1), I can rewrite the right part as ‘ since is the radon-nikodym derivative of . But I can’t further rewrite it as : . That’s because F_X(x) is not necessarily absolutely continuous. Hence the first equality can’t be hold. The same condition happens in question (2). So my question is if we need an extra condition that F_X(x) is absolutely continuous? if not, how to prove the final equality?

24 January, 2019 at 8:06 pm

Terence TaoThe typo is now corrected.

Integration by parts for the Riemann-Stieltjes integral does not require absolute continuity; bounded variation is sufficient.

3 April, 2019 at 1:54 am

AnonymousDear Prof. Tao, I have a confusion on last paragraph of section 2.

I think invariance of expectation isn’t trivial, since concrete probability space is defined first and then abstract space is. Let (Omega, F, P) be the original probability space and (Omega’, F’, P’) a extension of it.

Then pull back sigma field on (Omega’) can be coarser than F’, so there can be simple random variables modeled by (Omega’) which cannot be modeled by (Omega).

Thus extension of original probability space makes abstract probability space bigger, so where to take supremum to calculate expectation is unclear to me.

6 April, 2019 at 3:55 pm

Terence TaoOoh, that is a subtle point I had not noticed. The invariance is indeed slightly non-trivial (though not terribly difficult once one knows what to do to resolve it) and I have added an exercise to address it.

5 June, 2021 at 8:01 am

AnonymousHello Professor, Exercise 41 seems to be inconsistent with the definition of Lp norms for vectors. In particular, ||v||_{1} \geq ||v||_{\infty} while here ||X||_p is non-decreasing for increasing p. Should we think of these as two different types of Lp norms?

7 June, 2021 at 2:01 pm

Terence Taonorms are non-decreasing in when the measure has total mass at most one (which is the case in this probabilistic setting), but are non-increasing in when all non-empty sets have measure at least one (which is the case of norms for vectors). For more general measure spaces one does not have monotonicity, but still has log convexity in from Holder’s inequality.

7 June, 2021 at 9:22 am

AnonymousIn the third to last equation for the proof of Fatou’s lemma, the upper limit of the sum should be k and not N.

[Corrected, thanks – T.]12 June, 2021 at 8:33 am

AnonymousThe second equation of exercise 43 should be ||Y||_p not ||X||_p.

[Corrected, thanks – T.]