In the previous set of notes, we constructed the measure-theoretic notion of the Lebesgue integral, and used this to set up the probabilistic notion of expectation on a rigorous footing. In this set of notes, we will similarly construct the measure-theoretic concept of a product measure (restricting to the case of probability measures to avoid unnecessary techncialities), and use this to set up the probabilistic notion of independence on a rigorous footing. (To quote Durrett: “measure theory ends and probability theory begins with the definition of independence.”) We will be able to take virtually any collection of random variables (or probability distributions) and couple them together to be independent via the product measure construction, though for infinite products there is the slight technicality (a requirement of the Kolmogorov extension theorem) that the random variables need to range in standard Borel spaces. This is not the only way to couple together such random variables, but it is the simplest and the easiest to compute with in practice, as we shall see in the next few sets of notes.
— 1. Product measures —
It is intuitively obvious that Lebesgue measure on
ought to be related to Lebesgue measure
on
by the relationship
for any Borel sets . This is in fact true (see Exercise 4 below), and is part of a more general phenomenon, which we phrase here in the case of probability measures:
Theorem 1 (Product of two probability spaces) Let
and
be probability spaces. Then there is a unique probability measure
on
with the property that
for all
. Furthermore, we have the following two facts:
- (Tonelli theorem) If
is measurable, then for each
, the function
is measurable on
, and the function
is measurable on
. Similarly, for each
, the function
is measurable on
and
is measurable on
. Finally, we have
- (Fubini theorem) If
is absolutely integrable, then for
-almost every
, the function
is absolutely integrable on
, and the function
is absolutely integrable on
. Similarly, for
-almost every
, the function
is absolutely integrable on
and
is absolutely integrable on
. Finally, we have
The Fubini and Tonelli theorems are often used together (so much so that one may refer to them as a single theorem, the Fubini-Tonelli theorem, often also just referred to as Fubini’s theorem in the literature). For instance, given an absolutely integrable function and an absolutely integrable function
, the Tonelli theorem tells us that the tensor product
defined by
for , is absolutely integrable and one has the factorisation
Our proof of Theorem 1 will be based on the monotone class lemma that allows one to conveniently generate a -algebra from a Boolean algebra. (In Durrett, the closely related
theorem is used in place of the monotone class lemma.) Define a monotone class in a set
to be a collection
of subsets of
with the following two closure properties:
- If
are a countable increasing sequence of sets in
, then
.
- If
are a countable decreasing sequence of sets in
, then
.
Thus for instance any -algebra is a monotone class, but not conversely. Nevertheless, there is a key way in which monotone classes “behave like”
-algebras:
Lemma 2 (Monotone class lemma) Let
be a Boolean algebra on
. Then
is the smallest monotone class that contains
.
Proof: Let be the intersection of all the monotone classes that contain
. Since
is clearly one such class,
is a subset of
. Our task is then to show that
contains
.
It is also clear that is a monotone class that contains
. By replacing all the elements of
with their complements, we see that
is necessarily closed under complements.
For any , consider the set
of all sets
such that
,
,
, and
all lie in
. It is clear that
contains
; since
is a monotone class, we see that
is also. By definition of
, we conclude that
for all
.
Next, let be the set of all
such that
,
,
, and
all lie in
for all
. By the previous discussion, we see that
contains
. One also easily verifies that
is a monotone class. By definition of
, we conclude that
. Since
is also closed under complements, this implies that
is closed with respect to finite unions. Since this class also contains
, which contains
, we conclude that
is a Boolean algebra. Since
is also closed under increasing countable unions, we conclude that it is closed under arbitrary countable unions, and is thus a
-algebra. As it contains
, it must also contain
.
We now begin the proof of Theorem 1. We begin with the uniqueness claim. Suppose that we have two measures on
that are product measures of
and
in the sense that
for all and
. If we then set
to be the collection of all
such that
, then
contains all sets of the form
with
and
. In fact
contains the collection
of all sets that are “elementary” in the sense that they are of the form
for finite
and
for
, since such sets can be easily decomposed into a finite union of disjoint products
, at which point the claim follows from (4) and finite additivity. But
is a Boolean algebra that generates
as a
-algebra, and from continuity from above and below we see that
is a monotone class. By the monotone class lemma, we conclude that
is all of
, and hence
. This gives uniqueness. Now we prove existence. We first claim that for any measurable set
, the sets
are measurable in
. Indeed, the claim is obvious for sets
that are “elementary” in the sense that they belong to the Boolean algebra
defined previously, and the collection of all such sets is a monotone class, so the claim follows from the monotone class lemma. A similar argument (relying on monotone or dominated convergence) shows that the function
is measurable in for all
. Thus, for any
, we can define the quantity
by
A routine application of the monotone convergence theorem verifies that is a countably additive measure; one easily checks that (2) holds for all
, and in particular
is a probability measure.
By construction, we see that the identity
holds (with all functions integrated being measurable) whenever is an indicator function
with
. By linearity of integration, the same identity holds (again with all functions measurable) when
is an unsigned simple function. Since any unsigned measurable function
can be expressed as the monotone non-decreasing limit of unsigned simple functions
(for instance, one can round
down to the largest multiple of
that is less than
and
), the above identity also holds for unsigned measurable
by the monotone convergence theorem. Applying this fact to the absolute value
of an absolutely integrable function
, we conclude for such functions that
which by Markov’s inequality implies that
for -almost every
. In other words, the function
is absolutely integrable on
for
-almost every
. By monotonicity we conclude that
and hence the function is absolutely integrable. Hence it makes sense to ask whether the identity
holds for absolutely integrable , as both sides are well-defined. We have already established this claim when
is unsigned and absolutely integrable; by subtraction this implies the claim for real-valued absolutely integrable
, and by taking real and imaginary parts we obtain the claim for complex-valued absolutely integrable
.
We may reverse the roles of and
, and define
instead by the formula
By the previously proved uniqueness of product measure, we see that this defines the same product measure as previously. Repeating the previous arguments we obtain all the above claims with the roles of
and
reversed. This gives all the claims required for Theorem 1.
One can extend the product construction easily to finite products:
Exercise 3 (Finite products) Show that for any finite collection
of probability spaces, there exists a unique probability measure
on
such that
whenever
for
. Furthermore, show that
for any partition
(after making the obvious identification between
and
). Thus for instance one has the associativity property
for any probability spaces
for
.
By writing as products of pairs of probability spaces in many different ways, one can obtain a higher-dimensional analogue of the Fubini and Tonelli theorems; we leave the precise statement of such a theorem to the interested reader.
It is important to be aware that the Fubini theorem identity
for measurable functions that are not unsigned, are usually only justified when
is absolutely integrable on
, or equivalently (by the Tonelli theorem) the function
is absolutely integrable on
(or that
is absolutely integrable on
. Without this joint absolute integrability (and without any unsigned property on
), the identity (5) can fail even if both sides are well-defined. For instance, let
be the unit interval
, and let
be the uniform probability measure on this interval, and set
One can check that both sides of (5) are well-defined, but that the left-hand side is and the right-hand side is
. Of course, this function is neither unsigned nor jointly absolutely integrable, so this counterexample does not violate either of the Fubini or Tonelli theorems. Thus one should take care to only interchange integrals when the integrands are known to be either unsigned or jointly absolutely integrable, or if one has another way to rigorously justify the exchange of integrals.
The above theory extends from probability spaces to finite measure spaces, and more generally to measure spaces that are -finite, that is to say they are expressable as the countable union of sets of finite measure. (With a bit of care, some portions of product measure theory are even extendible to non-sigma-finite settings, though I urge caution in applying these results blindly in that case.) We will not give the details of these generalisations here, but content ourselves with one example:
Exercise 4 Establish (1) for all Borel sets
. (Hint:
can be viewed as the disjoint union of a countable sequence of sets of measure
.)
Remark 5 When doing real analysis (as opposed to probability), it is convenient to complete the Borel
-algebra
on spaces such as
, to form the larger Lebesgue
-algebra
, defined as the collection of all subsets
in
that differ from a Borel set
in
by a sub-null set, in the sense that
for some Borel subset
of
of zero Lebesgue measure. There are analogues of the Fubini and Tonelli theorems for such complete
-algebras; see this previous lecture notes for details. However one should be cautioned that the product
of Lebesgue
-algebras is not the Lebesgue
-algebra
, but is instead an intermediate
-algebra between
and
, which causes some additional small complications. For instance, if
is Lebesgue measurable, then the functions
can only be found to be Lebesgue measurable on
for almost every
, rather than for all
. We will not dwell on these subtleties further here, as we will rarely have any need to complete the
-algebras used in probability theory.
It is also important in probability theory applications to form the product of an infinite number of probability spaces for
, where
can be infinite or even uncountable. Recall from Notes 0 that the product
-algebra
on
is defined to be the
-algebra generated by the sets
for
and
, where
is the usual coordinate projection. Equivalently, if we define an elementary set to be a subset of
of the form
, where
is a finite subset of
,
is the obvious projection map to
, and
is a measurable set in
, then
can be defined as the
-algebra generated by the collection
of elementary sets. (Elementary sets are the measure-theoretic analogue of cylinder sets in point set topology.) For future reference we note the useful fact that
is a Boolean algebra.
We define a product measure to be a probability measure on the measurable space
which extends all of the finite products in the sense that
for all finite subsets of
and all
in
, where
. If this product measure exists, it is unique:
Exercise 6 Show that for any collection of probability spaces
for
, there is at most one product measure
. (Hint: adapt the uniqueness argument in Theorem 1 that used the monotone class lemma.)
Exercise 7 Let
be probability measures on
, and let
be their Stieltjes measure functions. Show that
is the unique probability measure on
whose Stietljes transform is the tensor product
of
.
In the case of finite , the finite product constructed in Exercise 3 is clearly the unique product. But for infinite
, the construction of product measure is a more nontrivial issue. We can generalise the problem as follows:
Problem 8 (Extension problem) Let
be a collection of measurable spaces. For each finite
, let
be a probability measure on
obeying the compatibility condition
for all finite
and
, where
is the obvious restriction. Can one then define a probability measure
on
such that
for all finite
and
?
Note that the compatibility condition (6) is clearly necessary if one is to find a measure obeying (7).
Again, one has uniqueness:
Exercise 9 Show that for any
and
for finite
as in the above extension problem, there is at most one probability measure
with the stated properties.
The extension problem is trivial for finite , but for infinite
there are unfortunately examples where the probability measure
fails to exist. However, there is one key case in which we can build the extension, thanks to the Kolmogorov extension theorem. Call a measurable space
standard Borel if it is isomorphic as a measurable space to a Borel subset of the unit interval
with Borel measure, that is to say there is a bijection
from
to a Borel subset
of
such that
and
are both measurable. (In Durrett, such spaces are called nice spaces.) Note that one can easily replace
by other standard spaces such as
if desired, since these spaces are isomorphic as measurable spaces (why?).
Theorem 10 (Kolmogorov extension theorem) Let the situation be as in Problem 8. If all the measurable spaces
are standard Borel, then there exists probability measure
solving the extension problem (which is then unique, thanks to Exercise 9).
The proof of this theorem is lengthy and is deferred to the next (optional) section. Specialising to the product case, we conclude
Corollary 11 Let
be a collection of probability spaces with
standard Borel. Then there exists a product measure
(which is then unique, thanks to Exercise 6).
Of course, to use this theorem we would like to have a large supply of standard Borel spaces. Here is one tool that often suffices:
Lemma 12 Let
be a complete separable metric space, and let
be a Borel subset of
. Then
(with the Borel
-algebra) is standard Borel.
Proof: Let us call two topological spaces Borel isomorphic if their corresponding Borel structures are isomorphic as measurable spaces. Using the binary expansion, we see that is Borel isomorphic to
(the countable number of points that have two binary expansions can be easily permuted to obtain a genuine isomorphism). Similarly
is Borel isomorphic to
. Since
is in bijection with
, we conclude that
is Borel isomorphic to
. Thus it will suffice to to show that every complete separable metric space
is Borel isomorphic to a Borel subset of
. But if we let
be a countable dense subset in
, the map
can easily be seen to be a Borel isomorphism between and a Borel subset of
(note the image is the closure of the points
in the uniform norm). The claim follows.
Exercise 13 (Kolmogorov extension theorem, alternate form) For each natural number
, let
be a probability measure on
with the property that
for
and any box
in
, where we identify
with
in the usual manner. Show that there exists a unique probability measure
on
(with the product
-algebra, or equivalently the Borel
-algebra on the product topology) such that
for all
and Borel sets
.
— 2. Proof of the Kolmogorov extension theorem (optional) —
We now prove Theorem 10. By the definition of a standard Borel space, we may assume without loss of generality that each is a Borel subset of
with the Borel
-algebra, and then by extending each
to
we may in fact assume without loss of generality that each
is simply
with the Borel
-algebra. Thus each
for finite
is a probability measure on the cube
.
We will exploit the regularity properties of such measures:
Exercise 14 Let
be a finite set, and let
be a probability measure on
(with the Borel
-algebra). For any Borel set
in
, establish the inner regularity property
and the outer regularity property
Hint: use the monotone class lemma.
Another way of stating the above exercise is that finite Borel measures on the cube are automatically Radon measures. In fact there is nothing particularly special about the unit cube here; the claim holds for any compact separable metric spaces. Radon measures are often used in real analysis (see e.g. these lecture notes) but we will not develop their theory further here.
Observe that one can define the elementary measure of any elementary set
in
by defining
for any finite and any Borel
. This definition is well-defined thanks to the compatibility hypothesis (6). From the finite additivity of the
it is easy to see that
is a finitely additive probability measure on the Boolean algebra
of elementary sets.
We would like to extend to a countably additive probability measure on
. The standard approach to do this is via the Carathéodory extension theorem in measure theory (or the closely related Hahn-Kolmogorov theorem); this approach is presented in these previous lecture notes, and a similar approach is taken in Durrett. Here, we will try to avoid developing the Carathéodory extension theorem, and instead take a more direct approach similar to the direct construction of Lebesgue measure, given for instance in these previous lecture notes.
Given any subset (not necessarily Borel), we define its outer measure
to be the quantity
where we say that is an open elementary cover of
if each
is an open elementary set, and
. Some properties of this outer measure are easily established:
Exercise 15
- (i) Show that
.
- (ii) (Monotonicity) Show that if
then
.
- (iii) (Countable subadditivity) For any countable sequence
of subsets of
, show that
. In particular (from part (i)) we have the finite subadditivity
for all
.
- (iv) (Elementary sets) If
is an elementary set, show that
. (Hint: first establish the claim when
is compact, relying heavily on the regularity properties of the
provided by Exercise 14, then extend to the general case by further heavy reliance on regularity.) In particular, we have
.
- (v) (Approximation) Show that if
, then for any
there exists an elementary set
such that
. (Hint: use the monotone class lemma. When dealing with an increasing sequence of measurable sets
obeying the required property, approximate these sets by an increasing sequence of elementary sets
, and use the finite additivity of elementary measure and the fact that bounded monotone sequences converge.)
From part (v) of the above exercise, we see that every can be viewed as a “limit” of a sequence
of elementary sets such that
. From parts (iii), (iv) we see that the sequence
is a Cauchy sequence and thus converges to a limit, which we denote
; one can check from further application of (iii), (iv) that this quantity does not depend on the specific choice of
. (Indeed, from subadditivity we see that
.) From definition we see that
extends
(thus
for any elementary set
), and from the above exercise one checks that
is countably additive. Thus
is a probability measure with the desired properties, and the proof of the Kolmogorov extension theorem is complete.
— 3. Independence —
Using the notion of product measure, we can now quickly define the notion of independence:
Definition 16 A collection
of random variables
(each of which take values in some measurable space
) is said to be jointly independent, if the distribution of
is the product of the distributions of the
. Or equivalently (after expanding all the definitions), we have
for all finite
and all measurable subsets
of
. We say that two random variables
are independent (or that
is independent of
) if the pair
is jointly independent.
It is worth reiterating that unless otherwise specified, all random variables under consideration are being modeled by a single probability space. The notion of independence between random variables does not make sense if the random variables are only being modeled by separate probability spaces; they have to be coupled together into a single probability space before independence becomes a meaningful notion.
Independence is a non-trivial notion only when one has two or more random variables; by chasing through the definitions we see that any collection of zero or one variables is automatically jointly independent.
Example 17 If we let
be drawn uniformly from a product
of two Borel sets
in
of positive finite Lebesgue measure, then
and
are independent. However, if
is drawn from uniformly from another shape (e.g. a parallelogram), then one usually does not expect to have independence.
As a special case of the above definition, a finite family of random variables taking values in
is jointly independent if one has
for all measurable in
for
.
Suppose that is a family of independent random variables, with each
taking values in
. From Exercise 3 we see that
whenever are disjoint finite subsets of
,
is the tuple
, and
is a measurable subset of
. In particular, we see that the tuples
are also jointly independent. This implies in turn that
are jointly independent for any measurable functions
. Thus, for instance, if
are jointly independent random variables taking values in
respectively, then
and
are independent for any measurable
and
. In particular, if two scalar random variables
are jointly independent of a third random variable
(i.e. the triple
are jointly independent), then combinations such as
or
are also independent of
.
We remark that there is a quantitative version of the above facts used in information theory, known as the data processing inequality, but this is beyond the scope of this course.
If and
are scalar random variables, then from the Fubini and Tonelli theorems we see that
if and
are either both unsigned, or both absolutely integrable. We caution however that the converse is not true: just because two random variables
happen to obey (8) does not necessarily mean that they are independent; instead, we say merely that they are uncorrelated, which is a weaker statement.
More generally, if and
are random variables taking values in ranges
respectively, then
for any scalar functions on
respectively, provided that
and
are either both unsigned, or both absolutely integrable. This is the property of
and
which is equivalent to independence (as can be seen by specialising to those
that take values in
): thus for instance independence of two unsigned random variables
entails not only (8), but
,
, etc.. Similarly when discussing the joint independence of larger numbers of random variables. It is this ability to easily decouple expectations of independent random variables that make independent variables particularly easy to compute with in probability.
Exercise 18 Show that a random variable
taking values in a locally compact,
-compact metric space is independent of itself (i.e.
and
are independent) if and only if
is almost surely equal to a constant.
Exercise 19 Show that a constant (deterministic) random variable is independent of any other random variable.
Exercise 20 Let
be discrete random variables (i.e. they take values in at most countable spaces
equipped with the discrete sigma-algebra). Show that
are jointly independent if and only if one has
for all
.
Exercise 21 Let
be real scalar random variables. Show that
are jointly independent if and only if one has
for all
.
The following exercise demonstrates that probabilistic independence is analogous to linear independence:
Exercise 22 Let
be a finite-dimensional vector space over a finite field
, and let
be a random variable drawn uniformly at random from
. Let
be a non-degenerate bilinear form on
, and let
be non-zero vectors in
. Show that the random variables
are jointly independent if and only if the vectors
are linearly independent.
Exercise 23 Give an example of three random variables
which are pairwise independent (that is, any two of
are independent of each other), but not jointly independent. (Hint: one can use the preceding exercise.)
Another analogy is with orthogonality:
Exercise 24 Let
be a random variable taking values in
with the Gaussian distribution, in the sense that
(where
denotes the Euclidean norm on
), and let
be vectors in
. Show that the random variables
(with
denoting the Euclidean inner product) are jointly independent if and only if the
are pairwise orthogonal.
We say that a family of events are jointly independent if their indicator random variables
are jointly independent. Undoing the definitions, this is equivalent to requiring that
for all disjoint finite subsets of
. This condition is complicated, but simplifies in the case of just two events:
Exercise 25
- (i) Show that two events
are independent if and only if
.
- (ii) If
are events, show that the condition
is necessary, but not sufficient, to ensure that
are jointly independent.
- (iii) Given an example of three events
that are pairwise independent, but not jointly independent.
Because of the product measure construction, it is easy to insert independent sources of randomness into an existing randomness model by extending that model, thus giving a more useful version of Corollaries 27 and 31 of Notes 0:
Proposition 26 Suppose one has a collection of events and random variables modeled by some probability space
, and let
be a probability measure on a measurable space
. Then there exists an extension
of the probability space
, and a random variable
modeled by
taking values in
, such that
has distribution
and is independent of all random variables that were previously modeled by
.
More generally, given a finite collectionof probability spaces on measurable spaces
, there exists an extension
of
and random variables
modeled by
taking values in
for each
, such that each
has distribution
and
and
are jointly independent for any random variable
that was previously modeled by
.
If the
are all standard Borel spaces, then one can also take
to be infinite (even if
is uncountable).
Proof: For the first part, we define the extension to be the product of
with the probability space
, with factor map
defined by
, and with
modeled by
. It is then routine to verify all the claimed properties. The other parts of the proposition are proven similarly, using Proposition 11 for the final part.
Using this proposition, for instance, one can start with a given random variable and create an independent copy
of that variable, which has the same distribution as
but is independent of
, by extending the probability model. Indeed one can create any finite number of independent copies, or even an infinite number of
takes values in a standard Borel space (in particular, one can do this if
is a scalar random variable). A finite or infinite sequence
of random variables that are jointly independent and all have the same distribution is said to be an independent and identically distributed (or iid for short) sequence of random variables. The above proposition allows us to easily generate such sequences by extending the sample space as necessary.
Exercise 27 Let
be random variables that are independent and identically distributed copies of the Bernoulli random variable with expectation
, that is to say the
are jointly independent with
for all
.
- (i) Show that the random variable
is uniformly distributed on the unit interval
.
- (ii) Show that the random variable
has the distribution of Cantor measure (constructed for instance in Example 1.2.4 of Durrett).
Note that part (i) of this exercise provides a means to construct Lebesgue measure on the unit interval (although, when one unpacks the construction, it is actually not too different from the standard construction, as given for instance in this previous set of notes).
Given two square integrable real random variables , the covariance
between the two is defined by the formula
The covariance is well-defined thanks to the Cauchy-Schwarz inequality, and it is not difficult to see that one has the alternate formula
for the covariance. Note that the variance is a special case of the covariance: .
From construction we see that if are independent square integrable variables, then the covariance
vanishes. The converse is not true:
Exercise 28 Give an example of two square-integrable real random variables
which have vanishing covariance
, but are not independent.
However, there is one key case in which the converse does hold, namely that of gaussian random vectors.
Exercise 29 A random vector
taking values in
is said to be a gaussian random vector if there exists
and an
positive definite real symmetric matrix
such that
for all Borel sets
(where we identify elements of
with column vectors). The distribution of
is called a multivariate normal distribution.
- (i) If
is a gaussian random vector with the indicated parameters
, show that
and
for
. In particular
. Thus we see that the parameters of a gaussian random variable can be recovered from the mean and covariances.
- (ii) If
is a gaussian random vector and
, show that
and
are independent if and only if the covariance
vanishes. Furthermore, show that
are jointly independent if and only if all the covariances
for
vanish. In particular, for gaussian random vectors, joint independence is equivalent to pairwise independence. (Contrast this with Exercise 23.)
- (iii) Give an example of two real random variables
, each of which is gaussian, and for which
, but such that
and
are not independent. (Hint: take
to be the product of
with a random sign.) Why does this not contradict (ii)?
We have discussed independence of random variables, and independence of events. It is also possible to define a notion of independence of -algebras. More precisely, define a
-algebra of events to be a collection
of events that contains the empty event, is closed under Boolean operations (in particular, under complements
) and under countable conjunctions and countable disjunctions. Each such
-algebra, when using a probability space model
, is modeled by a
-algebra
of measurable sets in
, which behaves under an extension
in the obvious pullback fashion:
A random variable taking values in some range
is said to be measurable with respect to a
-algebra of events
if the event
lies in
for every measurable subset
of
; in terms of a probabilistic model
,
is measurable with respect to
if and only if
is measurable with respect to
. Note that every random variable
generates a
-algebra
of events, defined to be the collection of all events of the form
for
a measurable subset of
; this is the smallest
-algebra with respect to which
is measurable. More generally, any collection
of random variables, one can define the
-algebra
to be the smallest
-algebra of events with respect to which all of the
are measurable; in terms of a model
, we have
where is the range of
. Similarly, any collection
of events generates a
-algebra of events
, defined as the smallest
-algebra of events that contains all of the
; with respect to a model
, one has
Definition 30 A collection
of
-algebras of events are said to be jointly independent if, whenever
is a random variable measurable with respect to
for
, the tuple
is jointly independent. Equivalently,
is jointly independent if and only if one has
whenever
is a finite subset of
and
for
(why is this equivalent?).
Thus, for instance, and
are independent
-algebras of events if and only if one has
for all and
, that is to say that all the events in
are independent of all the events in
.
The above notion generalises the notion of independence for random variables:
Exercise 31 If
are a collection of random variables, show that
are jointly independent random variables if and only if
are jointly independent
-algebras.
Exercise 32 Let
be a sequence of random variables. Show that
are jointly independent if and only if
is independent of
for all natural numbers
.
Suppose one has a sequence of random variables (such a sequence can be referred to as a discrete stochastic process). For each natural number
, we can define the
-algebras
, as the smallest
algebra that makes all of the
for
measurable; for instance, this
-algebra contains any event that is definable in terms of measurable relations of finitely many of the
, together with countable boolean operations on such events. These
-algebras are clearly decreasing in
. We can define the tail
-algebra
to be the intersection of all these
-algebras, that is to say
consists of those events which lie in
for every
. For instance, if the
are scalar random variables that converge almost surely to a limit
, then we see that (after modification on a null set)
is measurable with respect to the tail
-algebra
.
We have the remarkable Kolmogorov 0-1 law that says that the tail -algebra of a sequence of independent random variables is essentially trivial:
Theorem 33 (Kolmogorov zero-one law) Let
be a sequence of jointly independent random variables. Then every event
in the tail
-algebra
has probability equal to either
or
.
As a corollary of the zero-one law, note that any real scalar tail random variable will be almost surely constant (because, for each rational
, the event
is either almost surely true or almost surely false). Similarly for tail random variables taking values in
or
.
Example 34 Let
be a sequence of jointly independent random variables in
(not necessarily identically distributed). The random variable
is measurable in the tail algebra, and hence must be almost surely constant, thus there exists
such that
almost surely. Similarly there exists
such that
. Thus, either we have
and the
converge almost surely to a deterministic limit, or
and the
almost surely do not converge. What cannot happen is (for instance) that
converges with probability
, and diverges with probability
; the zero-one law forces the only available probabilities of tail events to be zero or one.
Proof: Since are jointly independent, the
-algebra
is independent of
for any
. In particular,
is independent of
. Since the
-algebra
is generated by the
for
, a simple application of the monotone class lemma then shows that
is also independent of
. But
contains
, hence
is independent of itself. But the only events
that are independent of themselves have probability
or
, and the claim follows.
Note that the zero-one law gives no guidance as to which of the two probabilities actually occurs for a given tail event. This usually cannot be determined from such “soft” tools as the zero-one law; instead one often has to work with more “hard” estimates, in particular in explicit inequalities for the probabilities of various events that approximate the given tail event. On the other hand, the proof technique used to prove the Kolmogorov zero-one law is quite general, and is often adapted to prove other zero-one laws in the probability literature.
The zero-one law suggests that many asymptotic statistics of random variables will almost surely have deterministic values. We will see specific examples of this in the next few notes, when we discuss the law of large numbers and the central limit theorem.
76 comments
Comments feed for this article
12 October, 2015 at 5:06 pm
Colin Rust
I think Exercise 22 is wrong as stated. It does not suffice for the
to be linearly independent. For example, take
and the standard (dot) inner product on $R^2$, then $X_1$ and $X_1+X_2$ are not independent even though
and
are linearly independent.
12 October, 2015 at 5:49 pm
Terence Tao
Actually, I don’t see the problem here.
are drawn uniformly and independently at random from a finite field
, and the shear
is a bijection on the plane
, so the pair
should still be independent (in the probabilistic sense).
13 October, 2015 at 4:31 pm
Colin Rust
You’re right, sorry about that! (I was confused about the statement of the exercise, as was probably already clear by my referring to $\latex R$ rather than a finite field.)
This result is surprising to me. Normally, I think of linear independence of two vectors (generically true) as being a much weaker condition than the independence of two random variables (which I think of as generically false). Of course, this is a very special case since the distributions are uniform, so maybe it’s not so surprising.
15 October, 2015 at 8:41 am
Venky
Colin, doesn’t this depend on whether you think of two variables as already coming from a joint distribution, in which case they are unlikely to be independent, but otherwise, you just can just take the product and assert they are independent?
17 October, 2015 at 7:49 am
Colin Rust
That’s a fair point, I was thinking of two variables coming from a joint distribution.
12 October, 2015 at 7:44 pm
Chao Huang
Hi Prof. Tao,
Do you have any intuitive explanation or example for the conclusion of Exercise 23 above, i.e. pairwise independent random variables are not necessarily jointly independent? Thanks!
12 October, 2015 at 8:31 pm
Terence Tao
Not every dependency between a triplet X,Y,Z of variables is caused by dependencies between pairs. For instance, it could be that X,Y,Z are random variables with X+Y+Z=0, thus creating a dependency in the triplet (X,Y,Z) that prevents joint independence, but such that there is no dependency between just X and Y, or between X and Z, or between Y and Z. (This can be made rigorous if X,Y,Z take values in a finite field, as per Exercise 22. The linear algebra analogue would be a triplet of vectors u,v,w which are pairwise linearly independent, but not jointly linearly independent.)
13 October, 2015 at 4:55 pm
arch1
It’s interesting that the example given seems at odds w/ my naive intuition. If X+Y+Z=0 is the only constraint, and in a given instance we know only that X (say) takes on a particularly *large* value, this would appear to *increase* the probability that Z (say) takes on an especially *small* value.
13 October, 2015 at 9:52 pm
Terence Tao
This intuition is valid for random variables in ordered fields such as
, in which notions such as “small” and “large” make sense. However, finite fields are not ordered.
13 October, 2015 at 12:44 am
Anonymous
In exercise 4, “the bound” may be deleted.
[Corrected, thanks -T.]
13 October, 2015 at 4:07 pm
victor
Is Exercise 13 another way of saying that for any (discrete time) stochastic process on the reals is actually a Markov process on a different (much larger) state space?
13 October, 2015 at 4:32 pm
Terence Tao
Not really, because it only produces a model of the individual
in which all of the variables are independent, so if one starts with a discrete stochastic process
which has some coupling between the
, Exercise 13 does not produce a model of that process, though it does give each individual term
of the process the correct distribution. There should however be other constructions that can model an essentially arbitrary non-independent process
as some deterministic functions
of some jointly independent (and in particular, Markovian) process
, modeled by some enormous probability space, though I don’t know how useful such constructions would be.
13 October, 2015 at 5:22 pm
Jack Hamill
Hello Dr. Tao,
My name is Jack Hamill. I am a 12 year old boy from Australia. We have an event at school coming up about eminent people. I have picked you for this event because I found your mathematic discoveries very important and eminent to the mathematics world.
For this event, all boys dress up as their eminent people.
Because of this, I need to know all the facts that you know.
Can you please explain all your knowledge for a kid? (even though your knowledge is infinite. 😀)
Thank you an advance,
Jack Hamill.
14 October, 2015 at 1:22 pm
Anonymous
Hello Prof. Tao,
We usually suppose that we have an i.i.d. collection of random variables.
But in practice, how can we check the i.i.d. assumption for a given data set
? Namely, what are the some possible statistical tests we could apply to our data to determine if it is indeed i.i.d. or not?
Thanks,
14 October, 2015 at 4:16 pm
Terence Tao
I’m not an expert in statistics, so I don’t know the most advanced tests for things like this, but there are certainly tests such as the Pearson chi-squared test which can be used to test for independence and for whether a random variable has a given distribution. So one could for instance use all but one of the data variables to obtain an empirical distribution that one could then test the remaining variable against. One can also compute empirical correlations, though these only capture pairwise independence rather than joint independence.
14 October, 2015 at 4:40 pm
Colin Rust
Correlation is a necessary but not sufficient condition for pairwise independence. (For the very special case where
and
are marginals of a bivariate normal, it does suffice. But even assuming that each is normally distributed is not sufficient.)
14 October, 2015 at 5:28 pm
Anonymous
Do you have any advice for the maths in general including high school math, also do you have any tips to make it easier to solve word problems and do algebraic equations -well really advice for all of algebra would be helpful.
15 October, 2015 at 8:07 pm
Anonymous
Dear Dr.Tao,
After Erdos problem, all of us (200 people) wish someday recent all magazines announce that “Terence Tao has just solved Riemann hypothesis”.N o one in the world can do this problem, except to you.This is a fate.A human’s life is very short.You should be urgent.
16 October, 2015 at 10:20 am
Venky
The definition of \pi_j (right after Remark 5) has a small typo: the projection goes to sigma_j, not \pi_j. Also, the Notes 0 product measure has a box topology flavor, but elementary sets have a product topology flavor. Do you think you can remark on we should think of measures and topologies?
[Corrected, thanks. There are certainly many analogies between point-set topology and measure theory, but I don’t know of many precise connections beyond the level of mere analogy (though there are some curious results of this flavour, e.g. Erdos-Sierpinski duality). -T]
17 October, 2015 at 7:54 am
Colin Rust
I think a way to think about this difference is that sigma-algebras are closed under countable intersection whereas topologies are only closed under finite intersection (by “topology” here I mean the set of open sets). The product structure (topology or measure) is the minimal one such that the projections are well-behaved (respectively, continuous or measurable).
17 October, 2015 at 11:33 am
Venky
Thanks Colin. I now realize that p < 1 can only be meaningfully multiplied to itself a finite number of times.
16 October, 2015 at 12:40 pm
Nicolás Sanhueza-Matamala
A small typo: the first paragraph says “infintie” instead of “infinite”.
[Corrected, thanks – T.]
17 October, 2015 at 8:10 am
Venky
Small typo: in the definition of standard borel after Exercise 9, f doesn’t biject from E to [0,1].
[Corrected, thanks – T.]
18 October, 2015 at 2:09 am
KM
This is a trivial remark, but may complement the last paragraph: it may occasionally be possible to combine the zero-one law with a symmetry argument to actually determine a probability. For instance, if
$X_n$ are independent identically distributed with Ber(1/2) distribution, then the series $\sum_{n=1} X_n/n$ is divergent with probability $1$. With a simple coupling argument you can replace $1/2$ by any parameter $p \ge 1/2$. Of course, this gives no clue about the situation $p<1/2$.
18 October, 2015 at 4:48 am
Venky
Small typo: below equation (4), v(E) = v'(E) not v(E’). This is a great proof. I saw the connection between the continuity and extension of measure and monotone classes.
[Corrected, thanks – T.]
19 October, 2015 at 10:40 pm
leaner
dear tao,I am a student from university .I want to ask you a question.I like math.when I learn by myself,l read the text carefully and try to prove every theorem.Then l’ll do some practice .And read thesis if I can.I think our class may be a little easy.I fell quite confused whether I should listen to him .can you give me some advise if you are available?
22 October, 2015 at 7:33 am
John Mangual
What was Kolmogorov’s issue in building his extension theorem? It seems quite laborious.
My guess: without too much difficulty we can assign measures or “probability” to many subsets of
or even
. However in order to talk about repeated experiments or the Law of Large Numbers we need to define subspaces on an infinite product of copies of
. Even the infinite product
with
can have unmeasurable subsets.
22 October, 2015 at 7:57 am
John Mangual
Any collection of experiments I can think of are not independent. There will always be trace amounts of interdependence.
22 October, 2015 at 9:56 am
Terence Tao
This is certainly true (and particularly relevant, for instance, when applying probabilistic models to mathematical finance). However, thanks to the mathematical phenomenon of universality, many of the results obtained from perfectly independent models (e.g. the central limit theorem) often also hold for more realistic models in which there is a weak amount of coupling. The perfectly independent case thus serves as an excellent toy model for more general models, as it is by far the best understood type of coupling. (The situation is somewhat analogous to linear PDE and nonlinear PDE; almost any real-world PDE is going to have some nonlinearity in it, but the linear case is an incredibly important toy model that must be understood first if one is to have any hope to get a hold of the nonlinear models.)
22 October, 2015 at 10:29 am
Terence Tao
Measure theory has a lot of subtleties (e.g. Banach-Tarski paradoxes, inability to exchange limits, integrals, or sums with each other, interpreting divergent or non-measurable integrals or sums, or distinctions between different modes of convergence) that require care to address correctly, and so many of the foundational results in measure theory (such as the Kolmogorov extension theorem) are rather technical to prove. However, it is relatively easy to apply these theorems once they are proven (as long as one actually verifies whatever technical hypotheses are needed for the theorem to hold, of course).
23 October, 2015 at 8:06 am
275A, Notes 3: The weak and strong law of large numbers | What's new
[…] the law of large numbers with what one can obtain from the Kolmogorov zero-one law, discussed in Notes 2. Observe that if the are real-valued, then the limit superior and are tail random variables in […]
27 October, 2015 at 2:42 pm
Anonymous
In Exercise 29, do we have any more additional assumptions on
? It seems that the result of part (i) would need that
is also symmetric.
[I was using the convention that positive definite matrices are understood to be real symmetric (or at least Hermitian), but I reworded the exercise to clarify this point. – T.]
2 November, 2015 at 7:06 pm
275A, Notes 4: The central limit theorem | What's new
[…] be iid copies of ; by extending the probability space used to model (using Proposition 26 from Notes 2), we can model the and by a common model, in such a way that the combined collection of random […]
3 November, 2015 at 10:46 am
Anonymous
There seems to be a typo in Exercise 14 in the definition of outer regularity. U should contain E and not the other way around.
[Corrected, thanks – T.]
6 November, 2015 at 8:06 am
Anonymous
Dear Terence Tao, in the paragraph above Theorem 33, I don’t understand the statement “the
-algebra
contains any event that is definable in terms of measurable relations of [finitely many] of the
, together with countable bollean operations on such evetns”. So why can’t we directly say that an event in
is definable or involves infinitely many
‘s? Or do you mean that such an event can always be generated by countable operations of events which only involve finitely many
‘s? Thanks very much.
6 November, 2015 at 9:34 am
Terence Tao
The
-algebra
does indeed consist of all events that can be described as measurable combinations of the
for
, which by the definition of the product
-algebra, is the same thing as countable boolean combinations of measurable combinations of finitely many of the
for
. (The countable boolean combinations can be rather complicated though; most of the time one can get away with fairly simple combinations such as countable unions of countable intersections or vice versa, but in principle one could consider combinations that involve iterating the countable intersection and union operations along an arbitrarily countable ordinal.)
So it doesn’t really matter which way one views the
-algebra, but I prefer thinking more concretely in terms of countable boolean combinations of events that depend on finitely many variables, as this makes it explicit why random variables such as
would be measurable in this
-algebra. (On the other hand, asking whether the set
belongs to a given non-principal ultrafilter will not, in general, be measurable in this
-algebra even though it does only depend on the
for
, because it usually cannot be expressed as such a countable combination of events depending only on finitely many variables and so one does not get measurability in the product
-algebra.)
6 November, 2015 at 10:14 am
Rex
Would you happen to know a natural example where Kolmogorov extension fails?
Or, for that matter, a natural example of non-Borel space which arises as a sample space in practice?
8 November, 2015 at 12:26 pm
Rex
Dear Terry,
So is there no simple example of this?
8 November, 2015 at 1:50 pm
Terence Tao
I don’t know of any “natural” example. The construction of Andersen and Jessen (see e.g. http://www.sdu.dk/media/bibpdf/Bind%2020-29/Bind/mfm-25-4.pdf ) uses a variant of the Banach-Tarski paradox to build the counterexample, so it is arguably rather “unnatural”.
6 November, 2015 at 11:05 pm
Rex
Typo: “We have the remarkable that says that the tail”
7 November, 2015 at 6:43 am
Anonymous
This typo is in the second line above theorem 33.
[Corrected, thanks – T.]
7 November, 2015 at 10:36 am
Rex
Now it seems that some formula in example 34 is not parsing.
[Corrected, thanks – T.]
28 November, 2015 at 7:35 am
XiangYu
Dear Professor Tao
In paragraph below exercise 15, since
for every
. Can we say that
is actually equal to
?
[Yes; I’ve updated the text accordingly. -T.]
7 January, 2016 at 11:14 am
Anonymous
Extremely minor typo: $X$ in lieu of $\Omega$ in proof of Lemma 2.
[Corrected, thanks – T.]
13 March, 2016 at 1:08 pm
Anonymous
The following definition is taught in the undergraduate stochastic process course:
Given a probability space
and a measurable space
, an
-valued stochastic process is a collection of
-valued random variables on
, indexed by a totally ordered set
(“time”). That is, a stochastic process $X$ is a collection
where each
It is said that one needs the Kolmogorov’s Existence Theorem to construct such stochastic processes. I don’t understand why one needs to do so. Why knowing the distribution of each
is not enough to give a stochastic process
? Isn’t
just a family of random variables?
13 March, 2016 at 3:22 pm
Terence Tao
Knowing the distribution of each
is not enough information to determine the distribution of the joint variable
; for instance, the
could be independent, or they could be coupled together. (As discussed in Notes 0, knowing the distribution of each
lets one model each
by its own probability space
, but the point is that one has to couple all these random variables together to live on a common probability space
. If one wants to specify the joint distributions of any finite number of the
(e.g. if one wishes the
to all be jointly independent), one needs the Kolmogorov extension theorem in order to construct such a common probability space for all of the
.)
13 March, 2016 at 5:42 pm
Anonymous
I’m not sure if I have understood all your points.
As a special case , if it is already assumed that
for all
, namely, all the random variables
are assumed to live in a common probability space, then can one claim that
is a stochastic process (as in the definition given in the previous comment) without using the Kolmogorov extension theorem?
I see that “Knowing the distribution of each $X_t$ is not enough information to determine the distribution of the joint variable $X$”. Does this suggest that the definition of stochastic processes given in the previous comment is not complete? All I can see from the definition of a stochastic process is that
is just an indexed “set” of random variables which live in a common probability space and no consideration (even the existence) of “joint distribution” for
is found in the definition. It seems from your comment that
is also a “random variable”
(using the notations in that definition).
13 March, 2016 at 10:33 pm
Terence Tao
An ordered tuple
of random variables (modeled using a common probability space
), with each
taking values in some range
, can be viewed as a single random variable
(modeled using the same probability space) taking values in the product space
, using the canonical identification of
with
(in the category of measurable spaces). So, yes, one can in principle describe a stochastic process by writing down a sample space, and then using that sample space to model a tuple of random variables
.
But in practice, one doesn’t want to specify the sample space explicitly (see Notes 0 for more discussion of this point) – one would prefer to work in a “coordinate-free” fashion, and only describe those aspects of the stochastic process that are independent of the choice of sample space model, such as the individual distribution of the
, or the joint distribution of any finite number of the
. The Kolmogorov extension theorem tells us that the latter gives us all the information we need to describe the entire process up to change of sample space model (modulo the technical restriction to standard Borel spaces).
15 May, 2016 at 5:48 am
Anonymous
You use
for the product of two
-algebra while some people use
. After all, this is just a matter of notation preference. On the other hand,
is not a Cartesian product, where the notation
is usually used. Is it some sort of “tensor product”?
15 May, 2016 at 7:54 am
Terence Tao
More or less, yes. More precisely, the category of measurable spaces
is a monoidal category (which abstracts the concept of having a tensor product), with the operation
being the tensor product. The Cartesian product can similarly be thought of as the tensor product in the monoidal category of sets. (Actually, in both cases the tensor product is also just the categorical product.)
9 October, 2016 at 6:32 pm
Anonymous
I have never seen that the concept
–
system is used in real analysis. Why is it useful in probability theory? Can one say that everything can be done with the
–
system is essentially can be done by “monotone class”?
10 October, 2016 at 8:39 am
Terence Tao
My understanding is that the monotone class theorem and the Dynkin pi-lambda theorem are basically equivalent, in that it is not difficult to use one to deduce the other, though I have not worked through the details myself (I have always used the monotone class theorem when I had to understand some tricky sigma-algebra). But there may be some minor technical advantages to using the pi-lambda formalism in probability theory (perhaps when working with continuous martingales?). The consensus seems to be though that the differences are small, and the choice of which to use is largely a matter of personal taste.
9 October, 2016 at 6:59 pm
Anonymous
Is Halmos who first gave the proof of the Monotone class theorem?
19 October, 2016 at 12:09 pm
Anonymous
Given a sequence of probability measure
on
, can one construct a sequence of independent random variables
such that each
has probability measure
?
[Yes; see Proposition 26. -T.]
14 November, 2016 at 5:25 am
Anonymous
I find all the versions of the proof of the monotone class lemma I read before very difficult to remember until I read the following one from Billingsley’s Probability and Measure.
Replace
with
in the notes above.
Let
be the minimal monotone class over
. It suffices to show that
.
It suffices to show that
is a
-field. But since a monotone field is a
-field, it suffices to show that
is a field.
The essential goal now is to show that
Instead of doing the proof directly, define




and
are monotone classes containing
.
and
It suffices to show that
All we need to do now is to show that both
I’m quite curious that how you internalize a tricky proof like the one you gave in your notes.
14 November, 2016 at 5:26 am
Anonymous
In Problem 8, would you elaborate why (6) is “clearly necessary”?
[Apply (7) to a set
of the form
. -T.]
14 November, 2016 at 5:38 am
Anonymous
I can do Exercise 23 by considering the set
and
.
How can I do it with the hint you gave? Would you elaborate?
16 November, 2016 at 5:39 pm
Anonymous
Looking at the definition of independence, I’m not able to give an answer to the following question:
Given any real number
, is there a sequence of independent random variables such that they are all equal to
surely?
Since they are all equal to to a same constant, must they be dependent?
16 November, 2016 at 9:25 pm
Terence Tao
Deterministic variables are simultaneously independent of each other, and dependent on each other; see also Exercises 18 and 19.
27 December, 2016 at 2:09 am
Tim
I have a question: if X and Y are normal (0,1) random variable with correlation coefficient r, then what’s the distribution of the tensor product of X and Y? Thanks.
25 January, 2017 at 9:21 am
Anonymous
In the Wikipedia arcticle (https://en.wikipedia.org/wiki/Kolmogorov_extension_theorem), the Kolmogorov’s extension theorem says that if a given system of measures satisfies two “consistency conditions”, then there exists a stochastic process having these finite-dimensional distributions.
How different is this one from Theorem 10? Or are they equivalent? The consistency conditions in Wikipedia becomes the part of the conclusions of Theorem 10 in this note. Is the “standard Borel” condition corresponding to something in the Wikipedia article?
25 January, 2017 at 9:27 am
Anonymous
Sorry for this stupid question. The version in this note gives a more general form of the extension theorem.
17 February, 2017 at 9:57 pm
254A, Notes 2: The central limit theorem | What's new
[…] be iid copies of ; by extending the probability space used to model (using Proposition 26 from Notes 2), we can model the and by a common model, in such a way that the combined collection of random […]
19 June, 2017 at 11:23 pm
Doubtful determinism
This may be a question that is naive, but I can’t see why the uniqueness claim in Theorem 1 necessitates the more complicated proof that was given, using various structural properties of the sets
that satisfy (4). Why is it not, instead, possible to do the following simpler proof, which uses the fact that
are already completely determined by (4) ?
, the right-hand side
is a fixed number, which implies that if
and
, then
. Since this holds for all
, the claim is proved.
Proof: For every
22 June, 2017 at 5:36 pm
Terence Tao
Not every measurable set in
is of the form
(consider for instance the union of two disjoint rectangles in the plane). The sets
generate the
-algebra
(as noted in Exercise 14 of Notes 0), but are not the only elements of that algebra.
9 November, 2017 at 8:53 pm
haonanz
Hello Professor Tao, I think I am having some confusions regard the fundamentals. My confusion can be illustrated in the following example: Assuming it’s a flipping coin experiment that follows Bernoulli Trial with probability p heads (denoted as 1), and probability (1-p) tail (Denoted as 0). The experiment is performed independently n times with corresponding sequence of random variables
. I am interested in the conditional probability of
. By independence, we know that
. However, we can infer on p based on the realization of
through which indirectly affect $P(X_n)$ through p. In an extreme case, we can consider the situation
. I think in summary, my point is that given a sequence of i.i.d R.Vs
, The random variables are independent, but then there are some ‘dependency’ through the underlying distribution P. Right now, I am thinking that maybe the best way to resolve this is to look at an extended probability space which include the distribution, then $X_1…X_n$ is rather interpreted as conditional independent on the distribution. I wonder if this is the right way to look at this, Thanks so much.
9 November, 2017 at 9:53 pm
Terence Tao
It depends on the status of
in your probabilistic model. If
is a deterministic quantity, then the
are jointly independent. Inferring
from some subset of this data does not actually give any additional information, since
was not random to begin with. If instead
is itself a random variable, then
is distributed according to a mixture of iid variables, rather than being iid, and so as you say the
are now only conditionally independent relative to
, but not jointly independent since knowledge of
does influence one’s posterior distribution of
, which in turn influences
.
10 September, 2018 at 6:39 am
Need help proving $[0,1]$ is Borel isomorphic to ${0,1}^mathbb{N}$. – eeem
[…] 275A, Notes 2: Product measures and independence […]
8 October, 2018 at 6:06 am
Need help proving $[0,1]$ is Borel isomorphic to ${0,1}^mathbb{N}$. – Website Information
[…] 275A, Notes 2: Product measures and independence […]
29 October, 2019 at 12:25 pm
hhhenforce
Hello Terence. Is there any analogue of the Tonelli-Fubini theorem when I is infinite ?
27 March, 2020 at 1:46 am
Anonymous
There is a typo in the first line of Exercise 28.
[Corrected, thanks – T.]
29 June, 2021 at 4:28 pm
Anonymous
In Exercise 4, the link should be to equation (1) and not (4).
In Exercise 24, the last line should be v_1, …, v_m.
[Corrected, thanks – T.]
15 September, 2021 at 9:36 am
Anonymouse
In Exercise 18, I think you are missing some assumptions on the range. In particular, you need all singletons to be measurable, otherwise $X = c$ need not be an event and it doesn’t make sense to talk about a.s. constant r.v.
[Hypotheses added, thanks – T.]
18 September, 2021 at 6:21 am
Yaver Gulusoy
Exercise 13, second display: the closing bracket for the set in question is missing.
[Corrected, thanks – T.]
16 May, 2022 at 12:07 pm
Kevin
Hello Professor Tao,
I have trouble understanding the end of the proof of lemma 12, which states that the image of the map is a
subset of
. It suffices to show that for each i,
is a
subset of [0,1], but I have trouble showing that (I have replaced the original d with the equivalent metric d(x,y) = d(x,y) / (1 + d(x,y))). Or should I try a different approach?
Thanks in advance!
17 May, 2022 at 9:05 am
Terence Tao
Oops, you’re right, it’s only a countable intersection of
sets. I’ve adjusted the text appropriately.