275A, Notes 2: Product measures and independence

12 October, 2015 in 275A - probability theory, math.CA, math.PR | Tags: independence, product measure | by Terence Tao

In the previous set of notes, we constructed the measure-theoretic notion of the Lebesgue integral, and used this to set up the probabilistic notion of expectation on a rigorous footing. In this set of notes, we will similarly construct the measure-theoretic concept of a product measure (restricting to the case of probability measures to avoid unnecessary technicalities), and use this to set up the probabilistic notion of independence on a rigorous footing. (To quote Durrett: “measure theory ends and probability theory begins with the definition of independence.”) We will be able to take virtually any collection of random variables (or probability distributions) and couple them together to be independent via the product measure construction, though for infinite products there is the slight technicality (a requirement of the Kolmogorov extension theorem) that the random variables need to range in standard Borel spaces. This is not the only way to couple together such random variables, but it is the simplest and the easiest to compute with in practice, as we shall see in the next few sets of notes.

— 1. Product measures —

It is intuitively obvious that Lebesgue measure ${m^2}$ on ${{\bf R}^2}$ ought to be related to Lebesgue measure ${m}$ on ${{\bf R}}$ by the relationship

$\displaystyle m^2( E_1 \times E_2 ) = m(E_1) m(E_2) \ \ \ \ \ (1)$

for any Borel sets ${E_1, E_2 \subset {\bf R}}$ . This is in fact true (see Exercise 5 below), and is part of a more general phenomenon, which we phrase here in the case of probability measures:

Theorem 1 (Product of two probability spaces) Let ${(\Omega_1, {\mathcal F}_1, \mu_1)}$ and ${(\Omega_2, {\mathcal F}_2, \mu_2)}$ be probability spaces. Then there is a unique probability measure ${\mu_1 \times \mu_2}$ on ${(\Omega_1 \times \Omega_2, {\mathcal F}_1 \times {\mathcal F}_2)}$ with the property that

$\displaystyle \mu_1 \times \mu_2( E_1 \times E_2 ) = \mu_1(E_1) \mu_2(E_2) \ \ \ \ \ (2)$

for all ${E_1 \in {\mathcal F}_1, E_2 \in {\mathcal F}_2}$ . Furthermore, we have the following two facts:

(Tonelli theorem) If ${f: \Omega_1 \times \Omega_2 \rightarrow [0,+\infty]}$ is measurable, then for each ${\omega_1 \in \Omega_1}$ , the function ${\omega_2 \mapsto f(\omega_1,\omega_2)}$ is measurable on ${\Omega_2}$ , and the function ${\omega_1 \mapsto \int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)}$ is measurable on ${\Omega_1}$ . Similarly, for each ${\omega_2 \in \Omega_2}$ , the function ${\omega_1 \mapsto f(\omega_1,\omega_2)}$ is measurable on ${\Omega_1}$ and ${\omega_2 \mapsto \int_{\Omega_1} f(\omega_1,\omega_2)\ d\mu_1(\omega_1)}$ is measurable on ${\Omega_2}$ . Finally, we have
$\displaystyle \int_{\Omega_1 \times \Omega_2} f\ d(\mu_1 \times \mu_2) = \int_{\Omega_1} (\int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)) d\mu_1(\omega_1)$

$\displaystyle = \int_{\Omega_2} (\int_{\Omega_1} f(\omega_1,\omega_2)\ d\mu_1(\omega_1)) d\mu_2(\omega_2).$

(Fubini theorem) If ${f: \Omega_1 \times \Omega_2 \rightarrow {\bf C}}$ is absolutely integrable, then for ${\mu_1}$ -almost every ${\omega_1 \in \Omega_1}$ , the function ${\omega_2 \mapsto f(\omega_1,\omega_2)}$ is absolutely integrable on ${\Omega_2}$ , and the function ${\omega_1 \mapsto \int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)}$ is absolutely integrable on ${\Omega_1}$ . Similarly, for ${\mu_2}$ -almost every ${\omega_2 \in \Omega_2}$ , the function ${\omega_1 \mapsto f(\omega_1,\omega_2)}$ is absolutely integrable on ${\Omega_1}$ and ${\omega_2 \mapsto \int_{\Omega_1} f(\omega_1,\omega_2)\ d\mu_1(\omega_1)}$ is absolutely integrable on ${\Omega_2}$ . Finally, we have
$\displaystyle \int_{\Omega_1 \times \Omega_2} f\ d(\mu_1 \times \mu_2) = \int_{\Omega_1} (\int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)) d\mu_1(\omega_1)$

$\displaystyle = \int_{\Omega_2} (\int_{\Omega_1} f(\omega_1,\omega_2)\ d\mu_1(\omega_1)) d\mu_2(\omega_2).$

The Fubini and Tonelli theorems are often used together (so much so that one may refer to them as a single theorem, the Fubini-Tonelli theorem, often also just referred to as Fubini’s theorem in the literature). For instance, given an absolutely integrable function ${f_1: \Omega_1 \rightarrow {\bf C}}$ and an absolutely integrable function ${f_2: \Omega_2 \rightarrow {\bf C}}$ , the Tonelli theorem tells us that the tensor product ${f_1 \otimes f_2: \Omega_1 \times \Omega_2 \rightarrow {\bf C}}$ defined by

$\displaystyle (f_1 \otimes f_2)(\omega_1, \omega_2) := f_1(\omega_1) f_2(\omega_2)$

for ${\omega_1 \in \Omega_1,\omega_2 \in \Omega_2}$ , is absolutely integrable and one has the factorisation

$\displaystyle \int_{\Omega_1 \times \Omega_2} f_1 \otimes f_2\ d(\mu_1 \times \mu_2) = (\int_{\Omega_1} f_1\ d\mu_1) (\int_{\Omega_2} f_2\ d\mu_2). \ \ \ \ \ (3)$

Our proof of Theorem 1 will be based on the monotone class lemma that allows one to conveniently generate a ${\sigma}$ -algebra from a Boolean algebra. (In Durrett, the closely related ${\pi-\lambda}$ theorem is used in place of the monotone class lemma.) Define a monotone class in a set ${\Omega}$ to be a collection ${{\mathcal F}}$ of subsets of ${\Omega}$ with the following two closure properties:

If ${E_1 \subset E_2 \subset \ldots}$ are a countable increasing sequence of sets in ${{\mathcal F}}$ , then ${\bigcup_{n=1}^\infty E_n \in {\mathcal F}}$ .
If ${E_1 \supset E_2 \supset \ldots}$ are a countable decreasing sequence of sets in ${{\mathcal F}}$ , then ${\bigcap_{n=1}^\infty E_n \in {\mathcal F}}$ .

Thus for instance any ${\sigma}$ -algebra is a monotone class, but not conversely. Nevertheless, there is a key way in which monotone classes “behave like” ${\sigma}$ -algebras:

Lemma 2 (Monotone class lemma) Let ${{\mathcal A}}$ be a Boolean algebra on ${\Omega}$ . Then ${\langle {\mathcal A} \rangle}$ is the smallest monotone class that contains ${{\mathcal A}}$ .

Proof: Let ${{\mathcal F}}$ be the intersection of all the monotone classes that contain ${{\mathcal A}}$ . Since ${\langle {\mathcal A} \rangle}$ is clearly one such class, ${{\mathcal F}}$ is a subset of ${\langle {\mathcal A} \rangle}$ . Our task is then to show that ${{\mathcal F}}$ contains ${\langle {\mathcal A} \rangle}$ .
It is also clear that ${{\mathcal F}}$ is a monotone class that contains ${{\mathcal A}}$ . By replacing all the elements of ${{\mathcal F}}$ with their complements, we see that ${{\mathcal F}}$ is necessarily closed under complements.
For any ${E \in {\mathcal A}}$ , consider the set ${{\mathcal C}_E}$ of all sets ${F \in {\mathcal F}}$ such that ${F \backslash E}$ , ${E \backslash F}$ , ${F \cap E}$ , and ${\Omega \backslash (E \cup F)}$ all lie in ${{\mathcal F}}$ . It is clear that ${{\mathcal C}_E}$ contains ${{\mathcal A}}$ ; since ${{\mathcal F}}$ is a monotone class, we see that ${{\mathcal C}_E}$ is also. By definition of ${{\mathcal F}}$ , we conclude that ${{\mathcal C}_E = {\mathcal F}}$ for all ${E \in {\mathcal A}}$ .
Next, let ${{\mathcal D}}$ be the set of all ${E \in {\mathcal F}}$ such that ${F \backslash E}$ , ${E \backslash F}$ , ${F \cap E}$ , and ${\Omega \backslash (E \cup F)}$ all lie in ${{\mathcal F}}$ for all ${F \in {\mathcal F}}$ . By the previous discussion, we see that ${{\mathcal D}}$ contains ${{\mathcal A}}$ . One also easily verifies that ${{\mathcal D}}$ is a monotone class. By definition of ${{\mathcal F}}$ , we conclude that ${{\mathcal D} = {\mathcal F}}$ . Since ${{\mathcal F}}$ is also closed under complements, this implies that ${{\mathcal F}}$ is closed with respect to finite unions. Since this class also contains ${{\mathcal A}}$ , which contains ${\emptyset}$ , we conclude that ${{\mathcal F}}$ is a Boolean algebra. Since ${{\mathcal F}}$ is also closed under increasing countable unions, we conclude that it is closed under arbitrary countable unions, and is thus a ${\sigma}$ -algebra. As it contains ${{\mathcal A}}$ , it must also contain ${\langle {\mathcal A} \rangle}$ . $\Box$
We now begin the proof of Theorem 1. We begin with the uniqueness claim. Suppose that we have two measures ${\nu, \nu'}$ on ${\Omega_1 \times \Omega_2}$ that are product measures of ${\mu_1}$ and ${\mu_2}$ in the sense that

$\displaystyle \nu(E_1 \times E_2) = \nu'(E_1 \times E_2) = \mu_1(E_1) \times \mu_2(E_2) \ \ \ \ \ (4)$

for all ${E_1 \in {\mathcal F}_1}$ and ${E_2 \in {\mathcal F}_2}$ . If we then set ${{\mathcal F}}$ to be the collection of all ${E \in {\mathcal F}_1 \times {\mathcal F}_2}$ such that ${\nu(E) = \nu'(E)}$ , then ${{\mathcal F}}$ contains all sets of the form ${E_1 \times E_2}$ with ${E_1 \in {\mathcal F}_1}$ and ${E_2 \in {\mathcal F}_2}$ . In fact ${{\mathcal F}}$ contains the collection ${{\mathcal A}}$ of all sets that are “elementary” in the sense that they are of the form ${\bigcup_{i=1}^n E_{1,i} \times E_{2,i}}$ for finite ${n}$ and ${E_{1,i} \in {\mathcal F}_1, E_{2,i} \in {\mathcal F}_2}$ for ${i=1,\dots,n}$ , since such sets can be easily decomposed into a finite union of disjoint products ${E'_{1,i} \times E'_{2,i}}$ , at which point the claim follows from (4) and finite additivity. But ${{\mathcal A}}$ is a Boolean algebra that generates ${{\mathcal F}_1 \times {\mathcal F}_2}$ as a ${\sigma}$ -algebra, and from continuity from above and below we see that ${{\mathcal F}}$ is a monotone class. By the monotone class lemma, we conclude that ${{\mathcal F}}$ is all of ${{\mathcal F}_1 \times {\mathcal F}_2}$ , and hence ${\nu=\nu'}$ . This gives uniqueness. Now we prove existence. We first claim that for any measurable set ${E \in {\mathcal F}_1 \times {\mathcal F}_2}$ , the sets ${E_{\omega_1} := \{ \omega_2 \in \Omega_2: (\omega_1 \times \omega_2) \in E\}}$ are measurable in ${{\mathcal F}_2}$ . Indeed, the claim is obvious for sets ${E}$ that are “elementary” in the sense that they belong to the Boolean algebra ${{\mathcal A}}$ defined previously, and the collection of all such sets is a monotone class, so the claim follows from the monotone class lemma. A similar argument (relying on monotone or dominated convergence) shows that the function

$\displaystyle \omega_1 \mapsto \mu_2(E_{\omega_1}) = \int_{\Omega_2} 1_E( \omega_1, \omega_2)\ d\mu_2(\omega_2)$

is measurable in ${\Omega_1}$ for all ${E \in {\mathcal F}_1 \times {\mathcal F}_2}$ . Thus, for any ${E \in {\mathcal F}_1 \times {\mathcal F}_2}$ , we can define the quantity ${(\mu_1 \times \mu_2)(E)}$ by

$\displaystyle (\mu_1 \times \mu_2)(E) := \int_{\Omega_1} \mu_2(E_{\omega_1})\ d\mu_1(\omega_1)$

$\displaystyle = \int_{\Omega_1}(\int_{\Omega_2} 1_E( \omega_1, \omega_2)\ d\mu_2(\omega_2))\ d\mu_1(\omega_1).$

A routine application of the monotone convergence theorem verifies that ${\mu_1 \times \mu_2}$ is a countably additive measure; one easily checks that (2) holds for all ${E_1 \in {\mathcal F}_1, E_2 \in {\mathcal F}_2}$ , and in particular ${\mu_1 \times \mu_2}$ is a probability measure.
By construction, we see that the identity

$\displaystyle \int_{\Omega_1 \times \Omega_2} f\ d(\mu_1 \times \mu_2) = \int_{\Omega_1} (\int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2))\ d\mu_1(\omega_1)$

holds (with all functions integrated being measurable) whenever ${f}$ is an indicator function ${f=1_E}$ with ${E \in {\mathcal F}_1 \times {\mathcal F}_2}$ . By linearity of integration, the same identity holds (again with all functions measurable) when ${f: \Omega_1 \times \Omega_2 \rightarrow [0,+\infty]}$ is an unsigned simple function. Since any unsigned measurable function ${f}$ can be expressed as the monotone non-decreasing limit of unsigned simple functions ${f_n}$ (for instance, one can round ${f}$ down to the largest multiple of ${2^{-n}}$ that is less than ${n}$ and ${f}$ ), the above identity also holds for unsigned measurable ${f}$ by the monotone convergence theorem. Applying this fact to the absolute value ${|f|}$ of an absolutely integrable function ${f: \Omega_1 \times \Omega_2 \rightarrow {\bf C}}$ , we conclude for such functions that

$\displaystyle \int_{\Omega_1} (\int_{\Omega_2} |f|(\omega_1,\omega_2)\ d\mu_2(\omega_2))\ d\mu_1(\omega_1) < \infty$

which by Markov’s inequality implies that

$\displaystyle \int_{\Omega_2} |f|(\omega_1,\omega_2)\ d\mu_2(\omega_2) < \infty$

for ${\mu_1}$ -almost every ${\omega_1 \in \Omega_1}$ . In other words, the function ${\omega_2 \mapsto f(\omega_1,\omega_2)}$ is absolutely integrable on ${\Omega_2}$ for ${\mu_1}$ -almost every ${\omega_1 \in \Omega_1}$ . By monotonicity we conclude that

$\displaystyle \int_{\Omega_1} |\int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)|\ d\mu_1(\omega_1) < \infty$

and hence the function ${\omega_1 \mapsto \int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)}$ is absolutely integrable. Hence it makes sense to ask whether the identity

$\displaystyle \int_{\Omega_1 \times \Omega_2} f\ d(\mu_1 \times \mu_2) = \int_{\Omega_1} \int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)\ d\mu_1(\omega_1)$

holds for absolutely integrable ${f}$ , as both sides are well-defined. We have already established this claim when ${f}$ is unsigned and absolutely integrable; by subtraction this implies the claim for real-valued absolutely integrable ${f}$ , and by taking real and imaginary parts we obtain the claim for complex-valued absolutely integrable ${f}$ .
We may reverse the roles of ${\Omega_1}$ and ${\Omega_2}$ , and define ${\mu_1 \times \mu_2}$ instead by the formula

$\displaystyle (\mu_1 \times \mu_2)(E) = \int_{\Omega_2}(\int_{\Omega_1} 1_E( \omega_1, \omega_2)\ d\mu_1(\omega_1))\ d\mu_2(\omega_2).$

By the previously proved uniqueness of product measure, we see that this defines the same product measure ${\mu_1 \times \mu_2}$ as previously. Repeating the previous arguments we obtain all the above claims with the roles of ${\Omega_1}$ and ${\Omega_2}$ reversed. This gives all the claims required for Theorem 1.
One can extend the product construction easily to finite products:

Exercise 3 (Finite products) Show that for any finite collection ${(\Omega_i, {\mathcal F}_i, \mu_i)_{i \in A}}$ of probability spaces, there exists a unique probability measure ${\prod_{i \in A} \mu_i}$ on ${(\prod_{i \in A} \Omega_i, \prod_{i \in A} {\mathcal F}_i)}$ such that

$\displaystyle (\prod_{i \in A}\mu_i)(\prod_{i \in A} E_i) = \prod_{i \in A} \mu_i(E_i)$

whenever ${E_i \in {\mathcal F}_i}$ for ${i \in A}$ . Furthermore, show that

$\displaystyle \prod_{i \in A}\mu_i = (\prod_{i \in A_1}\mu_i) \times (\prod_{i \in A_2}\mu_i)$

for any partition ${A = A_1 \uplus A_2}$ (after making the obvious identification between ${\prod_{i \in A} \Omega_i}$ and ${(\prod_{i \in A_1} \Omega_i) \times (\prod_{i \in A_2} \Omega_i)}$ ). Thus for instance one has the associativity property

$\displaystyle \mu_1 \times \mu_2 \times \mu_3 = (\mu_1 \times \mu_2) \times \mu_3 = \mu_1 \times (\mu_2 \times \mu_3)$

for any probability spaces ${(\Omega_i, {\mathcal F}_i, \mu_i)}$ for ${i=1,\dots,3}$ .

Exercise 4 Let ${\mu_1,\dots,\mu_n}$ be probability measures on ${{\bf R}}$ , and let ${F_1,\dots,F_n: {\bf R} \rightarrow [0,1]}$ be their Stieltjes measure functions. Show that ${\mu_1 \times \dots \mu_n}$ is the unique probability measure ${\mu}$ on ${{\bf R}^n}$ whose Stietljes measure function ${(t_1,\dots,t_n) := \mu( (-\infty,t_1] \times \dots \times (-\infty,t_n])}$ is the tensor product ${(t_1,\dots,t_n) \mapsto F_1(t_1) \dots F_n(t_n)}$ of ${F_1,\dots,F_n}$ .

By writing ${\prod_{i \in A} \mu_i}$ as products of pairs of probability spaces in many different ways, one can obtain a higher-dimensional analogue of the Fubini and Tonelli theorems; we leave the precise statement of such a theorem to the interested reader.
It is important to be aware that the Fubini theorem identity

$\displaystyle \int_{\Omega_1} (\int_{\Omega_2} f(\omega_1,\omega_2)\ d\mu_2(\omega_2)) d\mu_1(\omega_1) = \int_{\Omega_2} (\int_{\Omega_1} f(\omega_1,\omega_2)\ d\mu_1(\omega_1)) d\mu_2(\omega_2) \ \ \ \ \ (5)$

for measurable functions ${f: \Omega_1 \times \Omega_2 \rightarrow {\bf C}}$ that are not unsigned, are usually only justified when ${f}$ is absolutely integrable on ${\Omega_1 \times \Omega_2}$ , or equivalently (by the Tonelli theorem) the function ${\omega_1 \mapsto \int_{\Omega_2} |f(\omega_1,\omega_2)|\ d\mu_2(\omega_2)}$ is absolutely integrable on ${\Omega_1}$ (or that ${\omega_2 \mapsto \int_{\Omega_1} |f(\omega_1,\omega_2)|\ d\mu_1(\omega_1)}$ is absolutely integrable on ${\Omega_2}$ . Without this joint absolute integrability (and without any unsigned property on ${f}$ ), the identity (5) can fail even if both sides are well-defined. For instance, let ${\Omega_1 = \Omega_2}$ be the unit interval ${[0,1]}$ , and let ${\mu_1 = \mu_2}$ be the uniform probability measure on this interval, and set

$\displaystyle f(\omega_1,\omega_2) := \prod_{n=1}^\infty 2^n 1_{[2^{-n}, 2^{-n+1})}(\omega_1)$

$\displaystyle \times (2^n 1_{[2^{-n}, 2^{-n+1})}(\omega_2) - 2^{n+1} 1_{[2^{-n-1}, 2^{-n})}(\omega_2)).$

One can check that both sides of (5) are well-defined, but that the left-hand side is ${0}$ and the right-hand side is ${1}$ . Of course, this function is neither unsigned nor jointly absolutely integrable, so this counterexample does not violate either of the Fubini or Tonelli theorems. Thus one should take care to only interchange integrals when the integrands are known to be either unsigned or jointly absolutely integrable, or if one has another way to rigorously justify the exchange of integrals.
The above theory extends from probability spaces to finite measure spaces, and more generally to measure spaces that are ${\sigma}$ -finite, that is to say they are expressable as the countable union of sets of finite measure. (With a bit of care, some portions of product measure theory are even extendible to non-sigma-finite settings, though I urge caution in applying these results blindly in that case.) We will not give the details of these generalisations here, but content ourselves with one example:

Exercise 5 Establish (1) for all Borel sets ${E_1,E_2 \subset {\bf R}}$ . (Hint: ${{\bf R}}$ can be viewed as the disjoint union of a countable sequence of sets of measure ${1}$ .)

Remark 6 When doing real analysis (as opposed to probability), it is convenient to complete the Borel ${\sigma}$ -algebra ${{\mathcal B}[{\bf R}^n]}$ on spaces such as ${{\bf R}^n}$ , to form the larger Lebesgue ${\sigma}$ -algebra ${{\mathcal L}[{\bf R}^n]}$ , defined as the collection of all subsets ${E}$ in ${{\bf R}^n}$ that differ from a Borel set ${F}$ in ${{\bf R}^n}$ by a sub-null set, in the sense that ${E \Delta F \subset G}$ for some Borel subset ${G}$ of ${{\bf R}^n}$ of zero Lebesgue measure. There are analogues of the Fubini and Tonelli theorems for such complete ${\sigma}$ -algebras; see this previous lecture notes for details. However one should be cautioned that the product ${{\mathcal L}[{\bf R}^{n_1}] \times {\mathcal L}[{\bf R}^{n_2}]}$ of Lebesgue ${\sigma}$ -algebras is not the Lebesgue ${\sigma}$ -algebra ${{\mathcal L}[{\bf R}^{n_1+n_2}]}$ , but is instead an intermediate ${\sigma}$ -algebra between ${{\mathcal B}[{\bf R}^{n_1+n_2}]}$ and ${{\mathcal L}[{\bf R}^{n_1+n_2}]}$ , which causes some additional small complications. For instance, if ${f: {\bf R}^{n_1+n_2} \rightarrow {\bf C}}$ is Lebesgue measurable, then the functions ${x_2 \mapsto f(x_1,x_2)}$ can only be found to be Lebesgue measurable on ${{\bf R}^{n_2}}$ for almost every ${x_1 \in {\bf R}^{n_1}}$ , rather than for all ${x_1 \in {\bf R}^{n_1}}$ . We will not dwell on these subtleties further here, as we will rarely have any need to complete the ${\sigma}$ -algebras used in probability theory.

It is also important in probability theory applications to form the product of an infinite number of probability spaces ${(\Omega_i, {\mathcal F}_i, \mu_i)}$ for ${i \in A}$ , where ${A}$ can be infinite or even uncountable. Recall from Notes 0 that the product ${\sigma}$ -algebra ${{\mathcal F}_A := \prod_{i \in A} {\mathcal F}_i}$ on ${\Omega_A := \prod_{i \in A} \Omega_i}$ is defined to be the ${\sigma}$ -algebra generated by the sets ${\pi_j^{-1}(E_j)}$ for ${j \in A}$ and ${E_j \in {\mathcal F}_j}$ , where ${\pi_j: \Omega_A \rightarrow \Omega_j}$ is the usual coordinate projection. Equivalently, if we define an elementary set to be a subset of ${\Omega_A}$ of the form ${\pi_B^{-1}(E_B)}$ , where ${B}$ is a finite subset of ${A}$ , ${\pi_B: \Omega_A \rightarrow \Omega_B}$ is the obvious projection map to ${\Omega_B := \prod_{i \in B} \Omega_i}$ , and ${E_B}$ is a measurable set in ${{\mathcal F}_B := \prod_{i \in B} {\mathcal F}_i}$ , then ${{\mathcal F}_A}$ can be defined as the ${\sigma}$ -algebra generated by the collection ${{\mathcal A}}$ of elementary sets. (Elementary sets are the measure-theoretic analogue of cylinder sets in point set topology.) For future reference we note the useful fact that ${{\mathcal A}}$ is a Boolean algebra.
We define a product measure ${\mu_A = \prod_{i \in A} \mu_i}$ to be a probability measure on the measurable space ${(\Omega_A, {\mathcal F}_A)}$ which extends all of the finite products in the sense that

$\displaystyle \mu_A( \pi_B^{-1}(E_B) ) = \mu_B(E_B)$

for all finite subsets ${B}$ of ${A}$ and all ${E_B}$ in ${{\mathcal F}_B}$ , where ${\mu_B := \prod_{i \in B} \mu_i}$ . If this product measure exists, it is unique:

Exercise 7 Show that for any collection of probability spaces ${(\Omega_i, {\mathcal F}_i, \mu_i)}$ for ${i \in A}$ , there is at most one product measure ${\mu_A}$ . (Hint: adapt the uniqueness argument in Theorem 1 that used the monotone class lemma.)

In the case of finite ${A}$ , the finite product constructed in Exercise 3 is clearly the unique product. But for infinite ${A}$ , the construction of product measure is a more nontrivial issue. We can generalise the problem as follows:

Problem 8 (Extension problem) Let ${(\Omega_i, {\mathcal F}_i)_{i \in A}}$ be a collection of measurable spaces. For each finite ${B \subset A}$ , let ${\mu_B}$ be a probability measure on ${(\Omega_B, {\mathcal F}_B)}$ obeying the compatibility condition

$\displaystyle \mu_B( \pi_{B \rightarrow C}^{-1}(E_C) ) = \mu_C(E_C) \ \ \ \ \ (6)$

for all finite ${C \subset B \subset A}$ and ${E_C \in {\mathcal F}_C}$ , where ${\pi_{B \rightarrow C}: \Omega_B \rightarrow \Omega_C}$ is the obvious restriction. Can one then define a probability measure ${\mu_A}$ on ${(\Omega_A, {\mathcal F}_A)}$ such that

$\displaystyle \mu_A( \pi_{B}^{-1}(E_B) ) = \mu_B(E_B) \ \ \ \ \ (7)$

for all finite ${B \subset A}$ and ${E_B \subset {\mathcal F}_B}$ ?

Note that the compatibility condition (6) is clearly necessary if one is to find a measure ${\mu_A}$ obeying (7).
Again, one has uniqueness:

Exercise 9 Show that for any ${(\Omega_i, {\mathcal F}_i)_{i \in A}}$ and ${\mu_B}$ for finite ${B \subset A}$ as in the above extension problem, there is at most one probability measure ${\mu_A}$ with the stated properties.

The extension problem is trivial for finite ${A}$ , but for infinite ${A}$ there are unfortunately examples where the probability measure ${\mu_A}$ fails to exist. However, there is one key case in which we can build the extension, thanks to the Kolmogorov extension theorem. Call a measurable space ${(\Omega,{\mathcal F})}$ standard Borel if it is isomorphic as a measurable space to a Borel subset of the unit interval ${[0,1]}$ with Borel measure, that is to say there is a bijection ${f}$ from ${\Omega}$ to a Borel subset ${E}$ of ${[0,1]}$ such that ${f: \Omega \rightarrow E}$ and ${f^{-1}: E \rightarrow \Omega}$ are both measurable. (In Durrett, such spaces are called nice spaces.) Note that one can easily replace ${[0,1]}$ by other standard spaces such as ${{\bf R}}$ if desired, since these spaces are isomorphic as measurable spaces (why?).

Theorem 10 (Kolmogorov extension theorem) Let the situation be as in Problem 8. If all the measurable spaces ${(\Omega_i,{\mathcal F}_i)}$ are standard Borel, then there exists probability measure ${\mu_A}$ solving the extension problem (which is then unique, thanks to Exercise 9).

The proof of this theorem is lengthy and is deferred to the next (optional) section. Specialising to the product case, we conclude

Corollary 11 Let ${(\Omega_i, {\mathcal F}_i, \mu_i)_{i \in A}}$ be a collection of probability spaces with ${(\Omega_i, {\mathcal F}_i)}$ standard Borel. Then there exists a product measure ${\prod_{i \in A} \mu_i}$ (which is then unique, thanks to Exercise 7).

Of course, to use this theorem we would like to have a large supply of standard Borel spaces. Here is one tool that often suffices:

Lemma 12 Let ${(X,d)}$ be a complete separable metric space, and let ${\Omega}$ be a Borel subset of ${X}$ . Then ${\Omega}$ (with the Borel ${\sigma}$ -algebra) is standard Borel.

Proof: Let us call two topological spaces Borel isomorphic if their corresponding Borel structures are isomorphic as measurable spaces. Using the binary expansion, we see that ${[0,1]}$ is Borel isomorphic to ${\{0,1\}^{\bf N}}$ (the countable number of points that have two binary expansions can be easily permuted to obtain a genuine isomorphism). Similarly ${[0,1]^{\bf N}}$ is Borel isomorphic to ${\{0,1\}^{{\bf N} \times {\bf N}}}$ . Since ${{\bf N} \times {\bf N}}$ is in bijection with ${{\bf N}}$ , we conclude that ${[0,1]^{\bf N}}$ is Borel isomorphic to ${[0,1]}$ . Thus it will suffice to to show that every complete separable metric space ${(X,d)}$ is Borel isomorphic to a Borel subset of ${[0,1]^{\bf N}}$ . But if we let ${q_1,q_2,\dots}$ be a countable dense subset in ${X}$ , the map

$\displaystyle x \mapsto (\frac{d(x,q_i)}{1+d(x,q_i)})_{i \in {\bf N}}$

can easily be seen to be a Borel isomorphism between ${X}$ and a Borel subset of ${[0,1]^{\bf N}}$ (note the image is the closure of the points ${(\frac{d(q_j,q_i)}{1+d(q_j,q_i)})_{i \in {\bf N}}}$ in the uniform norm). The claim follows. $\Box$

Exercise 13 (Kolmogorov extension theorem, alternate form) For each natural number ${n}$ , let ${\mu_n}$ be a probability measure on ${{\bf R}^n}$ with the property that

$\displaystyle \mu_{n+1}( B \times {\bf R} ) = \mu_n(B)$

for ${n \geq 1}$ and any box ${B = [a_1,b_1] \times \dots \times [a_n,b_n]}$ in ${{\bf R}^n}$ , where we identify ${{\bf R}^n \times {\bf R}}$ with ${{\bf R}^{n+1}}$ in the usual manner. Show that there exists a unique probability measure ${\mu_{\bf N}}$ on ${{\bf R}^{\bf N}}$ (with the product ${\sigma}$ -algebra, or equivalently the Borel ${\sigma}$ -algebra on the product topology) such that

$\displaystyle \mu_{\bf N}( \{ (\omega_i)_{i \in {\bf N}}: (\omega_1,\dots,\omega_n) \in E \} ) = \mu_n(E)$

for all ${n \geq 1}$ and Borel sets ${E \subset {\bf R}^n}$ .

— 2. Proof of the Kolmogorov extension theorem (optional) —

We now prove Theorem 10. By the definition of a standard Borel space, we may assume without loss of generality that each ${\Omega_i}$ is a Borel subset of ${[0,1]}$ with the Borel ${\sigma}$ -algebra, and then by extending each ${\Omega_i}$ to ${[0,1]}$ we may in fact assume without loss of generality that each ${\Omega_i}$ is simply ${[0,1]}$ with the Borel ${\sigma}$ -algebra. Thus each ${\mu_B}$ for finite ${B \subset A}$ is a probability measure on the cube ${[0,1]^B}$ .
We will exploit the regularity properties of such measures:

Exercise 14 Let ${B}$ be a finite set, and let ${\mu_B}$ be a probability measure on ${[0,1]^B}$ (with the Borel ${\sigma}$ -algebra). For any Borel set ${E}$ in ${[0,1]^B}$ , establish the inner regularity property

$\displaystyle \mu_B(E) = \sup_{K \subset E, K \hbox{ compact}} \mu_B(K)$

and the outer regularity property

$\displaystyle \mu_B(E) = \inf_{U \supset E, U \hbox{ open}} \mu_B(U).$

Hint: use the monotone class lemma.

Another way of stating the above exercise is that finite Borel measures on the cube are automatically Radon measures. In fact there is nothing particularly special about the unit cube ${[0,1]^B}$ here; the claim holds for any compact separable metric spaces. Radon measures are often used in real analysis (see e.g. these lecture notes) but we will not develop their theory further here.
Observe that one can define the elementary measure ${\mu_0(E)}$ of any elementary set ${E = \pi_B^{-1}(E_B)}$ in ${[0,1]^A}$ by defining

$\displaystyle \mu_0( \pi_B^{-1}(E_B) ) := \mu_B( E_B )$

for any finite ${B \subset A}$ and any Borel ${E_B \subset [0,1]^B}$ . This definition is well-defined thanks to the compatibility hypothesis (6). From the finite additivity of the ${\mu_B}$ it is easy to see that ${\mu_0}$ is a finitely additive probability measure on the Boolean algebra ${{\mathcal A}}$ of elementary sets.
We would like to extend ${\mu_0}$ to a countably additive probability measure on ${{\mathcal F}_A}$ . The standard approach to do this is via the Carathéodory extension theorem in measure theory (or the closely related Hahn-Kolmogorov theorem); this approach is presented in these previous lecture notes, and a similar approach is taken in Durrett. Here, we will try to avoid developing the Carathéodory extension theorem, and instead take a more direct approach similar to the direct construction of Lebesgue measure, given for instance in these previous lecture notes.
Given any subset ${E \subset [0,1]^A}$ (not necessarily Borel), we define its outer measure ${\mu^*(E)}$ to be the quantity

$\displaystyle \mu^*(E) := \inf \{ \sum_{i=1}^\infty \mu_0(E_i): E_i \hbox{ open elementary cover of } E \},$

where we say that ${E_1,E_2,\dots}$ is an open elementary cover of ${E}$ if each ${E_i}$ is an open elementary set, and ${E \subset \bigcup_{i=1}^\infty E_i}$ . Some properties of this outer measure are easily established:

Exercise 15

(i) Show that ${\mu^*(\emptyset) = 0}$ .

(ii) (Monotonicity) Show that if ${E \subset F \subset [0,1]^A}$ then ${\mu^*(E) \leq \mu^*(F)}$ .

(iii) (Countable subadditivity) For any countable sequence ${E_1,E_2,\dots}$ of subsets of ${[0,1]^A}$ , show that ${\mu^*( \bigcup_{i=1}^\infty E_i) \leq \sum_{i=1}^\infty \mu^*(E_i)}$ . In particular (from part (i)) we have the finite subadditivity ${\mu^*(E \cup F) \leq \mu^*(E) + \mu^*(F)}$ for all ${E,F \subset [0,1]^A}$ .

(iv) (Elementary sets) If ${E}$ is an elementary set, show that ${\mu^*(E) = \mu_0(E)}$ . (Hint: first establish the claim when the elementary set ${E}$ is also compact , relying heavily on the regularity properties of the ${\mu_B}$ provided by Exercise 14, then extend to the general case by further heavy reliance on regularity.) In particular, we have ${\mu^*([0,1]^A) = 1}$ .

(v) (Approximation) Show that if ${E \in {\mathcal F}_A}$ , then for any ${\varepsilon > 0}$ there exists an elementary set ${E_\varepsilon}$ such that ${\mu^*(E \Delta E_\varepsilon) < \varepsilon}$ . (Hint: use the monotone class lemma. When dealing with an increasing sequence of measurable sets ${E_n}$ obeying the required property, approximate these sets by an increasing sequence of elementary sets ${E'_n}$ , and use the finite additivity of elementary measure and the fact that bounded monotone sequences converge.)

From part (v) of the above exercise, we see that every ${E \in {\mathcal F}_A}$ can be viewed as a “limit” of a sequence ${E_n}$ of elementary sets such that ${\mu^*(E \Delta E_{n}) < 1/n}$ . From parts (iii), (iv) we see that the sequence ${\mu_0(E_n)}$ is a Cauchy sequence and thus converges to a limit, which we denote ${\mu(E)}$ ; one can check from further application of (iii), (iv) that this quantity does not depend on the specific choice of ${E_n}$ . (Indeed, from subadditivity we see that ${\mu(E) = \mu^*(E)}$ .) From definition we see that ${\mu}$ extends ${\mu_0}$ (thus ${\mu(E) = \mu_0(E)}$ for any elementary set ${E}$ ), and from the above exercise one checks that ${\mu}$ is countably additive. Thus ${\mu}$ is a probability measure with the desired properties, and the proof of the Kolmogorov extension theorem is complete.

— 3. Independence —

Using the notion of product measure, we can now quickly define the notion of independence:

Definition 16 A collection ${(X_i)_{i \in A}}$ of random variables ${X_i}$ (each of which take values in some measurable space ${R_i}$ ) is said to be jointly independent, if the distribution of ${(X_i)_{i \in A}}$ is the product of the distributions of the ${X_i}$ . Or equivalently (after expanding all the definitions), we have

$\displaystyle {\bf P}( \bigwedge_{i \in B} (X_i \in S_i) ) = \prod_{i \in B} {\bf P}(X_i \in S_i)$

for all finite ${B \subset A}$ and all measurable subsets ${S_i}$ of ${R_i}$ . We say that two random variables ${X,Y}$ are independent (or that ${X}$ is independent of ${Y}$ ) if the pair ${(X,Y)}$ is jointly independent.

It is worth reiterating that unless otherwise specified, all random variables under consideration are being modeled by a single probability space. The notion of independence between random variables does not make sense if the random variables are only being modeled by separate probability spaces; they have to be coupled together into a single probability space before independence becomes a meaningful notion.
Independence is a non-trivial notion only when one has two or more random variables; by chasing through the definitions we see that any collection of zero or one variables is automatically jointly independent.

Example 17 If we let ${(X,Y)}$ be drawn uniformly from a product ${E \times F}$ of two Borel sets ${E,F}$ in ${{\bf R}}$ of positive finite Lebesgue measure, then ${X}$ and ${Y}$ are independent. However, if ${(X,Y)}$ is drawn from uniformly from another shape (e.g. a parallelogram), then one usually does not expect to have independence.

As a special case of the above definition, a finite family ${X_1,\dots,X_n}$ of random variables taking values in ${R_1,\dots,R_n}$ is jointly independent if one has

$\displaystyle {\bf P}( \bigwedge_{i=1}^n (X_i\in S_i) ) = \prod_{i=1}^n {\bf P}( X_i \in S_i )$

for all measurable ${S_i}$ in ${R_i}$ for ${i=1,\dots,n}$ .
Suppose that ${(X_i)_{i \in A}}$ is a family of independent random variables, with each ${X_i}$ taking values in ${R_i}$ . From Exercise 3 we see that

$\displaystyle {\bf P}( \bigwedge_{j=1}^J (X_{A_j} \in S_j) ) = \prod_{j=1}^J {\bf P}( X_{A_j} \in S_j )$

whenever ${A_1,\dots,A_J}$ are disjoint finite subsets of ${A}$ , ${X_{A_j}}$ is the tuple ${(X_i)_{i \in A_j}}$ , and ${S_j}$ is a measurable subset of ${\prod_{i \in A_j} R_i}$ . In particular, we see that the tuples ${X_{A_1},\dots,X_{A_J}}$ are also jointly independent. This implies in turn that ${F_1(X_{A_1}),\dots,F_J(X_{A_J})}$ are jointly independent for any measurable functions ${F_j: \prod_{i \in A_j} X_i \rightarrow Y_j}$ . Thus, for instance, if ${X_1,X_2,X_3,X_4}$ are jointly independent random variables taking values in ${R_1,R_2,R_3,R_4}$ respectively, then ${F(X_1,X_2)}$ and ${G(X_3)}$ are independent for any measurable ${F: R_1 \times R_2 \rightarrow Y}$ and ${G: R_3 \rightarrow Y'}$ . In particular, if two scalar random variables ${X,Y}$ are jointly independent of a third random variable ${Z}$ (i.e. the triple ${X,Y,Z}$ are jointly independent), then combinations such as ${X+Y}$ or ${XY}$ are also independent of ${Z}$ .
We remark that there is a quantitative version of the above facts used in information theory, known as the data processing inequality, but this is beyond the scope of this course.
If ${X}$ and ${Y}$ are independent scalar random variables, then from the Fubini and Tonelli theorems we see that

$\displaystyle {\bf E} XY = ({\bf E} X) ({\bf E} Y) \ \ \ \ \ (8)$

if ${X}$ and ${Y}$ are either both unsigned, or both absolutely integrable. We caution however that the converse is not true: just because two random variables ${X,Y}$ happen to obey (8) does not necessarily mean that they are independent; instead, we say merely that they are uncorrelated, which is a weaker statement.
More generally, if ${X}$ and ${Y}$ are random variables taking values in ranges ${R, R'}$ respectively, then

$\displaystyle {\bf E} F(X) G(Y) = ({\bf E} F(X)) ({\bf E} G(Y))$

for any scalar functions ${F,G}$ on ${R,R'}$ respectively, provided that ${F(X)}$ and ${G(Y)}$ are either both unsigned, or both absolutely integrable. This is the property of ${X}$ and ${Y}$ which is equivalent to independence (as can be seen by specialising to those ${F,G}$ that take values in ${\{0,1\}}$ ): thus for instance independence of two unsigned random variables ${X,Y}$ entails not only (8), but ${{\bf E} X^2 Y = ({\bf E} X^2) ({\bf E} Y)}$ , ${{\bf E} e^X e^Y = ({\bf E} e^X) ({\bf E} e^Y)}$ , etc.. Similarly when discussing the joint independence of larger numbers of random variables. It is this ability to easily decouple expectations of independent random variables that make independent variables particularly easy to compute with in probability.

Exercise 18 Show that a random variable ${X}$ taking values in a locally compact, ${\sigma}$ -compact metric space is independent of itself (i.e. ${X}$ and ${X}$ are independent) if and only if ${X}$ is almost surely equal to a constant.

Exercise 19 Show that a constant (deterministic) random variable is independent of any other random variable.

Exercise 20 Let ${X_1,\dots,X_n}$ be discrete random variables (i.e. they take values in at most countable spaces ${R_1,\dots,R_n}$ equipped with the discrete sigma-algebra). Show that ${X_1,\dots,X_n}$ are jointly independent if and only if one has

$\displaystyle {\bf P}( \bigwedge_{i=1}^n (X_i=x_i) ) = \prod_{i=1}^n {\bf P}(X_i = x_i)$

for all ${x_1 \in R_1,\dots, x_n \in R_n}$ .

Exercise 21 Let ${X_1,\dots,X_n}$ be real scalar random variables. Show that ${X_1,\dots,X_n}$ are jointly independent if and only if one has

$\displaystyle {\bf P}( \bigwedge_{i=1}^n (X_i \leq t_i) ) = \prod_{i=1}^n {\bf P}( X_i \leq t_i )$

for all ${t_1,\dots,t_n \in {\bf R}}$ .

The following exercise demonstrates that probabilistic independence is analogous to linear independence:

Exercise 22 Let ${V}$ be a finite-dimensional vector space over a finite field ${F}$ , and let ${X}$ be a random variable drawn uniformly at random from ${V}$ . Let ${\langle, \rangle: V \times V \rightarrow F}$ be a non-degenerate bilinear form on ${V}$ , and let ${v_1,\dots,v_n}$ be non-zero vectors in ${V}$ . Show that the random variables ${\langle X, v_1 \rangle, \dots, \langle X, v_n \rangle}$ are jointly independent if and only if the vectors ${v_1,\dots,v_n}$ are linearly independent.

Exercise 23 Give an example of three random variables ${X,Y,Z}$ which are pairwise independent (that is, any two of ${X,Y,Z}$ are independent of each other), but not jointly independent. (Hint: one can use the preceding exercise.)

Another analogy is with orthogonality:

Exercise 24 Let ${X}$ be a random variable taking values in ${{\bf R}^n}$ with the Gaussian distribution, in the sense that

$\displaystyle \mathop{\bf P}( X \in S ) = \int_S \frac{1}{(2\pi)^{n/2}} e^{-|x|^2/2}\ dx$

(where ${|x|}$ denotes the Euclidean norm on ${{\bf R}^n}$ ), and let ${v_1,\dots,v_m}$ be vectors in ${{\bf R}^n}$ . Show that the random variables ${X \cdot v_1, \dots, X \cdot v_m}$ (with ${\cdot}$ denoting the Euclidean inner product) are jointly independent if and only if the ${v_1,\dots,v_m}$ are pairwise orthogonal.

We say that a family of events ${(E_i)_{i \in A}}$ are jointly independent if their indicator random variables ${(1_{E_i})_{i \in A}}$ are jointly independent. Undoing the definitions, this is equivalent to requiring that

$\displaystyle {\bf P}( \bigwedge_{i \in A_1} E_{i} \wedge \bigwedge_{j \in A_2} \overline{E_{j}}) = \prod_{i \in A_1} {\bf P}(E_i) \prod_{j \in A_2} {\bf P}(\overline{E_j})$

for all disjoint finite subsets ${A_1, A_2}$ of ${A}$ . This condition is complicated, but simplifies in the case of just two events:

Exercise 25

(i) Show that two events ${E,F}$ are independent if and only if ${{\bf P}(E \wedge F) = {\bf P}(E) {\bf P}(F)}$ .

(ii) If ${E,F,G}$ are events, show that the condition ${{\bf P}(E \wedge F \wedge G) = {\bf P}(E) {\bf P}(F) {\bf P}(G)}$ is necessary, but not sufficient, to ensure that ${E,F,G}$ are jointly independent.

(iii) Given an example of three events ${E,F,G}$ that are pairwise independent, but not jointly independent.

Because of the product measure construction, it is easy to insert independent sources of randomness into an existing randomness model by extending that model, thus giving a more useful version of Corollaries 27 and 31 of Notes 0:

Proposition 26 Suppose one has a collection of events and random variables modeled by some probability space ${\Omega}$ , and let ${\nu}$ be a probability measure on a measurable space ${R = (R,{\mathcal B})}$ . Then there exists an extension ${\Omega'}$ of the probability space ${\Omega}$ , and a random variable ${X}$ modeled by ${\Omega'}$ taking values in ${R}$ , such that ${X}$ has distribution ${\nu}$ and is independent of all random variables that were previously modeled by ${\Omega}$ .
More generally, given a finite collection ${(\nu_i)_{i \in A}}$ of probability measures on measurable spaces ${R_i}$ , there exists an extension ${\Omega'}$ of ${\Omega}$ and random variables ${X_i}$ modeled by ${\Omega'}$ taking values in ${R_i}$ for each ${i \in A}$ , such that each ${X_i}$ has distribution ${\nu_i}$ and ${(X_i)_{i \in A}}$ and ${Y}$ are jointly independent for any random variable ${Y}$ that was previously modeled by ${\Omega}$ .
If the ${R_i}$ are all standard Borel spaces, then one can also take ${A}$ to be infinite (even if ${A}$ is uncountable).

Proof: For the first part, we define the extension ${\Omega'}$ to be the product of ${\Omega}$ with the probability space ${(R,{\mathcal B},\nu)}$ , with factor map ${\pi: \Omega \times R \rightarrow \Omega}$ defined by ${\pi(\omega,x) := \omega}$ , and with ${X}$ modeled by ${X_\Omega(\omega,x) := x}$ . It is then routine to verify all the claimed properties. The other parts of the proposition are proven similarly, using Proposition 11 for the final part. $\Box$
Using this proposition, for instance, one can start with a given random variable ${X}$ and create an independent copy ${Y}$ of that variable, which has the same distribution as ${X}$ but is independent of ${X}$ , by extending the probability model. Indeed one can create any finite number of independent copies, or even an infinite number of ${X}$ takes values in a standard Borel space (in particular, one can do this if ${X}$ is a scalar random variable). A finite or infinite sequence ${X_1, X_2, \dots}$ of random variables that are jointly independent and all have the same distribution is said to be an independent and identically distributed (or iid for short) sequence of random variables. The above proposition allows us to easily generate such sequences by extending the sample space as necessary.

Exercise 27 Let ${\epsilon_1, \epsilon_2, \dots \in \{0,1\}}$ be random variables that are independent and identically distributed copies of the Bernoulli random variable with expectation ${1/2}$ , that is to say the ${\epsilon_1,\epsilon_2,\dots}$ are jointly independent with ${{\bf P}( \epsilon_i = 1 ) = {\bf P}(\epsilon_i = 0 ) = 1/2}$ for all ${i}$ .

(i) Show that the random variable ${\sum_{n=1}^\infty 2^{-n} \epsilon_n}$ is uniformly distributed on the unit interval ${[0,1]}$ .

(ii) Show that the random variable ${\sum_{n=1}^\infty 2 \times 3^{-n} \epsilon_n}$ has the distribution of Cantor measure (constructed for instance in Example 1.2.4 of Durrett).

Note that part (i) of this exercise provides a means to construct Lebesgue measure on the unit interval ${[0,1]}$ (although, when one unpacks the construction, it is actually not too different from the standard construction, as given for instance in this previous set of notes).
Given two square integrable real random variables ${X, Y}$ , the covariance ${\hbox{Cov}(X,Y)}$ between the two is defined by the formula

$\displaystyle \hbox{Cov}(X,Y) := {\bf E}( (X - {\bf E} X) (Y - {\bf E} Y) ).$

The covariance is well-defined thanks to the Cauchy-Schwarz inequality, and it is not difficult to see that one has the alternate formula

$\displaystyle \hbox{Cov}(X,Y) = {\bf E}(X Y) - ({\bf E} X) ({\bf E} Y)$

for the covariance. Note that the variance is a special case of the covariance: ${\hbox{Var}(X) = \hbox{Cov}(X,X)}$ .
From construction we see that if ${X,Y}$ are independent square integrable variables, then the covariance ${\hbox{Cov}(X,Y)}$ vanishes. The converse is not true:

Exercise 28 Give an example of two square-integrable real random variables ${X,Y}$ which have vanishing covariance ${\hbox{Cov}(X,Y)}$ , but are not independent.

However, there is one key case in which the converse does hold, namely that of gaussian random vectors.

Exercise 29 A random vector ${(X_1,\dots,X_n)}$ taking values in ${{\bf R}^n}$ is said to be a gaussian random vector if there exists ${\mu = (\mu_1,\dots,\mu_n) \in {\bf R}^n}$ and an ${n \times n}$ positive definite real symmetric matrix ${\Sigma := (\sigma_{ij})_{1 \leq i,j \leq n}}$ such that

$\displaystyle \mathop{\bf P}( (X_1,\dots,X_n) \in S) = \frac{1}{(2\pi)^{n/2} (\det \Sigma)^{1/2}} \int_S e^{-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)}\ dx$

for all Borel sets ${S \subset {\bf R}^n}$ (where we identify elements of ${{\bf R}^n}$ with column vectors). The distribution of ${(X_1,\dots,X_n)}$ is called a multivariate normal distribution.

(i) If ${(X_1,\dots,X_n)}$ is a gaussian random vector with the indicated parameters ${\mu, \Sigma}$ , show that ${{\bf E} X_i = \mu_i}$ and ${\hbox{Cov}(X_i,X_j) = \sigma_{ij}}$ for ${1 \leq i,j \leq n}$ . In particular ${\hbox{Var}(X_i) = \sigma_{ii}}$ . Thus we see that the parameters of a gaussian random variable can be recovered from the mean and covariances.

(ii) If ${(X_1,\dots,X_n)}$ is a gaussian random vector and ${1 \leq i, j \leq n}$ , show that ${X_i}$ and ${X_j}$ are independent if and only if the covariance ${\hbox{Cov}(X_i,X_j)}$ vanishes. Furthermore, show that ${(X_1,\dots,X_n)}$ are jointly independent if and only if all the covariances ${\hbox{Cov}(X_i,X_j)}$ for ${1 \leq i < j \leq n}$ vanish. In particular, for gaussian random vectors, joint independence is equivalent to pairwise independence. (Contrast this with Exercise 23.)

(iii) Give an example of two real random variables ${X,Y}$ , each of which is gaussian, and for which ${\hbox{Cov}(X,Y)=0}$ , but such that ${X}$ and ${Y}$ are not independent. (Hint: take ${Y}$ to be the product of ${X}$ with a random sign.) Why does this not contradict (ii)?

We have discussed independence of random variables, and independence of events. It is also possible to define a notion of independence of ${\sigma}$ -algebras. More precisely, define a ${\sigma}$ -algebra of events to be a collection ${{\mathcal F}}$ of events that contains the empty event, is closed under Boolean operations (in particular, under complements ${E \mapsto \overline{E}}$ ) and under countable conjunctions and countable disjunctions. Each such ${\sigma}$ -algebra, when using a probability space model ${\Omega}$ , is modeled by a ${\sigma}$ -algebra ${{\mathcal F}_\Omega}$ of measurable sets in ${\Omega}$ , which behaves under an extension ${\pi: \Omega' \rightarrow \Omega}$ in the obvious pullback fashion:

$\displaystyle {\mathcal F}_{\Omega'} = \{ \pi^{-1}(E): E \in {\mathcal F}_\Omega \}.$

A random variable ${X}$ taking values in some range ${R}$ is said to be measurable with respect to a ${\sigma}$ -algebra of events ${{\mathcal F}}$ if the event ${X \in S}$ lies in ${{\mathcal F}}$ for every measurable subset ${S}$ of ${R}$ ; in terms of a probabilistic model ${\Omega}$ , ${X}$ is measurable with respect to ${{\mathcal F}}$ if and only if ${X_\Omega}$ is measurable with respect to ${{\mathcal F}_\Omega}$ . Note that every random variable ${X}$ generates a ${\sigma}$ -algebra ${\sigma(X)}$ of events, defined to be the collection of all events of the form ${X \in S}$ for ${S}$ a measurable subset of ${R}$ ; this is the smallest ${\sigma}$ -algebra with respect to which ${X}$ is measurable. More generally, any collection ${(X_i)_{i \in A}}$ of random variables, one can define the ${\sigma}$ -algebra ${\sigma( (X_i)_{i \in A} )}$ to be the smallest ${\sigma}$ -algebra of events with respect to which all of the ${X_i}$ are measurable; in terms of a model ${\Omega}$ , we have

$\displaystyle \sigma( (X_i)_{i \in A} )_\Omega = \langle \{ (X_i)_\Omega^{-1}(S_i): i \in A, S_i \in {\mathcal B}_i \} \rangle$

where ${(R_i,{\mathcal B}_i)}$ is the range of ${X_i}$ . Similarly, any collection ${(E_i)_{i \in A}}$ of events generates a ${\sigma}$ -algebra of events ${\sigma(( E_i)_{i \in A})}$ , defined as the smallest ${\sigma}$ -algebra of events that contains all of the ${E_i}$ ; with respect to a model ${\Omega}$ , one has

$\displaystyle \sigma( (E_i)_{i \in A} )_\Omega = \langle \{ (E_i)_\Omega: i \in A \} \rangle.$

Definition 30 A collection ${({\mathcal F}_i)_{i \in A}}$ of ${\sigma}$ -algebras of events are said to be jointly independent if, whenever ${X_i}$ is a random variable measurable with respect to ${{\mathcal F}_i}$ for ${i \in A}$ , the tuple ${(X_i)_{i \in A}}$ is jointly independent. Equivalently, ${({\mathcal F}_i)_{i \in A}}$ is jointly independent if and only if one has

$\displaystyle {\bf P}( \bigcap_{i \in B} E_i ) = \prod_{i \in B} {\bf P}( E_i )$

whenever ${B}$ is a finite subset of ${A}$ and ${E_i \in {\mathcal F}_i}$ for ${i \in B}$ (why is this equivalent?).

Thus, for instance, ${{\mathcal F}_1}$ and ${{\mathcal F}_2}$ are independent ${\sigma}$ -algebras of events if and only if one has

$\displaystyle {\bf P}( E_1 \wedge E_2 ) = {\bf P}(E_1) {\bf P}(E_2)$

for all ${E_1 \in {\mathcal F}_1}$ and ${E_2 \in {\mathcal F}_2}$ , that is to say that all the events in ${{\mathcal F}_1}$ are independent of all the events in ${{\mathcal F}_2}$ .
The above notion generalises the notion of independence for random variables:

Exercise 31 If ${(X_i)_{i \in A}}$ are a collection of random variables, show that ${(X_i)_{i \in A}}$ are jointly independent random variables if and only if ${(\sigma(X_i))_{i \in A}}$ are jointly independent ${\sigma}$ -algebras.

Exercise 32 Let ${X_1,X_2,\dots}$ be a sequence of random variables. Show that ${(X_n)_{n=1}^\infty}$ are jointly independent if and only if ${\sigma(X_{n+1})}$ is independent of ${\sigma(X_1,\dots,X_n)}$ for all natural numbers ${n}$ .

Suppose one has a sequence ${X_1, X_2, \dots}$ of random variables (such a sequence can be referred to as a discrete stochastic process). For each natural number ${n}$ , we can define the ${\sigma}$ -algebras ${\sigma( X_i: i >n )}$ , as the smallest ${\sigma}$ algebra that makes all of the ${X_i}$ for ${i >n}$ measurable; for instance, this ${\sigma}$ -algebra contains any event that is definable in terms of measurable relations of finitely many of the ${X_{n+1}, X_{n+2}, \dots}$ , together with countable boolean operations on such events. These ${\sigma}$ -algebras are clearly decreasing in ${n}$ . We can define the tail ${\sigma}$ -algebra ${{\mathcal T}}$ to be the intersection of all these ${\sigma}$ -algebras, that is to say ${{\mathcal T}}$ consists of those events which lie in ${\sigma(X_i: i > n)}$ for every ${n}$ . For instance, if the ${X_i}$ are scalar random variables that converge almost surely to a limit ${X}$ , then we see that (after modification on a null set) ${X}$ is measurable with respect to the tail ${\sigma}$ -algebra ${{\mathcal T}}$ .
We have the remarkable Kolmogorov 0-1 law that says that the tail ${\sigma}$ -algebra of a sequence of independent random variables is essentially trivial:

Theorem 33 (Kolmogorov zero-one law) Let ${X_1,X_2,\dots}$ be a sequence of jointly independent random variables. Then every event ${E}$ in the tail ${\sigma}$ -algebra ${{\mathcal T}}$ has probability equal to either ${0}$ or ${1}$ .

As a corollary of the zero-one law, note that any real scalar tail random variable ${Y}$ will be almost surely constant (because, for each rational ${t}$ , the event ${Y \geq t}$ is either almost surely true or almost surely false). Similarly for tail random variables taking values in ${{\bf C}}$ or ${[-\infty,+\infty]}$ .

Example 34 Let ${X_1,X_2,\dots}$ be a sequence of jointly independent random variables in ${[-\infty,+\infty]}$ (not necessarily identically distributed). The random variable ${\limsup_{n \rightarrow \infty} X_n}$ is measurable in the tail algebra, and hence must be almost surely constant, thus there exists ${c_+ \in [-\infty,\infty]}$ such that ${\limsup_{n \rightarrow \infty} X_n = c_+}$ almost surely. Similarly there exists ${c_- \in [-\infty,\infty]}$ such that ${\liminf_{n \rightarrow \infty} X_n = c_-}$ . Thus, either we have ${c_-=c_+}$ and the ${X_n}$ converge almost surely to a deterministic limit, or ${c_- \neq c_+}$ and the ${X_n}$ almost surely do not converge. What cannot happen is (for instance) that ${X_n}$ converges with probability ${1/2}$ , and diverges with probability ${1/2}$ ; the zero-one law forces the only available probabilities of tail events to be zero or one.

Proof: Since ${X_1,X_2,\dots}$ are jointly independent, the ${\sigma}$ -algebra ${\sigma( X_i: i > n )}$ is independent of ${\sigma(X_1,\dots,X_n)}$ for any ${n}$ . In particular, ${{\mathcal T}}$ is independent of ${\sigma(X_1,\dots,X_n)}$ . Since the ${\sigma}$ -algebra ${\sigma(X_i: i \geq 1)}$ is generated by the ${\sigma(X_1,\dots,X_n)}$ for ${n=1,2,3,\dots}$ , a simple application of the monotone class lemma then shows that ${{\mathcal T}}$ is also independent of ${\sigma(X_i: i \geq 1)}$ . But ${\sigma(X_i: i \geq 1)}$ contains ${{\mathcal T}}$ , hence ${{\mathcal T}}$ is independent of itself. But the only events ${E}$ that are independent of themselves have probability ${0}$ or ${1}$ , and the claim follows. $\Box$
Note that the zero-one law gives no guidance as to which of the two probabilities ${0, 1}$ actually occurs for a given tail event. This usually cannot be determined from such “soft” tools as the zero-one law; instead one often has to work with more “hard” estimates, in particular in explicit inequalities for the probabilities of various events that approximate the given tail event. On the other hand, the proof technique used to prove the Kolmogorov zero-one law is quite general, and is often adapted to prove other zero-one laws in the probability literature.
The zero-one law suggests that many asymptotic statistics of random variables will almost surely have deterministic values. We will see specific examples of this in the next few notes, when we discuss the law of large numbers and the central limit theorem.

142 comments

Comments feed for this article

28 September, 2023 at 1:06 am

Han Bing

Also, may I ask why is the compatibility condition (6) in the extension problem necessary if one is to find a measure ${\mu_A}$ obeying (7)?

[(7) follows from two applications of (6) since $\pi_B^{-1}( \pi_{B \rightarrow C}^{-1}(E_C) ) = \pi_C^{-1}(E_C)$ . -T.]

28 September, 2023 at 1:37 am

Han Bing

Finally, I have a question regarding the definition of product measure following Remark 5: $\mu_A( \pi_B^{-1}(E_B) ) = \mu_B(E_B)$ , for it seems that the set $\pi_B^{-1}(E_B)$ in $\Omega_A$ is not unique.

The pre-image $\pi_B^{-1}(E_B) := \{ \omega: \pi_B(\omega) \in E_B \}$ is uniquely determined by $\pi_B$ and $E_B$ . -T.]

29 September, 2023 at 4:08 pm

Han Bing

Thank you Prof Tao, see it now.

30 September, 2023 at 6:50 pm

Anonymous

Dear Prof Tao:
What differs the solution of Exercise 9 to that of Exercise 7? It seems that in both cases one directly show that the collection $\mathcal{F}$ of sets on which $\mu_A$ and $\mu’_A$ agree is a monotone class containing the elementary sets, and hence is the entire $\mathcal{F}_A$ by the monotone class lemma. But then where is the compatibility condition on $\mu_B$ used?

30 September, 2023 at 6:59 pm

Terence Tao

The compatibility condition is not, strictly speaking, needed for the uniqueness aspect for Problem 8, but it is needed for existence.

30 September, 2023 at 10:02 pm

Han Bing

So we can use the same argument for both Exercise 7 and Exercise 9?

4 October, 2023 at 2:55 pm

Han Bing

Dear prof Tao:
In the proof of Lemma 12, how do we show that the map $x \mapsto (\frac{d(x,q_i)}{1+d(x,q_i)})_{i \in {\bf N}}$ is measurable?

4 October, 2023 at 3:32 pm

Han Bing

And do we assume the fact that the binary expansion of points in $[0, 1]$ is essentially unique to be known?

5 October, 2023 at 11:04 am

Terence Tao

Yes, broadly speaking I would permit anything covered in a standard undergraduate mathematics curriculum to be applied to these sorts of arguments.

5 October, 2023 at 11:03 am

Terence Tao

One can first show that each individual function $x \mapsto \frac{d(x,q_i)}{1+d(x,q_i)}$ is measurable.

6 October, 2023 at 3:52 pm

Han Bing

Thank you professor.

6 October, 2023 at 4:45 pm

Han Bing

Dear professor Tao:
In the second display of Exercise 13 (the alternate Kolmogorov extension), $\mu_\infty$ should be $\mu_{\mathbf N}$ . And do we expect to do this Exercise after reading the next section on the proof of the original extension theorem?

[Corrected, thanks. One can establish Exercise 13 using the Kolmogorov Extension Theorem (Theorem 10) as a “black box”; reading the proof may provide additional intuition but is not, strictly speaking, necessary. -T]

11 October, 2023 at 5:11 pm

Han Bing

Thank you professor. Following your hint, one first show that $\mu_n$ obeys the compatibility condition of Theorem 10 by showing that $\displaystyle \mu_n(\pi^{-1}_{\mathbf{R}^n \to \mathbf{R}^m}(B)) = \mu_m(B)$ for any box $B$ in $\mathbf{R}^m$ and any $m \leq n$ , is this what you mean by “black box”?

[Yes – T.]

16 October, 2023 at 3:05 am

Anonymous

Dear professor Tao:
For Exercise 14, if we let $\mathcal{G}$ be the collection of Borel sets in the unit cube that can be approximated from inside by compact $K$ and from outside by open $U$ such that $\mu_B(U \setminus K) 0$ , then we want $latex $ to be the Borel $\sigma$ -algebra of the cube, can you give some further hint on how the monotone class lemma is being used to obtain this conclusion?

21 October, 2023 at 3:22 am

Anonymous

Hello professor Tao: Regarding the hint on part 4 of Exercise 15, I struggle to justify the fact that compact sets are elementary, given a compact $K \subset [0,1]^A$ , how can we express it as $\pi^{-1}_B(E_B)$ ?

[Not all compact sets are elementary; but the intention of the hint is to first work with sets that are both compact and elementary. I will reword the hint to reduce confusion. -T]

26 October, 2023 at 5:59 pm

Anonymous

Thank you for clarification professor Tao. Can you provide some further hint on how to establish the claim for elementary sets? If $E = pi_B^{-1}(E_B)$ is compact, then so is $E_B$ , but I struggled to see how this helps.

30 October, 2023 at 2:28 am

Anonymous

After one shows the result for compact elementary sets, then one generalizes to all elementary sets by showing that outer measure is inner regular on elementary sets. May I ask if this is what the hint is suggesting?

1 November, 2023 at 4:57 pm

Terence Tao

That is what the second part of the hint is indicating. For the first part, you may find some of the techniques in https://terrytao.wordpress.com/2010/10/21/245a-problem-solving-strategies/ to be helpful.

3 November, 2023 at 4:33 am

Anonymous

Thank you so much for clarification, dear prof Tao. For any elementary set $E \subset [0,1]^A$ , we show that $\displaystyle \mu^*(E) = \inf_{V \supset E, \text{V open elementary}} \mu_0(V)$ by noting that LHS $\leq$ RHS by definition of outer measure, and LHS $\geq$ RHS by outer regularity of $\mu_B$ . From this can’t one derive the result for all elementary $E$ rather than $E$ compact elementary?

9 November, 2023 at 4:36 am

Anonymous

Dear professor Tao:
For Exercise 14 if one let $G$ be the collection of Borel sets of $[0,1]^B$ on which $\mu_B$ is regular, can’t we show that $G$ is a $\sigma$ -algebra and thus $G = \mathcal{B}[[0,1]^B]$ ? Why is the monotone class lemma necessary for the solution?

[Perhaps this can be done directly, but I don’t think it is completely obvious to show that $G$ is a $\sigma$ -algebra just from the definitions. -T]

9 November, 2023 at 11:27 pm

Anonymous

Set $\displaystyle G = \{E \in \mathcal{B}[[0,1]^B]: \forall \varepsilon > 0, \exists K \subset E \subset U\ \text{with}\ \mu_B(U \setminus K) < \varepsilon\}$ , $K$ compact and $U$ open. We can verify that $G$ is a Boolean algebra. Let $E_1, E_2, \ldots \in G$ , with $K_i \subset E_i \subset U_i$ be such that $\mu_B(U_i \setminus K_i) < \varepsilon / 2^i$ for all $i$ , we see that $\mu_B(\bigcup_{i=1}^\infty U_i \setminus \bigcup_{i=1}^\infty K_i) = \mu_B(U \setminus K) < \varepsilon$ . From this and continuity from below we obtain $\displaystyle \mu_B(U \setminus K) = \lim_{n \to \infty} \mu_B(U \setminus \bigcup_{i=1}^n K_i) 0$ such that $K’ = \bigcup_{i=1}^n K_i$ , $K’ \subset \bigcup_{i=1}^\infty E_i \subset U$ with $\mu_B(U \setminus K’) < \varepsilon$ , $K’$ compact and $U$ open. That is, $\bigcup_{i=1}^\infty E_i \in G$ . Hence $G$ is a $\sigma$ -algebra. Note that $G$ contains all open sets $U$ . For if $U \subset [0,1]^B$ is open, then clearly it can be approximated from above by itself. By expressing $U$ as the union of a sequence of closed boxes $\bigcup_{n=1}^\infty B_n$ , and taking $K_n = \bigcup_{i=1}^n B_n$ so the sequence is increasing, we see again by continuity from below that $U$ can be approximated from below by compact sets as well. As the Borel $\sigma$ -algebra is generated by the open sets, we get the desired result.

May I ask why the hint prefers the approach by the monotone class lemma? Or is this solution flawed?

25 November, 2023 at 7:58 pm

Anonymous

Dear professor Tao: As for for the last part of Exercise 15, can you provide a bit hint on how to construct the increasing sequence $E'_n$ and how the finite additivity of the elementary measure is used to control $\mu^*(\bigcup_n E_n \Delta \bigcup_{m=1}^N E'_m)$ ?

27 November, 2023 at 1:36 pm

Anonymous

I meant to ask how should we construct the approximating sequence and how the hint on the finite additivity of elementary measure is used subsequently.

2 December, 2023 at 5:14 pm

Anonymous

Dear professor Tao: May I ask why do we also need an increasing sequence of elementary sets $E'_n$ in the last part of Exercise 15? For an increasing sequence of $E_n$ obeying the required property, and $E = \bigcup_n E_n$ , don’t we have by triangle inequality and the fact that bounded monotone sequences converge, that $\mu^*(E \Delta E'_N) < \mu^*(E \Delta E_N) + \mu^*(E_N \Delta E'_N) < 2 \varepsilon$ for large enough $N$ ?

[This argument works also. -T]

3 December, 2023 at 3:07 pm

Anonymous

Thank you so much professor Tao. Yet I realized to control the term $\mu^*(E \Delta E_N) = \mu^*(E \setminus E_N)$ , one needs to establish the superadditivity of $\mu^*$ too, can you provide a bit more explanation on your original hint on using an increasing sequence of $E'_n$ and the finite additivity of elementary measure?

4 December, 2023 at 8:00 am

Terence Tao

If one selects $E'_n$ to be elementary increasing with $\mu^*(E_n \Delta E'_n) \leq 2^n$ , then $\mu^*(E'_{n+1} \backslash E'_n) = \mu_0(E'_{n+1} \backslash E'_n)$ is summable (because the bounded monotone sequence $\mu_0(E'_n)$ converges, and from this and countable subadditivity (and the triangle inequality) one can show that $\mu(E \Delta E'_n)$ goes to zero as $n \to \infty$ .

4 December, 2023 at 2:17 pm

Anonymous

Thank you so much professor Tao!

4 December, 2023 at 8:40 pm

Anonymous

For the approximating sequence $E'_n$ , and $E' = \bigcup_n E'_n$ , I use the increasing sequence $F'_n = \bigcup_{m=1}^n E'_m$ , then the triangle inequality finally gives $\mu^*(E \Delta F'_N) \leq \mu^*(E \Delta E') + \mu^*(E' \backslash F'_N) < 2\varepsilon$ for some big $N$ . However I struggled to construct the initial approximating sequence $E'_n$ to be increasing.

5 December, 2023 at 12:57 am

Terence Tao

Actually it will be easier (and already enough) to show that it is increasing up to an error that goes to zero quickly as $n \to \infty$ (e.g., up to an error of $O(2^{-n})$ ).

6 December, 2023 at 6:14 pm

Anonymous

Dear professor Tao:
A minor typo: A redundant word “from” in Example 17. Also, following the second display after Example 17, the domain for the measurable functions $F_j$ should be ${\prod_{i \in A_j} R_i}$ instead of ${\prod_{i \in A_j} X_i}$ .

11 December, 2023 at 9:42 am

qshuyu

Dear professor Tao:
May I ask why we fail to have independence in Example 17 if ${(X,Y)}$ is drawn uniformly from say a parallelogram?

12 December, 2023 at 7:38 am

Terence Tao

If $X,Y$ is drawn from a general shape (rather than a rectangle), then in general, knowing the value of $X$ will give some information about $Y$ and vice versa. This is for instance the observation that powers Berkson’s paradox.

11 December, 2023 at 11:57 am

qshuyu

Also, can you elaborate more on your comment above Exercise 18 where you state “This is the property of ${X}$ and ${Y}$ which is equivalent to independence (as can be seen by specialising to those ${F,G}$ that take values in ${\{0,1\}}$ “)?

12 December, 2023 at 7:40 am

Terence Tao

If for instance $F = 1_S$ and $G = 1_T$ then ${\bf E} F(X) G(Y) = {\bf P} (X \in S \wedge Y \in T)$ and ${\bf E} F(X) {\bf E} G(Y) = {\bf P}(X \in S) {\bf P}(Y \in T)$ .

12 December, 2023 at 9:09 am

qshuyu

Thank you so much.

13 December, 2023 at 12:51 am

Anonymous

Dear professor Tao:
For Exercise 18, by the fact that every $\sigma$ -compact space is Lindelöf, we can find a point $y$ in the range of $X$ such that ${\bf P}(X \in B(y, \varepsilon)) = 1$ for arbitrarily small $\varepsilon$ . Yet I’m not sure how to get from this to ${\bf P}(X = y) = 1$, does this involve the unused condition of local compactness?

13 December, 2023 at 10:59 am

Anonymous

It seems that once we get this, then continuity from above alone implies that ${\bf P}(X = y) = 1$ , may I ask if I miss some technical detail regarding the local compactness part of the condition?

16 December, 2023 at 8:25 am

Terence Tao

One can probably relax the topological conditions here somewhat (e.g., a common relaxation is to assume one is on a Polish space, rather than a locally compact, sigma-compact space), but for these notes I did not seek the absolutely minimal regularity hypotheses required to make the assertions valid, as these sorts of technicalities are not really the focus of this probability theory course.

15 December, 2023 at 1:25 am

Anonymous

Dear professor Tao:
Concerning Exercise 21, I was trying to induct on boxes of the form $(-\infty, t_i]$ to extend the independence property to the Borel algebra ${\mathcal B}[{\bf R}^n]$ , but run into issues in showing that this property is closed under complement and countable union, is this a valid approach?

15 December, 2023 at 1:26 am

Anonymous

I mean boxes of the form $\prod_{i=1}^n (-\infty, t_i]$ .

16 December, 2023 at 8:38 am

Terence Tao

Try working as an intermediate step with half-open rectangular boxes (which can be expressed as finite boolean combinations of boxes), and then with half-open elementary sets (finite boolean combinations of those boxes).

18 December, 2023 at 12:12 am

Anonymous

Dear professor Tao:
Thank you so much for replying. Let $\mu$ be the law of $(X_i)_{i=1}^n$ and $v$ the product measure, if $C$ is the collection of Borel subsets $S$ s.t $\mu(S) = v(S)$ , then $C$ can be shown to be a monotone class. Are you suggesting we show that $C$ contains a Boolean algebra containing the half-open elementary sets, and use the monotone class theorem?

19 December, 2023 at 10:39 am

Anonymous

Dear professor Tao:
A small typo: In Exercise 22, the random variables ${\langle X, v_1 \rangle, \dots, \langle X, v_n \rangle}$ should be ${\langle X_1, v_1 \rangle, \dots, \langle X_n, v_n \rangle}$

19 December, 2023 at 2:53 pm

Anonymous

(I apologize for my misunderstanding, nothing is wrong here.)

20 December, 2023 at 4:53 pm

Anonymous

Dear professor Tao:
Can you give a bit hint on how linear independence can imply joint independence in Exercise 22?

23 December, 2023 at 5:26 pm

Anonymous

Without loss of generality, set $V = F^r$ with basis $\{e_1, \dots, e_r\}$ , and $|F| = q$ . The event $\bigwedge_{i=1}^n (\langle X,v_i \rangle = t_i)$ with $t_i \in F$ corresponds to the system of linear equations $Bx = t$ , where $B$ is the $n \times r$ matrix with rows $(\langle e_k,v_i \rangle)_{1 \leq k \leq r}, 1 \leq i \leq n$ , and $t = (t_1, \dots, t_n)^t$ . If $s$ is the dimension of the span of $e_1, \dots, e_r$ , then rank( $B$ ) is exactly $s$ since it has $s$ independent rows. Hence the dimension of the solution space equals $r - s$ , and ${\bf P}(\bigwedge_{i=1}^n (\langle X,v_i \rangle = t_i)) = q^{r-s}/q^r = 1/q^s$ . Similarly ${\bf P}(\langle X,v_i \rangle = t_i) = 1/q$ , so by Exercise 20 the random variables are jointly independent iff the $s = n$ . i.e. iff the vectors are linearly independent. I don’t see how the non-degeneracy assumption is used here. Is this argument flawed?

23 December, 2023 at 7:50 pm

Anonymous

I think I see where we need the non-degeneracy assumption. It ensures that we don’t get $0$ as row vectors of $B$ (since none of the $v_i = 0$ ), only then can we safely deduce that $B$ has $s$ linearly independent rows. Is this correct, professor Tao?

29 December, 2023 at 11:02 am

Anonymous

Dear professor Tao:
Can you provide a little more directions on Exercise 24(orthogonality and independence)?

[First try the model case where $v_1,\dots,v_m$ are the standard coordinate bases $e_1,\dots,e_m$ . You can set $m=n=2$ if you wish to make things a little more concrete. -T]

2 January, 2024 at 1:09 am

Anonymous

Thank you for replying professor. I find it hard though to use the fact that $X$ is a Gaussian random variable. Are we supposed to show this by definition?

2 January, 2024 at 12:53 pm

Terence Tao

Yes; for instance in the model case $n=2, v_1 = e_1, v_2 = e_2$ , one can compute the distributions of $X \cdot v_1, X \cdot v_2$ by direct computation (performing one-dimensional Gaussian integrals), and show that the product of these distributions agrees with the distribution of the joint random variable $X = (X \cdot v_1, X \cdot v_2)$ .

3 January, 2024 at 8:45 am

Anonymous

Let $f_i(\omega) = X(\omega) \cdot v_i for all 1 \leq i \leq m$ , then $\displaystyle \mu_{(X \cdot v_i)_{1 \leq i \leq m}} (\prod_{i=1}^m S_i) = \int_{\bigcap_{i=1}^m f_i^{-1}(S_i)} \frac{1}{(2\pi)^{n/2}} e^{-|x|^2/2}\ dx$ . By direct computation, are you suggesting one show that the integral splits over the intersection if and only if we have orthogonality, professor Tao?

7 January, 2024 at 10:39 pm

Anonymous

Dear professor Tao: For this Exercise is it required to use the rotational invariance of the standard Gaussian distribution, and the linear change of variables formula?

[Yes – T.]

5 January, 2024 at 1:04 am

Anonymous

Dear professor Tao:
I believe we need the scalar r.v $X$ and $Y$ to be independent in the display tagged $(8)$ , because later it is said that “we caution however that the converse is not true…”

[Corrected, thanks – T.]

11 January, 2024 at 9:30 pm

Anonymous

Dear professor Tao;
In the proof of proposition 26, ${X}$ should be modeled by ${X_{\Omega'}(\omega,x) := x}$ .

22 January, 2024 at 9:34 pm

Anonymous

Dear professor Tao:
I have a conceptual misunderstanding of Exercise 29. Let $D = Q^{-1} \Sigma Q$ be the eigenvalue decomposition of $\Sigma$ , where $Q$ is orthogonal, $D$ is diagonal with its $j^{th}$ diagonal entry $\lambda_j$ the eigenvalue of $\Sigma$ corresponding to the $j^{th}$ column of $Q$ . Then by consecutive linear change of variables, one seems to get $\displaystyle {\bf P}(X \in S) = c\int_{S} e^{-\frac{1}{2} (w-\mu)^TD^{-1}(w-\mu)}\ dw$ , where $c = \frac{1}{(2\pi)^{n/2}(\det{\Sigma})^{1/2}}$ . In particular, the joint distribution splits into product of the marginal distribution of the components, and we get independence automatically, which is obviously incorrect, where did it possibly go wrong?

23 January, 2024 at 12:37 pm

Anonymous

Dear professor Tao:
Actually I see the blunder in the derivation now. But may I still ask how should one get the marginal distribution of each component $X_i$ to calculate ${\bf E}(X_i)$ when the joint pdf does not readily split into product of the marginal pdfs?

26 January, 2024 at 11:33 am

Anonymous

Dear professor Tao:
Regarding Exercise 29, if $\Sigma = QDQ^T$ is the eigenvalue decomposition of $\Sigma$ , and $Y \sim N(0_{{\bf R}^n}, D)$ , then $QY + \mu$ is a copy of $X$ and the fact that ${\bf E}X_i = \mu_i$ follows by linearity.

This approach gives no information however on the marginal distribution of the components $X_i$, is there a way to calculate the marginal distributions without using the linear transformation theorem for multivariate normal distribution, the proof of which relies in turn on the use of generating functions or characteristic functions, which have not yet been fully developed at this point in the notes?

26 January, 2024 at 8:08 pm

Anonymous

Finally, by change of variables $\hbox{Cov}(X_i, X_j) = \int_{{\bf R}^n} (x_i - \mu_i)(x_j - \mu_j)\ d\mu_X(x)$ , I have some difficulties showing this evaluate to $\sigma_{ij}$ , can you provide some hint on this, professor Tao?

27 January, 2024 at 11:44 pm

Anonymous

Dear professor Tao: In the second part of Exercise 29, we should have the covariances vanish for all off-diagonal entries rather than just the entries above the diagonal.

30 January, 2024 at 9:10 pm

Anonymous

Dear professor Tao:
In the proof of the Kolmogorov zero-one law, the monotone class lemma is used to show that $\mathcal{T}$ is independent of $\sigma(X_i: i \geq 1)$ , may I ask what is the larger class containing the generators $\sigma(X_1, \dots, X_n)$ to which $\mathcal{T}$ is independent, that allow us to use the said lemma? Why can’t we use induction in this case?

[One applies the monotone class lemma to the collection of events that are independent of $\mathcal{T}$ , which is easily seen to be a monotone class containing each of the $\sigma(X_1, \dots, X_n)$ , and hence $\sigma(X_i: i \geq 1)$ . Note that the latter $\sigma$ -algebra is not simply the union of the preceding ones, but is instead the $\sigma$ -algebra (or monotone class) generated by that union. -T]

	Alex Gunning on A symmetric formulation of the…
	Terence Tao on On product representations of…
	domotorp on On product representations of…
	Terence Tao on 275A, Notes 3: The weak and st…
	Terence Tao on A symmetric formulation of the…
	Anonymous on On product representations of…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…
	Alex Gunning on A symmetric formulation of the…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…
	Anonymous on Work hard

275A, Notes 2: Product measures and independence

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

142 comments

Leave a comment Cancel reply

For commenters

275A, Notes 2: Product measures and independence

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

142 comments

Leave a comment Cancel reply

For commenters