Thus far, we have only focused on measure and integration theory in the context of Euclidean spaces {{\bf R}^d}. Now, we will work in a more abstract and general setting, in which the Euclidean space {{\bf R}^d} is replaced by a more general space {X}.

It turns out that in order to properly define measure and integration on a general space {X}, it is not enough to just specify the set {X}. One also needs to specify two additional pieces of data:

  1. A collection {{\mathcal B}} of subsets of {X} that one is allowed to measure; and
  2. The measure {\mu(E) \in [0,+\infty]} one assigns to each measurable set {E \in {\mathcal B}}.

For instance, Lebesgue measure theory covers the case when {X} is a Euclidean space {{\bf R}^d}, {{\mathcal B}} is the collection {{\mathcal B} = {\mathcal L}[{\bf R}^d]} of all Lebesgue measurable subsets of {{\bf R}^d}, and {\mu(E)} is the Lebesgue measure {\mu(E)=m(E)} of {E}.

The collection {{\mathcal B}} has to obey a number of axioms (e.g. being closed with respect to countable unions) that make it a {\sigma}-algebra, which is a stronger variant of the more well-known concept of a boolean algebra. Similarly, the measure {\mu} has to obey a number of axioms (most notably, a countable additivity axiom) in order to obtain a measure and integration theory comparable to the Lebesgue theory on Euclidean spaces. When all these axioms are satisfied, the triple {(X, {\mathcal B}, \mu)} is known as a measure space. These play much the same role in abstract measure theory that metric spaces or topological spaces play in abstract point-set topology, or that vector spaces play in abstract linear algebra.

On any measure space, one can set up the unsigned and absolutely convergent integrals in almost exactly the same way as was done in the previous notes for the Lebesgue integral on Euclidean spaces, although the approximation theorems are largely unavailable at this level of generality due to the lack of such concepts as “elementary set” or “continuous function” for an abstract measure space. On the other hand, one does have the fundamental convergence theorems for the subject, namely Fatou’s lemma, the monotone convergence theorem and the dominated convergence theorem, and we present these results here.

One question that will not be addressed much in this current set of notes is how one actually constructs interesting examples of measures. We will discuss this issue more in later notes (although one of the most powerful tools for such constructions, namely the Riesz representation theorem, will not be covered until 245B).

— 1. Boolean algebras —

We begin by recalling the concept of a Boolean algebra.

Definition 1 (Boolean algebras) Let {X} be a set. A (concrete) Boolean algebra on {X} is a collection {{\mathcal B}} of {X} which obeys the following properties:

  • (Empty set) {\emptyset \in {\mathcal B}}.
  • (Complement) If {E \in {\mathcal B}}, then the complement {E^c := X \backslash E} also lies in {{\mathcal B}}.
  • (Finite unions) If {E, F \in {\mathcal B}}, then {E \cup F \in {\mathcal B}}.

We sometimes say that {E} is {{\mathcal B}}-measurable, or measurable with respect to {{\mathcal B}}, if {E \in {\mathcal B}}.

Given two Boolean algebras {{\mathcal B}, {\mathcal B}'} on {X}, we say that {{\mathcal B}'} is finer than, a sub-algebra of, or a refinement of {{\mathcal B}}, or that {{\mathcal B}} is coarser than or a coarsening of {{\mathcal B}'}, if {{\mathcal B} \subset {\mathcal B}'}.

We have chosen a “minimalist” definition of a Boolean algebra, in which one is only assumed to be closed under two of the basic Boolean operations, namely complement and finite union. However, by using the laws of Boolean algebra (such as de Morgan’s laws), it is easy to see that a Boolean algebra is also closed under other Boolean algebra operations such as intersection {E \cap F}, set differerence {E \backslash F}, and symmetric difference {E \Delta F}. So one could have placed these additional closure properties inside the definition of a Boolean algebra without any loss of generality. However, when we are verifying that a given collection {{\mathcal B}} of sets is indeed a Boolean algebra, it is convenient to have as minimal a set of axioms as possible. (This point is discussed further in this Math Overflow comment of mine.)

Remark 1 One can also consider abstract Boolean algebras {{\mathcal B}}, which do not necessarily live in an ambient domain {X}, but for which one has a collection of abstract Boolean operations such as meet {\wedge} and join {\vee} instead of the concrete operations of intersection {\cap} and union {\cup}. We will not take this abstract perspective here, but see this blog post of mine for some further discussion of the relationship between concrete and abstract Boolean algebras, which is codified by Stone’s theorem.

Example 1 (Trivial and discrete algebra) Given any set {X}, the coarsest Boolean algebra is the trivial algebra {\{ \emptyset, X \}}, in which the only measurable sets are the empty set and the whole set. The finest Boolean algebra is the discrete algebra {2^X := \{ E: E \subset X\}}, in which every set is measurable. All other Boolean algebras are intermediate between these two extremes: finer than the trivial algebra, but coarser than the discrete one.

Exercise 1 (Elementary algebra) Let {\overline{{\mathcal E}[{\bf R}^d]}} be the collection of those sets {E \subset {\bf R}^d} that are either elementary sets, or co-elementary sets (i.e. the complement of an elementary set). Show that {\overline{{\mathcal E}[{\bf R}^d]}} is a Boolean algebra. We will call this algebra the elementary Boolean algebra of {{\bf R}^d}.

Example 2 (Jordan algebra) Let {\overline{{\mathcal J}[{\bf R}^d]}} be the collection of subsets of {{\bf R}^d} that are either Jordan measurable or co-Jordan measurable (i.e. the complement of a Jordan measurable set). Then {\overline{{\mathcal J}[{\bf R}^d]}} is a Boolean algebra that is finer than the elementary algebra. We refer to this algebra as the Jordan algebra on {{\bf R}^d} (but caution that there is a completely different concept of a Jordan algebra in mathematics.)

Example 3 (Lebesgue algebra) Let {{\mathcal L}[{\bf R}^d]} be the collection of Lebesgue measurable subsets of {{\bf R}^d}. Then {{\mathcal L}[{\bf R}^d]} is a Boolean algebra that is finer than the Jordan algebra; we refer to this as the Lebesgue algebra on {{\bf R}^d}.

Example 4 (Null algebra) Let {{\mathcal N}({\bf R}^d)} be the collection of subsets of {{\bf R}^d} that are either Lebesgue null sets or Lebesgue co-null sets (the complement of null sets). Then {{\mathcal N}({\bf R}^d)} is a Boolean algebra that is coarser than the Lebesgue algebra; we refer to it as the null algebra on {{\bf R}^d}.

Exercise 2 (Restriction) Let {{\mathcal B}} be a Boolean algebra on a set {X}, and let {Y} be a subset of {X} (not necessarily {{\mathcal B}}-measurable). Show that the restriction {{\mathcal B}\downharpoonright_Y := \{ E \cap Y: E \in {\mathcal B} \}} of {{\mathcal B}} to {Y} is a Boolean algebra on {Y}. If {Y} is {{\mathcal B}}-measurable, show that

\displaystyle  {\mathcal B}\downharpoonright_Y = {\mathcal B} \cap 2^Y = \{ E \subset Y: E \in {\mathcal B} \}.

Example 5 (Atomic algebra) Let {X} be partitioned into a union {X = \bigcup_{\alpha \in I} A_\alpha} of disjoint sets {A_\alpha}, which we refer to as atoms. Then this partition generates a Boolean algebra {{\mathcal A}( (A_\alpha)_{\alpha \in I} )}, defined as the collection of all the sets {E} of the form {E = \bigcup_{\alpha \in J} A_\alpha} for some {J \subset I}, i.e. {{\mathcal A}( (A_\alpha)_{\alpha \in I} )} is the collection of all sets that can be represented as the union of one or more atoms. This is easily verified to be a Boolean algebra, and we refer to it as the atomic algebra with atoms {(A_\alpha)_{\alpha \in I}}. The trivial algebra corresponds to the trivial partition {X = X} into a single atom; at the other extreme, the discrete algebra corresponds to the discrete partition {X = \bigcup_{x \in X} \{x\}} into singleton atoms. More generally, note that finer (resp. coarser) partitions lead to finer (resp. coarser) atomic algebra. In this definition, we permit some of the atoms in the partition to be empty; but it is clear that empty atoms have no impact on the final atomic algebra, and so without loss of generality one can delete all empty atoms and assume that all atoms are non-empty if one wishes.

Example 6 (Dyadic algebras) Let {n} be an integer. The dyadic algebra {{\mathcal D}_n({\bf R}^d)} at scale {2^{-n}} in {{\bf R}^d} is defined to be the atomic algebra generated by the half-open dyadic cubes

\displaystyle  [\frac{i_1}{2^n}, \frac{i_1+1}{2^n}) \times \ldots \times [\frac{i_d}{2^n}, \frac{i_d+1}{2^n})

of length {2^{-n}} (see Exercise 14 of the prologue). These are Boolean algebras which are increasing in {n}: {{\mathcal D}_{n+1} \supset {\mathcal D}_{n}}. Draw a diagram to indicate how these algebras sit in relation to the elementary, Jordan, and Lebesgue, null, discrete, and trivial algebras.

Remark 2 The dyadic algebras are analogous to the finite resolution one has on modern computer monitors, which subdivide space into square pixels. A low resolution monitor (in which each pixel has a large size) can only resolve a very small set of “blocky” images, as opposed to the larger class of images that can be resolved by a finer resolution monitor.

Exercise 3 Show that the non-empty atoms of an atomic algebra are determined up to relabeling. More precisely, show that if {X = \bigcup_{\alpha \in I} A_\alpha = \bigcup_{\alpha' \in I'} A'_{\alpha'}} are two partitions of {X} into non-empty atoms {A_\alpha}, {A'_{\alpha'}}, then {{\mathcal A}( (A_\alpha)_{\alpha \in I} ) = {\mathcal A}( (A'_{\alpha'})_{\alpha' \in I'} )} if and only if exists a bijection {\phi: I \rightarrow I'} such that {A'_{\phi(\alpha)} = A_\alpha} for all {\alpha \in I}.

While many Boolean algebras are atomic, many are not, as the following two exercises indicate.

Exercise 4 Show that every finite Boolean algebra is an atomic algebra. (A Boolean algebra {{\mathcal B}} is finite if its cardinality is finite, i.e. there are only finitely many measurable sets.) Conclude that every finite Boolean algebra has a cardinality of the form {2^{n}} for some natural number {n}. From this exercise and Exercise 3 we see that there is a one-to-one correspondence between finite Boolean algebras on {X} and finite partitions of {X} into non-empty sets (up to relabeling).

Exercise 5 Show that the elementary, Jordan, Lebesgue, and null algebras are not atomic algebras. (Hint: argue by contradiction. If these algebras were atomic, what must the atoms be?)

Now we describe some further ways to generate Boolean algebras.

Exercise 6 (Intersection of algebras) Let {({\mathcal B}_\alpha)_{\alpha \in I}} be a family of Boolean algebras on a set {X}, indexed by a (possibly infinite or uncountable) label set {I}. Show that the intersection {\bigwedge_{\alpha \in I} {\mathcal B}_\alpha := \bigcap_{\alpha \in I} {\mathcal B}_\alpha} of these algebras is still a Boolean algebra, and is the finest Boolean algebra that is coarser than all of the {{\mathcal B}_\alpha}. (If {I} is empty, we adopt the convention that {\bigwedge_{\alpha \in I} {\mathcal B}_\alpha} is the discrete algebra.)

Definition 2 (Generation of algebras) Let {{\mathcal F}} be any family of sets in {X}. We define {\langle {\mathcal F} \rangle_{\hbox{bool}}} to be the intersection of all the Boolean algebras that contain {{\mathcal F}}, which is again a Boolean algebra by Exercise 6. Equivalently, {\langle {\mathcal F} \rangle_{\hbox{bool}}} is the coarsest Boolean algebra that contains {{\mathcal F}}. We say that {\langle {\mathcal F} \rangle_{\hbox{bool}}} is the Boolean algebra generated by {{\mathcal F}}.

Example 7 {{\mathcal F}} is a Boolean algebra if and only if {\langle {\mathcal F} \rangle_{\hbox{bool}} = {\mathcal F}}; thus each Boolean algebra is generated by itself.

Exercise 7 Show that the elementary algebra {{\mathcal E}({\bf R}^d)} is generated by the collection of boxes in {{\bf R}^d}.

Exercise 8 Let {n} be a natural number. Show that if {{\mathcal F}} is a finite collection of {n} sets, then {\langle {\mathcal F} \rangle_{\hbox{bool}}} is a finite Boolean algebra of cardinality at most {2^{2^n}} (in particular, finite sets generate finite algebras). Give an example to show that this bound is best possible. (Hint: for the latter, it may be convenient to use a discrete ambient space such as the discrete cube {X = \{0,1\}^n}.)

The Boolean algebra {\langle {\mathcal F} \rangle_{\hbox{bool}}} can be described explicitly in terms of {{\mathcal F}} as follows:

Exercise 9 (Recursive description of a generated Boolean algebra) Let {{\mathcal F}} be a collection of sets in a set {X}. Define the sets {{\mathcal F}_0, {\mathcal F}_1, {\mathcal F}_2, \ldots} recursively as follows:

  • {{\mathcal F}_0 := {\mathcal F}}.
  • For each {n \geq 1}, we define {{\mathcal F}_n} to be the collection of all sets that either the union of a finite number of sets in {{\mathcal F}_{n-1}} (including the empty union {\emptyset}), or the complement of such a union.

Show that {\langle {\mathcal F} \rangle_{\hbox{bool}} = \bigcup_{n=0}^\infty {\mathcal F}_n}.

— 2. {\sigma}-algebras and measurable spaces —

In order to obtain a measure and integration theory that can cope well with limits, the finite union axiom of a Boolean algebra is insufficient, and must be improved to a countable union axiom:

Definition 3 (Sigma algebras) Let {X} be a set. A {\sigma}-algebra on {X} is a collection {{\mathcal B}} of {X} which obeys the following properties:

  • (Empty set) {\emptyset \in {\mathcal B}}.
  • (Complement) If {E \in {\mathcal B}}, then the complement {E^c := X \backslash E} also lies in {{\mathcal B}}.
  • (Countable unions) If {E_1, E_2, \ldots \in {\mathcal B}}, then {\bigcup_{n=1}^\infty E_n \in {\mathcal B}}.

We refer to the pair {(X, {\mathcal B})} of a set {X} together with a {\sigma}-algebra on that set as a measurable space.

Remark 3 The prefix {\sigma} usually denotes “countable union”. See also the concepts of a {\sigma}-compact topological space, or a {\sigma}-finite measure space, or {F_\sigma} set for other instances of this prefix.

From de Morgan’s law (which is just as valid for infinite unions and intersections as it is for finite ones), we see that {\sigma}-algebras are closed under countable intersections as well as countable unions.

By padding a finite union into a countable union by using the empty set, we see that every {\sigma}-algebra is automatically a Boolean algebra. Thus, we automatically inherit the notion of being measurable with respect to a {\sigma}-algebra, or of one {\sigma}-algebra being coarser or finer than another.

Exercise 10 Show that all atomic algebras are {\sigma}-algebras. In particular, the discrete algebra and trivial algebra are {\sigma}-algebras, as are the finite algebras and the dyadic algebras on Euclidean spaces.

Exercise 11 Show that the Lebesgue and null algebras are {\sigma}-algebras, but the elementary and Jordan algebras are not.

Exercise 12 Show that any restriction {{\mathcal B}\downharpoonright_Y} of a {\sigma}-algebra {{\mathcal B}} to a subspace {Y} of {X} (as defined in Exercise 2) is again a {\sigma}-algebra on the subspace {Y}.

There is an exact analogue of Exercise 6:

Exercise 13 (Intersection of {\sigma}-algebras) Show that the intersection {\bigwedge_{\alpha \in I} {\mathcal B}_\alpha := \bigcap_{\alpha \in I} {\mathcal B}_\alpha} of an arbitrary (and possibly infinite or uncountable) number of {\sigma}-algebras {{\mathcal B}_\alpha} is again a {\sigma}-algebra, and is the finest {\sigma}-algebra that is coarser than all of the {{\mathcal B}_\alpha}.

Similarly, we have a notion of generation:

Definition 4 (Generation of {\sigma}-algebras) Let {{\mathcal F}} be any family of sets in {X}. We define {\langle {\mathcal F} \rangle} to be the intersection of all the {\sigma}-algebras that contain {{\mathcal F}}, which is again a {\sigma}-algebra by Exercise 13. Equivalently, {\langle {\mathcal F} \rangle} is the coarsest {\sigma}-algebra that contains {{\mathcal F}}. We say that {\langle {\mathcal F} \rangle} is the {\sigma}-algebra generated by {{\mathcal F}}.

Since every {\sigma}-algebra is a Boolean algebra, we have the trivial inclusion

\displaystyle  \langle {\mathcal F} \rangle_{\hbox{bool}} \subset \langle {\mathcal F} \rangle.

However, equality need not hold; it only holds if and only if {\langle {\mathcal F} \rangle_{\hbox{bool}}} is a {\sigma}-algebra. For instance, if {{\mathcal F}} is the collection of all boxes in {{\bf R}^d}, then {\langle {\mathcal F} \rangle_{\hbox{bool}}} is the elementary algebra (Exercise 7), but {\langle {\mathcal F} \rangle} cannot equal this algebra, as it is not a {\sigma}-algebra.

Remark 4 From the definitions, it is clear that we have the following principle, somewhat analogous to the principle of mathematical induction: if {{\mathcal F}} is a family of sets in {X}, and {P(E)} is a property of sets {E \subset X} which obeys the following axioms:

  • {P(\emptyset)} is true.
  • {P(E)} is true for all {E \in {\mathcal F}}.
  • If {P(E)} is true for some {E \subset X}, then {P( X \backslash E )} is true also.
  • If {E_1,E_2,\ldots \subset X} are such that {P(E_n)} is true for all {n}, then {P(\bigcup_{n=1}^\infty E_n)} is true also.

Then one can conclude that {P(E)} is true for all {E \in \langle {\mathcal F} \rangle}. Indeed, the set of all {E} for which {P(E)} holds is a {\sigma}-algebra that contains {{\mathcal F}}, whence the claim. This principle is particularly useful for establishing properties of Borel measurable sets (see below).

We now turn to an important example of a {\sigma}-algebra:

Definition 5 (Borel {\sigma}-algebra) Let {X} be a metric space, or more generally a topological space. The Borel {\sigma}-algebra {{\mathcal B}[X]} of {X} is defined to be the {\sigma}-algebra generated by the open subsets of {X}. Elements of {{\mathcal B}[X]} will be called Borel measurable.

Thus, for instance, the Borel {\sigma}-algebra contains the open sets, the closed sets (which are complements of open sets), the countable unions of closed sets (i.e. {F_\sigma} sets), the countable intersections of open sets (i.e. {G_\delta} sets), the countable intersections of {F_\sigma} sets, and so forth.

In {{\bf R}^d}, every open set is Lebesgue measurable, and so we see that the Borel {\sigma}-algebra is coarser than the Lebesgue {\sigma}-algebra. We will shortly see, though, that the two {\sigma}-algebras are not equal.

We defined the Borel {\sigma}-algebra to be generated by the open sets. However, they are also generated by several other sets:

Exercise 14 Show that the Borel {\sigma}-algebra {{\mathcal B}[{\bf R}^d]} of a Euclidean set is generated by any of the following collections of sets:

  1. The open subsets of {{\bf R}^d}.
  2. The closed subsets of {{\bf R}^d}.
  3. The compact subsets of {{\bf R}^d}.
  4. The open balls of {{\bf R}^d}.
  5. The boxes in {{\bf R}^d}.
  6. The elementary sets in {{\bf R}^d}.

(Hint: To show that two families {{\mathcal F}, {\mathcal F}'} of sets generate the same {\sigma}-algebra, it suffices to show that every {\sigma}-algebra that contains {{\mathcal F}}, contains {{\mathcal F}'} also, and conversely.)

There is an analogue of Exercise 9, which illustrates the extent to which a generated {\sigma}-algebra is “larger” than the analogous generated Boolean algebra:

Exercise 15 (Recursive description of a generated {\sigma}-algebra) (This exercise requires familiarity with the theory of ordinals, which is reviewed here. Recall that we are assuming the axiom of choice throughout this course.) Let {{\mathcal F}} be a collection of sets in a set {X}, and let {\omega_1} be the first uncountable ordinal. Define the sets {{\mathcal F}_\alpha} for every countable ordinal {\alpha \in \omega_1} via transfinite induction as follows:

  • {{\mathcal F}_\alpha := {\mathcal F}}.
  • For each countable successor ordinal {\alpha = \beta+1}, we define {{\mathcal F}_\alpha} to be the collection of all sets that either the union of an at most countable number of sets in {{\mathcal F}_{n-1}} (including the empty union {\emptyset}), or the complement of such a union.
  • For each countable limit ordinal {\alpha = \sup_{\beta < \alpha} \beta}, we define {{\mathcal F}_\alpha := \bigcup_{\beta < \alpha} {\mathcal F}_\beta}.

Show that {\langle {\mathcal F} \rangle = \bigcup_{\alpha \in \omega_1} {\mathcal F}_\alpha}.

Remark 5 The first uncountable ordinal {\omega_1} will make several further cameo appearances throughout this course, for instance by generating counterexamples to various plausible statements in point-set topology. In the case when {{\mathcal F}} is the collection of open sets in a topological space, so that {\langle {\mathcal F} \rangle}, then the sets {{\mathcal F}_\alpha} are essentially the Borel hierarchy (which starts at the open and closed sets, then moves on to the {F_\sigma} and {G_\delta} sets, and so forth); these play an important role in descriptive set theory.

Exercise 16 (This exercise requires familiarity with the theory of cardinals.) Let {{\mathcal F}} be an infinite family of subsets of {X} of cardinality {\kappa} (thus {\kappa} is an infinite cardinal). Show that {\langle {\mathcal F} \rangle} has cardinality at most {\kappa^{\aleph_0}}. (Hint: use Exercise 15.) In particular, show that the Borel {\sigma}-algebra {{\mathcal B}[{\bf R}^d]} has cardinality at most {c := 2^{\aleph_0}}.

Conclude that there exist Jordan measurable (and hence Lebesgue measurable) subsets of {{\bf R}^d} which are not Borel measurable. (Hint: How many subsets of the Cantor set are there?) Use this to place the Borel {\sigma}-algebra on the diagram that you drew for Exercise 6.

Remark 6 Despite this demonstration that not all Lebesgue measurable subsets are Borel measurable, it is remarkably difficult (though not impossible) to exhibit a specific set that is not Borel measurable. Indeed, a large majority of the explicitly constructible sets that one actually encounters in practice tend to be Borel measurable, and one can view the property of Borel measurability intuitively as a kind of “constructibility” property. (Indeed, as a very crude first approximation, one can view the Borel measurable sets as those sets of “countable descriptive complexity”; in contrast, sets of finite descriptive complexity tend to be Jordan measurable (assuming they are bounded, of course).

Exercise 17 Let {E, F} be Borel measurable subsets of {{\bf R}^{d_1}, {\bf R}^{d_2}} respectively. Show that {E \times F} is a Borel measurable subset of {{\bf R}^{d_1+d_2}}. (Hint: first establish this in the case when {F} is a box, by using Remark 4. To obtain the general case, apply Remark 4 yet again.)

The above exercise has a partial converse:

Exercise 18 Let {E} be a Borel measurable subset of {{\bf R}^{d_1+d_2}}.

  1. Show that for any {x_1 \in {\bf R}^{d_1}}, the slice {\{ x_2 \in {\bf R}^{d_2}: (x_1,x_2) \in E \}} is a Borel measurable subset of {{\bf R}^{d_2}}. Similarly, show that for every {x_2 \in {\bf R}^{d_2}}, the slice {\{ x_1 \in {\bf R}^{d_1}: (x_1,x_2) \in E \}} is a Borel measurable subset of {{\bf R}^{d_1}}.
  2. Give a counterexample to show that this claim is not true if “Borel” is replaced with “Lebesgue” throughout. (Hint: the Cartesian product of any set with a point is a null set, even if the first set was not measurable.)

Exercise 19 Show that the Lebesgue {\sigma}-algebra on {{\bf R}^d} is generated by the union of the Borel {\sigma}-algebra and the null {\sigma}-algebra.

— 3. Countably additive measures and measure spaces —

Having set out the concept of a {\sigma}-algebra a measurable space, we now endow these structures with a measure.

We begin with the finitely additive theory, although this theory is too weak for our purposes and will soon be supplanted by the countably additive theory.

Definition 6 (Finitely additive measure) Let {{\mathcal B}} be a Boolean algebra on a space {X}. An (unsigned) finitely additive measure {\mu} on {{\mathcal B}} is a map {\mu: {\mathcal B} \rightarrow [0,+\infty]} that obeys the following axioms:

  1. (Empty set) {\mu(\emptyset) = 0}.
  2. (Finite additivity) Whenever {E, F \in {\mathcal B}} are disjoint, then {\mu(E \cup F) = \mu(E) + \mu(F)}.

Remark 7 The empty set axiom is needed in order to rule out the degenerate situation in which every set (including the empty set) has infinite measure.

Example 8 Lebesgue measure {m} is a finitely additive measure on the Lebesgue {\sigma}-algebra, and hence on all sub-algebras (such as the null algebra, the Jordan algebra, or the elementary algebra). In particular, Jordan measure and elementary measure are finitely additive (adopting the convention that co-Jordan measurable sets have infinite Jordan measure, and co-elementary sets have infinite elementary measure).

On the other hand, as we saw in previous notes, Lebesgue outer measure is not finitely additive on the discrete algebra, and Jordan outer measure is not finitely additive on the Lebesgue algebra.

Example 9 (Dirac measure) Let {x \in X} and {{\mathcal B}} be an arbitrary Boolean algebra on {X}. Then the Dirac measure {\delta_x} at {x}, defined by setting {\delta_x(E) := 1_E(x)}, is finitely additive.

Example 10 (Zero measure) The zero measure {0: E \mapsto 0} is a finitely additive measure on any Boolean algebra.

Example 11 (Linear combinations of measures) If {{\mathcal B}} is a Boolean algebra on {X}, and {\mu, \nu: {\mathcal B} \rightarrow [0,+\infty]} are finitely additive measures on {{\mathcal B}}, then {\mu+\nu: E \mapsto \mu(E) + \nu(E)} is also a finitely additive measure, as is {c\mu: E \mapsto c \times \mu(E)} for any {c \in [0,+\infty]}. Thus, for instance, the sum of Lebesgue measure and a Dirac measure is also a finitely additive measure on the Lebesgue algebra (or on any of its sub-algebras).

Example 12 (Restriction of a measure) If {{\mathcal B}} is a Boolean algebra on {X}, {\mu: {\mathcal B} \rightarrow [0,+\infty]} is a finitely additive measure, and {Y} is a {{\mathcal B}}-measurable subset of {X}, then the restriction {\mu\downharpoonright_Y: {\mathcal B}\downharpoonright_Y \rightarrow [0,+\infty]} of {{\mathcal B}} to {Y}, defined by setting {\mu\downharpoonright_Y(E) := \mu(E)} whenever {E \in {\mathcal B}\downharpoonright_Y} (i.e. if {E \in {\mathcal B}} and {E \subset Y}), is also a finitely additive meaure.

Example 13 (Counting measure) If {{\mathcal B}} is a Boolean algebra on {X}, then the function {\#: {\mathcal B} \rightarrow [0,+\infty]} defined by setting {\#(E)} to be the cardinality of {E} if {E} is finite, and {\#(E) := +\infty} if {E} is infinite, is a finitely additive measure, known as counting measure.

As with our definition of Boolean algebras and {\sigma}-algebras, we adopted a “minimalist” definition so that the axioms are easy to verify. But they imply several further useful properties:

Exercise 20 Let {\mu: {\mathcal B} \rightarrow [0,+\infty]} be a finitely additive measure on a Boolean {\sigma}-algebra {{\mathcal B}}. Establish the following facts:

  1. (Monotonicity) If {E, F} are {{\mathcal B}}-measurable and {E \subset F}, then {\mu(E) \leq \mu(F)}.
  2. (Finite additivity) If {k} is a natural number, and {E_1,\ldots,E_k} are {{\mathcal B}}-measurable and disjoint, then {\mu(E_1\cup \ldots \cup E_k) = \mu(E_1) + \ldots + \mu(E_k)}.
  3. (Finite sub-additivity) If {k} is a natural number, and {E_1,\ldots,E_k} are {{\mathcal B}}-measurable, then {\mu(E_1\cup \ldots \cup E_k) \leq \mu(E_1) + \ldots + \mu(E_k)}.
  4. (Inclusion-exclusion for two sets) If {E, F} are {{\mathcal B}}-measurable, then {\mu(E \cup F) + \mu(E \cap F) = \mu(E) + \mu(F)}.

(Caution: remember that the cancellation law {a+c=b+c \implies a=b} does not hold in {[0,+\infty]} if {c} is infinite, and so the use of cancellation (or subtraction) should be avoided if possible.)

One can characterise measures completely for any finite algebra:

Exercise 21 Let {{\mathcal B}} be a finite Boolean algebra, generated by a finite family {A_1,\ldots,A_k} of non-empty atoms. Show that for every finitely additive measure {\mu} on {{\mathcal B}} there exists {c_1,\ldots,c_k \in [0,+\infty]} such that

\displaystyle  \mu(E) = \sum_{1 \leq j \leq k: A_j \subset E} c_j.

Equivalently, if {x_j} is a point in {A_j} for each {1 \leq j \leq k}, then

\displaystyle  \mu = \sum_{j=1}^k c_j \delta_{x_j}.

Furthermore, show that the {c_1,\ldots,c_k} are uniquely determined by {\mu}.

This is about the limit of what one can say about finitely additive measures at this level of generality. We now specialise to the countably additive measures on {\sigma}-algebras.

Definition 7 (Countably additive measure) Let {(X,{\mathcal B})} be a measurable space. An (unsigned) countably additive measure {\mu} on {{\mathcal B}}, or measure for short, is a map {\mu: {\mathcal B} \rightarrow [0,+\infty]} that obeys the following axioms:

  1. (Empty set) {\mu(\emptyset) = 0}.
  2. (Countable additivity) Whenever {E_1,E_2,\ldots \in {\mathcal B}} are a countable sequence of disjoint measurable sets, then {\mu(\bigcup_{n=1}^\infty E_n) = \sum_{n=1}^\infty \mu(E_n)}.

A triplet {(X,{\mathcal B}, \mu)}, where {(X,{\mathcal B})} is a measurable space and {\mu: {\mathcal B} \rightarrow [0,+\infty]} is a countably additive measure, is known as a measure space.

Note the distinction between a measure space and a measurable space. The latter has the capability to be equipped with a measure, but the former is actually equipped with a measure.

Example 14 Lebesgue measure is a countably additive measure on the Lebesgue {\sigma}-algebra, and hence on every sub-{\sigma}-algebra (such as the Borel {\sigma}-algebra).

Example 15 The Dirac measures from Exercise 9 are countably additive, as is counting measure.

Example 16 Any restriction of a countably additive measure to a measurable subspace is again countably additive.

Exercise 22 (Countable combinations of measures) Let {(X,{\mathcal B})} be a measurable space.

  1. If {\mu} is a countably additive measure on {{\mathcal B}}, and {c \in [0,+\infty]}, then {c\mu} is also countably additive.
  2. If {\mu_1,\mu_2,\ldots} are a sequence of countably additive measures on {{\mathcal B}}, then the sum {\sum_{n=1}^\infty \mu_n: E \mapsto \sum_{n=1}^\infty \mu_n(E)} is also a countably additive measure.

Note that countable additivity measures are necessarily finitely additive (by padding out a finite union into a countable union using the empty set), and so countably additive measures inherit all the properties of finitely additive properties, such as monotonicity and finite subadditivity. But one also has additional properties:

Exercise 23 Let {(X,{\mathcal B},\mu)} be a measure space.

  1. (Countable subadditivity) If {E_1, E_2, \ldots} are {{\mathcal B}}-measurable, then {\mu(\bigcup_{n=1}^\infty E_n) \leq \sum_{n=1}^\infty \mu( E_n )}.
  2. (Upwards monotone convergence) If {E_1 \subset E_2 \subset \ldots} are {{\mathcal B}}-measurable, then

    \displaystyle \mu(\bigcup_{n=1}^\infty E_n) = \lim_{n \rightarrow \infty} \mu( E_n ) = \sup_{n} \mu( E_n ).

  3. (Downwards monotone convergence) If {E_1 \supset E_2 \supset \ldots} are {{\mathcal B}}-measurable, and {\mu(E_n)<\infty} for at least one {n}, then

    \displaystyle \mu(\bigcap_{n=1}^\infty E_n) = \lim_{n \rightarrow \infty} \mu( E_n ) = \inf_{n} \mu( E_n ).

Show that the downward monotone convergence claim can fail if the hypothesis that {\mu(E_n)<\infty} for at least one {n} is dropped. (Hint: copy the argument used for Exercise 10 in Notes 1.)

Exercise 24 (Dominated convergence for sets) Let {(X,{\mathcal B},\mu)} be a measure space. Let {E_1,E_2,\ldots} be a sequence of {{\mathcal B}}-measurable sets that converge to another set {E}, in the sense that {1_{E_n}} converges pointwise to {1_E}.

  1. Show that {E} is also {{\mathcal B}}-measurable.
  2. If there exists a {{\mathcal B}}-measurable set {F} of finite measure (i.e. {\mu(F) < \infty}) that contains all of the {E_n}, show that {\lim_{n \rightarrow\infty} \mu(E_n) = \mu(E)}. (Hint: Apply downward monotonicity to the sets {\bigcup_{n>N} (E_n \Delta E)}.)
  3. Show that the previous part of this exercise can fail if the hypothesis that all the {E_n} are contained in a set of finite measure is omitted.

Exercise 25 Let {X} be an at most countable set with the discrete {\sigma}-algebra. Show that every measure {\mu} on this measurable space can be uniquely represented in the form

\displaystyle  \mu = \sum_{x \in X} c_x \delta_x

for some {c_x \in [0,+\infty]}, thus

\displaystyle  \mu(E) = \sum_{x \in E} c_x

for all {E \subset X}. (This claim fails in the uncountable case, although showing this is slightly tricky.)

A null set of a measure space {(X, {\mathcal B}, \mu)} is defined to be a {{\mathcal B}}-measurable set of measure zero. A sub-null set is any subset of a null set. A measure space is said to be complete if every sub-null set is a null set. Thus, for instance, the Lebesgue measure space {({\bf R}^d, {\mathcal L}[{\bf R}^d], m)} is complete, but the Borel measure space {({\bf R}^d, {\mathcal B}[{\bf R}^d], m)} is not (as can be seen from the solution to Exercise 16).

Completion is a convenient property to have in some cases, particularly when dealing with properties that hold almost everywhere. Fortunately, it is fairly easy to modify any measure space to be complete:

Exercise 26 (Completion) Let {(X, {\mathcal B}, \mu)} be a measure space. Show that there exists a unique refinement {(X, \overline{{\mathcal B}}, \overline{\mu})}, known as the completion of {(X, {\mathcal B}, \mu)}, which is the coarsest refinement of {(X, {\mathcal B}, \mu)} that is complete. Furthermore, show that {\overline{{\mathcal B}}} consists precisely of those sets that differ from a {{\mathcal B}}-measurable set by a {{\mathcal B}}-subnull set.

Exercise 27 Show that the Lebesgue measure space {({\bf R}^d, {\mathcal L}[{\bf R}^d], m)} is the completion of the Borel measure space {({\bf R}^d, {\mathcal B}[{\bf R}^d], m)}.

— 4. Measurable functions, and integration on a measure space —

Now we are ready to define integration on measure spaces. We first need the notion of a measurable function, which is analogous to that of a continuous function in topology. Recall that a function {f: X \rightarrow Y} between two topological spaces {X, Y} is continuous if the inverse image {f^{-1}(U)} of any open set is open. In a similar spirit, we have

Definition 8 Let {(X,{\mathcal B})} be a measurable space, and let {f: X \rightarrow [0,+\infty]} or {f: X \rightarrow {\bf C}} be an unsigned or complex-valued function. We say that {f} is measurable if {f^{-1}(U)} is {{\mathcal B}}-measurable for every open subset {U} of {[0,+\infty]} or {{\bf C}}.

From Lemma 7 of Notes 2, we see that this generalises the notion of a Lebesgue measurable function.

Exercise 28 Let {(X,{\mathcal B})} be a measurable space.

  1. Show that a function {f: X \rightarrow [0,+\infty]} is measurable if and only if the level sets {\{ x \in X: f(x) > \lambda \}} are {{\mathcal B}}-measurable.
  2. Show that an indicator function {1_E} of a set {E \subset X} is measurable if and only if {E} itself is {{\mathcal B}}-measurable.
  3. Show that a function {f: X \rightarrow [0,+\infty]} or {f: X \rightarrow {\bf C}} is measurable if and only if {f^{-1}(E)} is {{\mathcal B}}-measurable for every Borel-measurable subset {E} of {[0,+\infty]} or {{\bf C}}.
  4. Show that a function {f: X \rightarrow {\bf C}} is measurable if and only if its real and imaginary parts are measurable.
  5. Show that a function {f: X \rightarrow {\bf R}} is measurable if and only if the magnitudes {f_+ := \max(f,0)}, {f_- := \max(-f,0)} of its positive and negative parts are measurable.
  6. If {f_n: X \rightarrow [0,+\infty]} are a sequence of measurable functions that converge pointwise to a limit {f: X \rightarrow [0,+\infty]}, then show that {f} is also measurable. Obtain the same claim if {[0,+\infty]} is replaced by {{\bf C}}.
  7. If {f: X \rightarrow [0,+\infty]} is measurable and {\phi: [0,+\infty] \rightarrow [0,+\infty]} is continuous, show that {\phi \circ f} is measurable. Obtain the same claim if {[0,+\infty]} is replaced by {{\bf C}}.
  8. Show that the sum or product of two measurable functions in {[0,+\infty]} or {{\bf C}} is still measurable.

Remark 8 One can also view measurable functions in a more category theoretic fashion. Define measurable morphism or measurable map {f} from one measurable space {(X,{\mathcal B})} to another {(Y, {\mathcal C})} to be a function {f: X \rightarrow Y} with the property that {f^{-1}(E)} is {{\mathcal B}}-measurable for every {{\mathcal C}}-measurable set {E}. Then a measurable function {f: X \rightarrow [0,+\infty]} or {f: X \rightarrow {\bf C}} is the same thing as a measurable morphism from {X} to {[0,+\infty]} or {{\bf C}}, where the latter is equipped with the Borel {\sigma}-algebra. Also, one {\sigma}-algebra {{\mathcal B}} on a space {X} is coarser than another {{\mathcal B}'} precisely when the identity map {\hbox{id}_X: X \rightarrow X} is a measurable morphism from {(X,{\mathcal B}')} to {(X,{\mathcal B})}. The main purpose of adopting this viewpoint is that it is obvious that the composition of measurable morphisms is again a measurable morphism. This is important in those fields of mathematics, such as ergodic theory, in which one frequently wishes to compose measurable transformations (and in particular, to compose a transformation {T: (X,{\mathcal B}) \rightarrow (X,{\mathcal B})} with itself repeatedly); but it will not play a major role in this course.

Measurable functions are particularly easy to describe on atomic spaces:

Exercise 29 Let {(X,{\mathcal B})} be a measurable space that is atomic, thus {{\mathcal B} = {\mathcal A}((A_\alpha)_{\alpha \in I})} for some partition {X = \bigcup_{\alpha \in I} A_\alpha} of {X} into disjoint non-empty atoms. Show that a function {f: X \rightarrow [0,+\infty]} or {f: X \rightarrow {\bf C}} is measurable if and only if it is constant on each atom, or equivalently if one has a representation of the form

\displaystyle  f = \sum_{\alpha \in I} c_\alpha 1_{A_\alpha}

for some constants {c_\alpha} in {[0,+\infty]} or in {{\bf C}} as appropriate. Furthermore, the {c_\alpha} are uniquely determined by {f}.

Exercise 30 (Egorov’s theorem) Let {(X,{\mathcal B},\mu)} be a finite measure space (so {\mu(X) < \infty}), and let {f_n: X \rightarrow {\bf C}} be a sequence of measurable functions that converge pointwise almost everywhere to a limit {f: X \rightarrow {\bf C}}, and let {\epsilon > 0}. Show that there exists a measurable set {E} of measure at most {\epsilon} such that {f_n} converges uniformly to {f} outside of {E}. Give an example to show that the claim can fail when the measure {\mu} is not finite.

In Notes 2 we defined first an simple integral, then an unsigned integral, and then finally an absolutely convergent integral. We perform the same three stages here. We begin with the simple integral in the case when the {\sigma}-algebra is finite:

Definition 9 (Simple integral) Let {(X,{\mathcal B},\mu)} be a measure space with {{\mathcal B}} finite. By Exercise 4, {X} is partitioned into a finite number of atoms {A_1,\ldots,A_n}. If {f: X \rightarrow [0,+\infty]} is measurable, then by Exercise 29 it has a unique representation of the form

\displaystyle  f = \sum_{i=1}^n c_i 1_{A_i}

for some {c_1,\ldots,c_n \in [0,+\infty]}. We then define the simple integral {\hbox{Simp} \int_X f\ d\mu} of {f} by the formula

\displaystyle  \hbox{Simp} \int_X f\ d\mu := \sum_{i=1}^n c_i \mu(A_i).

Note that, thanks to Exercise 3, the precise decomposition into atoms does not affect the definition of the simple integral.

One could also define a simple integral for absolutely convergent complex-valued functions on a measurable space with a finite {\sigma}-algebra, but we will not need to do so here.

With this definition, it is clear that one has the monotonicity property

\displaystyle  \hbox{Simp} \int_X f\ d\mu \leq \hbox{Simp} \int_X g\ d\mu

whenever {f \leq g} are unsigned measurable, as well as the linearity properties

\displaystyle  \hbox{Simp} \int_X f+g\ d\mu = \hbox{Simp} \int_X f\ d\mu + \hbox{Simp} \int_X g\ d\mu

and

\displaystyle  \hbox{Simp} \int_X cf\ d\mu = c \times \hbox{Simp} \int_X f\ d\mu

for unsigned measurable {f,g} and {c \in [0,+\infty]}. We also make the following important technical observation:

Exercise 31 (Simple integral unaffected by refinements) Let {(X,{\mathcal B},\mu)} be a measure space, and let {(X,{\mathcal B}',\mu')} be a refinement of {(X,{\mathcal B},\mu)}, which means that {{\mathcal B}'} contains {{\mathcal B}} and {\mu': {\mathcal B}' \rightarrow [0,+\infty]} agrees with {\mu: {\mathcal B} \rightarrow [0,+\infty]} on {{\mathcal B}}. Suppose that both {{\mathcal B}, {\mathcal B}'} are finite, and let {f: {\mathcal B} \rightarrow [0,+\infty]} be measurable. Show that

\displaystyle  \hbox{Simp} \int_X f\ d\mu = \hbox{Simp} \int_X f\ d\mu'.

This allows one to extend the simple integral to simple functions:

Definition 10 (Integral of simple functions) An (unsigned) simple function {f: X \rightarrow [0,+\infty]} on a measurable space {(X,{\mathcal B})} is a measurable function that takes on finitely many values {a_1,\ldots,a_k}. Note that such a function is then automatically measurable with respect to at least one finite sub-{\sigma}-algebra {{\mathcal B}'} of {{\mathcal B}}, namely the {\sigma}-algebra {{\mathcal B}'} generated by the preimages {f^{-1}(\{a_1\}),\ldots,f^{-1}(\{a_k\})} of {a_1,\ldots,a_k}. We then define the simple integral {\hbox{Simp} \int_X f\ d\mu} by the formula

\displaystyle \hbox{Simp} \int_X f\ d\mu := \hbox{Simp} \int_X f\ d\mu\downharpoonright_{{\mathcal B}'},

where {\mu\downharpoonright_{{\mathcal B}'}: {\mathcal B}' \rightarrow [0,+\infty]} is the restriction of {\mu: {\mathcal B} \rightarrow [0,+\infty]} to {{\mathcal B}'}.

Note that there could be multiple finite {\sigma}-algebras with respect to which {f} is measurable, but Exercise 31 guarantees that all such extensions will give the same simple integral. Indeed, if {f} were measurable with respect to two separate finite sub-{\sigma}-algebras {{\mathcal B}'} and {{\mathcal B}''} of {{\mathcal B}}, then it would also be measurable with respect to their common refinement {{\mathcal B}' \vee {\mathcal B}'' := \langle {\mathcal B}' \cup {\mathcal B}'' \rangle}, which is also finite (by Exercise 8), and then by Exercise 31, {\int_X f\ d\mu\downharpoonright_{{\mathcal B}'}} and {\int_X f\ d\mu\downharpoonright_{{\mathcal B}''}} are both equal to {\int_X f\ d\mu\downharpoonright_{{\mathcal B}' \vee {\mathcal B}''}}, and hence equal to each other.

From this we can deduce the following properties of the simple integral. As with the Lebesgue theory, we say that a property {P(x)} of an element {x \in X} of a measure space {(X, {\mathcal B}, \mu)} holds {\mu}-almost everywhere if it holds outside of a sub-null set.

Exercise 32 (Basic properties of the simple integral) Let {(X, {\mathcal B}, \mu)} be a measure space, and let {f, g: X \rightarrow [0,+\infty]} be simple functions.

  1. (Monotonicity) If {f \leq g} pointwise, then {\hbox{Simp} \int_X f\ d\mu \leq \hbox{Simp} \int_X g\ d\mu}.
  2. (Compatibility with measure) For every {{\mathcal B}}-measurable set {E}, we have {\hbox{Simp} \int_X 1_E\ d\mu = \mu(E)}.
  3. (Homogeneity) For every {c \in [0,+\infty]}, one has {\hbox{Simp} \int_X cf\ d\mu = c \times \hbox{Simp} \int_X f\ d\mu}.
  4. (Finite additivity) {\hbox{Simp} \int_X (f+g)\ d\mu = \hbox{Simp} \int_X f\ d\mu + \hbox{Simp} \int_X g\ d\mu}.
  5. (Insensitivity to refinement) If {(X, {\mathcal B}', \mu')} is a refinement of {(X, {\mathcal B}, \mu)} (as defined in Exercise 31), then {\hbox{Simp} \int_X f\ d\mu = \hbox{Simp} \int_X f\ d\mu'}.
  6. (Almost everywhere equivalence) If {f(x)=g(x)} for {\mu}-almost every {x \in X}, then {\hbox{Simp} \int_X f\ d\mu = \hbox{Simp} \int_X g\ d\mu}.
  7. (Finiteness) {\hbox{Simp} \int_X f\ d\mu < \infty} if and only if {f} is finite almost everywhere, and is supported on a set of finite measure.
  8. (Vanishing) {\hbox{Simp} \int_X f\ d\mu = 0} if and only if {f} is zero almost everywhere.

Exercise 33 (Inclusion-exclusion principle) Let {(X, {\mathcal B}, \mu)} be a measure space, and let {A_1, \ldots, A_n} be {{\mathcal B}}-measurable sets of finite measure. Show that

\displaystyle  \mu( \bigcup_{i=1}^n A_i ) = \sum_{J \subset \{1,\ldots,n\}: J \neq \emptyset} (-1)^{|J|-1} \mu( \bigcap_{i \in J} A_i ).

(Hint: Compute {\hbox{Simp} \int_X (1 - \prod_{i=1}^n (1-1_{A_i}))\ d\mu} in two different ways.)

Remark 9 The simple integral could also be defined on finitely additive measure spaces, rather than countably additive ones, and all the above properties would still apply. However, on a finitely additive measure space one would have difficulty extending the integral beyond simple functions, as we will now do.

From the simple integral, we can now define the unsigned integral, similarly to what was done for the unsigned Lebesgue integral in Notes 2:

Definition 11 Let {(X, {\mathcal B}, \mu)} be a measure space, and let {f: X \rightarrow [0,+\infty]} be measurable. Then we define the unsigned integral {\int_X f\ d\mu} of {f} by the formula

\displaystyle  \int_X f\ d\mu := \sup_{0 \leq g \leq f; g \hbox{ simple}} \hbox{Simp} \int_X g\ d\mu. \ \ \ \ \ (1)

Clearly, this definition generalises the corresponding definition in Definition 10 of Notes 2. Indeed, if {f: {\bf R}^d \rightarrow [0,+\infty]} is Lebesgue measurable, then {\int_{{\bf R}^d} f(x)\ dx = \int_{{\bf R}^d} f\ dm}.

We record some easy properties of this integral:

Exercise 34 (Easy properties of the unsigned integral) Let {(X, {\mathcal B}, \mu)} be a measure space, and let {f,g: X \rightarrow [0,+\infty]} be measurable.

  1. (Almost everywhere equivalence) If {f = g} {\mu}-almost everywhere, then {\int_X f\ d\mu = \int_X g\ d\mu}
  2. (Monotonicity) If {f \leq g} {\mu}-almost everywhere, then {\int_X f\ d\mu \leq \int_X g\ d\mu}.
  3. (Homogeneity) We have {\int_X cf\ d\mu = c \int_X f\ d\mu} for every {c \in [0,+\infty]}.
  4. (Superadditivity) We have {\int_X (f+g)\ d\mu \geq \int_X f\ d\mu + \int_X g\ d\mu}.
  5. (Compatibility with the simple integral) If {f} is simple, then {\int_X f\ d\mu = \hbox{Simp} \int_X f\ d\mu}.
  6. (Markov’s inequality) For any {0 < \lambda < \infty}, one has

    \displaystyle  \mu( \{ x \in X: f(x) \geq \lambda \} ) \leq \frac{1}{\lambda} \int_X f\ d\mu.

    In particular, if {\int_X f\ d\mu < \infty}, then the sets {\{ x \in X: f(x) \geq \lambda \}} have finite measure for each {\lambda > 0}.

  7. (Finiteness) If {\int_X f\ d\mu < \infty}, then {f(x)} is finite for {\mu}-almost every {x}.
  8. (Vanishing) If {\int_X f\ d\mu = 0}, then {f(x)} is zero for {\mu}-almost every {x}.
  9. (Vertical truncation) We have {\lim_{n \rightarrow \infty} \int_X \min(f,n)\ d\mu = \int_X f\ d\mu}.
  10. (Horizontal truncation) If {E_1 \subset E_2 \subset \ldots} is an increasing sequence of {{\mathcal B}}-measurable sets, then

    \displaystyle  \lim_{n \rightarrow \infty} \int_X f 1_{E_n}\ d\mu = \int_X f 1_{\bigcup_{n=1}^\infty E_n}\ d\mu.

  11. (Restriction) If {Y} is a measurable subset of {X}, then {\int_X f 1_Y\ d\mu = \int_Y f\downharpoonright_Y d\mu\downharpoonright_Y}, where {f\downharpoonright_Y: Y \rightarrow [0,+\infty]} is the restriction of {f: X \rightarrow [0,+\infty]} to {Y}, and the restriction {\mu\downharpoonright_Y} was defined in Example 12. We will often abbreviate {\int_Y f\downharpoonright_Y d\mu\downharpoonright_Y} (by slight abuse of notation) as {\int_Y f\ d\mu}.

As before, one of the key properties of this integral is its additivity:

Theorem 12 Let {(X, {\mathcal B}, \mu)} be a measure space, and let {f,g: X \rightarrow [0,+\infty]} be measurable. Then

\displaystyle  \int_X (f+g)\ d\mu = \int_X f\ d\mu + \int_X g\ d\mu.

Proof: In view of super-additivity, it suffices to establish the sub-additivity property

\displaystyle  \int_X (f+g)\ d\mu \leq \int_X f\ d\mu + \int_X g\ d\mu

We establish this in stages. We first deal with the case when {\mu} is a finite measure (which means that {\mu(X) < \infty}) and {f, g} are bounded. Pick an {\epsilon > 0}, and let {f_\epsilon} be {f} rounded down to the nearest integer multiple of {\epsilon}, and {f^\epsilon} be {f} rounded up to the nearest integer multiple. Clearly, we have the pointwise bounds

\displaystyle  f_\epsilon(x) \leq f(x) \leq f^\epsilon(x)

and

\displaystyle  f^\epsilon(x)-f_\epsilon(x) \leq \epsilon.

Since {f} is bounded, {f_\epsilon} and {f^\epsilon} are simple. Similarly define {g_\epsilon, g^\epsilon}. We then have the pointwise bound

\displaystyle f+g \leq f^\epsilon + g^\epsilon \leq f_\epsilon + g_\epsilon + 2\epsilon,

hence by Exercise 34 and the properties of the simple integral,

\displaystyle  \int_X f+g\ d\mu \leq \int_X f_\epsilon + g_\epsilon + 2\epsilon\ d\mu

\displaystyle  = \hbox{Simp} \int_X f_\epsilon + g_\epsilon + 2\epsilon\ d\mu

\displaystyle  = \hbox{Simp} \int_X f_\epsilon\ d\mu + \hbox{Simp} \int_X g_\epsilon\ d\mu + 2 \epsilon \mu(X).

From (1) we conclude that

\displaystyle  \int_X f+g\ d\mu \leq \int_X f\ d\mu + \int_X g\ d\mu + 2 \epsilon \mu(X).

Letting {\epsilon \rightarrow 0} and using the assumption that {\mu(X)} is finite, we obtain the claim.

Now we continue to assume that {\mu} is a finite measure, but now do not assume that {f, g} are bounded. Then for any natural number {n}, we can use the previous case to deduce that

\displaystyle  \int_X \min(f,n) + \min(g,n)\ d\mu \leq \int_X \min(f,n)\ d\mu + \int_X \min(g,n)\ d\mu.

Since {\min(f+g,n) \leq \min(f,n) + \min(g,n)}, we conclude that

\displaystyle  \int_X \min(f+g,n) \leq \int_X \min(f,n)\ d\mu + \int_X \min(g,n)\ d\mu.

Taking limits as {n \rightarrow \infty} using vertical truncation, we obtain the claim.

Finally, we no longer assume that {\mu} is of finite measure, and also do not require {f, g} to be bounded. If either {\int_X f\ d\mu} or {\int_X g\ d\mu} is infinite, then by monotonicity, {\int_X f+g\ d\mu} is infinite as well, and the claim follows; so we may assume that {\int_X f\ d\mu} and {\int_X g\ d\mu} are both finite. By Markov’s inequality, we conclude that for each natural number {n}, the set {E_n := \{ x \in X: f(x) > \frac{1}{n} \} \cup \{ x \in X: g(x) > \frac{1}{n} \}} has finite measure. These sets are increasing in {n}, and {f, g, f+g} are supported on {\bigcup_{n=1}^\infty E_n}, and so by horizontal truncation

\displaystyle  \int_X (f+g)\ d\mu = \lim_{n \rightarrow \infty} \int_X (f+g) 1_{E_n}\ d\mu.

From the previous case, we have

\displaystyle  \int_X (f+g) 1_{E_n}\ d\mu \leq \int_X f 1_{E_n}\ d\mu + \int_X g 1_{E_n}\ d\mu.

Letting {n \rightarrow \infty} and using horizontal truncation we obtain the claim. \Box

Exercise 35 (Linearity in {\mu}) Let {(X, {\mathcal B}, \mu)} be a measure space, and let {f: X \rightarrow [0,+\infty]} be measurable.

  1. Show that {\int_X f\ d(c\mu) = c \times \int_X f\ d\mu} for every {c \in [0,+\infty]}.
  2. If {\mu_1, \mu_2, \ldots} are a sequence of measures on {{\mathcal B}}, show that

    \displaystyle  \int_X f\ d\sum_{n=1}^\infty \mu_n = \sum_{n=1}^\infty \int_X f\ d\mu_n.

Exercise 36 (Change of variables formula) Let {(X, {\mathcal B}, \mu)} be a measure space, and let {\phi: X \rightarrow Y} be a measurable morphism (as defined in Remark 8) from {(X, {\mathcal B})} to another measurable space {(Y, {\mathcal C})}. Define the pushforward {\phi_* \mu: {\mathcal C} \rightarrow [0,+\infty]} of {\mu} by {\phi} by the formula {\phi_* \mu(E) := \mu(\phi^{-1}(E))}.

  1. Show that {\phi_* \mu} is a measure on {{\mathcal C}}, so that {(Y, {\mathcal C}, \phi_* \mu)} is a measure space.
  2. If {f: Y \rightarrow [0,+\infty]} is measurable, show that {\int_Y f\ d\phi_* \mu = \int_X (f \circ \phi)\ d\mu}.

(Hint: the quickest proof here is via the monotone convergence theorem below, but it is also possible to prove the exercise without this theorem.)

Exercise 37 Let {T: {\bf R}^d \rightarrow {\bf R}^d} be an invertible linear transformation, and let {m} be Lebesgue measure on {{\bf R}^d}. Show that {T_* m = \frac{1}{|\det T|} m}, where the pushforward {T_* m} of {m} was defined in Exercise 36.

Exercise 38 (Sums as integrals) Let {X} be an arbitrary set (with the discrete {\sigma}-algebra), let {\#} be counting measure (see Exercise 13), and let {f: X \rightarrow [0,+\infty]} be an arbitrary unsigned function. Show that {f} is measurable with

\displaystyle  \int_X f\ d\# = \sum_{x \in X} f(x).

Once one has the unsigned integral, one can define the absolutely convergent integral exactly as in the Lebesgue case:

Definition 13 (Absolutely convergent integral) Let {(X,{\mathcal B},\mu)} be a measure space. A measurable function {f: X \rightarrow {\bf C}} is said to be absolutely integrable if the unsigned integral

\displaystyle  \|f\|_{L^1(X,{\mathcal B},\mu)} := \int_{X} |f|\ d\mu

is finite, and use {L^1(X,{\mathcal B},\mu)}, {L^1(X)}, or {L^1(\mu)} to denote the space of absolutely integrable functions. If {f} is real-valued and absolutely integrable, we define the integral {\int_X f\ d\mu} by the formula

\displaystyle  \int_X f\ d\mu := \int_X f_+\ d\mu - \int_X f_-\ d\mu

where {f_+ := \max(f,0)}, {f_- := \max(-f,0)} are the magnitudes of the positive and negative components of {f}. If {f} is complex-valued and absolutely integrable, we define the integral {\int_X f\ d\mu} by the formula

\displaystyle  \int_X f\ d\mu := \int_X \hbox{Re} f\ d\mu + i \int_X \hbox{Im} f\ d\mu

where the two integrals on the right are interpreted as real-valued integrals. It is easy to see that the unsigned, real-valued, and complex-valued integrals defined in this manner are compatible on their common domains of definition.

Clearly, this definition generalises the corresponding definition in Definition 13 of Notes 2.

We record some of the key facts about the absolutely convergent integral:

Exercise 39 Let {(X,{\mathcal B},\mu)} be a measure space.

  1. Show that {L^1(X,{\mathcal B},\mu)} is a complex vector space.
  2. Show that the integration map {f \mapsto \int_X f\ d\mu} is a complex-linear map from {L^1(X,{\mathcal B},\mu)} to {{\bf C}}.
  3. Establish the triangle inequality {\|f+g\|_{L^1(\mu)} \leq \|f\|_{L^1(\mu)} + \|g\|_{L^1(\mu)}} and the homogeneity property {\|cf\|_{L^1(\mu)} = |c| \|f\|_{L^1(\mu)}} for all {f,g \in L^1(X,{\mathcal B},\mu)} and {c \in {\bf C}}.
  4. Show that if {f, g \in L^1(X,{\mathcal B},\mu)} are such that {f(x)=g(x)} for {\mu}-almost every {x \in X}, then {\int_X f\ d\mu = \int_X g\ d\mu}.
  5. If {f \in L^1(X,{\mathcal B},\mu)}, and {(X,{\mathcal B}',\mu')} is a refinement of {(X, {\mathcal B}, \mu)}, then {f \in L^1(X,{\mathcal B}', \mu')}, and {\int_X f\ d\mu' = \int_X f\ d\mu}. (Hint: it is easy to get one inequality. To get the other inequality, first work in the case when {f} is both bounded and has finite measure support (i.e. is both vertically and horizontally truncated).)
  6. Show that if {f \in L^1(X,{\mathcal B},\mu)}, then {\|f\|_{L^1(\mu)}=0} if and only if {f} is zero {\mu}-almost everywhere.
  7. If {Y \subset X} is {{\mathcal B}}-measurable and {f \in L^1(X,{\mathcal B},\mu)}, then {f\downharpoonright_Y \in L^1(Y, {\mathcal B}\downharpoonright_Y, \mu\downharpoonright_Y)} and {\int_Y f\downharpoonright_Y\ d\mu\downharpoonright_Y = \int_X f 1_Y\ d\mu}. As before, by abuse of notation we write {\int_Y f\ d\mu} for {\int_Y f\downharpoonright_Y\ d\mu\downharpoonright_Y}.

— 5. The convergence theorems —

Let {(X,{\mathcal B},\mu)} be a measure space, and let {f_1, f_2, \ldots: X \rightarrow [0,+\infty]} be a sequence of measurable functions. Suppose that as {n \rightarrow \infty}, {f_n(x)} converges pointwise either everywhere, or {\mu}-almost everywhere, to a measurable limit {f}. A basic question in the subject is to determine the conditions under which such pointwise convergence would imply convergence of the integral:

\displaystyle  \int_X f_n\ d\mu \stackrel{?}{\rightarrow} \int_X f\ d\mu.

To put it another way: when can we ensure that one can interchange integrals and limits,

\displaystyle  \lim_{n \rightarrow \infty} \int_X f_n\ d\mu \stackrel{?}{=} \int_X \lim_{n \rightarrow \infty} f_n\ d\mu?

There are certainly some cases in which one can safely do this:

Exercise 40 (Uniform convergence on a finite measure space) Suppose that {(X,{\mathcal B},\mu)} is a finite measure space (so {\mu(X)<\infty}), and {f_n: X \rightarrow [0,+\infty]} (resp. {f_n: X \rightarrow {\bf C}}) are a sequence of unsigned measurable functions (resp. absolutely integrable functions) that converge uniformly to a limit {f}. Show that {\int_X f_n\ d\mu} converges to {\int_X f\ d\mu}.

However, there are also cases in which one cannot interchange limits and integrals, even when the {f_n} are unsigned. We give the three classic examples, all of “moving bump” type, though the way in which the bump moves varies from example to example:

Example 17 (Escape to horizontal infinity) Let {X} be the real line with Lebesgue measure, and let {f_n := 1_{[n,n+1]}}. Then {f_n} converges pointwise to {f := 0}, but {\int_{\bf R} f_n(x)\ dx = 1} does not converge to {\int_{\bf R} f(x)\ dx = 0}. Somehow, all the mass in the {f_n} has escaped by moving off to infinity in a horizontal direction, leaving none behind for the pointwise limit {f}.

Example 18 (Escape to width infinity) Let {X} be the real line with Lebesgue measure, and let {f_n := \frac{1}{n} 1_{[0,n]}}. Then {f_n} now converges uniformly {f := 0}, but {\int_{\bf R} f_n(x)\ dx = 1} still does not converge to {\int_{\bf R} f(x)\ dx = 0}. Exercise 40 would prevent this from happening if all the {f_n} were supported in a single set of finite measure, but the increasingly wide nature of the support of the {f_n} prevents this from happening.

Example 19 (Escape to vertical infinity) Let {X} be the unit interval {[0,1]} with Lebesgue measure (restricted from {{\bf R}}), and let {f_n := n 1_{[\frac{1}{n},\frac{2}{n}]}}. Now, we have finite measure, and {f_n} converges pointwise to {f}, but no uniform convergence. And again, {\int_{[0,1]} f_n(x)\ dx=1} is not converging to {\int_{[0,1]} f(x)\ dx = 0}. This time, the mass has escaped vertically, through the increasingly large values of {f_n}.

Remark 10 From the perspective of time-frequency analysis (or perhaps more accurately, space-frequency analysis), these three escapes are analogous (though not quite identical) to escape to spatial infinity, escape to zero frequency, and escape to infinite frequency respectively, thus describing the three different ways in which phase space fails to be compact (if one excises the zero frequency as being singular).

However, once one shuts down these avenues of escape to infinity, it turns out that one can recover convergence of the integral. There are two major ways to accomplish this. One is to enforce monotonicity, which prevents each {f_n} from abandoning the location where the mass of the preceding {f_1,\ldots,f_{n-1}} was concentrated and which thus shuts down the above three escape scenarios. More precisely, we have the monotone convergence theorem:

Theorem 14 (Monotone convergence theorem) Let {(X,{\mathcal B},\mu)} be a measure space, and let {0 \leq f_1 \leq f_2 \leq \ldots} be a monotone non-decreasing sequence of unsigned measurable functions on {X}. Then we have

\displaystyle  \lim_{n \rightarrow \infty} \int_X f_n\ d\mu = \int_X \lim_{n \rightarrow \infty} f_n\ d\mu.

Note that in the special case when each {f_n} is an indicator function {f_n = 1_{E_n}}, this theorem collapses to the upwards monotone convergence property (Exercise 23.2). Conversely, the upwards monotone convergence property will play a key role in the proof of this theorem.

Proof: Write {f := \lim_{n \rightarrow \infty} f_n = \sup_{n} f_n}, then {f: X \rightarrow [0,+\infty]} is measurable. Since the {f_n} are non-decreasing to {f}, we see from monotonicity that {\int_X f_n\ d\mu} are non-decreasing and bounded above by {\int_X f\ d\mu}, which gives the bound

\displaystyle  \lim_{n \rightarrow \infty} \int_X f_n\ d\mu \leq \int_X f\ d\mu.

It remains to establish the reverse inequality

\displaystyle  \int_X f\ d\mu \leq \lim_{n \rightarrow \infty} \int_X f_n\ d\mu.

By definition, it suffices to show that

\displaystyle  \int_X g\ d\mu \leq \lim_{n \rightarrow \infty} \int_X f_n\ d\mu.

whenever {g} is a simple function that is bounded pointwise by {f}. By vertical truncation we may assume without loss of generality that {g} also is finite everywhere, then we can write

\displaystyle  g = \sum_{i=1}^k c_i 1_{A_i}

for some {0 \leq c_i < \infty} and some disjoint {{\mathcal B}}-measurable sets {A_1,\ldots,A_k}, thus

\displaystyle  \int_X g\ d\mu = \sum_{i=1}^k c_i \mu(A_i).

Let {0 < \epsilon < 1} be arbitrary. Then we have

\displaystyle  f(x) = \sup_n f_n(x) > (1-\epsilon) c_i

for all {x \in A_i}. Thus, if we define the sets

\displaystyle  A_{i,n} :=\{ x \in A_i: f_n(x) > (1-\epsilon) c_i \}

then the {A_{i,n}} increase to {A_i} and are measurable. By upwards monotonicity of measure, we conclude that

\displaystyle  \lim_{n \rightarrow \infty} \mu(A_{i,n}) = \mu(A_i).

On the other hand, observe the pointwise bound

\displaystyle  f_n \geq \sum_{i=1}^k (1-\epsilon) c_i 1_{A_{i,n}}

for any {n}; integrating this, we obtain

\displaystyle  \int_X f_n\ d\mu \geq (1-\epsilon) \sum_{i=1}^k c_i \mu(A_{i,n}).

Taking limits as {n \rightarrow \infty}, we obtain

\displaystyle  \lim_{n \rightarrow \infty} \int_X f_n\ d\mu \geq (1-\epsilon) \sum_{i=1}^k c_i \mu(A_i);

sending {\epsilon \rightarrow 0} we then obtain the claim. \Box

Remark 11 It is easy to see that the result still holds if the monotonicity {f_n \leq f_{n+1}} only holds almost everywhere rather than everywhere.

This has a number of important corollaries. Firstly, we can generalise (part of) Tonelli’s theorem for exchanging sums (see Theorem 2 of Notes 1):

Corollary 15 (Tonelli’s theorem for sums and integrals) Let {(X,{\mathcal B},\mu)} be a measure space, and let {f_1,f_2,\ldots: X \rightarrow [0,+\infty]} be a sequence of unsigned measurable functions. Then one has

\displaystyle  \int_X \sum_{n=1}^\infty f_n\ d\mu = \sum_{n=1}^\infty \int_X f_n\ d\mu.

Proof: Apply the monotone convergence theorem to the partial sums {F_N := \sum_{n=1}^N f_n}. \Box

Exercise 41 Give an example to show that this corollary can fail if the {f_n} are assumed to be absolutely integrable rather than unsigned measurable, even if the sum {\sum_{n=1}^\infty f_n(x)} is absolutely convergent for each {x}. (Hint: think about the three escapes to infinity.)

Exercise 42 (Borel-Cantelli lemma) Let {(X, {\mathcal B},\mu)} be a measure space, and let {E_1, E_2, E_3, \ldots} be a sequence of {{\mathcal B}}-measurable sets such that {\sum_{n=1}^\infty \mu(E_n) < \infty}. Show that almost every {x \in X} is contained in at most finitely many of the {E_n} (i.e. {\{ n \in {\bf N}: x \in E_n \}} is finite for almost every {x \in X}). (Hint: Apply Tonelli’s theorem to the indicator functions {1_{E_n}}.)

Exercise 43

  1. Give an alternate proof of the Borel-Cantelli lemma (Exercise 42) that does not go through any of the convergence theorems, but instead exploits the more basic properties of measure from Exercise 23.
  2. Give a counterexample that shows that the Borel-Cantelli lemma can fail if the condition {\sum_{n=1}^\infty \mu(E_n) < \infty} is relaxed to {\lim_{n \rightarrow \infty} \mu(E_n) = 0}.

Secondly, when one does not have monotonicity, one can at least obtain an important inequality, known as Fatou’s lemma:

Corollary 16 (Fatou’s lemma) Let {(X,{\mathcal B},\mu)} be a measure space, and let {f_1,f_2,\ldots: X \rightarrow [0,+\infty]} be a sequence of unsigned measurable functions. Then

\displaystyle  \int_X \liminf_{n \rightarrow \infty} f_n\ d\mu \leq \liminf_{n \rightarrow \infty} \int_X f_n\ d\mu.

Proof: Write {F_N := \inf_{n \geq N} f_n} for each {N}. Then the {F_N} are measurable and non-decreasing, and hence by the monotone convergence theorem

\displaystyle  \int_X \sup_{N>0} F_N\ d\mu = \sup_{N>0} \int_X F_N\ d\mu.

By definition of lim inf, we have {\sup_{N>0} F_N = \liminf_{n \rightarrow \infty} f_n}. By monotonicity, we have {\int_X F_N\ d\mu \leq \int_X f_n\ d\mu} for all {n \geq N}, and thus

\displaystyle  \int_X F_N\ d\mu \leq \inf_{n \geq N} \int_X f_n\ d\mu.

Hence we have

\displaystyle  \int_X \liminf_{n \rightarrow \infty} f_n\ d\mu \leq \sup_{N>0} \inf_{n \geq N} \int_X f_n\ d\mu.

The claim then follows by another appeal to the definition of lim inf. \Box

Remark 12 Informally, Fatou’s lemma tells us that when taking the pointwise limit of unsigned functions {f_n}, that mass {\int_X f_n\ d\mu} can be destroyed in the limit (as was the case in the three key moving bump examples), but it cannot be created in the limit. Of course the unsigned hypothesis is necessary here (consider for instance multiplying any of the moving bump examples by {-1}). While this lemma was stated only for pointwise limits, the same general principle (that mass can be destroyed, but not created, by the process of taking limits) tends to hold for other “weak” notions of convergence. We will see some instances of this in 245B.

Finally, we give the other major way to shut down loss of mass via escape to infinity, which is to dominate all of the functions involved by an absolutely convergent one. This result is known as the dominated convergence theorem:

Theorem 17 (Dominated convergence theorem) Let {(X,{\mathcal B},\mu)} be a measure space, and let {f_1,f_2,\ldots: X \rightarrow {\bf C}} be a sequence of measurable functions that converge pointwise {\mu}-almost everywhere to a measurable limit {f: X \rightarrow {\bf C}}. Suppose that there is an unsigned absolutely integrable function {G: X \rightarrow [0,+\infty]} such that {|f_n|} are pointwise {\mu}-almost everywhere bounded by {G} for each {n}. Then we have

\displaystyle  \lim_{n \rightarrow \infty} \int_X f_n\ d\mu = \int_X f\ d\mu.

From the moving bump examples we see that this statement fails if there is no absolutely integrable dominating function {G}. The reader is encouraged to see why, in each of the moving bump examples, no such dominating function exists, without appealing to the above theorem. Note also that when each of the {f_n} is an indicator function {f_n = 1_{E_n}}, the dominated convergence theorem collapses to Exercise 24.

Proof: By modifying {f_n, f} on a null set, we may assume without loss of generality that the {f_n} converge to {f} pointwise everywhere rather than {\mu}-almost everywhere, and similarly we can assume that {|f_n} are bounded by {G} pointwise everywhere rather than {\mu}-almost everywhere.

By taking real and imaginary parts we may assume without loss of generality that {f_n, f} are real, thus {-G \leq f_n \leq G} pointwise. Of course, this implies that {-G \leq f \leq G} pointwise also.

If we apply Fatou’s lemma to the unsigned functions {f_n+G}, we see that

\displaystyle  \int_X f+G\ d\mu \leq \liminf_{n \rightarrow \infty} \int_X f_n + G\ d\mu,

which on subtracting the finite quantity {\int_X G\ d\mu} gives

\displaystyle  \int_X f\ d\mu \leq \liminf_{n \rightarrow \infty} \int_X f_n\ d\mu.

Similarly, if we apply that lemma to the unsigned functions {G-f_n}, we obtain

\displaystyle  \int_X G-f\ d\mu \leq \liminf_{n \rightarrow \infty} \int_X G - f_n\ d\mu;

negating this inequality and then cancelling {\int_X G\ d\mu} again we conclude that

\displaystyle  \limsup_{n \rightarrow \infty} \int_X f_n\ d\mu \leq \int_X f\ d\mu.

The claim then follows by combining these inequalities. \Box

Remark 13 We deduced the dominated convergence theorem from Fatou’s lemma, and Fatou’s lemma from the monotone convergence theorem. However, one can obtain these theorems in a different order, depending on one’s taste, as they are so closely related. For instance, in Stein-Shakarchi, the logic is somewhat different; one first obtains the slightly simpler bounded convergence theorem, which is the dominated convergence theorem under the assumption that the functions are uniformly bounded and all supported on a single set of finite measure, and then uses that to deduce Fatou’s lemma, which in turn is used to deduce the monotone convergence theorem; and then the horizontal and vertical truncation properties are used to extend the bounded convergence theorem to the dominated convergence theorem. It is instructive to view a couple different derivations of these key theorems to get more of an intuitive understanding as to how they work.

Exercise 44 Under the hypotheses of the dominated convergence theorem, establish also that {\|f_n - f \|_{L^1} \rightarrow 0} as {n \rightarrow \infty}.

Exercise 45 (Almost dominated convergence) Let {(X,{\mathcal B},\mu)} be a measure space, and let {f_1,f_2,\ldots: X \rightarrow {\bf C}} be a sequence of measurable functions that converge pointwise {\mu}-almost everywhere to a measurable limit {f: X \rightarrow {\bf C}}. Suppose that there is an unsigned absolutely integrable functions {G, g_1, g_2, \ldots: X \rightarrow [0,+\infty]} such that the {|f_n|} are pointwise {\mu}-almost everywhere bounded by {G + g_n}, and that {\int_X g_n\ d\mu \rightarrow 0} as {n \rightarrow \infty}. Show that

\displaystyle  \lim_{n \rightarrow \infty} \int_X f_n\ d\mu = \int_X f\ d\mu.

Exercise 46 (Defect version of Fatou’s lemma) Let {(X,{\mathcal B},\mu)} be a measure space, and let {f_1,f_2,\ldots: X \rightarrow [0,+\infty]} be a sequence of unsigned absolutely integrable functions that converges pointwise to an absolutely integrable limit {f}. Show that

\displaystyle  \int_X f_n\ d\mu - \int_X f\ d\mu - \|f-f_n\|_{L^1(\mu)} \rightarrow 0

as {n \rightarrow \infty}. (Hint: Apply the dominated convergence theorem to {\min(f_n,f)}.) Informally, this tells us that the gap between the left and right hand sides of Fatou’s lemma can be measured by the quantity {\|f-f_n\|_{L^1(\mu)}}.

Exercise 47 Let {(X, {\mathcal B}, \mu)} be a measure space, and let {g: X \rightarrow [0,+\infty]} be measurable. Show that the function {\mu_g: {\mathcal B} \rightarrow [0,+\infty]} defined by the formula

\displaystyle  \mu_g(E) := \int_X 1_E g\ d\mu = \int_E g\ d\mu

is a measure. (We will study such measures in greater detail in 245B.)

The monotone convergence theorem is, in some sense, a defining property of the unsigned integral, as the following exercise illustrates.

Exercise 48 (Characterisation of the unsigned integral) Let {(X,{\mathcal B})} be a measurable space. {I: f \mapsto I(f)} be a map from the space {{\mathcal U}(X,{\mathcal B})} of unsigned measurable functions {f: X \rightarrow [0,+\infty]} to {[0,+\infty]} that obeys the following axioms:

  1. (Homogeneity) For every {f \in {\mathcal U}(X,{\mathcal B})} and {c \in [0,+\infty]}, one has {I(cf) = cI(f)}.
  2. (Finite additivity) For every {f, g \in {\mathcal U}(X,{\mathcal B})}, one has {I(f+g)=I(f)+I(g)}.
  3. (Monotone convergence) If {0 \leq f_1 \leq f_2 \leq \ldots} are a non-decreasing sequence of unsigned measurable functions, then {I( \lim_{n \rightarrow \infty} f_n ) = \lim_{n \rightarrow \infty} I(f_n)}.

Then there exists a unique measure {\mu} on {(X,{\mathcal B})} such that {I(f) = \int_X f\ d\mu} for all {f \in {\mathcal U}(X,{\mathcal B})}. Furthermore, {\mu} is given by the formula {\mu(E) := I(1_E)} for all {{\mathcal B}}-measurable sets {E}.

Exercise 49 Let {(X, {\mathcal B}, \mu)} be a finite measure space (i.e. {\mu(X) < \infty}), and let {f: X \rightarrow {\bf R}} be a bounded function. Suppose that {\mu} is complete, which means that every sub-null set is a null set. Suppose that the upper integral

\displaystyle  \overline{\int}_X f\ d\mu := \inf_{g \geq f; g \hbox{ simple}} \int_X g\ d\mu

and lower integral

\displaystyle  \underline{\int}_X f\ d\mu := \sup_{h \leq f; h \hbox{ simple}} \int_X h\ d\mu

agree. Show that {f} is measurable. (This is a converse to Exercise 11 of Notes 2.)

We will continue to see the monotone convergence theorem, Fatou’s lemma, and the dominated convergence theorem make an appearance throughout the rest of this course sequence.

— 6. Probability spaces (optional) —

We now pause to isolate a special type of measure space, namely an probability space. As the name suggests, these spaces are of fundamental importance in the foundations of probability, although it should be emphasised that probability theory should not be viewed as the study of probability spaces, as these are merely models for the true objects of study of that theory, namely the behaviour of random events and random variables. (See this post for further discussion of this point.) This course will not be focused on applications to probability theory, but other courses (such as the Math 275 sequence at UCLA) will certainly be taking several results from measure theory (e.g. the Borel-Cantelli lemma, Exercise 42) and transferring them to a probabilistic context in order to apply them to problems of interest in probability theory.

Definition 18 (Probability space) A probability space is a measure space {(\Omega, {\mathcal F}, {\bf P})} of total measure {1}: {{\bf P}(\Omega) = 1}. The measure {{\bf P}} is known as a probability measure.

Note the change of notation: whereas measure spaces are traditionally denoted by symbols such as {(X, {\mathcal B}, \mu)}, probability spaces are traditionally denoted by symbols such as {(\Omega, {\mathcal F}, {\bf P})}. Of course, such notational changes have no impact on the underlying mathematical formalism, but they reflect the different cultures of measure theory and probability theory. In particular, the various components {\Omega}, {{\mathcal F}}, {{\bf P}} carry the following interpretations in probability theory, that are absent in other applications of measure theory:

  • The space {\Omega} is known as the sample space, and is interpreted as the set of all possible states {\omega \in \Omega} that a random system could be in.
  • The {\sigma}-algebra {{\mathcal F}} is known as the event space, and is interpreted as the set of all possible events {E \in {\mathcal F}} that one can measure.
  • The measure {{\bf P}(E)} of an event is known as the probability of that event.

The various axioms of a probability space then formalise the foundational axioms of probability, as set out by Kolmogorov.

Example 20 (Normalised measure) Given any measure space {(X,{\mathcal B},\mu)} with {0 < \mu(X) < +\infty}, the space {(X, {\mathcal B}, \frac{1}{\mu(X)} \mu)} is a probability space. For instance, if {\Omega} is a non-empty finite set with the discrete {\sigma}-algebra {2^\Omega} and the counting measure {\#}, then the normalised counting measure {\frac{1}{\# \Omega} \#} is a probability measure (known as the (discrete) uniform probability measure on {\Omega}), and {(\Omega, 2^\Omega, \frac{1}{\# \Omega} \#)} is a probability space. In probability theory, this probability spaces models the act of drawing an element of the discrete set {\Omega} uniformly at random.

Similarly, if {\Omega \subset {\bf R}^d} is a Lebesgue measurable set of positive finite Lebesgue measure, {0 < m(\Omega) < \infty}, then {(\Omega, {\mathcal L}[{\bf R}^d]\downharpoonright_\Omega, \frac{1}{m(\Omega)} m\downharpoonright_\Omega)} is a probability space. The probability measure {\frac{1}{m(\Omega)} m\downharpoonright_\Omega} is known as the (continuous) uniform probability measure on {\Omega}. In probability theory, this probability spaces models the act of drawing an element of the continuous set {\Omega} uniformly at random.

Example 21 (Discrete and continuous probability measures) If {\Omega} is a (possibly infinite) non-empty set with the discrete {\sigma}-algebra {2^\Omega}, and if {(p_\omega)_{\omega \in \Omega}} are a collection of real numbers in {[0,1]} with {\sum_{\omega \in \Omega} p_\omega = 1}, then the probability measure {{\bf P}} defined by {{\bf P} := \sum_{\omega \in \Omega} p_\omega \delta_\omega}, or in other words

\displaystyle  {\bf P}(E) := \sum_{\omega \in E} p_\omega,

is indeed a probability measure, and {(\Omega, 2^\Omega, {\bf P})} is a probability space. The function {\omega \mapsto p_\omega} is known as the (discrete) probability distribution of the state variable {\omega}.

Similarly, if {\Omega} is a Lebesgue measurable subset of {{\bf R}^d} of positive (and possibly infinite) measure, and {f: \Omega \rightarrow [0,+\infty]} is a Lebesgue measurable function on {\Omega} (where of course we restrict the Lebesgue measure space on {{\bf R}^d} to {\Omega} in the usual fashion) with {\int_\Omega f(x)\ dx = 1}, then {(\Omega, {\mathcal L}[{\bf R}^d]\downharpoonright_\Omega, {\bf P})} is a probability space, where {{\bf P} := m_f} is the measure

\displaystyle  {\bf P}(E) := \int_\Omega 1_E(x) f(x)\ dx = \int_E f(x)\ dx.

The function {f} is known as the (continuous) probability density of the state variable {\omega}. (This density is not quite unique, since one can modify it on a set of probability zero, but it is well-defined up to this ambiguity. We will return to this point in 245B.)

Exercise 50 (No translation-invariant random integer) Show that there is no probability measure {{\bf P}} on the integers {{\bf Z}} with the discrete {\sigma}-algebra {2^{\bf Z}} with the translation-invariance property {{\bf P}(E+n)={\bf P}(E)} for every event {E \in 2^{\bf Z}} and every integer {n}.

Exercise 51 (No translation-invariant random real) Show that there is no probability measure {{\bf P}} on the reals {{\bf R}} with the Lebesgue {\sigma}-algebra {{\mathcal L}[{\bf R}]} with the translation-invariance property {{\bf P}(E+x)={\bf P}(E)} for every event {E \in {\mathcal L}[{\bf R}]} and every real {x}.

Many concepts in measure theory are of importance in probability theory, although the terminology is changed to reflect the different perspective on the subject. For instance, the notion of a property holding almost everywhere is now replaced with that of a property holding almost surely. A measurable function is now referred to as a random variable and is often denoted by symbols such as {X}, and the integral of that function on the probability space (if the random variable is unsigned or absolutely convergent) is known as the expectation of that random variable, and is denoted {{\bf E}(X)}. Thus, for instance, the Borel-Cantelli lemma (Exercise 42) now reads as follows: given any sequence {E_1,E_2,E_3,\ldots} of events such that {\sum_{n=1}^\infty {\bf P}(E_n) < \infty}, it is almost surely true that at most finitely many of these events hold.

In later notes, when we develop the machinery of product measures and other tools to construct measures, we will see some more interesting examples of probability spaces, which would correspond in probability theory to random processes that are generated by an infinite number of independent random sources.

The following exercise will be moved to a more suitable location in the published version of the notes, but is here currently so as not to disrupt the exercise numbering.

Exercise 52 (Approximation by an algebra) Let {{\mathcal A}} be a Boolean algebra on {X}, and let {\mu} be a measure on {\langle {\mathcal A} \rangle}.

  1. If {\mu(X) < \infty}, show that for every {E \in \langle {\mathcal A} \rangle} and {\epsilon > 0} there exists {F \in {\mathcal A}} such that {\mu( E \Delta F ) < \epsilon}.
  2. More generally, if {X = \bigcup_{n=1}^\infty A_n} for some {A_1, A_2, \ldots \in {\mathcal A}} with {\mu(A_n) < \infty} for all {n}, {E \in \langle {\mathcal A} \rangle} has finite measure, and {\epsilon > 0}, show that there exists {F \in {\mathcal A}} such that {\mu( E \Delta F ) < \epsilon}.