You are currently browsing the category archive for the ‘245B – Real analysis’ category.

Notational convention: As in Notes 2, I will colour a statement red in this post if it assumes the axiom of choice.  We will, of course, rely on every other axiom of Zermelo-Frankel set theory here (and in the rest of the course).  $\diamond$

In this course we will often need to iterate some sort of operation “infinitely many times” (e.g. to create a infinite basis by choosing one basis element at a time).  In order to do this rigorously, we will rely on Zorn’s lemma:

Zorn’s Lemma. Let $(X, \leq)$ be a non-empty partially ordered set, with the property that every chain (i.e. a totally ordered set) in X has an upper bound.  Then X contains a maximal element (i.e. an element with no larger element).

Indeed, we have used this lemma several times already in previous notes.  Given the other standard axioms of set theory, this lemma is logically equivalent to

Axiom of choice. Let X be a set, and let ${\mathcal F}$ be a collection of non-empty subsets of X.  Then there exists a choice function $f: {\mathcal F} \to X$, i.e. a function such that $f(A) \in A$ for all $A \in {\mathcal F}$.

One implication is easy:

Proof of axiom of choice using Zorn’s lemma. Define a partial choice function to be a pair $({\mathcal F}', f')$, where ${\mathcal F}'$ is a subset of ${\mathcal F}$ and $f': {\mathcal F}' \to X$ is a choice function for ${\mathcal F'}$.  We can partially order the collection of partial choice functions by writing $({\mathcal F}', f') \leq ({\mathcal F}'', f'')$ if ${\mathcal F}' \subset {\mathcal F}''$ and f” extends f’.  The collection of partial choice functions is non-empty (since it contains the pair $(\emptyset, ())$ consisting of the empty set and the empty function), and it is easy to see that any chain of partial choice functions has an upper bound (formed by gluing all the partial choices together).  Hence, by Zorn’s lemma, there is a maximal partial choice function $({\mathcal F}_*, f_*)$.  But the domain ${\mathcal F}_*$ of this function must be all of ${\mathcal F}$, since otherwise one could enlarge ${\mathcal F}_*$ by a single set A and extend $f_*$ to A by choosing a single element of A.  (One does not need the axiom of choice to make a single choice, or finitely many choices; it is only when making infinitely many choices that the axiom becomes necessary.)  The claim follows. $\Box$

In the rest of these notes I would like to supply the reverse implication, using the machinery of well-ordered sets.  Instead of giving the shortest or slickest proof of Zorn’s lemma here, I would like to take the opportunity to place the lemma in the context of several related topics, such as ordinals and transfinite induction, noting that much of this material is in fact independent of the axiom of choice.  The material here is standard, but for the purposes of this course one may simply take Zorn’s lemma as a “black box” and not worry about the proof, so this material is optional.

dWhen studying a mathematical space X (e.g. a vector space, a topological space, a manifold, a group, an algebraic variety etc.), there are two fundamentally basic ways to try to understand the space:

1. By looking at subobjects in X, or more generally maps $f: Y \to X$ from some other space Y into X.  For iTnstance, a point in a space X can be viewed as a map from $pt$ to X; a curve in a space X could be thought of as a map from ${}[0,1]$ to X; a group G can be studied via its subgroups K, and so forth.
2. By looking at objects on X, or more precisely maps $f: X \to Y$ from X into some other space Y.  For instance, one can study a topological space X via the real- or complex-valued continuous functions $f \in C(X)$ on X; one can study a group G via its quotient groups $\pi: G \to G/H$; one can study an algebraic variety V by studying the polynomials on V (and in particular, the ideal of polynomials that vanish identically on V); and so forth.

(There are also more sophisticated ways to study an object via its maps, e.g. by studying extensions, joinings, splittings, universal lifts, etc.  The general study of objects via the maps between them is formalised abstractly in modern mathematics as category theory, and is also closely related to homological algebra.)

A remarkable phenomenon in many areas of mathematics is that of (contravariant) duality: that the maps into and out of one type of mathematical object X can be naturally associated to the maps out of and into a dual object $X^*$ (note the reversal of arrows here!).  In some cases, the dual object $X^*$ looks quite different from the original object X.  (For instance, in Stone duality, discussed in Notes 4, X would be a Boolean algebra (or some other partially ordered set) and $X^*$ would be a compact totally disconnected Hausdorff space (or some other topological space).)   In other cases, most notably with Hilbert spaces as discussed in Notes 5, the dual object $X^*$ is essentially identical to X itself.

In these notes we discuss a third important case of duality, namely duality of normed vector spaces, which is of an intermediate nature to the previous two examples: the dual $X^*$ of a normed vector space turns out to be another normed vector space, but generally one which is not equivalent to X itself (except in the important special case when X is a Hilbert space, as mentioned above).  On the other hand, the double dual $(X^*)^*$ turns out to be closely related to X, and in several (but not all) important cases, is essentially identical to X.  One of the most important uses of dual spaces in functional analysis is that it allows one to define the transpose $T^*: Y^* \to X^*$ of a continuous linear operator $T: X \to Y$.

A fundamental tool in understanding duality of normed vector spaces will be the Hahn-Banach theorem, which is an indispensable tool for exploring the dual of a vector space.  (Indeed, without this theorem, it is not clear at all that the dual of a non-trivial normed vector space is non-trivial!)  Thus, we shall study this theorem in detail in these notes concurrently with our discussion of duality.

Below the fold, I am giving some sample questions for the 245B midterm next week.  These are drawn from my previous 245A and 245B exams (with some modifications), and the solutions can be found by searching my previous web pages for those courses.  (The homework assignments are, of course, another good source of practice problems.)  Note that the actual midterm questions are likely to be somewhat shorter than the ones provided here (this is particularly the case for those questions with multiple parts).  More info on the midterm can  be found at the class web page, of course.

(These questions are of course primarily intended for my students than for my regular blog readers; but anyone is welcome to comment if they wish.)

In the next few lectures, we will be studying four major classes of function spaces. In decreasing order of generality, these classes are the topological vector spaces, the normed vector spaces, the Banach spaces, and the Hilbert spaces. In order to motivate the discussion of the more general classes of spaces, we will first focus on the most special class – that of (real and complex) Hilbert spaces. These spaces can be viewed as generalisations of (real and complex) Euclidean spaces such as ${\Bbb R}^n$ and ${\Bbb C}^n$ to infinite-dimensional settings, and indeed much of one’s Euclidean geometry intuition concerning lengths, angles, orthogonality, subspaces, etc. will transfer readily to arbitrary Hilbert spaces; in contrast, this intuition is not always accurate in the more general vector spaces mentioned above. In addition to Euclidean spaces, another fundamental example of Hilbert spaces comes from the Lebesgue spaces $L^2(X,{\mathcal X},\mu)$ of a measure space $(X,{\mathcal X},\mu)$. (There are of course many other Hilbert spaces of importance in complex analysis, harmonic analysis, and PDE, such as Hardy spaces ${\mathcal H}^2$, Sobolev spaces $H^s = W^{s,2}$, and the space $HS$ of Hilbert-Schmidt operators, but we will not discuss those spaces much in this course.  Complex Hilbert spaces also play a fundamental role in the foundations of quantum mechanics, being the natural space to hold all the possible states of a quantum system (possibly after projectivising the Hilbert space), but we will not discuss this subject here.)

Hilbert spaces are the natural abstract framework in which to study two important (and closely related) concepts: orthogonality and unitarity, allowing us to generalise familiar concepts and facts from Euclidean geometry such as the Cartesian coordinate system, rotations and reflections, and the Pythagorean theorem to Hilbert spaces. (For instance, the Fourier transform is a unitary transformation and can thus be viewed as a kind of generalised rotation.) Furthermore, the Hodge duality on Euclidean spaces has a partial analogue for Hilbert spaces, namely the Riesz representation theorem for Hilbert spaces, which makes the theory of duality and adjoints for Hilbert spaces especially simple (when compared with the more subtle theory of duality for, say, Banach spaces). Much later (next quarter, in fact), we will see that this duality allows us to extend the spectral theorem for self-adjoint matrices to that of self-adjoint operators on a Hilbert space.

These notes are only the most basic introduction to the theory of Hilbert spaces.  In particular, the theory of linear transformations between two Hilbert spaces, which is perhaps the most important aspect of the subject, is not covered much at all here (but I hope to discuss it further in future lectures.)

A (concrete) Boolean algebra is a pair $(X, {\mathcal B})$, where X is a set, and ${\mathcal B}$ is a collection of subsets of X which contain the empty set $\emptyset$, and which is closed under unions $A, B \mapsto A \cup B$, intersections $A, B \mapsto A \cap B$, and complements $A \mapsto A^c := X \backslash A$. The subset relation $\subset$ also gives a relation on ${\mathcal B}$. Because the ${\mathcal B}$ is concretely represented as subsets of a space X, these relations automatically obey various axioms, in particular, for any $A,B,C \in {\mathcal B}$, we have:

1. $\subset$ is a partial ordering on ${\mathcal B}$, and A and B have join $A \cup B$ and meet $A \cap B$.
2. We have the distributive laws $A \cup (B \cap C) = (A \cup B) \cap (A \cup C)$ and $A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$.
3. $\emptyset$ is the minimal element of the partial ordering $\subset$, and $\emptyset^c$ is the maximal element.
4. $A \cap A^c = \emptyset$ and $A \cup A^c = \emptyset^c$.

(More succinctly: ${\mathcal B}$ is a lattice which is distributive, bounded, and complemented.)

We can then define an abstract Boolean algebra ${\mathcal B} = ({\mathcal B}, \emptyset, \cdot^c, \cup, \cap, \subset)$ to be an abstract set ${\mathcal B}$ with the specified objects, operations, and relations that obey the axioms 1-4. [Of course, some of these operations are redundant; for instance, intersection can be defined in terms of complement and union by de Morgan’s laws. In the literature, different authors select different initial operations and axioms when defining an abstract Boolean algebra, but they are all easily seen to be equivalent to each other. To emphasise the abstract nature of these algebras, the symbols $\emptyset, \cdot^c, \cup, \cap, \subset$ are often replaced with other symbols such as $0, \overline{\cdot}, \vee, \wedge, <$.]

Clearly, every concrete Boolean algebra is an abstract Boolean algebra. In the converse direction, we have Stone’s representation theorem (see below), which asserts (among other things) that every abstract Boolean algebra is isomorphic to a concrete one (and even constructs this concrete representation of the abstract Boolean algebra canonically). So, up to (abstract) isomorphism, there is really no difference between a concrete Boolean algebra and an abstract one.

Now let us turn from Boolean algebras to $\sigma$-algebras.

A concrete $\sigma$-algebra (also known as a measurable space) is a pair $(X,{\mathcal B})$, where X is a set, and ${\mathcal B}$ is a collection of subsets of X which contains $\emptyset$ and are closed under countable unions, countable intersections, and complements; thus every concrete $\sigma$-algebra is a concrete Boolean algebra, but not conversely. As before, concrete $\sigma$-algebras come equipped with the structures $\emptyset, \cdot^c, \cup, \cap, \subset$ which obey axioms 1-4, but they also come with the operations of countable union $(A_n)_{n=1}^\infty \mapsto \bigcup_{n=1}^\infty A_n$ and countable intersection $(A_n)_{n=1}^\infty \mapsto \bigcap_{n=1}^\infty A_n$, which obey an additional axiom:

5. Any countable family $A_1, A_2, \ldots$ of elements of ${\mathcal B}$ has supremum $\bigcup_{n=1}^\infty A_n$ and infimum $\bigcap_{n=1}^\infty A_n$.

As with Boolean algebras, one can now define an abstract $\sigma$-algebra to be a set ${\mathcal B} = ({\mathcal B}, \emptyset, \cdot^c, \cup, \cap, \subset, \bigcup_{n=1}^\infty, \bigcap_{n=1}^\infty )$ with the indicated objects, operations, and relations, which obeys axioms 1-5. Again, every concrete $\sigma$-algebra is an abstract one; but is it still true that every abstract $\sigma$-algebra is representable as a concrete one?

The answer turns out to be no, but the obstruction can be described precisely (namely, one needs to quotient out an ideal of “null sets” from the concrete $\sigma$-algebra), and there is a satisfactory representation theorem, namely the Loomis-Sikorski representation theorem (see below). As a corollary of this representation theorem, one can also represent abstract measure spaces $({\mathcal B},\mu)$ (also known as measure algebras) by concrete measure spaces, $(X, {\mathcal B}, \mu)$, after quotienting out by null sets.

In the rest of this post, I will state and prove these representation theorems. They are not actually used directly in the rest of the course (and they will also require some results that we haven’t proven yet, most notably Tychonoff’s theorem), and so these notes are optional reading; but these theorems do help explain why it is “safe” to focus attention primarily on concrete $\sigma$-algebras and measure spaces when doing measure theory, since the abstract analogues of these mathematical concepts are largely equivalent to their concrete counterparts. (The situation is quite different for non-commutative measure theories, such as quantum probability, in which there is basically no good representation theorem available to equate the abstract with the classically concrete, but I will not discuss these theories here.)

Now that we have reviewed the foundations of measure theory, let us now put it to work to set up the basic theory of one of the fundamental families of function spaces in analysis, namely the $L^p$ spaces (also known as Lebesgue spaces). These spaces serve as important model examples for the general theory of topological and normed vector spaces, which we will discuss a little bit in this lecture and then in much greater detail in later lectures. (See also my previous blog post on function spaces.)

Just as scalar quantities live in the space of real or complex numbers, and vector quantities live in vector spaces, functions $f: X \to {\Bbb C}$ (or other objects closely related to functions, such as measures) live in function spaces. Like other spaces in mathematics (e.g. vector spaces, metric spaces, topological spaces, etc.) a function space $V$ is not just mere sets of objects (in this case, the objects are functions), but they also come with various important structures that allow one to do some useful operations inside these spaces, and from one space to another. For example, function spaces tend to have several (though usually not all) of the following types of structures, which are usually related to each other by various compatibility conditions:

1. Vector space structure. One can often add two functions $f, g$ in a function space $V$, and expect to get another function $f+g$ in that space $V$; similarly, one can multiply a function $f$ in $V$ by a scalar $c$ and get another function $cf$ in $V$. Usually, these operations obey the axioms of a vector space, though it is important to caution that the dimension of a function space is typically infinite. (In some cases, the space of scalars is a more complicated ring than the real or complex field, in which case we need the notion of a module rather than a vector space, but we will not use this more general notion in this course.) Virtually all of the function spaces we shall encounter in this course will be vector spaces. Because the field of scalars is real or complex, vector spaces also come with the notion of convexity, which turns out to be crucial in many aspects of analysis. As a consequence (and in marked contrast to algebra or number theory), much of the theory in real analysis does not seem to extend to other fields of scalars (in particular, real analysis fails spectacularly in the finite characteristic setting).
2. Algebra structure. Sometimes (though not always), we also wish to multiply two functions $f$, $g$ in $V$ and get another function $fg$ in $V$; when combined with the vector space structure and assuming some compatibility conditions (e.g. the distributive law), this makes $V$ an algebra. This multiplication operation is often just pointwise multiplication, but there are other important multiplication operations on function spaces too, such as convolution. (One sometimes sees other algebraic structures than multiplication appear in function spaces, most notably derivations, but again we will not encounter those in this course. Another common algebraic operation for function spaces is conjugation or adjoint, leading to the notion of a *-algebra.)
3. Norm structure. We often want to distinguish “large” functions in $V$ from “small” ones, especially in analysis, in which “small” terms in an expression are routinely discarded or deemed to be acceptable errors. One way to do this is to assign a magnitude or norm $\|f\|_V$ to each function that measures its size. Unlike the situation with scalars, where there is basically a single notion of magnitude, functions have a wide variety of useful notions of size, each measuring a different aspect (or combination of aspects) of the function, such as height, width, oscillation, regularity, decay, and so forth. Typically, each such norm gives rise to a separate function space (although sometimes it is useful to consider a single function space with multiple norms on it). We usually require the norm to be compatible with the vector space structure (and algebra structure, if present), for instance by demanding that the triangle inequality hold.
4. Metric structure. We also want to tell whether two functions f, g in a function space V are “near together” or “far apart”. A typical way to do this is to impose a metric $d: V \times V \to {\Bbb R}^+$ on the space $V$. If both a norm $\| \|_V$ and a vector space structure are available, there is an obvious way to do this: define the distance between two functions $f, g$ in $V$ to be $d( f, g ) := \|f-g\|_V$. (This will be the only type of metric on function spaces encountered in this course. But there are some nonlinear function spaces of importance in nonlinear analysis (e.g. spaces of maps from one manifold to another) which have no vector space structure or norm, but still have a metric.) It is often important to know if the vector space is complete with respect to the given metric; this allows one to take limits of Cauchy sequences, and (with a norm and vector space structure) sum absolutely convergent series, as well as use some useful results from point set topology such as the Baire category theorem. All of these operations are of course vital in analysis. [Compactness would be an even better property than completeness to have, but function spaces unfortunately tend be non-compact in various rather nasty ways, although there are useful partial substitutes for compactness that are available, see e.g. this blog post of mine.]
5. Topological structure. It is often important to know when a sequence (or, occasionally, nets) of functions $f_n$ in $V$ “converges” in some sense to a limit $f$ (which, hopefully, is still in $V$); there are often many distinct modes of convergence (e.g. pointwise convergence, uniform convergence, etc.) that one wishes to carefully distinguish from each other. Also, in order to apply various powerful topological theorems (or to justify various formal operations involving limits, suprema, etc.), it is important to know when certain subsets of $V$ enjoy key topological properties (most notably compactness and connectedness), and to know which operations on $V$ are continuous. For all of this, one needs a topology on $V$. If one already has a metric, then one of course has a topology generated by the open balls of that metric; but there are many important topologies on function spaces in analysis that do not arise from metrics. We also often require the topology to be compatible with the other structures on the function space; for instance, we usually require the vector space operations of addition and scalar multiplication to be continuous. In some cases, the topology on $V$ extends to some natural superspace $W$ of more general functions that contain $V$; in such cases, it is often important to know whether $V$ is closed in $W$, so that limits of sequences in $V$ stay in $V$.
6. Functional structures. Since numbers are easier to understand and deal with than functions, it is not surprising that we often study functions f in a function space V by first applying some functional $\lambda: V \to {\Bbb C}$ to V to identify some key numerical quantity $\lambda(f)$ associated to f. Norms $f \mapsto \|f\|_V$ are of course one important example of a functional; integration $f \mapsto \int_X f\ d\mu$ provides another; and evaluation $f \mapsto f(x)$ at a point x provides a third important class. (Note, though, that while evaluation is the fundamental feature of a function in set theory, it is often a quite minor operation in analysis; indeed, in many function spaces, evaluation is not even defined at all, for instance because the functions in the space are only defined almost everywhere!) An inner product $\langle,\rangle$ on $V$ (see below) also provides a large family $f \mapsto \langle f, g \rangle$ of useful functionals. It is of particular interest to study functionals that are compatible with the vector space structure (i.e. are linear) and with the topological structure (i.e. are continuous); this will give rise to the important notion of duality on function spaces.
7. Inner product structure. One often would like to pair a function f in a function space V with another object g (which is often, though not always, another function in the same function space V) and obtain a number $\langle f, g \rangle$, that typically measures the amount of “interaction” or “correlation” between f and g. Typical examples include inner products arising from integration, such as $\langle f, g\rangle := \int_X f \overline{g}\ d\mu$; integration itself can also be viewed as a pairing, $\langle f, \mu \rangle := \int_X f\ d\mu$. Of course, we usually require such inner products to be compatible with the other structures present on the space (e.g., to be compatible with the vector space structure, we usually require the inner product to be bilinear or sesquilinear). Inner products, when available, are incredibly useful in understanding the metric and norm geometry of a space, due to such fundamental facts as the Cauchy-Schwarz inequality and the parallelogram law. They also give rise to the important notion of orthogonality between functions.
8. Group actions. We often expect our function spaces to enjoy various symmetries; we might wish to rotate, reflect, translate, modulate, or dilate our functions and expect to preserve most of the structure of the space when doing so. In modern mathematics, symmetries are usually encoded by group actions (or actions of other group-like objects, such as semigroups or groupoids; one also often upgrades groups to more structured objects such as Lie groups). As usual, we typically require the group action to preserve the other structures present on the space, e.g. one often restricts attention to group actions that are linear (to preserve the vector space structure), continuous (to preserve topological structure), unitary (to preserve inner product structure), isometric (to preserve metric structure), and so forth. Besides giving us useful symmetries to spend, the presence of such group actions allows one to apply the powerful techniques of representation theory, Fourier analysis, and ergodic theory. However, as this is a foundational real analysis class, we will not discuss these important topics much here (and in fact will not deal with group actions much at all).
9. Order structure. In some cases, we want to utilise the notion of a function f being “non-negative”, or “dominating” another function g. One might also want to take the “max” or “supremum” of two or more functions in a function space V, or split a function into “positive” and “negative” components. Such order structures interact with the other structures on a space in many useful ways (e.g. via the Stone-Weierstrass theorem). Much like convexity, order structure is specific to the real line and is another reason why much of real analysis breaks down over other fields. (The complex plane is of course an extension of the real line and so is able to exploit the order structure of that line, usually by treating the real and imaginary components separately.)

There are of course many ways to combine various flavours of these structures together, and there are entire subfields of mathematics that are devoted to studying particularly common and useful categories of such combinations (e.g. topological vector spaces, normed vector spaces, Banach spaces, Banach algebras, von Neumann algebras, C^* algebras, Frechet spaces, Hilbert spaces, group algebras, etc.). The study of these sorts of spaces is known collectively as functional analysis. We will study some (but certainly not all) of these combinations in an abstract and general setting later in this course, but to begin with we will focus on the $L^p$ spaces, which are very good model examples for many of the above general classes of spaces, and also of importance in many applications of analysis (such as probability or PDE).

Notational convention: In this post only, I will colour a statement red if it assumes the axiom of choice. (For the rest of the course, the axiom of choice will be implicitly assumed throughout.) $\diamond$

The famous Banach-Tarski paradox asserts that one can take the unit ball in three dimensions, divide it up into finitely many pieces, and then translate and rotate each piece so that their union is now two disjoint unit balls.  As a consequence of this paradox, it is not possible to create a finitely additive measure on ${\Bbb R}^3$ that is both translation and rotation invariant, which can measure every subset of ${\Bbb R}^3$, and which gives the unit ball a non-zero measure. This paradox helps explain why Lebesgue measure (which is countably additive and both translation and rotation invariant, and gives the unit ball a non-zero measure) cannot measure every set, instead being restricted to measuring sets that are Lebesgue measurable.

On the other hand, it is not possible to replicate the Banach-Tarski paradox in one or two dimensions; the unit interval in ${\Bbb R}$ or unit disk in ${\Bbb R}^2$ cannot be rearranged into two unit intervals or two unit disks using only finitely many pieces, translations, and rotations, and indeed there do exist non-trivial finitely additive measures on these spaces. However, it is possible to obtain a Banach-Tarski type paradox in one or two dimensions using countably many such pieces; this rules out the possibility of extending Lebesgue measure to a countably additive translation invariant measure on all subsets of ${\Bbb R}$ (or any higher-dimensional space).

In these notes I would like to establish all of the above results, and tie them in with some important concepts and tools in modern group theory, most notably amenability and the ping-pong lemma.  This material is not required for the rest of the course, but nevertheless has some independent interest.

For these notes, $X = (X, {\mathcal X})$ is a fixed measurable space. We shall often omit the $\sigma$-algebra ${\mathcal X}$, and simply refer to elements of ${\mathcal X}$ as measurable sets. Unless otherwise indicated, all subsets of X appearing below are restricted to be measurable, and all functions on X appearing below are also restricted to be measurable.

We let ${\mathcal M}_+(X)$ denote the space of measures on X, i.e. functions $\mu: {\mathcal X} \to [0,+\infty]$ which are countably additive and send $\emptyset$ to 0. For reasons that will be clearer later, we shall refer to such measures as unsigned measures. In this section we investigate the structure of this space, together with the closely related spaces of signed measures and finite measures.

Suppose that we have already constructed one unsigned measure $m \in {\mathcal M}_+(X)$ on X (e.g. think of X as the real line with the Borel $\sigma$-algebra, and let m be Lebesgue measure). Then we can obtain many further unsigned measures on X by multiplying m by a function $f: X \to [0,+\infty]$, to obtain a new unsigned measure $m_f$, defined by the formula

$m_f(E) := \int_X 1_E f\ d\mu$. (1)

If $f = 1_A$ is an indicator function, we write $m\downharpoonright_A$ for $m_{1_A}$, and refer to this measure as the restriction of m to A.

Exercise 1. Show (using the monotone convergence theorem) that $m_f$ is indeed a unsigned measure, and for any $g: X \to [0,+\infty]$, we have ${}\int_X g\ dm_f = \int_X gf\ dm$. We will express this relationship symbolically as

$dm_f = f dm$.$\diamond$ (2)

Exercise 2. Let m be $\sigma$-finite. Given two functions $f, g: X \to [0,+\infty]$, show that $m_f = m_g$ if and only if $f(x) = g(x)$ for m-almost every x. (Hint: as usual, first do the case when m is finite. The key point is that if f and g are not equal m-almost everywhere, then either f>g on a set of positive measure, or f<g on a set of positive measure.) Give an example to show that this uniqueness statement can fail if m is not $\sigma$-finite. (Hint: take a very simple example, e.g. let X consist of just one point.) $\diamond$

In view of Exercises 1 and 2, let us temporarily call a measure $\mu$ differentiable with respect to m if $d\mu = f dm$ (i.e. $\mu = m_f$) for some $f: X \to [0,+\infty]$, and call f the Radon-Nikodym derivative of $\mu$ with respect to m, writing

$\displaystyle f = \frac{d\mu}{dm}$; (3)

by Exercise 2, we see if $m$ is $\sigma$-finite that this derivative is defined up to m-almost everywhere equivalence.

Exercise 3. (Relationship between Radon-Nikodym derivative and classical derivative) Let m be Lebesgue measure on ${}[0,+\infty)$, and let $\mu$ be an unsigned measure that is differentiable with respect to m. If $\mu$ has a continuous Radon-Nikodym derivative $\frac{d\mu}{dm}$, show that the function $x \mapsto \mu( [0,x])$ is differentiable, and $\frac{d}{dx} \mu([0,x]) = \frac{d\mu}{dm}(x)$ for all x. $\diamond$

Exercise 4. Let X be at most countable. Show that every measure on X is differentiable with respect to counting measure $\#$. $\diamond$

If every measure was differentiable with respect to m (as is the case in Exercise 4), then we would have completely described the space of measures of X in terms of the non-negative functions of X (modulo m-almost everywhere equivalence). Unfortunately, not every measure is differentiable with respect to every other: for instance, if x is a point in X, then the only measures that are differentiable with respect to the Dirac measure $\delta_x$ are the scalar multiples of that measure. We will explore the precise obstruction that prevents all measures from being differentiable, culminating in the Radon-Nikodym-Lebesgue theorem that gives a satisfactory understanding of the situation in the $\sigma$-finite case (which is the case of interest for most applications).

In order to establish this theorem, it will be important to first study some other basic operations on measures, notably the ability to subtract one measure from another. This will necessitate the study of signed measures, to which we now turn.

[The material here is largely based on Folland’s text, except for the last section.]

In this supplemental note to the previous lecture notes, I would like to give an alternate proof of a (weak form of the) Carathéodory extension theorem.  This argument is restricted to the $\sigma$-finite case, and does not extend the measure to quite as large a $\sigma$-algebra as is provided by the standard proof of this theorem, but I find it conceptually clearer (in particular, hewing quite closely to Littlewood’s principles, and the general Lebesgue philosophy of treating sets of small measure as negligible), and suffices for many standard applications of this theorem, in particular the construction of Lebesgue measure.

Let us first state the precise statement of the theorem:

Theorem 1. (Weak Carathéodory extension theorem)  Let ${\mathcal A}$ be a Boolean algebra of subsets of a set X, and let $\mu: {\mathcal A} \to [0,+\infty]$ be a function obeying the following three properties:

1. $\mu(\emptyset) = 0$.
2. (Pre-countable additivity) If $A_1,A_2,\ldots \in {\mathcal A}$ are disjoint and such that $\bigcup_{n=1}^\infty A_n$ also lies in ${\mathcal A}$, then $\mu(\bigcup_{n=1}^\infty A_n) = \sum_{n=1}^\infty \mu(A_n)$.
3. ($\sigma$-finiteness) X can be covered by at most countably many sets in ${\mathcal A}$, each of which has finite $\mu$-measure.

Let ${\mathcal X}$ be the $\sigma$-algebra generated by ${\mathcal A}$.  Then $\mu$ can be uniquely extended to a countably additive measure on ${\mathcal X}$.

We will refer to sets in ${\mathcal A}$ as elementary sets and sets in ${\mathcal X}$ as measurable sets. A typical example is when X=[0,1] and ${\mathcal A}$ is the collection of all sets that are unions of finitely many intervals; in this case, ${\mathcal X}$ are the Borel-measurable sets.

In these notes we quickly review the basics of abstract measure theory and integration theory, which was covered in the previous course but will of course be relied upon in the current course.  This is only a brief summary of the material; of course, one should consult a real analysis text for the full details of the theory.