You are currently browsing the category archive for the ‘245A – Real analysis’ category.

My graduate text on measure theory (based on these lecture notes) is now published by the AMS as part of the Graduate Studies in Mathematics series.  (See also my own blog page for this book, which among other things contains a draft copy of the book in PDF format.)

In this course so far, we have focused primarily on one specific example of a countably additive measure, namely Lebesgue measure. This measure was constructed from a more primitive concept of Lebesgue outer measure, which in turn was constructed from the even more primitive concept of elementary measure.
It turns out that both of these constructions can be abstracted. In this set of notes, we will give the Carathéodory lemma, which constructs a countably additive measure from any abstract outer measure; this generalises the construction of Lebesgue measure from Lebesgue outer measure. One can in turn construct outer measures from another concept known as a pre-measure, of which elementary measure is a typical example.
With these tools, one can start constructing many more measures, such as Lebesgue-Stieltjes measures, product measures, and Hausdorff measures. With a little more effort, one can also establish the Kolmogorov extension theorem, which allows one to construct a variety of measures on infinite-dimensional spaces, and is of particular importance in the foundations of probability theory, as it allows one to set up probability spaces associated to both discrete and continuous random processes, even if they have infinite length.
The most important result about product measure, beyond the fact that it exists, is that one can use it to evaluate iterated integrals, and to interchange their order, provided that the integrand is either unsigned or absolutely integrable. This fact is known as the Fubini-Tonelli theorem, and is an absolutely indispensable tool for computing integrals, and for deducing higher-dimensional results from lower-dimensional ones.
We remark that these notes omit a very important way to construct measures, namely the Riesz representation theorem, but we will defer discussion of this theorem to 245B.
This is the final set of notes in this sequence. If time permits, the course will then begin covering the 245B notes, starting with the material on signed measures and the Radon-Nikodym-Lebesgue theorem.
Read the rest of this entry »

This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).

(See also the Tricki for some general mathematical problem solving tips.  Once this page matures somewhat, I might migrate it to the Tricki.)

Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at.  Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.

Read the rest of this entry »

Let {[a,b]} be a compact interval of positive length (thus {-\infty < a < b < +\infty}). Recall that a function {F: [a,b] \rightarrow {\bf R}} is said to be differentiable at a point {x \in [a,b]} if the limit

\displaystyle F'(x) := \lim_{y \rightarrow x; y \in [a,b] \backslash \{x\}} \frac{F(y)-F(x)}{y-x} \ \ \ \ \ (1)

exists. In that case, we call {F'(x)} the strong derivative, classical derivative, or just derivative for short, of {F} at {x}. We say that {F} is everywhere differentiable, or differentiable for short, if it is differentiable at all points {x \in [a,b]}, and differentiable almost everywhere if it is differentiable at almost every point {x \in [a,b]}. If {F} is differentiable everywhere and its derivative {F'} is continuous, then we say that {F} is continuously differentiable.

Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.

Exercise 2 If {F: [a,b] \rightarrow {\bf R}} is everywhere differentiable, show that {F} is continuous and {F'} is measurable. If {F} is almost everywhere differentiable, show that the (almost everywhere defined) function {F'} is measurable (i.e. it is equal to an everywhere defined measurable function on {[a,b]} outside of a null set), but give an example to demonstrate that {F} need not be continuous.

Exercise 3 Give an example of a function {F: [a,b] \rightarrow {\bf R}} which is everywhere differentiable, but not continuously differentiable. (Hint: choose an {F} that vanishes quickly at some point, say at the origin {0}, but which also oscillates rapidly near that point.)

In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.

Theorem 4 (Rolle’s theorem) Let {[a,b]} be a compact interval of positive length, and let {F: [a,b] \rightarrow {\bf R}} be a differentiable function such that {F(a)=F(b)}. Then there exists {x \in (a,b)} such that {F'(x)=0}.

Proof: By subtracting a constant from {F} (which does not affect differentiability or the derivative) we may assume that {F(a)=F(b)=0}. If {F} is identically zero then the claim is trivial, so assume that {F} is non-zero somewhere. By replacing {F} with {-F} if necessary, we may assume that {F} is positive somewhere, thus {\sup_{x \in [a,b]} F(x) > 0}. On the other hand, as {F} is continuous and {[a,b]} is compact, {F} must attain its maximum somewhere, thus there exists {x \in [a,b]} such that {F(x) \geq F(y)} for all {y \in [a,b]}. Then {F(x)} must be positive and so {x} cannot equal either {a} or {b}, and thus must lie in the interior. From the right limit of (1) we see that {F'(x) \leq 0}, while from the left limit we have {F'(x) \geq 0}. Thus {F'(x)=0} and the claim follows. \Box

Remark 5 Observe that the same proof also works if {F} is only differentiable in the interior {(a,b)} of the interval {[a,b]}, so long as it is continuous all the way up to the boundary of {[a,b]}.

Exercise 6 Give an example to show that Rolle’s theorem can fail if {f} is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that {f} is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.

Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when {F} is real-valued, as it relies heavily on the least upper bound property for the domain {{\bf R}}. If, for instance, we consider complex-valued scalar functions {F: [a,b] \rightarrow {\bf C}}, then the theorem can fail; for instance, the function {F: [0,1] \rightarrow {\bf C}} defined by {F(x) := e^{2\pi i x} - 1} vanishes at both endpoints and is differentiable, but its derivative {F'(x) = 2\pi i e^{2\pi i x}} is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative {F'} both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as {{\bf R}^n}.

One can easily amplify Rolle’s theorem to the mean value theorem:

Corollary 8 (Mean value theorem) Let {[a,b]} be a compact interval of positive length, and let {F: [a,b] \rightarrow {\bf R}} be a differentiable function. Then there exists {x \in (a,b)} such that {F'(x)=\frac{F(b)-F(a)}{b-a}}.

Proof: Apply Rolle’s theorem to the function {x \mapsto F(x) - \frac{F(b)-F(a)}{b-a} (x-a)}. \Box

Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.

Exercise 10 (Uniqueness of antiderivatives up to constants) Let {[a,b]} be a compact interval of positive length, and let {F: [a,b] \rightarrow {\bf R}} and {G: [a,b] \rightarrow {\bf R}} be differentiable functions. Show that {F'(x)=G'(x)} for every {x \in [a,b]} if and only if {F(x)=G(x)+C} for some constant {C \in {\bf R}} and all {x \in [a,b]}.

We can use the mean value theorem to deduce one of the fundamental theorems of calculus:

Theorem 11 (Second fundamental theorem of calculus) Let {F: [a,b] \rightarrow {\bf R}} be a differentiable function, such that {F'} is Riemann integrable. Then the Riemann integral {\int_a^b F'(x)\ dx} of {F'} is equal to {F(b) - F(a)}. In particular, we have {\int_a^b F'(x)\ dx = F(b)-F(a)} whenever {F} is continuously differentiable.

Proof: Let {\varepsilon > 0}. By the definition of Riemann integrability, there exists a finite partition {a = t_0 < t_1 < \ldots < t_k = b} such that

\displaystyle |\sum_{j=1}^k F'(t^*_j) (t_j - t_{j-1}) - \int_a^b F'(x)| \leq \varepsilon

for every choice of {t^*_j \in [t_{j-1},t_j]}.
Fix this partition. From the mean value theorem, for each {1 \leq j \leq k} one can find {t^*_j \in [t_{j-1},t_j]} such that

\displaystyle F'(t^*_j) (t_j - t_{j-1}) = F(t_j) - F(t_{j-1})

and thus by telescoping series

\displaystyle |(F(b)-F(a)) - \int_a^b F'(x)| \leq \varepsilon.

Since {\varepsilon > 0} was arbitrary, the claim follows. \Box

Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.

Of course, we also have the other half of the fundamental theorem of calculus:

Theorem 13 (First fundamental theorem of calculus) Let {[a,b]} be a compact interval of positive length. Let {f: [a,b] \rightarrow {\bf C}} be a continuous function, and let {F: [a,b] \rightarrow {\bf C}} be the indefinite integral {F(x) := \int_a^x f(t)\ dt}. Then {F} is differentiable on {[a,b]}, with derivative {F'(x) = f(x)} for all {x \in [a,b]}. In particular, {F} is continuously differentiable.

Proof: It suffices to show that

\displaystyle \lim_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h} = f(x)

for all {x \in [a,b)}, and

\displaystyle \lim_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h} = f(x)

for all {x \in (a,b]}. After a change of variables, we can write

\displaystyle \frac{F(x+h)-F(x)}{h} = \int_0^1 f(x+ht)\ dt

for any {x \in [a,b)} and any sufficiently small {h>0}, or any {x \in (a,b]} and any sufficiently small {h<0}. As {f} is continuous, the function {t \mapsto f(x+ht)} converges uniformly to {f(x)} on {[0,1]} as {h \rightarrow 0} (keeping {x} fixed). As the interval {[0,1]} is bounded, {\int_0^1 f(x+ht)\ dt} thus converges to {\int_0^1 f(x)\ dt = f(x)}, and the claim follows. \Box

Corollary 14 (Differentiation theorem for continuous functions) Let {f: [a,b] \rightarrow {\bf C}} be a continuous function on a compact interval. Then we have

\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt = f(x)

for all {x \in [a,b)},

\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x-h,x]} f(t)\ dt = f(x)

for all {x \in (a,b]}, and thus

\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{2h} \int_{[x-h,x+h]} f(t)\ dt = f(x)

for all {x \in (a,b)}.

In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions {F, F', f} are relaxed. Among the results proven in these notes are

  • The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every {x} if {f} is merely absolutely integrable, rather than continuous;
  • A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
  • The second fundamental theorem of calculus for absolutely continuous functions.

The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »

The following question came up in my 245A class today:

Is it possible to express a non-closed interval in the real line, such as [0,1), as a countable union of disjoint closed intervals?

I was not able to answer the question immediately, but by the end of the class some of the students had come up with an answer.  It is actually a nice little test of one’s basic knowledge of real analysis, so I am posing it here as well for anyone else who is interested.  Below the fold is the answer to the question (whited out; one has to highlight the text in order to read it).

Read the rest of this entry »

If one has a sequence {x_1, x_2, x_3, \ldots \in {\bf R}} of real numbers {x_n}, it is unambiguous what it means for that sequence to converge to a limit {x \in {\bf R}}: it means that for every {\varepsilon > 0}, there exists an {N} such that {|x_n-x| \leq \varepsilon} for all {n > N}. Similarly for a sequence {z_1, z_2, z_3, \ldots \in {\bf C}} of complex numbers {z_n} converging to a limit {z \in {\bf C}}.

More generally, if one has a sequence {v_1, v_2, v_3, \ldots} of {d}-dimensional vectors {v_n} in a real vector space {{\bf R}^d} or complex vector space {{\bf C}^d}, it is also unambiguous what it means for that sequence to converge to a limit {v \in {\bf R}^d} or {v \in {\bf C}^d}; it means that for every {\varepsilon > 0}, there exists an {N} such that {\|v_n-v\| \leq \varepsilon} for all {n \geq N}. Here, the norm {\|v\|} of a vector {v = (v^{(1)},\ldots,v^{(d)})} can be chosen to be the Euclidean norm {\|v\|_2 := (\sum_{j=1}^d (v^{(j)})^2)^{1/2}}, the supremum norm {\|v\|_\infty := \sup_{1 \leq j \leq d} |v^{(j)}|}, or any other number of norms, but for the purposes of convergence, these norms are all equivalent; a sequence of vectors converges in the Euclidean norm if and only if it converges in the supremum norm, and similarly for any other two norms on the finite-dimensional space {{\bf R}^d} or {{\bf C}^d}.

If however one has a sequence {f_1, f_2, f_3, \ldots} of functions {f_n: X \rightarrow {\bf R}} or {f_n: X \rightarrow {\bf C}} on a common domain {X}, and a putative limit {f: X \rightarrow {\bf R}} or {f: X \rightarrow {\bf C}}, there can now be many different ways in which the sequence {f_n} may or may not converge to the limit {f}. (One could also consider convergence of functions {f_n: X_n \rightarrow {\bf C}} on different domains {X_n}, but we will not discuss this issue at all here.) This is contrast with the situation with scalars {x_n} or {z_n} (which corresponds to the case when {X} is a single point) or vectors {v_n} (which corresponds to the case when {X} is a finite set such as {\{1,\ldots,d\}}). Once {X} becomes infinite, the functions {f_n} acquire an infinite number of degrees of freedom, and this allows them to approach {f} in any number of inequivalent ways.

What different types of convergence are there? As an undergraduate, one learns of the following two basic modes of convergence:

  1. We say that {f_n} converges to {f} pointwise if, for every {x \in X}, {f_n(x)} converges to {f(x)}. In other words, for every {\varepsilon > 0} and {x \in X}, there exists {N} (that depends on both {\varepsilon} and {x}) such that {|f_n(x)-f(x)| \leq \varepsilon} whenever {n \geq N}.
  2. We say that {f_n} converges to {f} uniformly if, for every {\varepsilon > 0}, there exists {N} such that for every {n \geq N}, {|f_n(x) - f(x)| \leq \varepsilon} for every {x \in X}. The difference between uniform convergence and pointwise convergence is that with the former, the time {N} at which {f_n(x)} must be permanently {\varepsilon}-close to {f(x)} is not permitted to depend on {x}, but must instead be chosen uniformly in {x}.

Uniform convergence implies pointwise convergence, but not conversely. A typical example: the functions {f_n: {\bf R} \rightarrow {\bf R}} defined by {f_n(x) := x/n} converge pointwise to the zero function {f(x) := 0}, but not uniformly.

However, pointwise and uniform convergence are only two of dozens of many other modes of convergence that are of importance in analysis. We will not attempt to exhaustively enumerate these modes here (but see this Wikipedia page, and see also these 245B notes on strong and weak convergence). We will, however, discuss some of the modes of convergence that arise from measure theory, when the domain {X} is equipped with the structure of a measure space {(X, {\mathcal B}, \mu)}, and the functions {f_n} (and their limit {f}) are measurable with respect to this space. In this context, we have some additional modes of convergence:

  1. We say that {f_n} converges to {f} pointwise almost everywhere if, for ({\mu}-)almost everywhere {x \in X}, {f_n(x)} converges to {f(x)}.
  2. We say that {f_n} converges to {f} uniformly almost everywhere, essentially uniformly, or in {L^\infty} norm if, for every {\varepsilon > 0}, there exists {N} such that for every {n \geq N}, {|f_n(x) - f(x)| \leq \varepsilon} for {\mu}-almost every {x \in X}.
  3. We say that {f_n} converges to {f} almost uniformly if, for every {\varepsilon > 0}, there exists an exceptional set {E \in {\mathcal B}} of measure {\mu(E) \leq \varepsilon} such that {f_n} converges uniformly to {f} on the complement of {E}.
  4. We say that {f_n} converges to {f} in {L^1} norm if the quantity {\|f_n-f\|_{L^1(\mu)} = \int_X |f_n(x)-f(x)|\ d\mu} converges to {0} as {n \rightarrow \infty}.
  5. We say that {f_n} converges to {f} in measure if, for every {\varepsilon > 0}, the measures {\mu( \{ x \in X: |f_n(x) - f(x)| \geq \varepsilon \} )} converge to zero as {n \rightarrow \infty}.

Observe that each of these five modes of convergence is unaffected if one modifies {f_n} or {f} on a set of measure zero. In contrast, the pointwise and uniform modes of convergence can be affected if one modifies {f_n} or {f} even on a single point.

Remark 1 In the context of probability theory, in which {f_n} and {f} are interpreted as random variables, convergence in {L^1} norm is often referred to as convergence in mean, pointwise convergence almost everywhere is often referred to as almost sure convergence, and convergence in measure is often referred to as convergence in probability.

Exercise 2 (Linearity of convergence) Let {(X, {\mathcal B}, \mu)} be a measure space, let {f_n, g_n: X \rightarrow {\bf C}} be sequences of measurable functions, and let {f, g: X \rightarrow {\bf C}} be measurable functions.

  1. Show that {f_n} converges to {f} along one of the above seven modes of convergence if and only if {|f_n-f|} converges to {0} along the same mode.
  2. If {f_n} converges to {f} along one of the above seven modes of convergence, and {g_n} converges to {g} along the same mode, show that {f_n+g_n} converges to {f+g} along the same mode, and that {cf_n} converges to {cf} along the same mode for any {c \in {\bf C}}.
  3. (Squeeze test) If {f_n} converges to {0} along one of the above seven modes, and {|g_n| \leq f_n} pointwise for each {n}, show that {g_n} converges to {0} along the same mode.

We have some easy implications between modes:

Exercise 3 (Easy implications) Let {(X, {\mathcal B}, \mu)} be a measure space, and let {f_n: X \rightarrow {\bf C}} and {f: X \rightarrow {\bf C}} be measurable functions.

  1. If {f_n} converges to {f} uniformly, then {f_n} converges to {f} pointwise.
  2. If {f_n} converges to {f} uniformly, then {f_n} converges to {f} in {L^\infty} norm. Conversely, if {f_n} converges to {f} in {L^\infty} norm, then {f_n} converges to {f} uniformly outside of a null set (i.e. there exists a null set {E} such that the restriction {f_n\downharpoonright_{X \backslash E}} of {f_n} to the complement of {E} converges to the restriction {f\downharpoonright_{X \backslash E}} of {f}).
  3. If {f_n} converges to {f} in {L^\infty} norm, then {f_n} converges to {f} almost uniformly.
  4. If {f_n} converges to {f} almost uniformly, then {f_n} converges to {f} pointwise almost everywhere.
  5. If {f_n} converges to {f} pointwise, then {f_n} converges to {f} pointwise almost everywhere.
  6. If {f_n} converges to {f} in {L^1} norm, then {f_n} converges to {f} in measure.
  7. If {f_n} converges to {f} almost uniformly, then {f_n} converges to {f} in measure.

The reader is encouraged to draw a diagram that summarises the logical implications between the seven modes of convergence that the above exercise describes.

We give four key examples that distinguish between these modes, in the case when {X} is the real line {{\bf R}} with Lebesgue measure. The first three of these examples already were introduced in the previous set of notes.

Example 4 (Escape to horizontal infinity) Let {f_n := 1_{[n,n+1]}}. Then {f_n} converges to zero pointwise (and thus, pointwise almost everywhere), but not uniformly, in {L^\infty} norm, almost uniformly, in {L^1} norm, or in measure.

Example 5 (Escape to width infinity) Let {f_n := \frac{1}{n} 1_{[0,n]}}. Then {f_n} converges to zero uniformly (and thus, pointwise, pointwise almost everywhere, in {L^\infty} norm, almost uniformly, and in measure), but not in {L^1} norm.

Example 6 (Escape to vertical infinity) Let {f_n := n 1_{[\frac{1}{n}, \frac{2}{n}]}}. Then {f_n} converges to zero pointwise (and thus, pointwise almost everywhere) and almost uniformly (and hence in measure), but not uniformly, in {L^\infty} norm, or in {L^1} norm.

Example 7 (Typewriter sequence) Let {f_n} be defined by the formula

\displaystyle  f_n := 1_{[\frac{n-2^k}{2^k}, \frac{n-2^k+1}{2^k}]}

whenever {k \geq 0} and {2^k \leq n < 2^{k+1}}. This is a sequence of indicator functions of intervals of decreasing length, marching across the unit interval {[0,1]} over and over again. Then {f_n} converges to zero in measure and in {L^1} norm, but not pointwise almost everywhere (and hence also not pointwise, not almost uniformly, nor in {L^\infty} norm, nor uniformly).

Remark 8 The {L^\infty} norm {\|f\|_{L^\infty(\mu)}} of a measurable function {f: X \rightarrow {\bf C}} is defined to the infimum of all the quantities {M \in [0,+\infty]} that are essential upper bounds for {f} in the sense that {|f(x)| \leq M} for almost every {x}. Then {f_n} converges to {f} in {L^\infty} norm if and only if {\|f_n-f\|_{L^\infty(\mu)} \rightarrow 0} as {n \rightarrow \infty}. The {L^\infty} and {L^1} norms are part of the larger family of {L^p} norms, which we will study in more detail in 245B.

One particular advantage of {L^1} convergence is that, in the case when the {f_n} are absolutely integrable, it implies convergence of the integrals,

\displaystyle  \int_X f_n\ d\mu \rightarrow \int_X f\ d\mu,

as one sees from the triangle inequality. Unfortunately, none of the other modes of convergence automatically imply this convergence of the integral, as the above examples show.
The purpose of these notes is to compare these modes of convergence with each other. Unfortunately, the relationship between these modes is not particularly simple; unlike the situation with pointwise and uniform convergence, one cannot simply rank these modes in a linear order from strongest to weakest. This is ultimately because the different modes react in different ways to the three “escape to infinity” scenarios described above, as well as to the “typewriter” behaviour when a single set is “overwritten” many times. On the other hand, if one imposes some additional assumptions to shut down one or more of these escape to infinity scenarios, such as a finite measure hypothesis {\mu(X) < \infty} or a uniform integrability hypothesis, then one can obtain some additional implications between the different modes.

Read the rest of this entry »

Thus far, we have only focused on measure and integration theory in the context of Euclidean spaces {{\bf R}^d}. Now, we will work in a more abstract and general setting, in which the Euclidean space {{\bf R}^d} is replaced by a more general space {X}.

It turns out that in order to properly define measure and integration on a general space {X}, it is not enough to just specify the set {X}. One also needs to specify two additional pieces of data:

  1. A collection {{\mathcal B}} of subsets of {X} that one is allowed to measure; and
  2. The measure {\mu(E) \in [0,+\infty]} one assigns to each measurable set {E \in {\mathcal B}}.

For instance, Lebesgue measure theory covers the case when {X} is a Euclidean space {{\bf R}^d}, {{\mathcal B}} is the collection {{\mathcal B} = {\mathcal L}[{\bf R}^d]} of all Lebesgue measurable subsets of {{\bf R}^d}, and {\mu(E)} is the Lebesgue measure {\mu(E)=m(E)} of {E}.

The collection {{\mathcal B}} has to obey a number of axioms (e.g. being closed with respect to countable unions) that make it a {\sigma}-algebra, which is a stronger variant of the more well-known concept of a boolean algebra. Similarly, the measure {\mu} has to obey a number of axioms (most notably, a countable additivity axiom) in order to obtain a measure and integration theory comparable to the Lebesgue theory on Euclidean spaces. When all these axioms are satisfied, the triple {(X, {\mathcal B}, \mu)} is known as a measure space. These play much the same role in abstract measure theory that metric spaces or topological spaces play in abstract point-set topology, or that vector spaces play in abstract linear algebra.

On any measure space, one can set up the unsigned and absolutely convergent integrals in almost exactly the same way as was done in the previous notes for the Lebesgue integral on Euclidean spaces, although the approximation theorems are largely unavailable at this level of generality due to the lack of such concepts as “elementary set” or “continuous function” for an abstract measure space. On the other hand, one does have the fundamental convergence theorems for the subject, namely Fatou’s lemma, the monotone convergence theorem and the dominated convergence theorem, and we present these results here.

One question that will not be addressed much in this current set of notes is how one actually constructs interesting examples of measures. We will discuss this issue more in later notes (although one of the most powerful tools for such constructions, namely the Riesz representation theorem, will not be covered until 245B).

Read the rest of this entry »

In the previous notes, we defined the Lebesgue measure {m(E)} of a Lebesgue measurable set {E \subset {\bf R}^d}, and set out the basic properties of this measure. In this set of notes, we use Lebesgue measure to define the Lebesgue integral

\displaystyle \int_{{\bf R}^d} f(x)\ dx

of functions {f: {\bf R}^d \rightarrow {\bf C} \cup \{\infty\}}. Just as not every set can be measured by Lebesgue measure, not every function can be integrated by the Lebesgue integral; the function will need to be Lebesgue measurable. Furthermore, the function will either need to be unsigned (taking values on {[0,+\infty]}), or absolutely integrable.

To motivate the Lebesgue integral, let us first briefly review two simpler integration concepts. The first is that of an infinite summation

\displaystyle \sum_{n=1}^\infty c_n

of a sequence of numbers {c_n}, which can be viewed as a discrete analogue of the Lebesgue integral. Actually, there are two overlapping, but different, notions of summation that we wish to recall here. The first is that of the unsigned infinite sum, when the {c_n} lie in the extended non-negative real axis {[0,+\infty]}. In this case, the infinite sum can be defined as the limit of the partial sums

\displaystyle \sum_{n=1}^\infty c_n = \lim_{N \rightarrow \infty} \sum_{n=1}^N c_n \ \ \ \ \ (1)

 

or equivalently as a supremum of arbitrary finite partial sums:

\displaystyle \sum_{n=1}^\infty c_n = \sup_{A \subset {\bf N}, A \hbox{ finite}} \sum_{n \in A} c_n. \ \ \ \ \ (2)

 

The unsigned infinite sum {\sum_{n=1}^\infty c_n} always exists, but its value may be infinite, even when each term is individually finite (consider e.g. {\sum_{n=1}^\infty 1}).

The second notion of a summation is the absolutely summable infinite sum, in which the {c_n} lie in the complex plane {{\bf C}} and obey the absolute summability condition

\displaystyle \sum_{n=1}^\infty |c_n| < \infty,

where the left-hand side is of course an unsigned infinite sum. When this occurs, one can show that the partial sums {\sum_{n=1}^N c_n} converge to a limit, and we can then define the infinite sum by the same formula (1) as in the unsigned case, though now the sum takes values in {{\bf C}} rather than {[0,+\infty]}. The absolute summability condition confers a number of useful properties that are not obeyed by sums that are merely conditionally convergent; most notably, the value of an absolutely convergent sum is unchanged if one rearranges the terms in the series in an arbitrary fashion. Note also that the absolutely summable infinite sums can be defined in terms of the unsigned infinite sums by taking advantage of the formulae

\displaystyle \sum_{n=1}^\infty c_n = (\sum_{n=1}^\infty \hbox{Re}(c_n)) + i (\sum_{n=1}^\infty \hbox{Im}(c_n))

for complex absolutely summable {c_n}, and

\displaystyle \sum_{n=1}^\infty c_n = \sum_{n=1}^\infty c_n^+ - \sum_{n=1}^\infty c_n^-

for real absolutely summable {c_n}, where {c_n^+ := \max(c_n,0)} and {c_n^- := \max(-c_n,0)} are the (magnitudes of the) positive and negative parts of {c_n}.

In an analogous spirit, we will first define an unsigned Lebesgue integral {\int_{{\bf R}^d} f(x)\ dx} of (measurable) unsigned functions {f: {\bf R}^d \rightarrow [0,+\infty]}, and then use that to define the absolutely convergent Lebesgue integral {\int_{{\bf R}^d} f(x)\ dx} of absolutely integrable functions {f: {\bf R}^d \rightarrow {\bf C} \cup \{\infty\}}. (In contrast to absolutely summable series, which cannot have any infinite terms, absolutely integrable functions will be allowed to occasionally become infinite. However, as we will see, this can only happen on a set of Lebesgue measure zero.)

To define the unsigned Lebesgue integral, we now turn to another more basic notion of integration, namely the Riemann integral {\int_a^b f(x)\ dx} of a Riemann integrable function {f: [a,b] \rightarrow {\bf R}}. Recall from the prologue that this integral is equal to the lower Darboux integral

\displaystyle \int_a^b f(x) = \underline{\int_a^b} f(x)\ dx := \sup_{g \leq f; g \hbox{ piecewise constant}} \hbox{p.c.} \int_a^b g(x)\ dx.

(It is also equal to the upper Darboux integral; but much as the theory of Lebesgue measure is easiest to define by relying solely on outer measure and not on inner measure, the theory of the unsigned Lebesgue integral is easiest to define by relying solely on lower integrals rather than upper ones; the upper integral is somewhat problematic when dealing with “improper” integrals of functions that are unbounded or are supported on sets of infinite measure.) Compare this formula also with (2). The integral {\hbox{p.c.} \int_a^b g(x)\ dx} is a piecewise constant integral, formed by breaking up the piecewise constant functions {g, h} into finite linear combinations of indicator functions of intervals, and then measuring the length of each interval.

It turns out that virtually the same definition allows us to define a lower Lebesgue integral {\underline{\int_{{\bf R}^d}} f(x)\ dx} of any unsigned function {f: {\bf R}^d \rightarrow [0,+\infty]}, simply by replacing intervals with the more general class of Lebesgue measurable sets (and thus replacing piecewise constant functions with the more general class of simple functions). If the function is Lebesgue measurable (a concept that we will define presently), then we refer to the lower Lebesgue integral simply as the Lebesgue integral. As we shall see, it obeys all the basic properties one expects of an integral, such as monotonicity and additivity; in subsequent notes we will also see that it behaves quite well with respect to limits, as we shall see by establishing the two basic convergence theorems of the unsigned Lebesgue integral, namely Fatou’s lemma and the monotone convergence theorem.

Once we have the theory of the unsigned Lebesgue integral, we will then be able to define the absolutely convergent Lebesgue integral, similarly to how the absolutely convergent infinite sum can be defined using the unsigned infinite sum. This integral also obeys all the basic properties one expects, such as linearity and compatibility with the more classical Riemann integral; in subsequent notes we will see that it also obeys a fundamentally important convergence theorem, the dominated convergence theorem. This convergence theorem makes the Lebesgue integral (and its abstract generalisations to other measure spaces than {{\bf R}^d}) particularly suitable for analysis, as well as allied fields that rely heavily on limits of functions, such as PDE, probability, and ergodic theory.

Remark 1 This is not the only route to setting up the unsigned and absolutely convergent Lebesgue integrals. Stein-Shakarchi, for instance, proceeds slightly differently, beginning with the unsigned integral but then making an auxiliary stop at integration of functions that are bounded and are supported on a set of finite measure, before going to the absolutely convergent Lebesgue integral. Another approach (which will not be discussed here) is to take the metric completion of the Riemann integral with respect to the {L^1} metric.

The Lebesgue integral and Lebesgue measure can be viewed as completions of the Riemann integral and Jordan measure respectively. This means three things. Firstly, the Lebesgue theory extends the Riemann theory: every Jordan measurable set is Lebesgue measurable, and every Riemann integrable function is Lebesgue measurable, with the measures and integrals from the two theories being compatible. Conversely, the Lebesgue theory can be approximated by the Riemann theory; as we saw in the previous notes, every Lebesgue measurable set can be approximated (in various senses) by simpler sets, such as open sets or elementary sets, and in a similar fashion, Lebesgue measurable functions can be approximated by nicer functions, such as Riemann integrable or continuous functions. Finally, the Lebesgue theory is complete in various ways; we will formalise this properly only in the next quarter when we study {L^p} spaces, but the convergence theorems mentioned above already hint at this completeness. A related fact, known as Egorov’s theorem, asserts that a pointwise converging sequence of functions can be approximated as a (locally) uniformly converging sequence of functions. The facts listed here manifestations of Littlewood’s three principles of real analysis, which capture much of the essence of the Lebesgue theory.

Read the rest of this entry »

In the prologue for this course, we recalled the classical theory of Jordan measure on Euclidean spaces {{\bf R}^d}. This theory proceeded in the following stages:

  1. First, one defined the notion of a box {B} and its volume {|B|}.
  2. Using this, one defined the notion of an elementary set {E} (a finite union of boxes), and defines the elementary measure {m(E)} of such sets.
  3. From this, one defined the inner and outer Jordan measures {m_{*,(J)}(E), m^{*,(J)}(E)} of an arbitrary bounded set {E \subset {\bf R}^d}. If those measures match, we say that {E} is Jordan measurable, and call {m(E) = m_{*,(J)}(E) = m^{*,(J)}(E)} the Jordan measure of {E}.

As long as one is lucky enough to only have to deal with Jordan measurable sets, the theory of Jordan measure works well enough. However, as noted previously, not all sets are Jordan measurable, even if one restricts attention to bounded sets. In fact, we shall see later in these notes that there even exist bounded open sets, or compact sets, which are not Jordan measurable, so the Jordan theory does not cover many classes of sets of interest. Another class that it fails to cover is countable unions or intersections of sets that are already known to be measurable:

Exercise 1 Show that the countable union {\bigcup_{n=1}^\infty E_n} or countable intersection {\bigcap_{n=1}^\infty E_n} of Jordan measurable sets {E_1, E_2, \ldots \subset {\bf R}} need not be Jordan measurable, even when bounded.

This creates problems with Riemann integrability (which, as we saw in the preceding notes, was closely related to Jordan measure) and pointwise limits:

Exercise 2 Give an example of a sequence of uniformly bounded, Riemann integrable functions {f_n: [0,1] \rightarrow {\bf R}} for {n=1,2,\ldots} that converge pointwise to a bounded function {f: [0,1] \rightarrow {\bf R}} that is not Riemann integrable. What happens if we replace pointwise convergence with uniform convergence?

These issues can be rectified by using a more powerful notion of measure than Jordan measure, namely Lebesgue measure. To define this measure, we first tinker with the notion of the Jordan outer measure

\displaystyle  m^{*,(J)}(E) := \inf_{B \supset E; B \hbox{ elementary}} m(B)

of a set {E \subset {\bf R}^d} (we adopt the convention that {m^{*,(J)}(E) = +\infty} if {E} is unbounded, thus {m^{*,(J)}} now takes values in the extended non-negative reals {[0,+\infty]}, whose properties we will briefly review below). Observe from the finite additivity and subadditivity of elementary measure that we can also write the Jordan outer measure as

\displaystyle  m^{*,(J)}(E) := \inf_{B_1 \cup \ldots \cup B_k \supset E; B_1,\ldots, B_k \hbox{ boxes}} |B_1| + \ldots + |B_k|,

i.e. the Jordan outer measure is the infimal cost required to cover {E} by a finite union of boxes. (The natural number {k} is allowed to vary freely in the above infimum.) We now modify this by replacing the finite union of boxes by a countable union of boxes, leading to the Lebesgue outer measure {m^*(E)} of {E}:

\displaystyle  m^*(E) := \inf_{\bigcup_{n=1}^\infty B_n \supset E; B_1, B_2, \ldots \hbox{ boxes}} \sum_{n=1}^\infty |B_n|,

thus the Lebesgue outer measure is the infimal cost required to cover {E} by a countable union of boxes. Note that the countable sum {\sum_{n=1}^\infty |B_n|} may be infinite, and so the Lebesgue outer measure {m^*(E)} could well equal {+\infty}.

(Caution: the Lebesgue outer measure {m^*(E)} is sometimes denoted {m_*(E)}; this is for instance the case in Stein-Shakarchi.)

Clearly, we always have {m^*(E) \leq m^{*,(J)}(E)} (since we can always pad out a finite union of boxes into an infinite union by adding an infinite number of empty boxes). But {m^*(E)} can be a lot smaller:

Example 1 Let {E = \{ x_1, x_2, x_3, \ldots \} \subset {\bf R}^d} be a countable set. We know that the Jordan outer measure of {E} can be quite large; for instance, in one dimension, {m^{*,(J)}({\bf Q})} is infinite, and {m^{*,(J)}({\bf Q} \cap [-R,R]) = m^{*,(J)}([-R,R]) = 2R} since {{\bf Q} \cap [-R,R]} has {[-R,R]} as its closure (see Exercise 18 of the prologue). On the other hand, all countable sets {E} have Lebesgue outer measure zero. Indeed, one simply covers {E} by the degenerate boxes {\{x_1\}, \{x_2\}, \ldots} of sidelength and volume zero.

Alternatively, if one does not like degenerate boxes, one can cover each {x_n} by a cube {B_n} of sidelength {\epsilon/2^n} (say) for some arbitrary {\epsilon > 0}, leading to a total cost of {\sum_{n=1}^\infty (\epsilon/2^n)^d}, which converges to {C_d \epsilon^d} for some absolute constant {C_d}. As {\epsilon} can be arbitrarily small, we see that the Lebesgue outer measure must be zero. We will refer to this type of trick as the {\epsilon/2^n} trick; it will be used many further times in this course.

From this example we see in particular that a set may be unbounded while still having Lebesgue outer measure zero, in contrast to Jordan outer measure.

As we shall see later in this course, Lebesgue outer measure (also known as Lebesgue exterior measure) is a special case of a more general concept known as an outer measure.

In analogy with the Jordan theory, we would also like to define a concept of “Lebesgue inner measure” to complement that of outer measure. Here, there is an asymmetry (which ultimately arises from the fact that elementary measure is subadditive rather than superadditive): one does not gain any increase in power in the Jordan inner measure by replacing finite unions of boxes with countable ones. But one can get a sort of Lebesgue inner measure by taking complements; see Exercise 18. This leads to one possible definition for Lebesgue measurability, namely the Carathéodory criterion for Lebesgue measurability, see Exercise 17. However, this is not the most intuitive formulation of this concept to work with, and we will instead use a different (but logically equivalent) definition of Lebesgue measurability. The starting point is the observation (see Exercise 5 of the prologue) that Jordan measurable sets can be efficiently contained in elementary sets, with an error that has small Jordan outer measure. In a similar vein, we will define Lebesgue measurable sets to be sets that can be efficiently contained in open sets, with an error that has small Lebesgue outer measure:

Definition 1 (Lebesgue measurability) A set {E \subset {\bf R}^d} is said to be Lebesgue measurable if, for every {\epsilon > 0}, there exists an open set {U \subset {\bf R}^d} containing {E} such that {m^*(U \backslash E) \leq \epsilon}. If {E} is Lebesgue measurable, we refer to {m(E) := m^*(E)} as the Lebesgue measure of {E} (note that this quantity may be equal to {+\infty}). We also write {m(E)} as {m^d(E)} when we wish to emphasise the dimension {d}.

(The intuition that measurable sets are almost open is also known as Littlewood’s first principle, this principle is a triviality with our current choice of definitions, though less so if one uses other, equivalent, definitions of Lebesgue measurability.)

As we shall see later, Lebesgue measure extends Jordan measure, in the sense that every Jordan measurable set is Lebesgue measurable, and the Lebesgue measure and Jordan measure of a Jordan measurable set are always equal. We will also see a few other equivalent descriptions of the concept of Lebesgue measurability.

In the notes below we will establish the basic properties of Lebesgue measure. Broadly speaking, this concept obeys all the intuitive properties one would ask of measure, so long as one restricts attention to countable operations rather than uncountable ones, and as long as one restricts attention to Lebesgue measurable sets. The latter is not a serious restriction in practice, as almost every set one actually encounters in analysis will be measurable (the main exceptions being some pathological sets that are constructed using the axiom of choice). In the next set of notes we will use Lebesgue measure to set up the Lebesgue integral, which extends the Riemann integral in the same way that Lebesgue measure extends Jordan measure; and the many pleasant properties of Lebesgue measure will be reflected in analogous pleasant properties of the Lebesgue integral (most notably the convergence theorems).

We will treat all dimensions {d=1,2,\ldots} equally here, but for the purposes of drawing pictures, we recommend to the reader that one sets {d} equal to {2}. However, for this topic at least, no additional mathematical difficulties will be encountered in the higher-dimensional case (though of course there are significant visual difficulties once {d} exceeds {3}).

The material here is based on Sections 1.1-1.3 of the Stein-Shakarchi text, though it is arranged somewhat differently.

Read the rest of this entry »

One of the most fundamental concepts in Euclidean geometry is that of the measure {m(E)} of a solid body {E} in one or more dimensions. In one, two, and three dimensions, we refer to this measure as the length, area, or volume of {E} respectively. In the classical approach to geometry, the measure of a body was often computed by partitioning that body into finitely many components, moving around each component by a rigid motion (e.g. a translation or rotation), and then reassembling those components to form a simpler body which presumably has the same area. One could also obtain lower and upper bounds on the measure of a body by computing the measure of some inscribed or circumscribed body; this ancient idea goes all the way back to the work of Archimedes at least. Such arguments can be justified by an appeal to geometric intuition, or simply by postulating the existence of a measure {m(E)} that can be assigned to all solid bodies {E}, and which obeys a collection of geometrically reasonable axioms. One can also justify the concept of measure on “physical” or “reductionistic” grounds, viewing the measure of a macroscopic body as the sum of the measures of its microscopic components.

With the advent of analytic geometry, however, Euclidean geometry became reinterpreted as the study of Cartesian products {{\bf R}^d} of the real line {{\bf R}}. Using this analytic foundation rather than the classical geometrical one, it was no longer intuitively obvious how to define the measure {m(E)} of a general subset {E} of {{\bf R}^d}; we will refer to this (somewhat vaguely defined) problem of writing down the “correct” definition of measure as the problem of measure. (One can also pose the problem of measure on other domains than Euclidean space, such as a Riemannian manifold, but we will focus on the Euclidean case here for simplicity.)

To see why this problem exists at all, let us try to formalise some of the intuition for measure discussed earlier. The physical intuition of defining the measure of a body {E} to be the sum of the measure of its component “atoms” runs into an immediate problem: a typical solid body would consist of an infinite (and uncountable) number of points, each of which has a measure of zero; and the product {\infty \cdot 0} is indeterminate. To make matters worse, two bodies that have exactly the same number of points, need not have the same measure. For instance, in one dimension, the intervals {A := [0,1]} and {B := [0,2]} are in one-to-one correspondence (using the bijection {x \mapsto 2x} from {A} to {B}), but of course {B} is twice as long as {A}. So one can disassemble {A} into an uncountable number of points and reassemble them to form a set of twice the length.

Of course, one can point to the infinite (and uncountable) number of components in this disassembly as being the cause of this breakdown of intuition, and restrict attention to just finite partitions. But one still runs into trouble here for a number of reasons, the most striking of which is the Banach-Tarski paradox, which shows that the unit ball {B := \{ (x,y,z) \in {\bf R}^3: x^2+y^2+z^2 \leq 1 \}} in three dimensions can be disassembled into a finite number of pieces (in fact, just five pieces suffice), which can then be reassembled (after translating and rotating each of the pieces) to form two disjoint copies of the ball {B}. (The paradox only works in three dimensions and higher, for reasons having to do with the property of amenability; see this blog post for further discussion of this interesting topic, which is unfortunately too much of a digression from the current subject.)

Here, the problem is that the pieces used in this decomposition are highly pathological in nature; among other things, their construction requires use of the axiom of choice. (This is in fact necessary; there are models of set theory without the axiom of choice in which the Banach-Tarski paradox does not occur, thanks to a famous theorem of Solovay.) Such pathological sets almost never come up in practical applications of mathematics. Because of this, the standard solution to the problem of measure has been to abandon the goal of measuring every subset {E} of {{\bf R}^d}, and instead to settle for only measuring a certain subclass of “non-pathological” subsets of {{\bf R}^d}, which are then referred to as the measurable sets. The problem of measure then divides into several subproblems:

  1. What does it mean for a subset {E} of {{\bf R}^d} to be measurable?
  2. If a set {E} is measurable, how does one define its measure?
  3. What nice properties or axioms does measure (or the concept of measurability) obey?
  4. Are “ordinary” sets such as cubes, balls, polyhedra, etc. measurable?
  5. Does the measure of an “ordinary” set equal the “naive geometric measure” of such sets? (e.g. is the measure of an {a \times b} rectangle equal to {ab}?)

These questions are somewhat open-ended in formulation, and there is no unique answer to them; in particular, one can expand the class of measurable sets at the expense of losing one or more nice properties of measure in the process (e.g. finite or countable additivity, translation invariance, or rotation invariance). However, there are two basic answers which, between them, suffice for most applications. The first is the concept of Jordan measure of a Jordan measurable set, which is a concept closely related to that of the Riemann integral (or Darboux integral). This concept is elementary enough to be systematically studied in an undergraduate analysis course, and suffices for measuring most of the “ordinary” sets (e.g. the area under the graph of a continuous function) in many branches of mathematics. However, when one turns to the type of sets that arise in analysis, and in particular those sets that arise as limits (in various senses) of other sets, it turns out that the Jordan concept of measurability is not quite adequate, and must be extended to the more general notion of Lebesgue measurability, with the corresponding notion of Lebesgue measure that extends Jordan measure. With the Lebesgue theory (which can be viewed as a completion of the Jordan-Darboux-Riemann theory), one keeps almost all of the desirable properties of Jordan measure, but with the crucial additional property that many features of the Lebesgue theory are preserved under limits (as exemplified in the fundamental convergence theorems of the Lebesgue theory, such as the monotone convergence theorem and the dominated convergence theorem, which do not hold in the Jordan-Darboux-Riemann setting). As such, they are particularly well suited for applications in analysis, where limits of functions or sets arise all the time. (There are other ways to extend Jordan measure and the Riemann integral, but the Lebesgue approach handles limits better than the other alternatives, and so has become the standard approach in analysis.)

In the rest of the course, we will formally define Lebesgue measure and the Lebesgue integral, as well as the more general concept of an abstract measure space and the associated integration operation. In the rest of this post, we will discuss the more elementary concepts of Jordan measure and the Riemann integral. This material will eventually be superceded by the more powerful theory to be treated in the main body of the course; but it will serve as motivation for that later material, as well as providing some continuity with the treatment of measure and integration in undergraduate analysis courses.

Read the rest of this entry »

Archives