Now that we have reviewed the foundations of measure theory, let us now put it to work to set up the basic theory of one of the fundamental families of function spaces in analysis, namely the L^p spaces (also known as Lebesgue spaces). These spaces serve as important model examples for the general theory of topological and normed vector spaces, which we will discuss a little bit in this lecture and then in much greater detail in later lectures. (See also my previous blog post on function spaces.)

Just as scalar quantities live in the space of real or complex numbers, and vector quantities live in vector spaces, functions f: X \to {\Bbb C} (or other objects closely related to functions, such as measures) live in function spaces. Like other spaces in mathematics (e.g. vector spaces, metric spaces, topological spaces, etc.) a function space V is not just mere sets of objects (in this case, the objects are functions), but they also come with various important structures that allow one to do some useful operations inside these spaces, and from one space to another. For example, function spaces tend to have several (though usually not all) of the following types of structures, which are usually related to each other by various compatibility conditions:

  1. Vector space structure. One can often add two functions f, g in a function space V, and expect to get another function f+g in that space V; similarly, one can multiply a function f in V by a scalar c and get another function cf in V. Usually, these operations obey the axioms of a vector space, though it is important to caution that the dimension of a function space is typically infinite. (In some cases, the space of scalars is a more complicated ring than the real or complex field, in which case we need the notion of a module rather than a vector space, but we will not use this more general notion in this course.) Virtually all of the function spaces we shall encounter in this course will be vector spaces. Because the field of scalars is real or complex, vector spaces also come with the notion of convexity, which turns out to be crucial in many aspects of analysis. As a consequence (and in marked contrast to algebra or number theory), much of the theory in real analysis does not seem to extend to other fields of scalars (in particular, real analysis fails spectacularly in the finite characteristic setting).
  2. Algebra structure. Sometimes (though not always), we also wish to multiply two functions f, g in V and get another function fg in V; when combined with the vector space structure and assuming some compatibility conditions (e.g. the distributive law), this makes V an algebra. This multiplication operation is often just pointwise multiplication, but there are other important multiplication operations on function spaces too, such as convolution. (One sometimes sees other algebraic structures than multiplication appear in function spaces, most notably derivations, but again we will not encounter those in this course. Another common algebraic operation for function spaces is conjugation or adjoint, leading to the notion of a *-algebra.)
  3. Norm structure. We often want to distinguish “large” functions in V from “small” ones, especially in analysis, in which “small” terms in an expression are routinely discarded or deemed to be acceptable errors. One way to do this is to assign a magnitude or norm \|f\|_V to each function that measures its size. Unlike the situation with scalars, where there is basically a single notion of magnitude, functions have a wide variety of useful notions of size, each measuring a different aspect (or combination of aspects) of the function, such as height, width, oscillation, regularity, decay, and so forth. Typically, each such norm gives rise to a separate function space (although sometimes it is useful to consider a single function space with multiple norms on it). We usually require the norm to be compatible with the vector space structure (and algebra structure, if present), for instance by demanding that the triangle inequality hold.
  4. Metric structure. We also want to tell whether two functions f, g in a function space V are “near together” or “far apart”. A typical way to do this is to impose a metric d: V \times V \to {\Bbb R}^+ on the space V. If both a norm \| \|_V and a vector space structure are available, there is an obvious way to do this: define the distance between two functions f, g in V to be d( f, g ) := \|f-g\|_V. (This will be the only type of metric on function spaces encountered in this course. But there are some nonlinear function spaces of importance in nonlinear analysis (e.g. spaces of maps from one manifold to another) which have no vector space structure or norm, but still have a metric.) It is often important to know if the vector space is complete with respect to the given metric; this allows one to take limits of Cauchy sequences, and (with a norm and vector space structure) sum absolutely convergent series, as well as use some useful results from point set topology such as the Baire category theorem. All of these operations are of course vital in analysis. [Compactness would be an even better property than completeness to have, but function spaces unfortunately tend be non-compact in various rather nasty ways, although there are useful partial substitutes for compactness that are available, see e.g. this blog post of mine.]
  5. Topological structure. It is often important to know when a sequence (or, occasionally, nets) of functions f_n in V “converges” in some sense to a limit f (which, hopefully, is still in V); there are often many distinct modes of convergence (e.g. pointwise convergence, uniform convergence, etc.) that one wishes to carefully distinguish from each other. Also, in order to apply various powerful topological theorems (or to justify various formal operations involving limits, suprema, etc.), it is important to know when certain subsets of V enjoy key topological properties (most notably compactness and connectedness), and to know which operations on V are continuous. For all of this, one needs a topology on V. If one already has a metric, then one of course has a topology generated by the open balls of that metric; but there are many important topologies on function spaces in analysis that do not arise from metrics. We also often require the topology to be compatible with the other structures on the function space; for instance, we usually require the vector space operations of addition and scalar multiplication to be continuous. In some cases, the topology on V extends to some natural superspace W of more general functions that contain V; in such cases, it is often important to know whether V is closed in W, so that limits of sequences in V stay in V.
  6. Functional structures. Since numbers are easier to understand and deal with than functions, it is not surprising that we often study functions f in a function space V by first applying some functional \lambda: V \to {\Bbb C} to V to identify some key numerical quantity \lambda(f) associated to f. Norms f \mapsto \|f\|_V are of course one important example of a functional; integration f \mapsto \int_X f\ d\mu provides another; and evaluation f \mapsto f(x) at a point x provides a third important class. (Note, though, that while evaluation is the fundamental feature of a function in set theory, it is often a quite minor operation in analysis; indeed, in many function spaces, evaluation is not even defined at all, for instance because the functions in the space are only defined almost everywhere!) An inner product \langle,\rangle on V (see below) also provides a large family f \mapsto \langle f, g \rangle of useful functionals. It is of particular interest to study functionals that are compatible with the vector space structure (i.e. are linear) and with the topological structure (i.e. are continuous); this will give rise to the important notion of duality on function spaces.
  7. Inner product structure. One often would like to pair a function f in a function space V with another object g (which is often, though not always, another function in the same function space V) and obtain a number \langle f, g \rangle, that typically measures the amount of “interaction” or “correlation” between f and g. Typical examples include inner products arising from integration, such as \langle f, g\rangle := \int_X f \overline{g}\ d\mu; integration itself can also be viewed as a pairing, \langle f, \mu \rangle := \int_X f\ d\mu. Of course, we usually require such inner products to be compatible with the other structures present on the space (e.g., to be compatible with the vector space structure, we usually require the inner product to be bilinear or sesquilinear). Inner products, when available, are incredibly useful in understanding the metric and norm geometry of a space, due to such fundamental facts as the Cauchy-Schwarz inequality and the parallelogram law. They also give rise to the important notion of orthogonality between functions.
  8. Group actions. We often expect our function spaces to enjoy various symmetries; we might wish to rotate, reflect, translate, modulate, or dilate our functions and expect to preserve most of the structure of the space when doing so. In modern mathematics, symmetries are usually encoded by group actions (or actions of other group-like objects, such as semigroups or groupoids; one also often upgrades groups to more structured objects such as Lie groups). As usual, we typically require the group action to preserve the other structures present on the space, e.g. one often restricts attention to group actions that are linear (to preserve the vector space structure), continuous (to preserve topological structure), unitary (to preserve inner product structure), isometric (to preserve metric structure), and so forth. Besides giving us useful symmetries to spend, the presence of such group actions allows one to apply the powerful techniques of representation theory, Fourier analysis, and ergodic theory. However, as this is a foundational real analysis class, we will not discuss these important topics much here (and in fact will not deal with group actions much at all).
  9. Order structure. In some cases, we want to utilise the notion of a function f being “non-negative”, or “dominating” another function g. One might also want to take the “max” or “supremum” of two or more functions in a function space V, or split a function into “positive” and “negative” components. Such order structures interact with the other structures on a space in many useful ways (e.g. via the Stone-Weierstrass theorem). Much like convexity, order structure is specific to the real line and is another reason why much of real analysis breaks down over other fields. (The complex plane is of course an extension of the real line and so is able to exploit the order structure of that line, usually by treating the real and imaginary components separately.)

There are of course many ways to combine various flavours of these structures together, and there are entire subfields of mathematics that are devoted to studying particularly common and useful categories of such combinations (e.g. topological vector spaces, normed vector spaces, Banach spaces, Banach algebras, von Neumann algebras, C^* algebras, Frechet spaces, Hilbert spaces, group algebras, etc.). The study of these sorts of spaces is known collectively as functional analysis. We will study some (but certainly not all) of these combinations in an abstract and general setting later in this course, but to begin with we will focus on the L^p spaces, which are very good model examples for many of the above general classes of spaces, and also of importance in many applications of analysis (such as probability or PDE).

L^p spaces –

In this post, (X, {\mathcal X}, \mu) will be a fixed measure space; notions such as “measurable”, “measure”, “almost everywhere”, etc. will always be with respect to this space, unless otherwise specified. Similarly, unless otherwise specified, all subsets of X mentioned are restricted to be measurable, as are all scalar functions on X.

For sake of concreteness, we shall select the field of scalars to be the complex numbers {\Bbb C}. The theory of real Lebesgue spaces is virtually identical to that of complex Lebesgue spaces, and the former can largely be deduced from the latter as a special case.

We already have the notion of an absolutely integrable function on X, which is a function f: X \to {\Bbb C} such that \int_X |f|\ d\mu is finite. More generally, given any exponent 0 < p < \infty, we can define a p^{th}-power integrable function to be a function f: X \to {\Bbb C} such that \int_X |f|^p\ d\mu is finite. (Besides p=1, the case of most interest is the case of square-integrable functions, when p=2. We will also extend this notion later to p=\infty, which is also an important special case.)

Remark 1. One can also extend these notions to functions that take values in the extended complex plane {\Bbb C} \cup \{\infty\}, but one easily observes that p^{th} power integrable functions must be finite almost everywhere, and so there is essentially no increase in generality afforded by extending the range in this manner. \diamond

Following the “Lebesgue philosophy” that one should ignore whatever is going on on a set of measure zero, let us declare two measurable functions to be equivalent if they agree almost everywhere. This is easily checked to be an equivalence relation, which does not affect the property of being p^{th}-power integrable. Thus, we can define the Lebesgue space L^p(X,{\mathcal X},\mu) to be the space of p^{th}-power integrable functions, quotiented out by this equivalence relation. Thus, strictly speaking, a typical element of L^p(X,{\mathcal X},\mu) is not actually a specific function f, but is instead an equivalence class [f], consisting of all functions equivalent to a single function f. However, we shall abuse notation and speak loosely of a function f “belonging” to L^p(X,{\mathcal X},\mu), where it is understood that f is only defined up to equivalence, or more imprecisely is “defined almost everywhere”. For the purposes of integration, this equivalence is quite harmless, but this convention does mean that we can no longer evaluate a function f in L^p(X,{\mathcal X},\mu) at a single point x if that point x has zero measure. It takes a little bit of getting used to the idea of a function that cannot actually be evaluated at any specific point, but with some practice you will find that it will not cause any significant conceptual difficulty. [One could also take a more abstract view, dispensing with the set X altogether and defining the Lebesgue space L^p({\mathcal X},\mu) on abstract measure spaces ({\mathcal X},\mu), but we will not do so here.  Another way to think about elements of L^p is that they are functions which are "unreliable" on an unknown set of measure zero, but remain "reliable" almost everywhere.]

Exercise 0. If (X,{\mathcal X},\mu) is a measure space, and \overline{\mathcal X} is the completion of {\mathcal X}, show that the spaces L^p(X, {\mathcal X},\mu) and L^p(X, \overline{\mathcal X}, \mu) are isomorphic using the obvious candidate for the isomorphism.  Because of this, when dealing with L^p spaces, we will usually not be too concerned with whether the underlying measure space is complete. \diamond

Remark 2. Depending on which of the three structures X, {\mathcal X}, \mu of the measure space one wishes to emphasise, the space L^p(X,{\mathcal X},\mu) is often abbreviated L^p(X), L^p({\mathcal X}), L^p(X,\mu), or even just L^p. Since for this discussion the measure space (X,{\mathcal X},\mu) will be fixed, we shall usually use the L^p abbreviation in this post. When the space X is discrete (i.e. {\mathcal X}=2^X) and \mu is counting measure, then L^p(X,{\mathcal X},\mu) is usually abbreviated \ell^p(X) or just \ell^p (and the almost everywhere equivqlence relation trivialises and can thus be completely ignored). \diamond

At present, the Lebesgue spaces L^p are just sets. We now begin to place several of the structures mentioned in the introduction to upgrade these sets to richer spaces.

We begin with vector space structure. Fix 0 < p < \infty, and let f, g \in L^p be two p^{th}-power integrable functions. From the crude pointwise (or more precisely, “pointwise almost everywhere”) inequality

|f(x) + g(x)|^p \leq (2 \max(|f(x)|,|g(x)|))^p

= 2^p \max( |f(x)|^p, |g(x)|^p ) \leq 2^p (|f(x)|^p + |g(x)|^p) (1)

we see that the sum of two p^{th}-power integrable functions is also p^{th}-power integrable. It is also easy to see that any scalar multiple of a p^{th}-power integrable function is also p^{th}-power integrable. These operations respect almost everywhere equivalence, and so L^p becomes a (complex) vector space.

Next, we set up the norm structure. If f \in L^p, we define the L^p norm \|f\|_{L^p} of f to be the number

\|f\|_{L^p} := (\int_X |f|^p\ d\mu)^{1/p}; (2)

this is a finite non-negative number by definition of L^p; in particular, we have the identity

\|f^r \|_{L^p} = \|f\|_{L^{pr}}^r (3)

for all 0 < p,r < \infty.

The L^p norm has the following three basic properties:

Lemma 1. Let 0 < p < \infty and f, g \in L^p.

  1. (Non-degeneracy) \|f\|_{L^p} = 0 if and only if f = 0.
  2. (Homogeneity) \| cf \|_{L^p} = |c| \|f\|_{L^p} for all complex numbers c.
  3. ((Quasi-)triangle inequality) We have \|f+g\|_{L^p} \leq C ( \|f\|_{L^p} + \|g\|_{L^p} ) for some constant C depending on p. If p \geq 1, then we can take C=1 (this fact is also known as Minkowski’s inequality).

Proof. The claims 1, 2 are obvious. (Note how important it is that we equate functions that vanish almost everywhere in order to get 1.) The quasi-triangle inequality follows from a variant of the estimates in (1) and is left as an exercise. For the triangle inequality, we have to be more efficient than the crude estimate (1). By the non-degeneracy property we may take \|f\|_{L^p} and \|g\|_{L^p} to be non-zero. Using the homogeneity, we can normalise \|f\|_{L^p}+\|g\|_{L^p} to equal 1, thus (by homogeneity again) we can write f = (1-\theta) F and g = \theta G for some 0 < \theta < 1 and F, G \in L^p with \|F\|_{L^p}=\|G\|_{L^p}=1. Our task is now to show that

\int_X | (1-\theta) F(x) + \theta G(x)|^p\ d\mu \leq 1. (4)

But observe that for 1 \leq p < \infty, the function x \mapsto |x|^p is convex on {\Bbb C}, and in particular that

|(1-\theta) F(x) + \theta G(x)|^p \leq (1-\theta) |F(x)|^p + \theta |G(x)|^p. (5)

(If one wishes, one can use the complex triangle inequality to first reduce to the case when F, G are non-negative, in which case one only needs convexity on {}[0,+\infty) rather than all of {\Bbb C}.) The claim (4) then follows from (5) and the normalisations of F, G. \Box

Exercise 1. Let 0 < p \leq 1 and f, g \in L^p.

  1. Establish the variant \|f+g\|_{L^p}^p \leq \|f\|_{L^p}^p + \|g\|_{L^p}^p of the triangle inequality.
  2. If furthermore f and g are non-negative (almost everywhere), establish also the reverse triangle inequality \|f+g\|_{L^p} \geq \|f\|_{L^p} + \|g\|_{L^p}.
  3. Show that the best constant C in the quasi-triangle inequality is 2^{\frac{1}{p}-1}. In particular, the triangle inequality is false for p < 1. \diamond
  4. Now suppose instead that 1 < p < \infty or 0 < p < 1. If f, g \in L^p are nonnegative and such that \|f+g\|_{L^p} = \|f\|_{L^p} + \|g\|_{L^p}, show that one of the functions f, g is a non-negative scalar multiple of the other (up to equivalence, of course). What happens when p=1?

A vector space V with a function \| \|: V \to [0,+\infty) obeying the non-degeneracy, homogeneity, and (quasi-)triangle inequality is known as a (quasi-)normed vector space, and the function f \mapsto \|f\| is then known as a (quasi-)norm; thus L^p is a normed vector space for 1 \leq p < \infty but only a quasi-normed vector space for 0 < p < 1. A function \| \|: V \to [0,+\infty) obeying the homogeneity and triangle inequality, but not necessarily the non-degeneracy property, is known as a seminorm; thus for instance the L^p norms for 1 \leq p < \infty would have been seminorms if we did not equate functions that agreed almost everywhere. (Conversely, given a seminormed vector space (V,\|\|), one can convert it into a normed vector space by quotienting out the subspace \{ f \in V: \|f\| = 0 \}; we leave the details as an exercise for the reader.)

Exercise 2. Let \| \|: V \to [0,+\infty) be a function on a vector space which obeys the non-degeneracy and homogeneity properties.  Show that \| \| is a norm if and only if the closed unit ball \{ x: \|x\| \leq 1 \} is convex; show that the same equivalence also holds for the open unit ball. This emphasises the geometric nature of the triangle inequality.  \diamond

Exercise 3. If f \in L^p for some 0 < p < \infty, show that the support \{ x \in X: f(x) \neq 0 \} of f (which is defined only up to sets of measure zero) is a \sigma-finite set. (Because of this, we can often reduce from the non-\sigma-finite case to the \sigma-finite case in many, though not all, questions concerning L^p spaces.) \diamond

We now are able to define L^p norms and spaces in the limit p=\infty. We say that a function f: X \to {\Bbb C} is essentially bounded if there exists an M such that {|f(x)| \leq M} for almost every x, and define \|f\|_{L^\infty} to be the least M that serves as such a bound. We let L^\infty denote the space of essentially bounded functions, quotiented out by equivalence, and given the norm \| \cdot \|_{L^\infty}. It is not hard to see that this is also a normed vector space. Observe that a sequence f_n \in L^\infty converges to a limit f \in L^\infty if and only if f_n converges essentially uniformly to f, i.e. it converges uniformly to f outside of a set of measure zero. (Compare with Egorov’s theorem (Theorem 3.6 from Notes 0), which equates pointwise convergence with uniform convergence outside of a set of arbitrarily small emasure.)

Now we explain why we call this norm the L^\infty norm:

Example 1. Let f be a (generalised) step function, thus f = A 1_E for some amplitude {A > 0} and some set E; let us assume that E has positive finite measure. Then \|f\|_{L^p} = A \mu(E)^{1/p} for all 0 < p < \infty, and also \|f\|_{L^\infty} = A. Thus in this case, at least, the L^\infty norm is the limit of the L^p norms. This example illustrates also that the L^p norms behave like combinations of the “height” A of a function, and the “width” \mu(E) of such a function, though of course the concepts of height and width are not formally defined for functions that are not step functions. \diamond

Exercise 4.

  1. If f \in L^\infty \cap L^{p_0} for some 0 < p_0 < \infty, show that \|f\|_{L^p} \to \|f\|_{L^\infty} as p \to \infty. (Hint: use the monotone convergence theorem.)
  2. If f \not \in L^\infty, show that \|f\|_{L^p} \to \infty as p \to \infty. \diamond

Once one has a vector space structure and a (quasi-)norm structure, we immediately get a (quasi-)metric structure:

Exercise 5. Let (V, \| \|) be a normed vector space. Show that the function d: V \times V \to [0,+\infty) defined by d(f,g) := \|f-g\| is a metric on V which is translation-invariant (thus d(f+h,g+h)=d(f,g) for all f,g \in V) and homogeneous (thus d(cf, cg) = |c| d(f,g) for all f,g \in V and scalars c). Conversely, show that every translation-invariant homogeneous metric on V arises from precisely one norm in this manner. Establish a similar claim relating quasi-norms with quasi-metrics (which are defined as metrics, but with the triangle inequality replaced by a quasi-triangle inequality; note that the term “quasi-metric” is occasionally used to denote a slightly different concept), or between seminorms and semimetrics (which are defined as metrics, but where distinct points are allowed to have a zero separation; these are also known as pseudometrics, with “semimetric” used to denote something else). \diamond

The (quasi-)metric structure in turn generates a topological structure in the usual manner using the (quasi-)metric balls as a base for the topology. In particular, a sequence of functions f_n \in L^p converges to a limit f \in L^p if \|f_n-f\|_{L^p} \to 0 as n \to \infty. We refer to this type of convergence as convergence in L^p norm, or strong convergence in L^p (we will discuss other modes of convergence in later lectures). As is usual in (quasi-)metric spaces (or more generally for Hausdorff spaces), the limit, if it exists, is unique. (This is however not the case for topological structures induced by seminorms or semimetrics, though we can solve this problem by quotienting out the degenerate elements as discussed earlier.)

Recall that any series \sum_{n=1}^\infty a_n of scalars is convergent if it is absolutely convergent (i.e. if \sum_{n=1}^\infty |a_n| < \infty. This fact turns out to be closely related to the fact that the field of scalars {\Bbb C} is complete. This can be seen from the following result:

Exercise 6. Let (V, \| \|) be a normed vector space (and hence also a metric space and a topological space). Show that the following are equivalent:

  1. V is a complete metric space (i.e. every Cauchy sequence converges).
  2. Every sequence f_n \in V which is absolutely convergent (i.e. \sum_{n=1}^\infty \|f_n\| < \infty), is also conditionally convergent (i.e. \sum_{n=1}^N f_n converges to a limit as N \to \infty. \diamond

Remark 3. The situation is more complicated for complete quasi-normed vector spaces; not every absolutely convergent series is conditionally convergent. On the other hand, if \|f_n\| decays faster than a sufficiently large negative power of n, one recovers conditional convergence; see these old notes of mine. \diamond

Remark 4. Let X be a topological space, and let BC(X) be the space of bounded continuous functions on X; this is a vector space.  We can place the uniform norm \|f\|_u := \sup_{x \in X} |f(x)| on this space; this makes BC(X) into a normed vector space.  It is not hard to verify that this space is complete, and so every absolutely convergent series in BC(X) is conditionally convergent.  This fact is better known as the Weierstrass M-test. \diamond

A space obeying the properties in Exercise 4 (i.e. a complete normed vector space) is known as a Banach space. We will study Banach spaces in more detail later in this course. For now, we give one of the fundamental examples of Banach spaces.

Proposition 1. L^p is a Banach space for every 1 \leq p \leq \infty.

Proof. By Exercise 6, it suffices to show that any series \sum_{n=1}^\infty f_n of functions in L^p which is absolutely convergent, is also conditionally convergent. This is easy in the case p=\infty and is left as an exercise. In the case 1 \leq p < \infty, we write M := \sum_{n=1}^\infty \|f_n\|_{L^p}, which is a finite quantity by hypothesis. By the triangle inequality, we have \| \sum_{n=1}^N |f_n| \|_{L^p} \leq M for all N. By monotone convergence, we conclude \| \sum_{n=1}^\infty |f_n| \|_{L^p} \leq M. In particular, \sum_{n=1}^\infty f_n(x) is absolutely convergent for almost every x. Write the limit of this series as F(x). By dominated convergence, we see that \sum_{n=1}^N f_n(x) converges in L^p norm to F, and we are done. \Box

An important fact is that functions in L^p can be approximated by simple functions:

Proposition 2. If 0 < p < \infty, then the space of simple functions with finite measure support is a dense subspace of L^p.

(The concept of a non-trivial dense subspace is one which only comes up in infinite dimensions, and is hard to visualise directly. Very roughly speaking, the infinite number of degrees of freedom in an infinite dimensional space gives a subspace an infinite number of “opportunities” to come as close as one desires to any given point in that space, which is what allows such spaces to be dense.)

Proof. The only non-trivial thing to show is the density. An application of the monotone convergence theorem shows that the space of bounded L^p functions are dense in L^p. Another application of monotone convergence (and Exercise 3) then shows that the space bounded L^p functions of finite measure support are dense in the space of bounded L^p functions. Finally, by discretising the range of bounded L^p functions, we see that the space of simple functions with finite measure support is dense in the space of bounded L^p functions with finite support. \Box

Remark 5. Since not every function in L^p is a simple function with finite measure support, we thus see that the space of simple functions with finite measure support with the L^p norm is an example of a normed vector space which is not complete. \diamond

Exercise 7. Show that the space of simple functions (not necessarily with finite measure support) is a dense subspace of L^\infty. Is the same true if one reinstates the finite measure support restriction? \diamond

Exercise 7a. Suppose that \mu is \sigma-finite and {\mathcal X} is separable (i.e. countably generated).  Show that L^p is separable (i.e. has a countable dense subset) for all 1 \leq p < \infty.  Give a counterexample that shows that L^\infty need not be separable.  (Hint: take the integers with counting measure.) \diamond

Next, we turn to algebra properties of L^p spaces. The key fact here is

Proposition 3. (Hölder’s inequality) Let f \in L^p and g \in L^q for some 0 < p,q \leq \infty. Then fg \in L^r and \|fg\|_{L^r} \leq \|f\|_{L^p} \|g\|_{L^q}, where the exponent r is defined by the formula \frac{1}{r} = \frac{1}{p} + \frac{1}{q}.

Proof. This will be a variant of the proof of the triangle inequality in Lemma 1, again relying ultimately on convexity. The claim is easy when p=\infty or q=\infty and is left as an exercise for the reader in this case, so we assume p, q < \infty. Raising f and g to the power r using (2) we may assume r=1, which makes 1 < p, q < \infty dual exponents in the sense that \frac{1}{p}+\frac{1}{q}=1. The claim is obvious if either \|f\|_{L^p} or \|g\|_{L^q} are zero, so we may assume they are non-zero; by homogeneity we may then normalise \|f\|_{L^p}=\|g\|_{L^q} = 1. Our task is now to show that

\int_X |fg|\ d\mu \leq 1. (6)

Here, we use the convexity of the exponential function t \mapsto e^t on {}[0,+\infty), which implies the convexity of the function t \mapsto |f(x)|^{p(1-t)} |g(x)|^{qt} for t \in [0,1] for any x. In particular we have

|f(x) g(x)| \leq \frac{1}{p} |f(x)|^p + \frac{1}{q} |g(x)|^q (7)

and the claim (6) follows from the normalisations on p, q, f, g. \Box

Remark 6. For a different proof of this inequality (based on the tensor power trick), see Example 1 of this blog post of mine. \diamond

Remark 7. One can also use Hölder’s inequality to prove the triangle inequality for L^p, 1 \leq p < \infty (i.e. Minkowski’s inequality). From the complex triangle inequality |f+g| \leq |f| + |g|, it suffices to check the case when f, g are non-negative. In this case we have the identity

\|f+g\|_{L^p}^p = \| f |f+g|^{p-1} \|_{L^1} + \| g |f+g|^{p-1} \|_{L^1} (8)

while Hölder’s inequality gives \| f |f+g|^{p-1} \|_{L^1} \leq \|f\|_{L^p} \|f+g\|_{L^p}^{p-1} and \| g |f+g|^{p-1} \|_{L^1} \leq \|g\|_{L^p} \|f+g\|_{L^p}^{p-1}. The claim then follows from some algebra (and checking the degenerate cases separately, e.g. when \|f+g\|_{L^p}=0). \diamond

Remark 8. The proofs of Hölder’s inequality and Minkowski’s inequality both relied on convexity of various functions in {\Bbb C} or {}[0,+\infty). One way to emphasise this is to deduce both inequalities from Jensen’s inequality, which is an inequality which manifestly exploits this convexity. We will not take this approach here, but see for instance the book of Lieb and Loss for a discussion. \diamond

Example 2. It is instructive to test Hölder’s inequality (and also Exercises 8-12 below) in the special case when f, g are generalised step functions, say f = A 1_E and g = B 1_F with A, B non-zero. The inequality then simplifies to

\mu( E \cap F )^{1/r} \leq \mu( E )^{1/p} \mu( F )^{1/q} (8)

which can be easily deduced from the hypothesis \frac{1}{p} + \frac{1}{q} = \frac{1}{r} and the trivial inequalities \mu(E \cap F) \leq \mu(E) and \mu(E \cap F) \leq \mu(F). One then easily sees (when p,q are finite) that equality in (8) only holds if \mu(E \cap F) = \mu(E) = \mu(F), or in other words if E and F agree almost everywhere. Note the above computations also explain why the condition \frac{1}{p}+\frac{1}{q}=\frac{1}{r} is necessary. \diamond

Exercise 8. Let 0 < p,q < \infty, and let f \in L^p, g \in L^q be such that Hölder’s inequality is obeyed with equality. Show that of the functions f^p, g^q, one of them is a scalar multiple of the other (up to equivalence, of course). What happens if p or q is infinite? \diamond

An important corollary of Hölder’s inequality is the Cauchy-Schwarz inequality

|\int_X f(x) \overline{g(x)}\ d\mu| \leq \|f\|_{L^2} \|g\|_{L^2} (9)

which can of course be proven by many other means.

Exercise 9. If f \in L^p for some 0 < p \leq \infty, and is also supported on a set E of finite measure, show that f \in L^q for all 0 < q \leq p, with \|f\|_{L^q} \leq \mu(E)^{\frac{1}{q}-\frac{1}{p}} \|f\|_{L^p}. When does equality occur? \diamond

Exercise 10. If f \in L^p for some 0 < p < \infty, and every set of positive measure in X has measure at least m, show that f \in L^q for all p < q \leq \infty, with \|f\|_{L^q} \leq m^{\frac{1}{q}-\frac{1}{p}} \|f\|_{L^p}. When does equality occur? (This result is especially useful for the \ell^p spaces, in which \mu is counting measure and m can be taken to be 1.) \diamond

Exercise 11. If f \in L^{p_0} \cap L^{p_1} for some 0 < p_0 < p_1 \leq \infty, show that f \in L^p for all p_0 \leq p \leq p_1, and that \|f\|_{L^p} \leq \|f\|_{L^{p_0}}^{1-\theta} \|f\|_{L^{p_1}}^{\theta}, where 0 < \theta < 1 is such that \frac{1}{p} = \frac{1-\theta}{p_0} + \frac{\theta}{p_1}. Another way of saying this is that the function \frac{1}{p} \mapsto \log \| f\|_{L^p} is convex. When does equality occur? This convexity is a prototypical example of interpolation, about which we shall say more in a later lecture. \diamond

Exercise 12. If f \in L^{p_0} for some 0 < p_0 \leq \infty, and its support E := \{ x \in X: f(x) \neq 0 \} has finite measure, show that f \in L^p for all 0 < p < p_0, and that \|f\|_{L^p}^p \to \mu(E) as p \to 0. (Because of this, the measure of the support of f is sometimes known as the L^0 norm of f, or more precisely the L^0 norm raised to the power 0.) \diamond

– Linear functionals on L^p

Given an exponent 1 \leq p \leq \infty, define the dual exponent 1 \leq p' \leq \infty by the formula \frac{1}{p} + \frac{1}{p'} = 1 (thus p' = p/(p-1) for 1 < p < \infty, while 1 and \infty are duals of each other). From Hölder’s inequality, we see that for any g \in L^{p'}, the functional \lambda_g: L^p \to {\Bbb C} defined by

\lambda_g(f) := \int_X f \overline{g}\ d\mu (10)

is well-defined on L^p; the functional is also clearly linear. Furthermore, Hölder’s inequality also tells us that this functional is continuous.

A deep and important fact about L^p spaces is that, in most cases, the converse is true: the recipe (10) is the only way to create continuous linear functionals on L^p.

Theorem 1. Let 1 \leq p < \infty, and assume \mu is \sigma-finite. Let \lambda: L^p \to {\Bbb C} be a continuous linear functional. Then there exists a unique g \in L^{p'} such that \lambda = \lambda_g.

This result should be compared with the Radon-Nikodym theorem (Corollary 1 from Notes 1). Both theorems start with an abstract function \mu: {\mathcal X} \to {\Bbb R} or \lambda: L^p \to {\Bbb C}, and create a function out of it. Indeed, we shall see shortly that the two theorems are essentially equivalent to each other. We will develop Theorem 1 further in later lectures, once we introduce the notion of a dual space.

To prove Theorem 1, we first need a simple and useful lemma:

Lemma 1. (Continuity is equivalent to boundedness for linear operators) Let T: X \to Y be a linear transformation from one normed vector space (X, \| \|_X) to another (Y, \| \|_Y).  Then the following are equivalent:

  1. T is continuous.
  2. T is continuous at 0.
  3. There exists a constant C such that \|Tx\|_Y \leq C \|x\|_X for all x \in X.

Proof. It is clear that 1 implies 2, and that 3 implies 2.  Next, from linearity we have Tx = Tx_0 + T(x-x_0) for any x,x_0 \in X, which (together with the continuity of addition, which follows from the triangle inequality) shows that continuity of T at 0 implies continuity of T at any x_0, so that 2 implies 1.  The only remaining task is to show that 1 implies 3.  By continuity, the inverse image of the unit ball in Y must be an open neighbourhood of 0 in X, thus there exists some radius r > 0 such that \|Tx\|_Y < 1 whenever \|x\|_X < r.  The claim then follows (with C :=1/r) by homogeneity.  (Alternatively, one can deduce 3 from 2 by contradiction.  If 3 failed, then there exists a sequence x_n of non-zero elements of X such that \|Tx_n\|_Y / \|x_n\|_X goes to infinity.  By homogeneity, we can arrange matters so that \|x_n\|_X goes to zero, but \|Tx_n\|_Y stays away from zero, thus contradicting continuity at 0.) \Box

Proof of Theorem 1. The uniqueness claim is similar to the uniqueness claim in the Radon-Nikodym theorem (Exercise 2 from Notes 1) and is left as an exercise to the reader; the hard part is establishing existence.

Let us first consider the case when \mu is finite. The linear functional \lambda: L^p \to {\Bbb C} induces a functional \nu: {\mathcal X} \to {\Bbb C} on sets E by the formula

\nu(E) := \lambda(1_E). (11)

Since \lambda is linear, \nu is finitely additive (and sends the empty set to zero). Also, if E_1,E_2,\ldots are a sequence of disjoint sets, then 1_{\bigcup_{n=1}^N E_n} converges in L^p to 1_{\bigcup_{n=1}^\infty E_n} as n \to \infty (by the dominated convergence theorem and the finiteness of \mu), and thus (by continuity of \lambda and finite additivity of \nu), \nu is countably additive as well. Finally, from (11) we also see that \nu(E)=0 whenever \mu(E)=0, thus \nu is absolutely continuous with respect to \mu. Applying the Radon-Nikodym theorem (Corollary 1 from Notes 1) to both the real and imaginary components of \nu, we conclude that \nu = \mu_g for some g \in L^1; thus by (11) we have

\lambda(1_E) = \lambda_g(1_E) (12)

for all measurable E. By linearity, this implies that \lambda and \lambda_g agree on simple functions. Taking uniform limits (using Exercise 7) and using continuity (and the finite measure of \mu) we conclude that \lambda and \lambda_g agree on all bounded functions. Taking monotone limits (working on the positive and negative supports of the real and imaginary parts of g separately) we conclude that \lambda and \lambda_g agree on all functions in L^p, and in particular that \int_X f \overline{g}\ d\mu is absolutely convergent for all f \in L^p.

To finish the theorem in this case, we need to establish that g lies in L^{p'}. By taking real and imaginary parts we may assume without loss of generality that g is real; by splitting into the regions where g is positive and negative we may assume that g is non-negative.

We already know that \lambda_g = \lambda is a continuous functional from L^p to {\Bbb C}.  By Lemma 1, this implies a bound of the form |\lambda_g(f)| \leq C \|f\|_{L^p} for some C > 0.

Suppose first that p > 1. Heuristically, we would like to test this inequality with f := g^{p'-1}, since we formally have \lambda_g(f) = \|g\|_{L^{p'}}^{p'} and \|f\|_{L^p} = \|g\|_{L^{p'}}^{p'-1}. (Not coincidentally, this is also the choice that would make Hölder’s inequality an equality, see Exercise 8.) Cancelling the \|g\|_{L^{p'}} factors would then give the desired finiteness of \|g\|_{L^{p'}}.

We can’t quite make that argument work, because it is circular: it assumes \|g\|_{L^{p'}} is finite in order to show that \|g\|_{L^{p'}} is finite! But this can be easily remedied. We test the inequality with f_N := \min(g,N)^{p'-1} for some large N; this lies in L^p. We have \lambda_g(f_N) \geq \|\min(g,N)\|_{L^{p'}}^{p'} and \|f_N\|_{L^p} = \|\min(g,N)\|_{L^{p'}}^{p'-1}, and hence \| \min(g,N)\|_{L^{p'}} \leq C for all N. Letting N go to infinity and using monotone convergence, we obtain the claim.

In the p=1 case, we instead use f := 1_{g>N} as the test functions, to conclude that g is bounded almost everywhere by N; we leave the details to the reader.

This handles the case when \mu is finite. When \mu is \sigma-finite, we can write X as the union of an increasing sequence E_n of sets of finite measure. On each such set, the above arguments let us write \lambda = \lambda_{g_n} for some g_n \in L^{p'}(E_n). The uniqueness arguments tell us that the g_n are all compatible with each other, in particular if n < m, then g_n and g_m agree on E_n. Thus all the g_n are in fact restrictions of a single function g to E_n. The previous arguments also tell us that the L^{p'} norm of g_n is bounded by the same constant C uniformly in n, so by monotone convergence, g has bounded L^{p'} norm also, and we are done. \Box

Remark 9. When 1 < p < \infty, the hypothesis that \mu is \sigma-finite can be dropped, but not when p=1; see e.g. Section 6.2 of Folland for further discussion. In these lectures, though, we will be content with working in the \sigma-finite setting. On the other hand, the claim fails when p=\infty (except when X is finite); we will see this in later lectures, when we discuss the Hahn-Banach theorem. \diamond

Remark 10. We have seen how the Lebesgue-Radon-Nikodym theorem can be used to establish Theorem 1. The converse is also true: Theorem 1 can be used to deduce the Lebesgue-Radon-Nikodym theorem (a fact essentially observed by von Neumann). For simplicity, let us restrict attention to the unsigned finite case, thus \mu and m are unsigned and finite. This implies that the sum \mu+m is also unsigned and finite. We observe that the linear functional \lambda: f \mapsto \int_X f\ d\mu is continuous on L^1(\mu+m), hence by Theorem 1 there must exist a function g \in L^\infty(\mu+m) such that

\int_X f\ d\mu =\int_X f \overline{g}\ d(\mu+m) (13)

for all f \in L^1(\mu+m). It is easy to see that g must be real and non-negative, and also at most 1 almost everywhere. If E is the set where g=1, we see by setting f=1_E in (13) that E has m-measure zero, and so \mu\downharpoonright_E is singular. Outside of E, we see from (13) and some rearrangement that

\int_{X \backslash E} (1-g) f\ d\mu = \int_X f g\ dm (14)

and one then easily verifies that \mu agrees with m_{\frac{g}{1-g}} outside of E’. This gives the desired Lebesgue-Radon-Nikodym decomposition \mu = m_{\frac{g}{1-g}} + \mu\downharpoonright_E.
\diamond

Remark 11. The argument used in Remark 10 also shows that the Radon-Nikodym theorem implies the Lebesgue-Radon-Nikodym theorem. \diamond

In a later set of notes, we will give an alternate proof of Theorem 1, which relies on the geometry of L^p spaces rather than on the Radon-Nikodym theorem, and can thus be viewed as giving an independent proof of that theorem.

[Update, Jan 10: Another exercise added.]

[Update, Jan 13: Lemma 1 added.]

[Update, Jan 14: More remarks and another exercise added; note this changes exercise numbering.]

[Update, Jan 15: Exercise 0 added; other exercise numbering unchanged.]

[Update, Jan 16: Exercise 7a added; other exercise numbering unchanged.]