Now that we have reviewed the foundations of measure theory, let us now put it to work to set up the basic theory of one of the fundamental families of function spaces in analysis, namely the spaces (also known as *Lebesgue spaces*). These spaces serve as important model examples for the general theory of topological and normed vector spaces, which we will discuss a little bit in this lecture and then in much greater detail in later lectures. (See also my previous blog post on function spaces.)

Just as scalar quantities live in the space of real or complex numbers, and vector quantities live in vector spaces, functions (or other objects closely related to functions, such as measures) live in *function spaces*. Like other spaces in mathematics (e.g. vector spaces, metric spaces, topological spaces, etc.) a function space is not just mere sets of objects (in this case, the objects are functions), but they also come with various important *structures* that allow one to do some useful *operations* inside these spaces, and from one space to another. For example, function spaces tend to have several (though usually not *all*) of the following types of structures, which are usually related to each other by various compatibility conditions:

**Vector space structure.**One can often add two functions in a function space , and expect to get another function in that space ; similarly, one can multiply a function in by a scalar and get another function in . Usually, these operations obey the axioms of a vector space, though it is important to caution that the dimension of a function space is typically infinite. (In some cases, the space of scalars is a more complicated ring than the real or complex field, in which case we need the notion of a module rather than a vector space, but we will not use this more general notion in this course.) Virtually all of the function spaces we shall encounter in this course will be vector spaces. Because the field of scalars is real or complex, vector spaces also come with the notion of*convexity*, which turns out to be crucial in many aspects of analysis. As a consequence (and in marked contrast to algebra or number theory), much of the theory in real analysis does not seem to extend to other fields of scalars (in particular, real analysis fails spectacularly in the finite characteristic setting).**Algebra structure.**Sometimes (though not always), we also wish to multiply two functions , in and get another function in ; when combined with the vector space structure and assuming some compatibility conditions (e.g. the distributive law), this makes an algebra. This multiplication operation is often just pointwise multiplication, but there are other important multiplication operations on function spaces too, such as convolution. (One sometimes sees other algebraic structures than multiplication appear in function spaces, most notably derivations, but again we will not encounter those in this course. Another common algebraic operation for function spaces is*conjugation*or*adjoint*, leading to the notion of a *-algebra.)**Norm structure.**We often want to distinguish “large” functions in from “small” ones, especially in analysis, in which “small” terms in an expression are routinely discarded or deemed to be acceptable errors. One way to do this is to assign a*magnitude*or*norm*to each function that measures its size. Unlike the situation with scalars, where there is basically a single notion of magnitude, functions have a wide variety of useful notions of size, each measuring a different aspect (or combination of aspects) of the function, such as height, width, oscillation, regularity, decay, and so forth. Typically, each such norm gives rise to a separate function space (although sometimes it is useful to consider a single function space with multiple norms on it). We usually require the norm to be compatible with the vector space structure (and algebra structure, if present), for instance by demanding that the triangle inequality hold.**Metric structure.**We also want to tell whether two functions f, g in a function space V are “near together” or “far apart”. A typical way to do this is to impose a metric on the space . If both a norm and a vector space structure are available, there is an obvious way to do this: define the distance between two functions in to be . (This will be the only type of metric on function spaces encountered in this course. But there are some nonlinear function spaces of importance in nonlinear analysis (e.g. spaces of maps from one manifold to another) which have no vector space structure or norm, but still have a metric.) It is often important to know if the vector space is complete with respect to the given metric; this allows one to take limits of Cauchy sequences, and (with a norm and vector space structure) sum absolutely convergent series, as well as use some useful results from point set topology such as the Baire category theorem. All of these operations are of course vital in analysis. [Compactness would be an even better property than completeness to have, but function spaces unfortunately tend be non-compact in various rather nasty ways, although there are useful partial substitutes for compactness that are available, see e.g. this blog post of mine.]**Topological structure.**It is often important to know when a sequence (or, occasionally, nets) of functions in “converges” in some sense to a limit (which, hopefully, is still in ); there are often many distinct modes of convergence (e.g. pointwise convergence, uniform convergence, etc.) that one wishes to carefully distinguish from each other. Also, in order to apply various powerful topological theorems (or to justify various formal operations involving limits, suprema, etc.), it is important to know when certain subsets of enjoy key topological properties (most notably*compactness*and*connectedness*), and to know which operations on are continuous. For all of this, one needs a topology on . If one already has a metric, then one of course has a topology generated by the open balls of that metric; but there are many important topologies on function spaces in analysis that do not arise from metrics. We also often require the topology to be compatible with the other structures on the function space; for instance, we usually require the vector space operations of addition and scalar multiplication to be continuous. In some cases, the topology on extends to some natural superspace of more general functions that contain ; in such cases, it is often important to know whether is closed in , so that limits of sequences in stay in .**Functional structures.**Since numbers are easier to understand and deal with than functions, it is not surprising that we often study functions f in a function space V by first applying some functional to V to identify some key numerical quantity associated to f.*Norms*are of course one important example of a functional;*integration*provides another; and*evaluation*at a point x provides a third important class. (Note, though, that while evaluation is the fundamental feature of a function in set theory, it is often a quite minor operation in analysis; indeed, in many function spaces, evaluation is not even defined at all, for instance because the functions in the space are only defined almost everywhere!) An*inner product*on (see below) also provides a large family of useful functionals. It is of particular interest to study functionals that are compatible with the vector space structure (i.e. are linear) and with the topological structure (i.e. are*continuous*); this will give rise to the important notion of duality on function spaces.**Inner product structure.**One often would like to pair a function f in a function space V with another object g (which is often, though not always, another function in the same function space V) and obtain a number , that typically measures the amount of “interaction” or “correlation” between f and g. Typical examples include inner products arising from integration, such as ; integration itself can also be viewed as a pairing, . Of course, we usually require such inner products to be compatible with the other structures present on the space (e.g., to be compatible with the vector space structure, we usually require the inner product to be bilinear or sesquilinear). Inner products, when available, are incredibly useful in understanding the metric and norm geometry of a space, due to such fundamental facts as the Cauchy-Schwarz inequality and the parallelogram law. They also give rise to the important notion of*orthogonality*between functions.**Group actions.**We often expect our function spaces to enjoy various*symmetries*; we might wish to rotate, reflect, translate, modulate, or dilate our functions and expect to preserve most of the structure of the space when doing so. In modern mathematics, symmetries are usually encoded by group actions (or actions of other group-like objects, such as semigroups or groupoids; one also often upgrades groups to more structured objects such as Lie groups). As usual, we typically require the group action to preserve the other structures present on the space, e.g. one often restricts attention to group actions that are linear (to preserve the vector space structure), continuous (to preserve topological structure), unitary (to preserve inner product structure), isometric (to preserve metric structure), and so forth. Besides giving us useful symmetries to spend, the presence of such group actions allows one to apply the powerful techniques of representation theory, Fourier analysis, and ergodic theory. However, as this is a foundational real analysis class, we will not discuss these important topics much here (and in fact will not deal with group actions much at all).**Order structure.**In some cases, we want to utilise the notion of a function f being “non-negative”, or “dominating” another function g. One might also want to take the “max” or “supremum” of two or more functions in a function space V, or split a function into “positive” and “negative” components. Such order structures interact with the other structures on a space in many useful ways (e.g. via the Stone-Weierstrass theorem). Much like convexity, order structure is specific to the real line and is another reason why much of real analysis breaks down over other fields. (The complex plane is of course an extension of the real line and so is able to exploit the order structure of that line, usually by treating the real and imaginary components separately.)

There are of course many ways to combine various flavours of these structures together, and there are entire subfields of mathematics that are devoted to studying particularly common and useful categories of such combinations (e.g. topological vector spaces, normed vector spaces, Banach spaces, Banach algebras, von Neumann algebras, C^* algebras, Frechet spaces, Hilbert spaces, group algebras, etc.). The study of these sorts of spaces is known collectively as *functional analysis*. We will study some (but certainly not all) of these combinations in an abstract and general setting later in this course, but to begin with we will focus on the spaces, which are very good model examples for many of the above general classes of spaces, and also of importance in many applications of analysis (such as probability or PDE).

— spaces —

In this post, will be a fixed measure space; notions such as “measurable”, “measure”, “almost everywhere”, etc. will always be with respect to this space, unless otherwise specified. Similarly, unless otherwise specified, all subsets of X mentioned are restricted to be measurable, as are all scalar functions on X.

For sake of concreteness, we shall select the field of scalars to be the complex numbers . The theory of real Lebesgue spaces is virtually identical to that of complex Lebesgue spaces, and the former can largely be deduced from the latter as a special case.

We already have the notion of an *absolutely integrable function* on X, which is a function such that is finite. More generally, given any exponent , we can define a *-power integrable function* to be a function such that is finite. (Besides p=1, the case of most interest is the case of *square-integrable functions*, when . We will also extend this notion later to , which is also an important special case.)

**Remark 1.** One can also extend these notions to functions that take values in the extended complex plane , but one easily observes that power integrable functions must be finite almost everywhere, and so there is essentially no increase in generality afforded by extending the range in this manner.

Following the “Lebesgue philosophy” that one should ignore whatever is going on on a set of measure zero, let us declare two measurable functions to be *equivalent* if they agree almost everywhere. This is easily checked to be an equivalence relation, which does not affect the property of being -power integrable. Thus, we can define the *Lebesgue space* to be the space of -power integrable functions, quotiented out by this equivalence relation. Thus, strictly speaking, a typical element of is not actually a specific function f, but is instead an equivalence class [f], consisting of all functions equivalent to a single function f. However, we shall abuse notation and speak loosely of a function f “belonging” to , where it is understood that f is only defined up to equivalence, or more imprecisely is “defined almost everywhere”. For the purposes of integration, this equivalence is quite harmless, but this convention does mean that we can no longer evaluate a function f in at a single point x if that point x has zero measure. It takes a little bit of getting used to the idea of a function that cannot actually be evaluated at any specific point, but with some practice you will find that it will not cause any significant conceptual difficulty. [One could also take a more abstract view, dispensing with the set X altogether and defining the Lebesgue space on abstract measure spaces , but we will not do so here. Another way to think about elements of is that they are functions which are “unreliable” on an unknown set of measure zero, but remain “reliable” almost everywhere.]

**Exercise 0.** If is a measure space, and is the completion of , show that the spaces and are isomorphic using the obvious candidate for the isomorphism. Because of this, when dealing with spaces, we will usually not be too concerned with whether the underlying measure space is complete.

**Remark 2.** Depending on which of the three structures of the measure space one wishes to emphasise, the space is often abbreviated , , , or even just . Since for this discussion the measure space will be fixed, we shall usually use the abbreviation in this post. When the space X is discrete (i.e. ) and is counting measure, then is usually abbreviated or just (and the almost everywhere equivqlence relation trivialises and can thus be completely ignored).

At present, the Lebesgue spaces are just sets. We now begin to place several of the structures mentioned in the introduction to upgrade these sets to richer spaces.

We begin with vector space structure. Fix , and let be two -power integrable functions. From the crude pointwise (or more precisely, “pointwise almost everywhere”) inequality

(1)

we see that the sum of two -power integrable functions is also -power integrable. It is also easy to see that any scalar multiple of a -power integrable function is also -power integrable. These operations respect almost everywhere equivalence, and so becomes a (complex) vector space.

Next, we set up the norm structure. If , we define the * norm* of f to be the number

(2)

this is a finite non-negative number by definition of ; in particular, we have the identity

(3)

for all .

The norm has the following three basic properties:

Lemma 1.Let and .

- (Non-degeneracy) if and only if f = 0.
- (Homogeneity) for all complex numbers c.
- ((Quasi-)triangle inequality) We have for some constant C depending on p. If , then we can take C=1 (this fact is also known as Minkowski’s inequality).

**Proof. **The claims 1, 2 are obvious. (Note how important it is that we equate functions that vanish almost everywhere in order to get 1.) The quasi-triangle inequality follows from a variant of the estimates in (1) and is left as an exercise. For the triangle inequality, we have to be more efficient than the crude estimate (1). By the non-degeneracy property we may take and to be non-zero. Using the homogeneity, we can normalise to equal 1, thus (by homogeneity again) we can write and for some and with . Our task is now to show that

(4)

But observe that for , the function is convex on , and in particular that

. (5)

(If one wishes, one can use the complex triangle inequality to first reduce to the case when F, G are non-negative, in which case one only needs convexity on rather than all of .) The claim (4) then follows from (5) and the normalisations of F, G.

**Exercise 1.** Let and .

- Establish the variant of the triangle inequality.
- If furthermore f and g are non-negative (almost everywhere), establish also the reverse triangle inequality .
- Show that the best constant C in the quasi-triangle inequality is . In particular, the triangle inequality is false for .
- Now suppose instead that or . If are nonnegative and such that , show that one of the functions f, g is a non-negative scalar multiple of the other (up to equivalence, of course). What happens when p=1?

A vector space V with a function obeying the non-degeneracy, homogeneity, and (quasi-)triangle inequality is known as a (quasi-)normed vector space, and the function is then known as a *(quasi-)norm*; thus is a normed vector space for but only a quasi-normed vector space for . A function obeying the homogeneity and triangle inequality, but not necessarily the non-degeneracy property, is known as a seminorm; thus for instance the norms for would have been seminorms if we did not equate functions that agreed almost everywhere. (Conversely, given a seminormed vector space , one can convert it into a normed vector space by quotienting out the subspace ; we leave the details as an exercise for the reader.)

**Exercise 2. **Let be a function on a vector space which obeys the non-degeneracy and homogeneity properties. Show that is a norm if and only if the closed unit ball is convex; show that the same equivalence also holds for the open unit ball. This emphasises the geometric nature of the triangle inequality.

**Exercise 3. ** If for some , show that the support of f (which is defined only up to sets of measure zero) is a -finite set. (Because of this, we can often reduce from the non--finite case to the -finite case in many, though not all, questions concerning spaces.)

We now are able to define norms and spaces in the limit . We say that a function is *essentially bounded* if there exists an M such that for almost every x, and define to be the least M that serves as such a bound. We let denote the space of essentially bounded functions, quotiented out by equivalence, and given the norm . It is not hard to see that this is also a normed vector space. Observe that a sequence converges to a limit if and only if converges *essentially uniformly* to f, i.e. it converges uniformly to f outside of a set of measure zero. (Compare with Egorov’s theorem (Theorem 3.6 from Notes 0), which equates pointwise convergence with uniform convergence outside of a set of arbitrarily small emasure.)

Now we explain why we call this norm the norm:

**Example 1.** Let f be a (generalised) *step function*, thus for some amplitude and some set E; let us assume that E has positive finite measure. Then for all , and also . Thus in this case, at least, the norm is the limit of the norms. This example illustrates also that the norms behave like combinations of the “height” A of a function, and the “width” of such a function, though of course the concepts of height and width are not formally defined for functions that are not step functions.

**Exercise 4.**

- If for some , show that as . (Hint: use the monotone convergence theorem.)
- If , show that as .

Once one has a vector space structure and a (quasi-)norm structure, we immediately get a (quasi-)metric structure:

**Exercise 5.** Let be a normed vector space. Show that the function defined by is a metric on V which is *translation-invariant* (thus for all ) and *homogeneous* (thus for all and scalars c). Conversely, show that every translation-invariant homogeneous metric on V arises from precisely one norm in this manner. Establish a similar claim relating quasi-norms with *quasi-metrics* (which are defined as metrics, but with the triangle inequality replaced by a quasi-triangle inequality; note that the term “quasi-metric” is occasionally used to denote a slightly different concept), or between seminorms and *semimetrics* (which are defined as metrics, but where distinct points are allowed to have a zero separation; these are also known as pseudometrics, with “semimetric” used to denote something else).

The (quasi-)metric structure in turn generates a topological structure in the usual manner using the (quasi-)metric balls as a base for the topology. In particular, a sequence of functions converges to a limit if as . We refer to this type of convergence as *convergence in norm*, or *strong convergence in * (we will discuss other modes of convergence in later lectures). As is usual in (quasi-)metric spaces (or more generally for Hausdorff spaces), the limit, if it exists, is unique. (This is however not the case for topological structures induced by seminorms or semimetrics, though we can solve this problem by quotienting out the degenerate elements as discussed earlier.)

Recall that any series of scalars is convergent if it is absolutely convergent (i.e. if . This fact turns out to be closely related to the fact that the field of scalars is complete. This can be seen from the following result:

**Exercise 6.** Let be a normed vector space (and hence also a metric space and a topological space). Show that the following are equivalent:

- V is a complete metric space (i.e. every Cauchy sequence converges).
- Every sequence which is absolutely convergent (i.e. ), is also conditionally convergent (i.e. converges to a limit as .

**Remark 3.** The situation is more complicated for complete quasi-normed vector spaces; not every absolutely convergent series is conditionally convergent. On the other hand, if decays faster than a sufficiently large negative power of n, one recovers conditional convergence; see these old notes of mine.

**Remark 4.** Let X be a topological space, and let BC(X) be the space of bounded continuous functions on X; this is a vector space. We can place the uniform norm on this space; this makes BC(X) into a normed vector space. It is not hard to verify that this space is complete, and so every absolutely convergent series in BC(X) is conditionally convergent. This fact is better known as the Weierstrass M-test.

A space obeying the properties in Exercise 4 (i.e. a complete normed vector space) is known as a Banach space. We will study Banach spaces in more detail later in this course. For now, we give one of the fundamental examples of Banach spaces.

Proposition 1.is a Banach space for every .

**Proof.** By Exercise 6, it suffices to show that any series of functions in which is absolutely convergent, is also conditionally convergent. This is easy in the case and is left as an exercise. In the case , we write , which is a finite quantity by hypothesis. By the triangle inequality, we have for all N. By monotone convergence, we conclude . In particular, is absolutely convergent for almost every x. Write the limit of this series as . By dominated convergence, we see that converges in norm to F, and we are done.

An important fact is that functions in can be approximated by simple functions:

Proposition 2.If , then the space of simple functions with finite measure support is a dense subspace of .

(The concept of a non-trivial dense subspace is one which only comes up in infinite dimensions, and is hard to visualise directly. Very roughly speaking, the infinite number of degrees of freedom in an infinite dimensional space gives a subspace an infinite number of “opportunities” to come as close as one desires to any given point in that space, which is what allows such spaces to be dense.)

**Proof. ** The only non-trivial thing to show is the density. An application of the monotone convergence theorem shows that the space of bounded functions are dense in . Another application of monotone convergence (and Exercise 3) then shows that the space bounded functions of finite measure support are dense in the space of bounded functions. Finally, by discretising the range of bounded functions, we see that the space of simple functions with finite measure support is dense in the space of bounded functions with finite support.

**Remark 5.** Since not every function in is a simple function with finite measure support, we thus see that the space of simple functions with finite measure support with the norm is an example of a normed vector space which is not complete.

**Exercise 7.** Show that the space of simple functions (not necessarily with finite measure support) is a dense subspace of . Is the same true if one reinstates the finite measure support restriction?

**Exercise 7a.** Suppose that is -finite and is separable (i.e. countably generated). Show that is separable (i.e. has a countable dense subset) for all . Give a counterexample that shows that need not be separable. (Hint: take the integers with counting measure.)

Next, we turn to algebra properties of spaces. The key fact here is

Proposition 3.(Hölder’s inequality) Let and for some . Then and , where the exponent r is defined by the formula .

**Proof.** This will be a variant of the proof of the triangle inequality in Lemma 1, again relying ultimately on convexity. The claim is easy when or and is left as an exercise for the reader in this case, so we assume . Raising f and g to the power r using (2) we may assume r=1, which makes *dual exponents* in the sense that . The claim is obvious if either or are zero, so we may assume they are non-zero; by homogeneity we may then normalise . Our task is now to show that

. (6)

Here, we use the convexity of the exponential function on , which implies the convexity of the function for for any x. In particular we have

(7)

and the claim (6) follows from the normalisations on p, q, f, g.

**Remark 6.** For a different proof of this inequality (based on the tensor power trick), see Example 1 of this blog post of mine.

**Remark 7.** One can also use Hölder’s inequality to prove the triangle inequality for , (i.e. Minkowski’s inequality). From the complex triangle inequality , it suffices to check the case when f, g are non-negative. In this case we have the identity

(8)

while Hölder’s inequality gives and . The claim then follows from some algebra (and checking the degenerate cases separately, e.g. when ).

**Remark 8.** The proofs of Hölder’s inequality and Minkowski’s inequality both relied on convexity of various functions in or . One way to emphasise this is to deduce both inequalities from Jensen’s inequality, which is an inequality which manifestly exploits this convexity. We will not take this approach here, but see for instance the book of Lieb and Loss for a discussion.

**Example 2.** It is instructive to test Hölder’s inequality (and also Exercises 8-12 below) in the special case when f, g are generalised step functions, say and with A, B non-zero. The inequality then simplifies to

(8)

which can be easily deduced from the hypothesis and the trivial inequalities and . One then easily sees (when p,q are finite) that equality in (8) only holds if , or in other words if E and F agree almost everywhere. Note the above computations also explain why the condition is necessary.

**Exercise 8.** Let , and let be such that Hölder’s inequality is obeyed with equality. Show that of the functions , one of them is a scalar multiple of the other (up to equivalence, of course). What happens if p or q is infinite?

An important corollary of Hölder’s inequality is the Cauchy-Schwarz inequality

(9)

which can of course be proven by many other means.

**Exercise 9.** If for some , and is also supported on a set E of finite measure, show that for all , with . When does equality occur?

**Exercise 10.** If for some , and every set of positive measure in X has measure at least m, show that for all , with . When does equality occur? (This result is especially useful for the spaces, in which is counting measure and m can be taken to be 1.)

**Exercise 11.** If for some , show that for all , and that , where is such that . Another way of saying this is that the function is convex. When does equality occur? This convexity is a prototypical example of interpolation, about which we shall say more in a later lecture.

**Exercise 12.** If for some , and its support has finite measure, show that for all , and that as . (Because of this, the measure of the support of f is sometimes known as the * norm* of f, or more precisely the norm raised to the power 0.)

— Linear functionals on —

Given an exponent , define the dual exponent by the formula (thus for , while 1 and are duals of each other). From Hölder’s inequality, we see that for any , the functional defined by

(10)

is well-defined on ; the functional is also clearly linear. Furthermore, Hölder’s inequality also tells us that this functional is continuous.

A deep and important fact about spaces is that, in most cases, the converse is true: the recipe (10) is the *only* way to create continuous linear functionals on .

Theorem 1.Let , and assume is -finite. Let be a continuous linear functional. Then there exists a unique such that .

This result should be compared with the Radon-Nikodym theorem (Corollary 1 from Notes 1). Both theorems start with an abstract function or , and create a function out of it. Indeed, we shall see shortly that the two theorems are essentially equivalent to each other. We will develop Theorem 1 further in later lectures, once we introduce the notion of a dual space.

To prove Theorem 1, we first need a simple and useful lemma:

Lemma 1.(Continuity is equivalent to boundedness for linear operators) Let be a linear transformation from one normed vector space to another . Then the following are equivalent:

- T is continuous.
- T is continuous at 0.
- There exists a constant C such that for all .

**Proof.** It is clear that 1 implies 2, and that 3 implies 2. Next, from linearity we have for any , which (together with the continuity of addition, which follows from the triangle inequality) shows that continuity of T at 0 implies continuity of T at any , so that 2 implies 1. The only remaining task is to show that 1 implies 3. By continuity, the inverse image of the unit ball in Y must be an open neighbourhood of 0 in X, thus there exists some radius such that whenever . The claim then follows (with ) by homogeneity. (Alternatively, one can deduce 3 from 2 by contradiction. If 3 failed, then there exists a sequence of non-zero elements of X such that goes to infinity. By homogeneity, we can arrange matters so that goes to zero, but stays away from zero, thus contradicting continuity at 0.)

**Proof of Theorem 1.** The uniqueness claim is similar to the uniqueness claim in the Radon-Nikodym theorem (Exercise 2 from Notes 1) and is left as an exercise to the reader; the hard part is establishing existence.

Let us first consider the case when is finite. The linear functional induces a functional on sets E by the formula

. (11)

Since is linear, is finitely additive (and sends the empty set to zero). Also, if are a sequence of disjoint sets, then converges in to as (by the dominated convergence theorem and the finiteness of ), and thus (by continuity of and finite additivity of ), is countably additive as well. Finally, from (11) we also see that whenever , thus is absolutely continuous with respect to . Applying the Radon-Nikodym theorem (Corollary 1 from Notes 1) to both the real and imaginary components of , we conclude that for some ; thus by (11) we have

(12)

for all measurable E. By linearity, this implies that and agree on simple functions. Taking uniform limits (using Exercise 7) and using continuity (and the finite measure of ) we conclude that and agree on all bounded functions. Taking monotone limits (working on the positive and negative supports of the real and imaginary parts of g separately) we conclude that and agree on all functions in , and in particular that is absolutely convergent for all .

To finish the theorem in this case, we need to establish that g lies in . By taking real and imaginary parts we may assume without loss of generality that g is real; by splitting into the regions where g is positive and negative we may assume that g is non-negative.

We already know that is a continuous functional from to . By Lemma 1, this implies a bound of the form for some .

Suppose first that . Heuristically, we would like to test this inequality with , since we formally have and . (Not coincidentally, this is also the choice that would make Hölder’s inequality an equality, see Exercise 8.) Cancelling the factors would then give the desired finiteness of .

We can’t quite make that argument work, because it is circular: it assumes is finite in order to show that is finite! But this can be easily remedied. We test the inequality with for some large N; this lies in . We have and , and hence for all N. Letting N go to infinity and using monotone convergence, we obtain the claim.

In the p=1 case, we instead use as the test functions, to conclude that g is bounded almost everywhere by N; we leave the details to the reader.

This handles the case when is finite. When is -finite, we can write X as the union of an increasing sequence of sets of finite measure. On each such set, the above arguments let us write for some . The uniqueness arguments tell us that the are all compatible with each other, in particular if , then and agree on . Thus all the are in fact restrictions of a single function g to . The previous arguments also tell us that the norm of is bounded by the same constant C uniformly in n, so by monotone convergence, g has bounded norm also, and we are done.

**Remark 9.** When , the hypothesis that is -finite can be dropped, but not when ; see e.g. Section 6.2 of Folland for further discussion. In these lectures, though, we will be content with working in the -finite setting. On the other hand, the claim fails when (except when X is finite); we will see this in later lectures, when we discuss the Hahn-Banach theorem.

**Remark 10.** We have seen how the Lebesgue-Radon-Nikodym theorem can be used to establish Theorem 1. The converse is also true: Theorem 1 can be used to deduce the Lebesgue-Radon-Nikodym theorem (a fact essentially observed by von Neumann). For simplicity, let us restrict attention to the unsigned finite case, thus and are unsigned and finite. This implies that the sum is also unsigned and finite. We observe that the linear functional is continuous on , hence by Theorem 1 there must exist a function such that

(13)

for all . It is easy to see that g must be real and non-negative, and also at most 1 almost everywhere. If E is the set where g=1, we see by setting in (13) that E has m-measure zero, and so is singular. Outside of E, we see from (13) and some rearrangement that

(14)

and one then easily verifies that agrees with outside of E’. This gives the desired Lebesgue-Radon-Nikodym decomposition .

**Remark 11.** The argument used in Remark 10 also shows that the Radon-Nikodym theorem implies the Lebesgue-Radon-Nikodym theorem.

In a later set of notes, we will give an alternate proof of Theorem 1, which relies on the geometry of spaces rather than on the Radon-Nikodym theorem, and can thus be viewed as giving an independent proof of that theorem.

[*Update*, Jan 10: Another exercise added.]

[*Update*, Jan 13: Lemma 1 added.]

[*Update*, Jan 14: More remarks and another exercise added; note this changes exercise numbering.]

[*Update*, Jan 15: Exercise 0 added; other exercise numbering unchanged.]

[*Update*, Jan 16: Exercise 7a added; other exercise numbering unchanged.]

## 51 comments

Comments feed for this article

10 January, 2009 at 1:26 am

liuxiaochuanDear Professor Tao:

This post doesn’t appear in the “254B, Real Analysis” page.

[Corrected, thanks – T.]13 January, 2009 at 1:51 am

liuxiaochuanDear Professor Tao:

I am considering (4) of exercise 1. I think the scalar between f and g should be real, am I right？

[Yes – and it should be non-negative, too. Thanks – T.]13 January, 2009 at 4:16 am

Mustafa SaidDear Professor Tao:

In Exercise #3 I think you need where .

13 January, 2009 at 5:06 am

Matthew FolzIt is not necessary. We can get one inequality by using Holder’s inequality to control the () norm of f by the and norms, and then letting p go to infinity (perhaps you were trying to use the estimate in Exercise 8 here instead?). For the reverse inequality, if on a set of measure , then Chebyshev’s inequality shows that the norm of f is at least . Letting p go to infinity gives the other inequality we need.

13 January, 2009 at 5:10 am

Matthew Folz(strictly speaking, the aforementioned inequalities are not quite ‘reverses’, one of them has a ‘liminf’ and the other has a ‘limsup’, but since the liminf of a sequence is always smaller than the limsup, things work out fine)

14 January, 2009 at 8:24 am

liuxiaochuanDear Professor Tao:

Here is a correction ( if I am right):

In the fourth paragraph after (12), I think f should be . Then and should be and (with p modified to p’)

The same problem happens in the fifth paragraph.

14 January, 2009 at 9:10 am

Terence TaoThanks for the correction!

17 January, 2009 at 9:46 am

254A, notes 5: Hilbert spaces « What’s new[…] 6. From Proposition 1 from Notes 3, (real or complex) is a Hilbert space for any measure space . In particular, and are Hilbert […]

26 January, 2009 at 3:41 pm

245B, Notes 6: Duality and the Hahn-Banach theorem « What’s new[…] of a continuous linear transformation between two normed vector spaces X, Y. By Lemma 1 from Notes 3, any such linear transformation is bounded, in the sense that there exists a constant C such that […]

1 February, 2009 at 11:15 pm

245B, Notes 9: The Baire category theorem and its Banach space consequences « What’s new[…] and quantitative properties of linear transformations between Banach spaces. (Lemma 1 of Notes 3 already gives a prototypical such equivalence between a qualitative property (continuity) and a […]

16 February, 2009 at 7:28 am

实分析0-10 « Liu Xiaochuan’s Weblog[…] 第三节的内容是大家都很熟悉的L^P空间。开始两页作者站在一个很高的观点上谈了有关一般的数学对象的研究方法的内容，我很喜欢。但是初学者还是略去先不要看了。L^P空间中避免不了不等式。除了复习了几个以前见过的之外，我在学习中注意了不等式取等号时刻的意义。另外，伪-范数（pseudometrics）以及近似范数（quasi-norms）的概念也挺有意思的。 […]

14 March, 2009 at 3:03 pm

ERICDear Prof. Tao,

is the following right? ” f is not in means that for every , there exists a set E with positive measure s.t. $f(x) $ is greater than M for all $x \in E.$

14 March, 2009 at 4:10 pm

Terence TaoDear Eric,

One needs to replace f(x) by |f(x)| (or assume that f is non-negative), but other than that, your statement is correct.

30 March, 2009 at 8:57 am

245C, Notes 1: Interpolation of L^p spaces « What’s new[…] real or complex spaces; for sake of concretness we work with complex spaces. Then for , recall (see 245B Notes 3) that is the space of all functions whose […]

6 April, 2009 at 2:58 pm

254C, Notes 2: The Fourier transform « What’s new[…] with . (Unfortunately, is not always -finite, and so the standard duality theorem from Notes 3 of 245B does not directly apply. However, one can get around this using Exercise […]

17 September, 2009 at 10:56 pm

Paul JHello,

I have got some problems with proving quasi-triangle inequality from Lemma 1. Could anyone give me same hint?

20 January, 2010 at 8:05 am

Seminar „Funktionalanalysis“ « UGroh's Weblog[…] Räume (T. Tao, Spaces, Blognotes zur Vorlesung 245B) […]

8 June, 2010 at 9:19 am

Kestutis CesnaviciusA couple of typos:

1. Before “In the case…” paragraph in the proof of dual of theorem, one should have instead of .

2. In Remark 10 between the two displays should be . Also, after the last display in this remark should be .

[Corrected, thanks – T.]19 September, 2010 at 7:21 pm

245A, Notes 2: The Lebesgue integral « What’s new[…] is complete in various ways; we will formalise this properly only in the next quarter when we study spaces, but the convergence theorems mentioned above already hint at this completeness. A related fact, […]

2 October, 2010 at 3:01 pm

245A, Notes 4: Modes of convergence « What’s new[…] whenever and . This is a sequence of indicator functions of intervals of decreasing length, marching across the unit interval over and over again. Then converges to zero in measure and in norm, but not pointwise almost everywhere (and hence also not pointwise, not almost uniformly, nor in norm, nor uniformly). Remark 2 The norm of a measurable function is defined to the infimum of all the quantities that are essential upper bounds for in the sense that for almost every . Then converges to in norm if and only if as . The and norms are part of the larger family of norms, which we will study in more detail in 245B. […]

12 November, 2010 at 6:28 pm

quantum probabilityOh … so THIS is why people are interested in Lp norms.

Is there another post that contrasts Lpspaces with other possible metrizations?

16 December, 2010 at 4:28 am

245A, Notes 2: The Lebesgue integral « mathTHÍCHinTOÁNmyHỌCbrain[…] is complete in various ways; we will formalise this properly only in the next quarter when we study spaces, but the convergence theorems mentioned above already hint at this completeness. A related fact, […]

21 December, 2010 at 7:47 pm

mcknight0219Hi, Prof Tao

In the last two line of proof for Proposition 2 there is a typo, ‘finit emeasure’ should be ‘finite measure’.

Qiang

[Corrected, thanks – T.]19 January, 2011 at 4:35 pm

Lecture Notes on Topology: 1 « Mcknight0219's Blog[…] vector space, space,…). A very good and abstract illustration of this conception can be found here. In the following, I will stick to contents of lecture notes. Definition 1.2: The discrete topology […]

12 March, 2011 at 6:53 pm

AnonymousWith Holder’s inequality, one can estimate any integral of the form with assumption that and . However, if one likes bound it below, it seems to be unhelpful. For instance, if one wants to find , such that for any , , Holder may not help, even just for finding a smaller group of . For a more concrete example, find all such that where . Maybe another kind of estimation is needed.

16 March, 2011 at 12:07 pm

AnonymousIn the case of , one can talk about the range and the kernel of the function . However, if , is a equivalent class instead of a “function”. Is it still possible to define the counterpart of the concepts in , say, “range” and “kernel”?

16 March, 2011 at 12:28 pm

Terence Taohttp://en.wikipedia.org/wiki/Essential_range

http://en.wikipedia.org/wiki/Support_(mathematics)

2 November, 2011 at 7:49 am

JackI am always not confident in using the space. Since its element is not a function at all but a “equivalent class”. Every time I read something like ““, I convert it into . This is really strange. It seems that what we actually care about is the “property” that . The "equivalent class" is only used for forming a Banach space. Am I right?

2 November, 2011 at 8:08 am

Terence TaoWell, yes, but most of the point of introducing L^p spaces in the first place is in order to exploit the properties of a Banach space. For instance, if one has , one would like to conclude that (because this is what normally happens in a Banach space), but because of the equivalence class in the way, one can only conclude that is equal to

almost everywhere.As mentioned in the lecture notes, L^p spaces adhere to the “Lebesgue philosophy” of analysis, in which one considers sets of measure zero to be negligible, and in particular allows functions to be uncontrolled on such sets; this is in order to take full advantage of the powerful tools of measure theory, integration theory, and function space theory. As such, analysis using Lebesgue methods (such as L^p spaces, the Lebesgue integral, etc.) tends to only give almost everywhere control of one’s functions, rather than everywhere control. If one has sufficient regularity (e.g. continuity is usually enough), one can upgrade almost everywhere control to everywhere control, but it is important to keep in mind (particularly in subjects such as ergodic theory) that this upgrade is not automatic in the absence of such regularity.

2 November, 2011 at 11:10 am

JackOk. can be read as either or , where is the equivalent class. I am always wondering if it is necessarily to do so.

What’s more, I don’t know how to completely accept the philosophy of Lebesgue. Since this “almost everywhere” issue, I cannot specify the value of from a equivalent class “anywhere”! In this sense, one cannot control the function at all, instead of “almost everywhere”.

I don’t quite understand the word “control” you used in the answer, though I have seen it in lots of places. It seems to be a common jargon in mathematics. Does mean the integration of the function? But the “everywhere control” means, I think, specifying the value of the function.

2 November, 2011 at 1:40 pm

Terence TaoThe Lebesgue philosophy is analogous to the “noise-tolerant” philosophy in modern signal progressing. If one is receiving a signal (e.g. a television signal) from a noisy source (e.g. a television station in the presence of electrical interference), then any individual component of that signal (e.g. a pixel of the television image) may be corrupted. But as long as the total number of corrupted data points is negligible, one can still get a good enough idea of the image to do things like distinguish foreground from background, compute the area of an object, or the mean intensity, etc. This is similar to how one can measure a set or integrate a function even in the presence of a measure zero “noise” which renders any specific point of the set or function value “unreliable”.

“Control” is a loose term which can mean exact specification of a function and its values, but can also mean estimation of the values to within some error tolerance, or some convergence rate of those values to some limit. Basically, control is the ability to acquire useful information on an object from the given hypotheses.

22 November, 2011 at 6:31 pm

AnonymousAs I understand from your comment, if “control” means exact specification and its values, then in the sense of L^p space, one cannot control the function “anywhere”.

4 December, 2011 at 5:28 am

How to understand "It takes a little bit of getting used to the idea…"? | web technical support[…] The following sentence is from a mathematical lecture note here: […]

17 December, 2011 at 4:01 am

How to understand “It takes a little bit of getting used to the idea…”? | Q&A System[…] The following sentence is from a mathematical lecture note here: […]

29 January, 2012 at 6:10 am

nncan anyone help with second part of exercise 7.a I want to guess what the completion would be in the case of simple functions with finite support measure.

29 January, 2012 at 6:16 am

nnI meant exercise 7. sorry

8 June, 2012 at 8:09 am

[T]he point of introducing L^p spaces in the first... • see things differently[…] exists + addition exists + everything’s included = it’s a Banach space) (Source: terrytao.wordpress.com) View the discussion […]

30 April, 2014 at 5:05 am

J.W.Regarding Exercise 1, part (4)

I feel in the case, we need the additional hypothesis that are non-negative a.e.

Consider the case when . Then, if I'm not doing something very stupid, provides a counterexample.

[Corrected, thanks – T.]15 October, 2014 at 1:42 pm

FilantoDear Prof. Tao,

Is there some nice way of what $L^p$ norms really measure, i.e. what property of a function reflects in big $L^{100}$ norm? For the sake of simplicity let’s just assume we are considering domain $[0,1]$ with Lebesgue measure. It seems from the definition that higher the $p$ the more emphasis should be on big values/spikes. Is this the right way to think about it? Could you give me some reference where I can find some discussion on this sort of questions (on heuristic/philosophical level)?

15 October, 2014 at 2:50 pm

Terence TaoThe norm of a function behaves like the “amplitude” of the function times the root of the “width” of the function; see Example 1 in the post. In particular, as you say, a big norm can be easily caused by a narrow spike, but not easily caused by a broad flat bump, whereas for low (quasi-)norms like , the situation is reversed.

2 February, 2016 at 11:46 am

AnonymousIf one considers the vector-valued space, say where is a Banach space and , does one still have the similar results as in this note? For example, can the dual of be identified as ?

How much in general would fail?

2 February, 2016 at 1:28 pm

Terence TaoI’ll restrict attention to separable spaces in order to not have to deal with subtleties regarding the definition of vector-valued measurability. The dual of always contains (after some canonical identifications), but can be larger if is not reflexive. Things can get rather subtle in the non-reflexive case, the main issue being whether a version of the Radon-Nikodym theorem holds for X-valued measures. One can search the literature on the “Radon-Nikodym property” (and also on duals of Bochner spaces) for more details.

22 February, 2016 at 12:18 am

AnonymousDear Prof. Tao,

In (3), $f^r$ should be replaced by $|f|^r$.

[Corrected, thanks – T.]11 May, 2016 at 5:18 pm

Anonymous2Do you have good references in mind for discussion of the space where is some bounded open subset in ?

20 May, 2016 at 4:01 pm

AnonymousIn Exercise 0 what kind of isomorphism are you referring to? Do you mean isometry of metric spaces?

20 May, 2016 at 4:33 pm

Terence TaoIsomorphism of normed vector spaces. Thus, the isomorphism must be invertible, linear and norm-preserving.

22 May, 2016 at 6:51 pm

AnonymousAnother proof for Proposition 1 when is using the theorem that is a Banach space for any normed vector space . Are these two proofs essentially connected?

23 May, 2016 at 12:47 pm

AnonymousIn Proposition 1, when . Let

How can one show that

where ?

24 May, 2016 at 4:46 pm

AnonymousWould you elaborate in the proof of Proposition 3, why the convexity of the exponential function on implies the convexity of the function for for any ?

24 May, 2016 at 5:21 pm

AnonymousConvexity means

where denotes the map . Would you also elaborate why this implies (7)?

24 May, 2016 at 5:35 pm

AnonymousI think one does not need the convexity of the map

By convexity of the exponential function,

for all . Now we can choose to get (7).