The notion of what it means for a subset E of a space X to be “small” varies from context to context.  For instance, in measure theory, when X = (X, {\mathcal X}, \mu) is a measure space, one useful notion of a “small” set is that of a null set: a set E of measure zero (or at least contained in a set of measure zero).  By countable additivity, countable unions of null sets are null.  Taking contrapositives, we obtain

Lemma 1. (Pigeonhole principle for measure spaces) Let E_1, E_2, \ldots be an at most countable sequence of measurable subsets of a measure space X.  If \bigcup_n E_n has positive measure, then at least one of the E_n has positive measure.

Now suppose that X was a Euclidean space {\Bbb R}^d with Lebesgue measure m.  The Lebesgue differentiation theorem easily implies that having positive measure is equivalent to being “dense” in certain balls:

Proposition 1. Let E be a measurable subset of {\Bbb R}^d.  Then the following are equivalent:

  1. E has positive measure.
  2. For any \varepsilon > 0, there exists a ball B such that m( E \cap B ) \geq (1-\varepsilon) m(B).

Thus one can think of a null set as a set which is “nowhere dense” in some measure-theoretic sense.

It turns out that there are analogues of these results when the measure space X = (X, {\mathcal X}, \mu)  is replaced instead by a complete metric space X = (X,d).  Here, the appropriate notion of a “small” set is not a null set, but rather that of a nowhere dense set: a set E which is not dense in any ball, or equivalently a set whose closure has empty interior.  (A good example of a nowhere dense set would be a proper subspace, or smooth submanifold, of {\Bbb R}^d, or a Cantor set; on the other hand, the rationals are a dense subset of {\Bbb R} and thus clearly not nowhere dense.)   We then have the following important result:

Theorem 1. (Baire category theorem). Let E_1, E_2, \ldots be an at most countable sequence of subsets of a complete metric space X.  If \bigcup_n E_n contains a ball B, then at least one of the E_n is dense in a sub-ball B’ of B (and in particular is not nowhere dense).  To put it in the contrapositive: the countable union of nowhere dense sets cannot contain a ball.

Exercise 1. Show that the Baire category theorem is equivalent to the claim that in a complete metric space, the countable intersection of open dense sets remain dense.  \diamond

Exercise 2. Using the Baire category theorem, show that any non-empty complete metric space without isolated points is uncountable.  (In particular, this shows that Baire category theorem can fail for incomplete metric spaces such as the rationals {\Bbb Q}.)  \diamond

To quickly illustrate an application of the Baire category theorem, observe that it implies that one cannot cover a finite-dimensional real or complex vector space {\Bbb R}^n, {\Bbb C}^n by a countable number of proper subspaces.  One can of course also establish this fact by using Lebesgue measure on this space.  However, the advantage of the Baire category approach is that it also works well in infinite dimensional complete normed vector spaces, i.e. Banach spaces, whereas the measure-theoretic approach runs into significant difficulties in infinite dimensions.  This leads to three fundamental equivalences between the qualitative theory of continuous linear operators on Banach spaces (e.g. finiteness, surjectivity, etc.) to the quantitative theory (i.e. estimates):

  1. The uniform boundedness principle, that equates the qualitative boundedness (or convergence) of a family of continuous operators with their quantitative boundedness.
  2. The open mapping theorem, that equates the qualitative solvability of a linear problem Lu = f with the quantitative solvability.
  3. The closed graph theorem, that equates the qualitative regularity of a (weakly continuous) operator T with the quantitative regularity of that operator.

Strictly speaking, these theorems are not used much directly in practice, because one usually works in the reverse direction (i.e. first proving quantitative bounds, and then deriving qualitative corollaries); but the above three theorems help explain why we usually approach qualitative problems in functional analysis via their quantitative counterparts.

– Proof of Baire category theorem –

Assume that the Baire category theorem failed; then it would be possible to cover a ball B(x_0,r_0) in a complete metric space by a countable family E_1, E_2, E_3, \ldots of nowhere dense sets.

We now invoke the following easy observation: if E is nowhere dense, then every ball B contains a subball B’ which is disjoint from E.  Indeed, this follows immediately from the definition of a nowhere dense set.

Invoking this observation, we can find a ball B(x_1,r_1) in B(x_0,r_0/10) (say) which is disjoint from E_1; we may also assume that r_1 \leq r_0/10 by shrinking r_1 as necessary.  Then, inside B(x_1,r_1/10), we can find a ball B(x_2,r_2) which is also disjoint from E_2, with r_2 \leq r_1/10.  Continuing this process, we end up with a nested sequence of balls B(x_n,r_n), each of which are disjoint from E_1,\ldots,E_n, and such that B(x_n,r_n) \subset B(x_{n-1},r_{n-1}/10) and r_n \leq r_{n-1}/10 for all n=1,2,\ldots.

From the triangle inequality we have d(x_n,x_{n-1}) \leq 2 r_{n-1} / 10 \leq 2 \times 10^{-n} r_0, and so the sequence x_n is a Cauchy sequence.  As X is complete, x_n converges to a limit x.  Summing the geometric series, one verifies that x \in B(x_{n-1},r_{n-1}) for all n=1,2,\ldots, and in particular x is an element of B which avoids all of E_1, E_2, E_3, \ldots, a contradiction.  \Box

We can illustrate the analogy between the Baire category theorem and the measure-theoretic analogs by introducing some further definitions.  Call a set E meager or of the first category if it can be expressed (or covered) by a countable union of nowhere dense sets, and of the second category if it is not meager.  Thus, the Baire category theorem shows that any subset of a complete metric space with non-empty interior is of the second category, which may help explain the name for the property.  Call a set co-meager or residual if its complement is meager, and call a set Baire or almost open if it differs from an open set by a meager set (note that a Baire set is unrelated to the Baire \sigma-algebra).  Then we have the following analogy between complete metric space topology, and measure theory:

Complete non-empty metric space X Measure space X of positive measure
first category (meager) zero measure (null )
second category positive measure
residual (co-meager) full measure (co-null)
Baire measurable

Nowhere dense sets are meager, and meager sets have empty interior. Contrapositively, sets with dense interior
are residual, and residual sets are somewhere dense.  Taking complements instead of contrapositives, we see that open dense sets are co-meager,and co-meager sets are dense.

While there are certainly many analogies between meager sets and null sets (for instance, both classes are closed under countable unions, or under intersections with arbitrary sets), the two concepts can differ in practice.  For instance, in the real line {\Bbb R} with the standard metric and measure space structures, the set

\bigcup_{n=1}^\infty (q_n - 2^{-n}, q_n + 2^{-n}), (1)

where q_1, q_2, \ldots is an enumeration of the rationals, is open and dense, but has Lebesgue measure at most 2; thus its complement has infinite measure in {\Bbb R} but is nowhere dense (hence meager).  As a variant of this, the set

\bigcap_{m=1}^\infty \bigcup_{n=1}^\infty (q_n - 2^{-n}/m, q_n + 2^{-n}/m), (2)

is a null set, but is the intersection of countably many open dense sets and is thus co-meager.

Exercise 3. A real number x is Diophantine if for every \varepsilon > 0 there exists c_\varepsilon > 0 such that |x - \frac{a}{q}| \geq \frac{c_\varepsilon}{|q|^{2+\varepsilon}} for every rational number \frac{a}{q}.  Show that  the set of Diophantine real numbers has full measure but is meager.  \diamond

Remark 1. If one assumes some additional axioms of set theory (e.g. the continuum hypothesis), it is possible to show that the collection of meager subsets of {\Bbb R} and the collection of null subsets of {\Bbb R} (viewed as \sigma-ideals of the collection of all subsets of {\Bbb R}) are isomorphic; this is the Sierpinski-Erdös theorem, which we will not prove here.  Roughly speaking, this theorem tells us that any “effective” first-order statement which is true about meager sets will also be true about null sets, and conversely. \diamond

– The uniform boundedness principle –

As mentioned in the introduction, the Baire category theorem implies various equivalences between qualitative and quantitative properties of linear transformations between Banach spaces.  (Lemma 1 of Notes 3 already gives a prototypical such equivalence between a qualitative property (continuity) and a quantitative one (boundedness).)

Theorem 2. (Uniform boundedness principle)  Let X be a Banach space, let Y be a normed vector space, and let (T_\alpha)_{\alpha \in A} be a family of continuous linear operators T_\alpha: X \to Y.  Then the following are equivalent:

  1. (Pointwise boundedness) For every x \in X, the set \{ T_\alpha x: \alpha \in A \} is bounded.
  2. (Uniform boundedness) The operator norms \{ \|T_\alpha \|_{op}: \alpha \in A \} are bounded.

The uniform boundedness principle is also known as the Banach-Steinhaus theorem.

Proof. It is clear that 2. implies 1.; now assume 1 holds and let us obtain 2.

For each n = 1, 2, \ldots, let E_n be the set

E_n := \{ x \in X: \| T_\alpha x \|_Y \leq n \hbox{ for all } \alpha \in A \}. (3)

The hypothesis 1 is nothing more than the assertion that the E_n cover X, and thus by the Baire category theorem must be dense in a ball.  Since the T_\alpha are continuous, the E_n are closed, and so one of the E_n contains a ball.  Since E_n - E_n \subset E_{2n}, we see that one of the E_n contains a ball centred at the origin.  Dilating n as necessary, we see that one of the E_n contains the unit ball B(0,1).  But then all the \|T_\alpha\|_{op} are bounded by n, and the claim follows. \Box

Exercise 4. Give counterexamples to show that the uniform boundedness principle fails if one relaxes the assumptions in any of the following ways:

  1. X is merely a normed vector space rather than a Banach space (i.e. completeness is dropped).
  2. The T_\alpha are not assumed to be continuous.
  3. The T_\alpha are allowed to be nonlinear rather than linear.

Thus completeness, continuity, and linearity are all essential for the uniform boundedness principle to apply. \diamond

Remark 2. It is instructive to establish the uniform boundedness principle more “constructively” without the Baire category theorem (though the proof of the Baire category theorem is still implicitly present), as follows.  Suppose that 2 fails, then \|T_\alpha\|_{op} is unbounded.  We can then find a sequence \alpha_n \in A such that \| T_{\alpha_{n+1}} \|_{op} > 100^n \| T_{\alpha_n} \|_{op} (say) for all n.  We can then find unit vectors x_n such that \| T_{\alpha_n} x_n \|_Y \geq \frac{1}{2} \| T_{\alpha_n} \|_{op}.

We can then form the absolutely convergent (and hence conditionally convergent, by completeness) sum x = \sum_{n=1}^\infty \epsilon_n 10^{-n} x_n for some choice of signs \epsilon_n = \pm 1 recursively as follows: once \epsilon_1,\ldots,\epsilon_{n-1} have been chosen, choose the sign \epsilon_n so that

\|\sum_{m=1}^n \epsilon_m 10^{-m} T_{\alpha_m} x_m \|_Y \geq \| 10^{-n} T_{\alpha_n} x_n \|_Y \geq \frac{1}{2} 10^{-n} \| T_{\alpha_n} \|_{op}.  (4)

From the triangle inequality we soon conclude that

\| T_{\alpha_n} x \|_Y \geq \frac{1}{4} 10^{-n} \| T_{\alpha_n} \|_{op}. (5)

But by hypothesis, the RHS is unbounded in n, contradicting 1.  \Box

A common way to apply the uniform boundedness principle is via the following corollary:

Corollary 1. (Uniform boundedness principle for norm convergence)  Let X and Y be Banach spaces, and let (T_n)_{n=1}^\infty be a family of continuous linear operators T_n: X \to Y.  Then the following are equivalent:

  1. (Pointwise convergence) For every x \in X, T_n x converges strongly in Y as n \to \infty.
  2. (Pointwise convergence to a continuous limit) There exists a continuous linear T: X \to Y such that for every x \in X, T_n x converges strongly in Y to Tx as n \to \infty.
  3. (Uniform boundedness + dense subclass convergence) The operator norms \{ \|T_n\|: n = 1,2,\ldots \} are bounded, and for a dense set of x in X, T_n x converges strongly in Y as n \to \infty.

Proof. Clearly 2. implies 1., and as convergent sequences are bounded, we see from Theorem 1 that 1. implies 3.  The implication of 2 from 3 follows by a standard limiting argument and is left as an exercise.  \Box

Remark 3. The same equivalences hold if one replaces the sequence (T_n)_{n=1}^\infty by a net (T_\alpha)_{\alpha \in A}. \diamond

Example 1 (Fourier inversion formula).  For any f \in L^2({\Bbb R}) and N > 0, define the Dirichlet summation operator

S_N f(x) := \int_{-N}^N \hat f(\xi) e^{2\pi i x \xi}\ d\xi (4)

where \hat f is the Fourier transform of f, defined on smooth compactly supported functions f \in C^\infty_0({\Bbb R}) by the formula \hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx and then extended to L^2 by the  Plancherel theorem.  Using the Plancherel identity, we can verify that the operator norms \|S_N\|_{op} are uniformly bounded (indeed, they are all 1); also, one can check that for f \in C^\infty_0({\Bbb R}), that S_N f converges in L^2 norm to f as N \to \infty.  As C^\infty_0({\Bbb R}) is known to be dense in L^2({\Bbb R)}, this implies that S_N f converges in L^2 norm to f for every f \in L^2({\Bbb R}).

This argument only used the “easy” implication of Corollary 1, namely the deduction of 2. from 3.  The “hard” implication using the Baire category theorem was not directly utilised.  However, from a metamathematical standpoint, that implication is important because it tells us that the above strategy to prove convergence in norm of the Fourier inversion formula on L^2 – i.e. to obtain uniform operator norms on the partial sums, and to establish convergence on a dense subclass of “nice” functions – is in some sense the only strategy available to prove such a result.  \diamond

Remark 4. There is a partial analogue of Corollary 1 for the question of pointwise almost everywhere convergence rather than norm convergence, known as Stein’s maximal principle (discussed for instance in this previous blog post of mine).  For instance, it reduces Carleson’s theorem on the pointwise almost everywhere convergence of Fourier series to the boundedness of a certain maximal function (the Carleson maximal operator) related to Fourier summation, although the latter task is again quite non-trivial.  (As in Example 1, the role of the maximal principle is meta-mathematical rather than direct.) \diamond

Of course, if we omit some of the hypotheses, it is no longer true that pointwise boundedness and uniform boundedness are the same.  For instance, if we let c_0({\Bbb N}) be the space of complex sequences with only finitely many non-zero entries and with the uniform topology, and let \lambda_n: c_0({\Bbb N}) \to {\Bbb C} be the map (a_m)_{m=1}^\infty \to n a_n, then the \lambda_n are pointwise bounded but not uniformly bounded; thus completeness of X is important.  Also, even in the one-dimensional case X=Y={\Bbb R}, the uniform boundedness principle can easily be seen to fail if the T_\alpha are non-linear transformations rather than linear ones. \diamond

– The open mapping theorem –

A map f: X \to Y between topological spaces X and Y is said to be open if it maps open sets to open sets.  This is similar to, but slightly different, from the more familiar property of being continuous, which is equivalent to the inverse image of open sets being open.  For instance, the map f: {\Bbb R} \to {\Bbb R} defined by f(x) := x^2 is continuous but not open; conversely, the function g: {\Bbb R}^2 \to {\Bbb R} defined by g(x,y) := \hbox{sgn}(y)+x is discontinuous but open.

We have seen that it is quite possible for non-linear continuous maps to fail to be open.  But for linear maps between Banach spaces, the situation is much better:

Theorem 3. (Open mapping theorem)  Let L: X \to Y be a continuous linear transformation between two Banach spaces X and Y.  Then the following are equivalent:

  1. L is surjective.
  2. L is open.
  3. (Qualitative solvability) For every f \in Y there exists a solution u \in X to the equation Lu = f.
  4. (Quantitative solvability) There exists a constant C > 0 such that for every f \in Y there exists a solution u \in X to the equation Lu = f, which obeys the bound \|u\|_X \leq C \|f\|_Y.
  5. (Quantitative solvability for a dense subclass) There exists a constant C > 0 such that for a dense set of f in Y, there exists a solution u \in X to the equation Lu = f, which obeys the bound \|u\|_X \leq C \|f\|_Y.

Proof. Clearly 4. implies 3., which is equivalent to 1., and it is easy to see from linearity that 2. and 4. are equivalent (cf. the proof of Lemma 1 from Notes 3).  4. trivially implies 5., while to obtain 4. from 5., observe that if E is any dense subset of the Banach space Y, then any f in Y can be expressed as an absolutely convergent series f = \sum_n f_n of elements in E (since one can iteratively approximate the residual f - \sum_{n=1}^{N-1} f_n to arbitrary accuracy by an element of E for N=1,2,3,\ldots), and the claim easily follows.  So it suffices to show that 3. implies 4.

For each n, let E_n \subset Y be the set of all f \in Y for which there exists a solution to Lu=f with \|u\|_X \leq n \|f\|_Y.  From the hypothesis 3, we see that \bigcup_n E_n = Y.  Since Y is complete, the Baire category theorem implies that there is some E_n which is dense in some ball B(f_0,r) in Y.  In other words, the problem Lu=f is approximately quantitatively solvable in the ball B(f_0,r) in the sense that

  • For every \varepsilon > 0 and every f \in B(f_0,r), there exists an approximate solution u with \| Lu - f \|_Y \leq \varepsilon and \|u\|_X \leq n \|Lu \|_Y, and thus \|u\|_X \leq n r + n \varepsilon.

By subtracting two such approximate solutions, we conclude that

  • For any f \in B(0,2r) and any \varepsilon > 0, there exists u \in X with \|Lu - f \|_Y \leq 2\varepsilon and \|u\|_X \leq 2nr + 2 n \varepsilon.

Since L is homogeneous, we can rescale and conclude that

  • For any f \in Y and any \varepsilon > 0 there exists u \in X with \|Lu - f \|_Y \leq 2 \varepsilon and \|u\|_X \leq 2n \|f\|_Y + 2n \varepsilon.

In particular, setting \varepsilon = \frac{1}{4} \|f\|_Y (treating the case f=0 separately), we conclude that

  • For any f \in Y, we may write f = Lu + f', where \| f'\|_Y \leq \frac{1}{2} \|f\|_Y and \|u\|_X \leq \frac{5}{2} n \|f\|_Y.

We can iterate this procedure and then take limits (now using the completeness of X rather than Y) to obtain a solution to Lu=f for every f \in Y with \|u\|_X \leq 5 n \|f\|_Y, and the claim follows. \Box

Remark 5. The open mapping theorem provides metamathematical justification for the method of a priori estimates for solving linear equations such as Lu = f for a given datum f \in Y and for an unknown u \in X, which is of course a familiar problem in linear PDE.  The a priori method assumes that f is in some dense class of nice functions (e.g. smooth functions) in which solvability of Lu=f is presumably easy, and then proceeds to obtain the a priori estimate \|u\|_X \leq C \|f\|_Y for some constant C.  Theorem 3 then assures that Lu=f is solvable for all f in Y (with a similar bound).  As before, this implication does not directly use the Baire category theorem, but that theorem helps explain why this method is “not wasteful”.  \diamond

A pleasant corollary of the open mapping theorem is that, as with ordinary linear algebra or with arbitrary functions, invertibility is the same thing as bijectivity:

Corollary 2. Let T: X \to Y be a continuous linear operator between two Banach spaces X, Y.  Then the following are equivalent:

  1. (Qualitative invertibility) T is bijective.
  2. (Quantitative invertibility) T is bijective, and T^{-1}: Y \to X is a continuous (hence bounded) linear transformation.

Remark 6. The claim fails without the completeness hypotheses on X and Y.  For instance, consider the operator T: c_c({\Bbb N}) \to c_c({\Bbb N}) defined by T (a_n)_{n=1}^\infty := (\frac{a_n}{n})_{n=1}^\infty, where we give c_c({\Bbb N}) the uniform norm.  Then T is continuous and bijective, but T^{-1} is unbounded. \diamond

Exercise 5. Show that Corollary 2 can still fail if we drop the completeness hypothesis on just X, or just Y. \diamond

Exercise 6. Suppose that L: X \to Y is a surjective continuous linear transformation between Banach spaces.  By combining the open mapping theorem with the Hahn-Banach theorem, show that the transpose map L^*: Y^* \to X^* is bounded from below, i.e. there exists c > 0 such that \| L^* \lambda \|_{X^*} \geq c \|\lambda \|_{Y^*} for all \lambda \in Y^*.  Conclude that L^* is an isomorphism between Y^* and L^*(Y^*)\diamond

Let L be as in Theorem 3, so that the problem Lu=f is both qualitatively and quantitatively solvable.  A standard application of Zorn’s lemma (similar to that used to prove the Hahn-Banach theorem) shows that the problem Lu=f is also qualitatively linearly solvable, in the sense that there exists a linear transformation S: Y \to X such that LSf = f for all f \in Y (i.e. S is a right-inverse of L).  In view of the open mapping theorem, it is then tempting to conjecture that L must also be quantitatively linearly solvable, in the sense that there exists a continuous linear transformation S: Y \to X such that LSf = f for all f \in Y.  By Corollary 2, we see that this conjecture is true when the problem Lu=f is determined, i.e. there is exactly one solution u for each datum f.  Unfortunately, the conjecture can fail when Lu=f is underdetermined (more than one solution u for each f); we discuss this in the appendix to these notes.  On the other hand, the situation is much better for Hilbert spaces:

Exercise 7. Suppose that L: H \to H' is a surjective continuous linear transformation between Hilbert spaces.  Show that there exists a continuous linear transformation S: H' \to H such that LS = I.  Furthermore, we can ensure that the range of S is orthogonal to the kernel of L, and that this condition determines S uniquely. \diamond

Remark 7. In fact, Hilbert spaces are essentially the only type of Banach space for which we have this nice property, due to the Lindenstrauss-Tzafriri solution of the complemented subspaces problem. \diamond

Exercise 8. Let M and N be closed subspaces of a Banach space X.  Show that the following statements are equivalent:

  1. (Qualitative complementation) Every x in X can be expressed in the form m+n for m \in M, n \in N in exactly one way.
  2. (Quantitative complementation)  Every x in X can be expressed in the form m+n for m \in M, n \in N in exactly one way.  Furthermore there exists C > 0 such that \|m\|_X, \|n\|_X \leq C \|x\|_X all x.

When either of these two properties hold, we say that M (or N) is a complemented subspace, and that N is a complement of M (or vice versa).  \diamond

The property of being complemented is closely related to that of quantitative linear solvability:

Exercise 9. Let L: X \to Y be a surjective bounded linear map between Banach spaces.  Show that there exists a bounded linear map S: Y \to X such that LSf = f for all f \in Y if and only if the kernel \{ u \in X: Lu=0\} is a complemented subspace of X.  \diamond

Exercise 10. Show that any finite-dimensional or closed finite co-dimensional subspace of a Banach space is complemented.  \diamond

Remark 8. The problem of determining whether a given closed subspace of a Banach space is complemented or not is, in general, quite difficult.  However, non-complemented subspaces do exist in abundance; some example are given in the apendix, and the Lindenstrauss-Tzafriri theorem referred to in in Remark 7 asserts that any Banach space not isomorphic to a Hilbert space contains at least one non-complemented subspace.  There is also a remarkable construction of Gowers and Maurey of a Banach space such that every subspace, other than those ruled out by Exercise 10, are uncomplemented.  \diamond

– The closed graph theorem –

Recall that a map T: X \to Y between two metric spaces is continuous if and only if, whenever x_n converges to x in X, Tx_n converges to Tx in Y.  We can also define the weaker property of being closed: an map T: X \to Y is closed if and only if whenever x_n converges to x in X, and Tx_n converges to a limit y in Y, then y is equal to Tx; equivalently, T is closed if its graph \{ (x,Tx): x \in X \} is a closed subset of X \times Y.  This is weaker than continuity because it has the additional requirement that the sequence Tx_n is already convergent. (Despite the name, closed operators are not directly related to open operators.)

Example 2. Let T: c_0({\Bbb N}) \to c_0({\Bbb N}) be the transformation T( a_m )_{m=1}^\infty := (ma_m)_{m=1}^\infty.  This transformation is unbounded and hence discontinuous, but one easily verifies that it is closed.  \diamond

As Example 2 shows, being closed is often a weaker property than being continuous.  However, the remarkable closed graph theorem shows that as long as the domain and range of the operator are both Banach spaces, the two statements are equivalent:

Theorem 4. (Closed graph theorem)  Let T: X \to Y be a linear transformation between two Banach spaces.  Then the following are equivalent:

  1. T is continuous.
  2. T is closed.
  3. (Weak continuity) There exists some topology {\mathcal F} on Y, weaker than the norm topology (i.e. containing fewer open sets) but still Hausdorff, for which T: X \to (Y, {\mathcal F}) is continuous.

Proof. It is clear that 1 implies 3 (just take {\mathcal F} to equal the norm topology).  To see why 3 implies 2, observe that if x_n \to x in X and Tx_n \to y in norm, then Tx_n \to y in the weaker topology {\mathcal F} as well; but by weak continuity Tx_n \to Tx in {\mathcal F}.  Since Hausdorff topological spaces have unique limits, we have Tx=y and so T is closed.

Now we show that 2 implies 1.  If T is closed, then the graph \Gamma := \{ (x,Tx): x \in X \} is a closed linear subspace of the Banach space X \times Y and is thus also a Banach space.  On the other hand, the projection map \pi: (x,Tx) \mapsto x from \Gamma to X is clearly a continuous linear bijection.  By Corollary 2, its inverse x \mapsto (x,Tx) is also continuous, and so T is continuous as desired. \Box

We can reformulate the closed graph theorem in the following fashion:

Corollary 3. Let X, Y be Banach spaces, and suppose we have some continuous inclusion Y \subset Z of Y into a Hausdorff topological vector space Z.  Let T: X \to Z be a continuous linear transformation.  Then the following are equivalent.

  1. (Qualitative regularity) For all x \in X, Tx \in Y.
  2. (Quantitative regularity) For all x \in X, Tx \in Y, and furthermore \|Tx\|_Y \leq C \|x\|_X for some C > 0 independent of x.
  3. (Quantitative regularity on a dense subclass) For all x in a dense subset of X, Tx \in Y, and furthermore \|Tx\|_Y \leq C \|x\|_X for some C > 0 independent of x.

Proof. Clearly 2. implies 3. or 1.  If we have 3., then T extends uniquely to a bounded linear map from X to Y, which must agree with the original continuous map from X to Z since limits in the Hausdorff space Z are unique, and so 3. implies 2.  Finally, if 1. holds, then we can view T as a map from X to Y, which by Theorem 4 is continuous, and the claim now follows from Lemma 1 from Notes 3. \Box

In practice, one should think of Z as some sort of “low regularity” space with a weak topology, and Y as a “high regularity” subspace with a stronger topology.  Corollary 3 motivates the method of a priori estimates to establish the Y-regularity of some linear transform Tx of an arbitrary element x in a Banach space X, by first establishing the a priori estimate \|Tx\|_Y \leq C \|x\|_X for a dense subclass of “nice” elements of X, and then using the above corollary (and some weak continuity of T in a low regularity space) to conclude.  The closed graph theorem provides the metamathematical explanation as to why this approach is at least as powerful as any other approach to proving regularity.

Example 3. Let 1 \leq p \leq 2, and let p’ be the dual exponent of p.  To prove that the Fourier transform \hat f of a function f \in L^p({\Bbb R}) necessarily lies in L^{p'}({\Bbb R}), it suffices to prove the Hausdorff-Young inequality

\| \hat f \|_{L^{p'}({\Bbb R})} \leq C_p \|f\|_{L^p({\Bbb R})} (5)

for some constant C_p and all f in some suitable dense subclass of L^p({\Bbb R}) (e.g. the space C^\infty_0({\Bbb R}) of smooth functions of compact support), together with the “soft” observation that the Fourier transform is continuous from L^p({\Bbb R}) to the space of tempered distributions, which is a Hausdorff space into which L^{p'}({\Bbb R}) embeds continuously.  One can replace the Hausdorff-Young inequality here by countless other estimates in harmonic analysis to obtain similar qualitative regularity conclusions. \diamond

– Appendix: Nonlinear solvability (optional) –

In this appendix we give an example of a linear equations Lu=f which can only be quantitatively solved in a nonlinear fashion.  We will use a number of basic tools which we will only cover later in this course, and so this material is optional reading.

Let X = \{0,1\}^{\Bbb N} be the infinite discrete cube with the product topology; by Tychonoff’s theorem, this is a compact Hausdorff space.  The Borel \sigma-algebra is generated by the cylinder sets

E_n := \{ (x_m)_{m=1}^\infty \in \{0,1\}^{\Bbb N}: x_n = 1 \}. (6)

(From a probabilistic view point, one can think of X as the event space for flipping a countably infinite number of coins, and E_n as the event that the n^{th} coin lands as heads.)

Let M(X) be the space of finite Borel measures on X; this can be verified to be a Banach space.  There is a map L: M(X) \to \ell^\infty({\Bbb N}) defined by

L( \mu ) := ( \mu(E_n) )_{n=1}^\infty. (7)

This is a continuous linear transformation.  The equation Lu=f is quantitatively solvable for every f \in \ell^\infty({\Bbb N}).  Indeed, if f is an indicator function f = 1_A, then f = L \delta_{x_A}, where x_A \in \{0,1\}^{\Bbb N} is the sequence that equals 1 on A and 0 outside of A, and \delta_{x_A} is the Dirac mass at A.  The general case then follows by expressing a bounded sequence as an integral of indicator functions (e.g. if f takes values in [0,1], we can write f = \int_0^1 1_{\{f > t\}}\ dt).  Note however that this is a nonlinear operation, since the indicator 1_{\{f>t\}} depends nonlinearly on f.

We now claim that the equation Lu=f is not quantitatively linearly solvable, i.e. there is no bounded linear map S: \ell^\infty({\Bbb N}) \to M(X) such that LSf = f for all f \in \ell^\infty({\Bbb N}).  This fact was first observed by Banach and Mazur; we shall give two proofs, one of a “soft analysis” flavour and one of a “hard analysis” flavour.

We begin with the “soft analysis” proof, starting with a measure-theoretic result which is of independent interest.

Theorem 5. (Nikodym convergence theorem) Let (X, {\mathcal B}) be a measurable space, and let \sigma_n: {\mathcal B} \to {\Bbb R} be a sequence of signed finite measures which is weakly convergent in the sense that \sigma_n(E) converges to some limit \sigma(E) for each E \in {\mathcal B}.

  1. The \sigma_n are uniformly countably additive, which means that for any sequence E_1, E_2, \ldots of disjoint measurable sets, the series \sum_{m=1}^\infty |\sigma_n(E_m)| converges uniformly in n.
  2. \sigma is a signed finite measure.

Proof. It suffices to prove the first part, since this easily implies that \sigma is also countably additive, and is thence a signed finite measure.  Suppose for contradiction that the claim failed, then one could find disjoint E_1, E_2, \ldots and \varepsilon > 0 such that one has \limsup_{n \to \infty} \sum_{m=M}^\infty |\sigma_n(E_m)| > \varepsilon for all M.  We now construct disjoint sets A_1, A_2, \ldots, each consisting of the union of a finite collection of the E_j, and an increasing sequence n_1, n_2, \ldots of positive integers, by the following recursive procedure:

  1. Initialise k=0.
  2. Suppose recursively that n_1 < \ldots < n_{2k} and A_1,\ldots,A_k has already been constructed for some k \geq 0.
  3. Choose n_{2k+1} > n_{2k} so large that for all n \geq n_{2k+1}, \sigma_n(A_1 \cup \ldots \cup A_k) differs from \sigma(A_1 \cup \ldots \cup A_k) by at most \varepsilon/10.
  4. Choose M_k so large that M_k is larger than j for any E_j \subset A_1 \cup \ldots \cup A_k, and such that \sum_{m=M_k}^\infty |\sigma_{n_j}(E_m)| \leq \varepsilon / 100^{k+1} for all 1 \leq j \leq 2k+1.
  5. Choose n_{2k+2} > n_{2k+1} so that \sum_{m=M_k}^\infty |\sigma_{n_{2k+2}}(E_m)| > \varepsilon.
  6. Pick A_{k+1} to be a finite union of the E_j with j \geq M_k such that |\sigma_{n_{2k+2}}(A_{k+1})| > \varepsilon/2.
  7. Increment k to k+1 and then return to Step 2.

It is then a routine matter to show that if A := \bigcup_{j=1}^\infty A_j, then |\sigma_{n_{2k+2}}(A) - \sigma_{n_{2k+1}}(A)| \geq \varepsilon/10 for all j, contradicting the hypothesis that \sigma_j is weakly convergent to \sigma. \Box

Exercise 11. (Schur’s property for \ell^1)  Show that if a sequence in \ell^1({\Bbb N}) is convergent in the weak topology, then it is convergent in the strong topology.  \diamond

We return now to the map S: \ell^\infty({\Bbb N}) \to M(X).  Consider the sequence a_n \in c_0({\Bbb N}) \subset \ell^\infty defined by a_n := (1_{m \leq n})_{m=1}^\infty, i.e. each a_n is the sequence consisting of n 1′s followed by an infinite number of 0′s.   As the dual of c_0({\Bbb N}) is isomorphic to \ell^1({\Bbb N}), we see from the dominated convergence theorem that a_n is a weakly Cauchy sequence in c_0({\Bbb N}), in the sense that \lambda(a_n) is Cauchy for any \lambda \in c_0({\Bbb N})^*.  Applying S, we conclude that S(a_n) is weakly Cauchy in M(X).  In particular, using the bounded linear functionals \mu \mapsto \mu(E) on M(X), we see that S(a_n)(E) converges to some limit \mu(E) for all measurable sets E.  Applying the Nikodym convergence theorem we see that \mu is also a signed finite measure.  We then see that S(a_n) converges in the weak topology to \mu.  (One way to see this is to define \nu := \sum_{n=1}^\infty 2^{-n} |S(a_n)| + |\mu|, then \nu is finite and S(a_n), \mu are all absolutely continuous with respect to \nu; now use the Radon-Nikodym theorem (see Notes 1) and the fact that L^1(\nu)^* \equiv L^\infty(\nu).)  On the other hand, as LS=I and L and S are both bounded, S is a Banach space isomorphism between c_0 and S(c_0).  Thus S(c_0) is complete, hence closed, hence weakly closed (by Hahn-Banach), and so \mu = S(a) for some a \in c_0.  By Hahn-Banach again, this implies that a_n converges weakly to a \in c_0.  But this is easily seen to be impossible, since the constant sequence (1)_{m=1}^\infty does not lie in c_0, and the claim follows.

Now we give the “hard analysis” proof.  Let e_1, e_2, \ldots be the standard basis for \ell^\infty({\Bbb N}), let N be a large number, and consider the random sums

S( \varepsilon_1 e_1 + \ldots + \varepsilon_N e_N ) (8)

where \varepsilon_n \in \{-1,1\} are iid random signs.  Since the \ell^\infty norm of \varepsilon_1 e_1 + \ldots + \varepsilon_N e_N is 1, we have

\| S( \varepsilon_1 e_1 + \ldots + \varepsilon_N e_N ) \|_{M(X)} \leq C (9)

for some constant C independent of N.  On the other hand, we can write S(e_n) = f_n \nu for some finite measure \nu and some f_n \in L^1(\nu) using  Radon-Nikodym as in the previous proof, and then

\| \varepsilon_1 f_1 + \ldots + \varepsilon_N f_N \|_{L^1(\nu)} \leq C. (10)

Taking expectations and applying Khintchine’s inequality we conclude

\| (\sum_{n=1}^N |f_n|^2)^{1/2} \|_{L^1(\nu)} \leq C' (11)

for some constant C’ independent of N.  By Cauchy-Schwarz this implies that

\| \sum_{n=1}^N |f_n| \|_{L^1(\nu)} \leq C' \sqrt{N} (12)

But as \|f_n\|_{L^1(\nu)} = \|S(e_n)\|_{M(X)} \geq c for some constant c > 0 independent of N, we obtain a contradiction for N large enough, and the claim follows.

Remark 9. The phenomenon of nonlinear quantitative solvability actually comes up in many applications of interest.  For instance, consider the Fefferman-Stein decomposition theorem, which asserts that any f \in BMO({\Bbb R}) of bounded mean oscillation can be decomposed as f = g + Hh for some g, h \in L^\infty({\Bbb R}), where H is the Hilbert transform.  This theorem was first proven by using the duality of the Hardy space H^1({\Bbb R}) and BMO (and by using Exercise 13 from Notes 6), and by using the fact that a function f is in H^1({\Bbb R}) if and only if f and Hf both lie in L^1({\Bbb R}).  From the open mapping theorem we know that we can pick g, h so that the L^\infty norms of g, h are bounded by a multiple of the BMO norm of f.  But it turns out not to be possible to pick g and h in a bounded linear manner in terms of f, although this is a little tricky to prove.  (Uchiyama famously gave an explicit construction of g, h in terms of f, but the construction was highly nonlinear; see my blog post on the topic.)

An example in a similar spirit was given more recently by Bourgain and Brezis, who considered the problem of solving the equation \hbox{div} u = f on the d-dimensional torus {\Bbb T}^d for some function f: {\Bbb T}^d \to {\Bbb C} on the torus with mean zero, and with some unknown vector field u: {\Bbb T}^d \to {\Bbb C}^d, where the derivatives are interpreted in the weak sense.  They showed that if d \geq 2 and f \in L^d({\Bbb T}^d), then there existed a solution u to this problem with u \in W^{1,d} \cap C^0, despite the failure of Sobolev embedding at this endpoint.  Again, the open mapping theorem allows one to choose u with norm bounded by a multiple of the norm of f, but Bourgain and Brezis also show that one cannot select u in a bounded linear fashion depending on f.  \diamond

Question. All of the above constructions of non-complemented closed subspaces, or of linear problems that can only be quantitatively solved nonlinearly, were quite involved.  Is there a “soft” or “elementary” way to see that closed subspaces of Banach spaces exist which are not complemented, or (equivalently) that surjective continuous linear maps between Banach spaces do not always enjoy a continuous linear right-inverse?  I do not have a good answer to this question. \diamond

[Update, Feb 4: definition of “residual” corrected.]