The notion of what it means for a subset E of a space X to be “small” varies from context to context.  For instance, in measure theory, when $X = (X, {\mathcal X}, \mu)$ is a measure space, one useful notion of a “small” set is that of a null set: a set E of measure zero (or at least contained in a set of measure zero).  By countable additivity, countable unions of null sets are null.  Taking contrapositives, we obtain

Lemma 1. (Pigeonhole principle for measure spaces) Let $E_1, E_2, \ldots$ be an at most countable sequence of measurable subsets of a measure space X.  If $\bigcup_n E_n$ has positive measure, then at least one of the $E_n$ has positive measure.

Now suppose that X was a Euclidean space ${\Bbb R}^d$ with Lebesgue measure m.  The Lebesgue differentiation theorem easily implies that having positive measure is equivalent to being “dense” in certain balls:

Proposition 1. Let $E$ be a measurable subset of ${\Bbb R}^d$.  Then the following are equivalent:

1. E has positive measure.
2. For any $\varepsilon > 0$, there exists a ball B such that $m( E \cap B ) \geq (1-\varepsilon) m(B)$.

Thus one can think of a null set as a set which is “nowhere dense” in some measure-theoretic sense.

It turns out that there are analogues of these results when the measure space $X = (X, {\mathcal X}, \mu)$  is replaced instead by a complete metric space $X = (X,d)$.  Here, the appropriate notion of a “small” set is not a null set, but rather that of a nowhere dense set: a set E which is not dense in any ball, or equivalently a set whose closure has empty interior.  (A good example of a nowhere dense set would be a proper subspace, or smooth submanifold, of ${\Bbb R}^d$, or a Cantor set; on the other hand, the rationals are a dense subset of ${\Bbb R}$ and thus clearly not nowhere dense.)   We then have the following important result:

Theorem 1. (Baire category theorem). Let $E_1, E_2, \ldots$ be an at most countable sequence of subsets of a complete metric space X.  If $\bigcup_n E_n$ contains a ball B, then at least one of the $E_n$ is dense in a sub-ball B’ of B (and in particular is not nowhere dense).  To put it in the contrapositive: the countable union of nowhere dense sets cannot contain a ball.

Exercise 1. Show that the Baire category theorem is equivalent to the claim that in a complete metric space, the countable intersection of open dense sets remain dense.  $\diamond$

Exercise 2. Using the Baire category theorem, show that any non-empty complete metric space without isolated points is uncountable.  (In particular, this shows that Baire category theorem can fail for incomplete metric spaces such as the rationals ${\Bbb Q}$.)  $\diamond$

To quickly illustrate an application of the Baire category theorem, observe that it implies that one cannot cover a finite-dimensional real or complex vector space ${\Bbb R}^n, {\Bbb C}^n$ by a countable number of proper subspaces.  One can of course also establish this fact by using Lebesgue measure on this space.  However, the advantage of the Baire category approach is that it also works well in infinite dimensional complete normed vector spaces, i.e. Banach spaces, whereas the measure-theoretic approach runs into significant difficulties in infinite dimensions.  This leads to three fundamental equivalences between the qualitative theory of continuous linear operators on Banach spaces (e.g. finiteness, surjectivity, etc.) to the quantitative theory (i.e. estimates):

1. The uniform boundedness principle, that equates the qualitative boundedness (or convergence) of a family of continuous operators with their quantitative boundedness.
2. The open mapping theorem, that equates the qualitative solvability of a linear problem Lu = f with the quantitative solvability.
3. The closed graph theorem, that equates the qualitative regularity of a (weakly continuous) operator T with the quantitative regularity of that operator.

Strictly speaking, these theorems are not used much directly in practice, because one usually works in the reverse direction (i.e. first proving quantitative bounds, and then deriving qualitative corollaries); but the above three theorems help explain why we usually approach qualitative problems in functional analysis via their quantitative counterparts.

– Proof of Baire category theorem –

Assume that the Baire category theorem failed; then it would be possible to cover a ball $B(x_0,r_0)$ in a complete metric space by a countable family $E_1, E_2, E_3, \ldots$ of nowhere dense sets.

We now invoke the following easy observation: if E is nowhere dense, then every ball B contains a subball B’ which is disjoint from E.  Indeed, this follows immediately from the definition of a nowhere dense set.

Invoking this observation, we can find a ball $B(x_1,r_1)$ in $B(x_0,r_0/10)$ (say) which is disjoint from $E_1$; we may also assume that $r_1 \leq r_0/10$ by shrinking $r_1$ as necessary.  Then, inside $B(x_1,r_1/10)$, we can find a ball $B(x_2,r_2)$ which is also disjoint from $E_2$, with $r_2 \leq r_1/10$.  Continuing this process, we end up with a nested sequence of balls $B(x_n,r_n)$, each of which are disjoint from $E_1,\ldots,E_n$, and such that $B(x_n,r_n) \subset B(x_{n-1},r_{n-1}/10)$ and $r_n \leq r_{n-1}/10$ for all $n=1,2,\ldots$.

From the triangle inequality we have $d(x_n,x_{n-1}) \leq 2 r_{n-1} / 10 \leq 2 \times 10^{-n} r_0$, and so the sequence $x_n$ is a Cauchy sequence.  As X is complete, $x_n$ converges to a limit x.  Summing the geometric series, one verifies that $x \in B(x_{n-1},r_{n-1})$ for all $n=1,2,\ldots$, and in particular $x$ is an element of B which avoids all of $E_1, E_2, E_3, \ldots$, a contradiction.  $\Box$

We can illustrate the analogy between the Baire category theorem and the measure-theoretic analogs by introducing some further definitions.  Call a set E meager or of the first category if it can be expressed (or covered) by a countable union of nowhere dense sets, and of the second category if it is not meager.  Thus, the Baire category theorem shows that any subset of a complete metric space with non-empty interior is of the second category, which may help explain the name for the property.  Call a set co-meager or residual if its complement is meager, and call a set Baire or almost open if it differs from an open set by a meager set (note that a Baire set is unrelated to the Baire $\sigma$-algebra).  Then we have the following analogy between complete metric space topology, and measure theory:

 Complete non-empty metric space X Measure space X of positive measure first category (meager) zero measure (null ) second category positive measure residual (co-meager) full measure (co-null) Baire measurable

Nowhere dense sets are meager, and meager sets have empty interior. Contrapositively, sets with dense interior
are residual, and residual sets are somewhere dense.  Taking complements instead of contrapositives, we see that open dense sets are co-meager,and co-meager sets are dense.

While there are certainly many analogies between meager sets and null sets (for instance, both classes are closed under countable unions, or under intersections with arbitrary sets), the two concepts can differ in practice.  For instance, in the real line ${\Bbb R}$ with the standard metric and measure space structures, the set

$\bigcup_{n=1}^\infty (q_n - 2^{-n}, q_n + 2^{-n}),$ (1)

where $q_1, q_2, \ldots$ is an enumeration of the rationals, is open and dense, but has Lebesgue measure at most 2; thus its complement has infinite measure in ${\Bbb R}$ but is nowhere dense (hence meager).  As a variant of this, the set

$\bigcap_{m=1}^\infty \bigcup_{n=1}^\infty (q_n - 2^{-n}/m, q_n + 2^{-n}/m),$ (2)

is a null set, but is the intersection of countably many open dense sets and is thus co-meager.

Exercise 3. A real number x is Diophantine if for every $\varepsilon > 0$ there exists $c_\varepsilon > 0$ such that $|x - \frac{a}{q}| \geq \frac{c_\varepsilon}{|q|^{2+\varepsilon}}$ for every rational number $\frac{a}{q}$.  Show that  the set of Diophantine real numbers has full measure but is meager.  $\diamond$

Remark 1. If one assumes some additional axioms of set theory (e.g. the continuum hypothesis), it is possible to show that the collection of meager subsets of ${\Bbb R}$ and the collection of null subsets of ${\Bbb R}$ (viewed as $\sigma$-ideals of the collection of all subsets of ${\Bbb R}$) are isomorphic; this is the Sierpinski-Erdös theorem, which we will not prove here.  Roughly speaking, this theorem tells us that any “effective” first-order statement which is true about meager sets will also be true about null sets, and conversely. $\diamond$

– The uniform boundedness principle –

As mentioned in the introduction, the Baire category theorem implies various equivalences between qualitative and quantitative properties of linear transformations between Banach spaces.  (Lemma 1 of Notes 3 already gives a prototypical such equivalence between a qualitative property (continuity) and a quantitative one (boundedness).)

Theorem 2. (Uniform boundedness principle)  Let X be a Banach space, let Y be a normed vector space, and let $(T_\alpha)_{\alpha \in A}$ be a family of continuous linear operators $T_\alpha: X \to Y$.  Then the following are equivalent:

1. (Pointwise boundedness) For every $x \in X$, the set $\{ T_\alpha x: \alpha \in A \}$ is bounded.
2. (Uniform boundedness) The operator norms $\{ \|T_\alpha \|_{op}: \alpha \in A \}$ are bounded.

The uniform boundedness principle is also known as the Banach-Steinhaus theorem.

Proof. It is clear that 2. implies 1.; now assume 1 holds and let us obtain 2.

For each $n = 1, 2, \ldots$, let $E_n$ be the set

$E_n := \{ x \in X: \| T_\alpha x \|_Y \leq n \hbox{ for all } \alpha \in A \}$. (3)

The hypothesis 1 is nothing more than the assertion that the $E_n$ cover X, and thus by the Baire category theorem must be dense in a ball.  Since the $T_\alpha$ are continuous, the $E_n$ are closed, and so one of the $E_n$ contains a ball.  Since $E_n - E_n \subset E_{2n}$, we see that one of the $E_n$ contains a ball centred at the origin.  Dilating n as necessary, we see that one of the $E_n$ contains the unit ball $B(0,1)$.  But then all the $\|T_\alpha\|_{op}$ are bounded by n, and the claim follows. $\Box$

Exercise 4. Give counterexamples to show that the uniform boundedness principle fails if one relaxes the assumptions in any of the following ways:

1. X is merely a normed vector space rather than a Banach space (i.e. completeness is dropped).
2. The $T_\alpha$ are not assumed to be continuous.
3. The $T_\alpha$ are allowed to be nonlinear rather than linear.

Thus completeness, continuity, and linearity are all essential for the uniform boundedness principle to apply. $\diamond$

Remark 2. It is instructive to establish the uniform boundedness principle more “constructively” without the Baire category theorem (though the proof of the Baire category theorem is still implicitly present), as follows.  Suppose that 2 fails, then $\|T_\alpha\|_{op}$ is unbounded.  We can then find a sequence $\alpha_n \in A$ such that $\| T_{\alpha_{n+1}} \|_{op} > 100^n \| T_{\alpha_n} \|_{op}$ (say) for all n.  We can then find unit vectors $x_n$ such that $\| T_{\alpha_n} x_n \|_Y \geq \frac{1}{2} \| T_{\alpha_n} \|_{op}$.

We can then form the absolutely convergent (and hence conditionally convergent, by completeness) sum $x = \sum_{n=1}^\infty \epsilon_n 10^{-n} x_n$ for some choice of signs $\epsilon_n = \pm 1$ recursively as follows: once $\epsilon_1,\ldots,\epsilon_{n-1}$ have been chosen, choose the sign $\epsilon_n$ so that

$\|\sum_{m=1}^n \epsilon_m 10^{-m} T_{\alpha_m} x_m \|_Y \geq \| 10^{-n} T_{\alpha_n} x_n \|_Y \geq \frac{1}{2} 10^{-n} \| T_{\alpha_n} \|_{op}$.  (4)

From the triangle inequality we soon conclude that

$\| T_{\alpha_n} x \|_Y \geq \frac{1}{4} 10^{-n} \| T_{\alpha_n} \|_{op}.$ (5)

But by hypothesis, the RHS is unbounded in n, contradicting 1.  $\Box$

A common way to apply the uniform boundedness principle is via the following corollary:

Corollary 1. (Uniform boundedness principle for norm convergence)  Let X and Y be Banach spaces, and let $(T_n)_{n=1}^\infty$ be a family of continuous linear operators $T_n: X \to Y$.  Then the following are equivalent:

1. (Pointwise convergence) For every $x \in X$, $T_n x$ converges strongly in Y as $n \to \infty$.
2. (Pointwise convergence to a continuous limit) There exists a continuous linear $T: X \to Y$ such that for every $x \in X$, $T_n x$ converges strongly in Y to Tx as $n \to \infty$.
3. (Uniform boundedness + dense subclass convergence) The operator norms $\{ \|T_n\|: n = 1,2,\ldots \}$ are bounded, and for a dense set of x in X, $T_n x$ converges strongly in Y as $n \to \infty$.

Proof. Clearly 2. implies 1., and as convergent sequences are bounded, we see from Theorem 1 that 1. implies 3.  The implication of 2 from 3 follows by a standard limiting argument and is left as an exercise.  $\Box$

Remark 3. The same equivalences hold if one replaces the sequence $(T_n)_{n=1}^\infty$ by a net $(T_\alpha)_{\alpha \in A}$. $\diamond$

Example 1 (Fourier inversion formula).  For any $f \in L^2({\Bbb R})$ and N > 0, define the Dirichlet summation operator

$S_N f(x) := \int_{-N}^N \hat f(\xi) e^{2\pi i x \xi}\ d\xi$ (4)

where $\hat f$ is the Fourier transform of f, defined on smooth compactly supported functions $f \in C^\infty_0({\Bbb R})$ by the formula $\hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx$ and then extended to $L^2$ by the  Plancherel theorem.  Using the Plancherel identity, we can verify that the operator norms $\|S_N\|_{op}$ are uniformly bounded (indeed, they are all 1); also, one can check that for $f \in C^\infty_0({\Bbb R})$, that $S_N f$ converges in $L^2$ norm to f as $N \to \infty$.  As $C^\infty_0({\Bbb R})$ is known to be dense in $L^2({\Bbb R)}$, this implies that $S_N f$ converges in $L^2$ norm to f for every $f \in L^2({\Bbb R})$.

This argument only used the “easy” implication of Corollary 1, namely the deduction of 2. from 3.  The “hard” implication using the Baire category theorem was not directly utilised.  However, from a metamathematical standpoint, that implication is important because it tells us that the above strategy to prove convergence in norm of the Fourier inversion formula on $L^2$ – i.e. to obtain uniform operator norms on the partial sums, and to establish convergence on a dense subclass of “nice” functions – is in some sense the only strategy available to prove such a result.  $\diamond$

Remark 4. There is a partial analogue of Corollary 1 for the question of pointwise almost everywhere convergence rather than norm convergence, known as Stein’s maximal principle (discussed for instance in this previous blog post of mine).  For instance, it reduces Carleson’s theorem on the pointwise almost everywhere convergence of Fourier series to the boundedness of a certain maximal function (the Carleson maximal operator) related to Fourier summation, although the latter task is again quite non-trivial.  (As in Example 1, the role of the maximal principle is meta-mathematical rather than direct.) $\diamond$

Of course, if we omit some of the hypotheses, it is no longer true that pointwise boundedness and uniform boundedness are the same.  For instance, if we let $c_0({\Bbb N})$ be the space of complex sequences with only finitely many non-zero entries and with the uniform topology, and let $\lambda_n: c_0({\Bbb N}) \to {\Bbb C}$ be the map $(a_m)_{m=1}^\infty \to n a_n$, then the $\lambda_n$ are pointwise bounded but not uniformly bounded; thus completeness of X is important.  Also, even in the one-dimensional case $X=Y={\Bbb R}$, the uniform boundedness principle can easily be seen to fail if the $T_\alpha$ are non-linear transformations rather than linear ones. $\diamond$

– The open mapping theorem –

A map $f: X \to Y$ between topological spaces X and Y is said to be open if it maps open sets to open sets.  This is similar to, but slightly different, from the more familiar property of being continuous, which is equivalent to the inverse image of open sets being open.  For instance, the map $f: {\Bbb R} \to {\Bbb R}$ defined by $f(x) := x^2$ is continuous but not open; conversely, the function $g: {\Bbb R}^2 \to {\Bbb R}$ defined by $g(x,y) := \hbox{sgn}(y)+x$ is discontinuous but open.

We have seen that it is quite possible for non-linear continuous maps to fail to be open.  But for linear maps between Banach spaces, the situation is much better:

Theorem 3. (Open mapping theorem)  Let $L: X \to Y$ be a continuous linear transformation between two Banach spaces X and Y.  Then the following are equivalent:

1. L is surjective.
2. L is open.
3. (Qualitative solvability) For every $f \in Y$ there exists a solution $u \in X$ to the equation $Lu = f$.
4. (Quantitative solvability) There exists a constant $C > 0$ such that for every $f \in Y$ there exists a solution $u \in X$ to the equation $Lu = f$, which obeys the bound $\|u\|_X \leq C \|f\|_Y$.
5. (Quantitative solvability for a dense subclass) There exists a constant $C > 0$ such that for a dense set of f in Y, there exists a solution $u \in X$ to the equation $Lu = f$, which obeys the bound $\|u\|_X \leq C \|f\|_Y$.

Proof. Clearly 4. implies 3., which is equivalent to 1., and it is easy to see from linearity that 2. and 4. are equivalent (cf. the proof of Lemma 1 from Notes 3).  4. trivially implies 5., while to obtain 4. from 5., observe that if E is any dense subset of the Banach space Y, then any f in Y can be expressed as an absolutely convergent series $f = \sum_n f_n$ of elements in E (since one can iteratively approximate the residual $f - \sum_{n=1}^{N-1} f_n$ to arbitrary accuracy by an element of E for $N=1,2,3,\ldots$), and the claim easily follows.  So it suffices to show that 3. implies 4.

For each n, let $E_n \subset Y$ be the set of all $f \in Y$ for which there exists a solution to Lu=f with $\|u\|_X \leq n \|f\|_Y$.  From the hypothesis 3, we see that $\bigcup_n E_n = Y$.  Since Y is complete, the Baire category theorem implies that there is some $E_n$ which is dense in some ball $B(f_0,r)$ in Y.  In other words, the problem Lu=f is approximately quantitatively solvable in the ball $B(f_0,r)$ in the sense that

• For every $\varepsilon > 0$ and every $f \in B(f_0,r)$, there exists an approximate solution u with $\| Lu - f \|_Y \leq \varepsilon$ and $\|u\|_X \leq n \|Lu \|_Y$, and thus $\|u\|_X \leq n r + n \varepsilon$.

By subtracting two such approximate solutions, we conclude that

• For any $f \in B(0,2r)$ and any $\varepsilon > 0$, there exists $u \in X$ with $\|Lu - f \|_Y \leq 2\varepsilon$ and $\|u\|_X \leq 2nr + 2 n \varepsilon$.

Since L is homogeneous, we can rescale and conclude that

• For any $f \in Y$ and any $\varepsilon > 0$ there exists $u \in X$ with $\|Lu - f \|_Y \leq 2 \varepsilon$ and $\|u\|_X \leq 2n \|f\|_Y + 2n \varepsilon$.

In particular, setting $\varepsilon = \frac{1}{4} \|f\|_Y$ (treating the case f=0 separately), we conclude that

• For any $f \in Y$, we may write $f = Lu + f'$, where $\| f'\|_Y \leq \frac{1}{2} \|f\|_Y$ and $\|u\|_X \leq \frac{5}{2} n \|f\|_Y$.

We can iterate this procedure and then take limits (now using the completeness of X rather than Y) to obtain a solution to Lu=f for every $f \in Y$ with $\|u\|_X \leq 5 n \|f\|_Y$, and the claim follows. $\Box$

Remark 5. The open mapping theorem provides metamathematical justification for the method of a priori estimates for solving linear equations such as $Lu = f$ for a given datum $f \in Y$ and for an unknown $u \in X$, which is of course a familiar problem in linear PDE.  The a priori method assumes that f is in some dense class of nice functions (e.g. smooth functions) in which solvability of Lu=f is presumably easy, and then proceeds to obtain the a priori estimate $\|u\|_X \leq C \|f\|_Y$ for some constant C.  Theorem 3 then assures that Lu=f is solvable for all f in Y (with a similar bound).  As before, this implication does not directly use the Baire category theorem, but that theorem helps explain why this method is “not wasteful”.  $\diamond$

A pleasant corollary of the open mapping theorem is that, as with ordinary linear algebra or with arbitrary functions, invertibility is the same thing as bijectivity:

Corollary 2. Let $T: X \to Y$ be a continuous linear operator between two Banach spaces X, Y.  Then the following are equivalent:

1. (Qualitative invertibility) T is bijective.
2. (Quantitative invertibility) T is bijective, and $T^{-1}: Y \to X$ is a continuous (hence bounded) linear transformation.

Remark 6. The claim fails without the completeness hypotheses on X and Y.  For instance, consider the operator $T: c_c({\Bbb N}) \to c_c({\Bbb N})$ defined by $T (a_n)_{n=1}^\infty := (\frac{a_n}{n})_{n=1}^\infty$, where we give $c_c({\Bbb N})$ the uniform norm.  Then T is continuous and bijective, but $T^{-1}$ is unbounded. $\diamond$

Exercise 5. Show that Corollary 2 can still fail if we drop the completeness hypothesis on just X, or just Y. $\diamond$

Exercise 6. Suppose that $L: X \to Y$ is a surjective continuous linear transformation between Banach spaces.  By combining the open mapping theorem with the Hahn-Banach theorem, show that the transpose map $L^*: Y^* \to X^*$ is bounded from below, i.e. there exists $c > 0$ such that $\| L^* \lambda \|_{X^*} \geq c \|\lambda \|_{Y^*}$ for all $\lambda \in Y^*$.  Conclude that $L^*$ is an isomorphism between $Y^*$ and $L^*(Y^*)$$\diamond$

Let L be as in Theorem 3, so that the problem Lu=f is both qualitatively and quantitatively solvable.  A standard application of Zorn’s lemma (similar to that used to prove the Hahn-Banach theorem) shows that the problem Lu=f is also qualitatively linearly solvable, in the sense that there exists a linear transformation $S: Y \to X$ such that $LSf = f$ for all $f \in Y$ (i.e. S is a right-inverse of L).  In view of the open mapping theorem, it is then tempting to conjecture that L must also be quantitatively linearly solvable, in the sense that there exists a continuous linear transformation $S: Y \to X$ such that $LSf = f$ for all $f \in Y$.  By Corollary 2, we see that this conjecture is true when the problem Lu=f is determined, i.e. there is exactly one solution u for each datum f.  Unfortunately, the conjecture can fail when Lu=f is underdetermined (more than one solution u for each f); we discuss this in the appendix to these notes.  On the other hand, the situation is much better for Hilbert spaces:

Exercise 7. Suppose that $L: H \to H'$ is a surjective continuous linear transformation between Hilbert spaces.  Show that there exists a continuous linear transformation $S: H' \to H$ such that $LS = I$.  Furthermore, we can ensure that the range of S is orthogonal to the kernel of L, and that this condition determines S uniquely. $\diamond$

Remark 7. In fact, Hilbert spaces are essentially the only type of Banach space for which we have this nice property, due to the Lindenstrauss-Tzafriri solution of the complemented subspaces problem. $\diamond$

Exercise 8. Let M and N be closed subspaces of a Banach space X.  Show that the following statements are equivalent:

1. (Qualitative complementation) Every x in X can be expressed in the form m+n for $m \in M, n \in N$ in exactly one way.
2. (Quantitative complementation)  Every x in X can be expressed in the form m+n for $m \in M, n \in N$ in exactly one way.  Furthermore there exists C > 0 such that $\|m\|_X, \|n\|_X \leq C \|x\|_X$ all x.

When either of these two properties hold, we say that M (or N) is a complemented subspace, and that N is a complement of M (or vice versa).  $\diamond$

The property of being complemented is closely related to that of quantitative linear solvability:

Exercise 9. Let $L: X \to Y$ be a surjective bounded linear map between Banach spaces.  Show that there exists a bounded linear map $S: Y \to X$ such that $LSf = f$ for all $f \in Y$ if and only if the kernel $\{ u \in X: Lu=0\}$ is a complemented subspace of X.  $\diamond$

Exercise 10. Show that any finite-dimensional or closed finite co-dimensional subspace of a Banach space is complemented.  $\diamond$

Remark 8. The problem of determining whether a given closed subspace of a Banach space is complemented or not is, in general, quite difficult.  However, non-complemented subspaces do exist in abundance; some example are given in the apendix, and the Lindenstrauss-Tzafriri theorem referred to in in Remark 7 asserts that any Banach space not isomorphic to a Hilbert space contains at least one non-complemented subspace.  There is also a remarkable construction of Gowers and Maurey of a Banach space such that every subspace, other than those ruled out by Exercise 10, are uncomplemented.  $\diamond$

– The closed graph theorem –

Recall that a map $T: X \to Y$ between two metric spaces is continuous if and only if, whenever $x_n$ converges to x in X, $Tx_n$ converges to Tx in Y.  We can also define the weaker property of being closed: an map $T: X \to Y$ is closed if and only if whenever $x_n$ converges to x in X, and $Tx_n$ converges to a limit y in Y, then y is equal to Tx; equivalently, T is closed if its graph $\{ (x,Tx): x \in X \}$ is a closed subset of $X \times Y$.  This is weaker than continuity because it has the additional requirement that the sequence $Tx_n$ is already convergent. (Despite the name, closed operators are not directly related to open operators.)

Example 2. Let $T: c_0({\Bbb N}) \to c_0({\Bbb N})$ be the transformation $T( a_m )_{m=1}^\infty := (ma_m)_{m=1}^\infty$.  This transformation is unbounded and hence discontinuous, but one easily verifies that it is closed.  $\diamond$

As Example 2 shows, being closed is often a weaker property than being continuous.  However, the remarkable closed graph theorem shows that as long as the domain and range of the operator are both Banach spaces, the two statements are equivalent:

Theorem 4. (Closed graph theorem)  Let $T: X \to Y$ be a linear transformation between two Banach spaces.  Then the following are equivalent:

1. T is continuous.
2. T is closed.
3. (Weak continuity) There exists some topology ${\mathcal F}$ on Y, weaker than the norm topology (i.e. containing fewer open sets) but still Hausdorff, for which $T: X \to (Y, {\mathcal F})$ is continuous.

Proof. It is clear that 1 implies 3 (just take ${\mathcal F}$ to equal the norm topology).  To see why 3 implies 2, observe that if $x_n \to x$ in X and $Tx_n \to y$ in norm, then $Tx_n \to y$ in the weaker topology ${\mathcal F}$ as well; but by weak continuity $Tx_n \to Tx$ in ${\mathcal F}$.  Since Hausdorff topological spaces have unique limits, we have Tx=y and so T is closed.

Now we show that 2 implies 1.  If T is closed, then the graph $\Gamma := \{ (x,Tx): x \in X \}$ is a closed linear subspace of the Banach space $X \times Y$ and is thus also a Banach space.  On the other hand, the projection map $\pi: (x,Tx) \mapsto x$ from $\Gamma$ to X is clearly a continuous linear bijection.  By Corollary 2, its inverse $x \mapsto (x,Tx)$ is also continuous, and so T is continuous as desired. $\Box$

We can reformulate the closed graph theorem in the following fashion:

Corollary 3. Let X, Y be Banach spaces, and suppose we have some continuous inclusion $Y \subset Z$ of Y into a Hausdorff topological vector space Z.  Let $T: X \to Z$ be a continuous linear transformation.  Then the following are equivalent.

1. (Qualitative regularity) For all $x \in X$, $Tx \in Y$.
2. (Quantitative regularity) For all $x \in X$, $Tx \in Y$, and furthermore $\|Tx\|_Y \leq C \|x\|_X$ for some $C > 0$ independent of x.
3. (Quantitative regularity on a dense subclass) For all x in a dense subset of X, $Tx \in Y$, and furthermore $\|Tx\|_Y \leq C \|x\|_X$ for some $C > 0$ independent of x.

Proof. Clearly 2. implies 3. or 1.  If we have 3., then T extends uniquely to a bounded linear map from X to Y, which must agree with the original continuous map from X to Z since limits in the Hausdorff space Z are unique, and so 3. implies 2.  Finally, if 1. holds, then we can view T as a map from X to Y, which by Theorem 4 is continuous, and the claim now follows from Lemma 1 from Notes 3. $\Box$

In practice, one should think of Z as some sort of “low regularity” space with a weak topology, and Y as a “high regularity” subspace with a stronger topology.  Corollary 3 motivates the method of a priori estimates to establish the Y-regularity of some linear transform Tx of an arbitrary element x in a Banach space X, by first establishing the a priori estimate $\|Tx\|_Y \leq C \|x\|_X$ for a dense subclass of “nice” elements of X, and then using the above corollary (and some weak continuity of T in a low regularity space) to conclude.  The closed graph theorem provides the metamathematical explanation as to why this approach is at least as powerful as any other approach to proving regularity.

Example 3. Let $1 \leq p \leq 2$, and let p’ be the dual exponent of p.  To prove that the Fourier transform $\hat f$ of a function $f \in L^p({\Bbb R})$ necessarily lies in $L^{p'}({\Bbb R})$, it suffices to prove the Hausdorff-Young inequality

$\| \hat f \|_{L^{p'}({\Bbb R})} \leq C_p \|f\|_{L^p({\Bbb R})}$ (5)

for some constant $C_p$ and all f in some suitable dense subclass of $L^p({\Bbb R})$ (e.g. the space $C^\infty_0({\Bbb R})$ of smooth functions of compact support), together with the “soft” observation that the Fourier transform is continuous from $L^p({\Bbb R})$ to the space of tempered distributions, which is a Hausdorff space into which $L^{p'}({\Bbb R})$ embeds continuously.  One can replace the Hausdorff-Young inequality here by countless other estimates in harmonic analysis to obtain similar qualitative regularity conclusions. $\diamond$

– Appendix: Nonlinear solvability (optional) –

In this appendix we give an example of a linear equations Lu=f which can only be quantitatively solved in a nonlinear fashion.  We will use a number of basic tools which we will only cover later in this course, and so this material is optional reading.

Let $X = \{0,1\}^{\Bbb N}$ be the infinite discrete cube with the product topology; by Tychonoff’s theorem, this is a compact Hausdorff space.  The Borel $\sigma$-algebra is generated by the cylinder sets

$E_n := \{ (x_m)_{m=1}^\infty \in \{0,1\}^{\Bbb N}: x_n = 1 \}$. (6)

(From a probabilistic view point, one can think of X as the event space for flipping a countably infinite number of coins, and $E_n$ as the event that the $n^{th}$ coin lands as heads.)

Let $M(X)$ be the space of finite Borel measures on X; this can be verified to be a Banach space.  There is a map $L: M(X) \to \ell^\infty({\Bbb N})$ defined by

$L( \mu ) := ( \mu(E_n) )_{n=1}^\infty$. (7)

This is a continuous linear transformation.  The equation $Lu=f$ is quantitatively solvable for every $f \in \ell^\infty({\Bbb N})$.  Indeed, if f is an indicator function $f = 1_A$, then $f = L \delta_{x_A}$, where $x_A \in \{0,1\}^{\Bbb N}$ is the sequence that equals 1 on A and 0 outside of A, and $\delta_{x_A}$ is the Dirac mass at A.  The general case then follows by expressing a bounded sequence as an integral of indicator functions (e.g. if f takes values in [0,1], we can write $f = \int_0^1 1_{\{f > t\}}\ dt$).  Note however that this is a nonlinear operation, since the indicator $1_{\{f>t\}}$ depends nonlinearly on f.

We now claim that the equation $Lu=f$ is not quantitatively linearly solvable, i.e. there is no bounded linear map $S: \ell^\infty({\Bbb N}) \to M(X)$ such that LSf = f for all $f \in \ell^\infty({\Bbb N})$.  This fact was first observed by Banach and Mazur; we shall give two proofs, one of a “soft analysis” flavour and one of a “hard analysis” flavour.

We begin with the “soft analysis” proof, starting with a measure-theoretic result which is of independent interest.

Theorem 5. (Nikodym convergence theorem) Let $(X, {\mathcal B})$ be a measurable space, and let $\sigma_n: {\mathcal B} \to {\Bbb R}$ be a sequence of signed finite measures which is weakly convergent in the sense that $\sigma_n(E)$ converges to some limit $\sigma(E)$ for each $E \in {\mathcal B}$.

1. The $\sigma_n$ are uniformly countably additive, which means that for any sequence $E_1, E_2, \ldots$ of disjoint measurable sets, the series $\sum_{m=1}^\infty |\sigma_n(E_m)|$ converges uniformly in n.
2. $\sigma$ is a signed finite measure.

Proof. It suffices to prove the first part, since this easily implies that $\sigma$ is also countably additive, and is thence a signed finite measure.  Suppose for contradiction that the claim failed, then one could find disjoint $E_1, E_2, \ldots$ and $\varepsilon > 0$ such that one has $\limsup_{n \to \infty} \sum_{m=M}^\infty |\sigma_n(E_m)| > \varepsilon$ for all M.  We now construct disjoint sets $A_1, A_2, \ldots$, each consisting of the union of a finite collection of the $E_j$, and an increasing sequence $n_1, n_2, \ldots$ of positive integers, by the following recursive procedure:

1. Initialise $k=0$.
2. Suppose recursively that $n_1 < \ldots < n_{2k}$ and $A_1,\ldots,A_k$ has already been constructed for some $k \geq 0$.
3. Choose $n_{2k+1} > n_{2k}$ so large that for all $n \geq n_{2k+1}$, $\sigma_n(A_1 \cup \ldots \cup A_k)$ differs from $\sigma(A_1 \cup \ldots \cup A_k)$ by at most $\varepsilon/10$.
4. Choose $M_k$ so large that $M_k$ is larger than j for any $E_j \subset A_1 \cup \ldots \cup A_k$, and such that $\sum_{m=M_k}^\infty |\sigma_{n_j}(E_m)| \leq \varepsilon / 100^{k+1}$ for all $1 \leq j \leq 2k+1$.
5. Choose $n_{2k+2} > n_{2k+1}$ so that $\sum_{m=M_k}^\infty |\sigma_{n_{2k+2}}(E_m)| > \varepsilon$.
6. Pick $A_{k+1}$ to be a finite union of the $E_j$ with $j \geq M_k$ such that $|\sigma_{n_{2k+2}}(A_{k+1})| > \varepsilon/2$.

It is then a routine matter to show that if $A := \bigcup_{j=1}^\infty A_j$, then $|\sigma_{n_{2k+2}}(A) - \sigma_{n_{2k+1}}(A)| \geq \varepsilon/10$ for all j, contradicting the hypothesis that $\sigma_j$ is weakly convergent to $\sigma$. $\Box$

Exercise 11. (Schur’s property for $\ell^1$)  Show that if a sequence in $\ell^1({\Bbb N})$ is convergent in the weak topology, then it is convergent in the strong topology.  $\diamond$

We return now to the map $S: \ell^\infty({\Bbb N}) \to M(X)$.  Consider the sequence $a_n \in c_0({\Bbb N}) \subset \ell^\infty$ defined by $a_n := (1_{m \leq n})_{m=1}^\infty$, i.e. each $a_n$ is the sequence consisting of n 1′s followed by an infinite number of 0′s.   As the dual of $c_0({\Bbb N})$ is isomorphic to $\ell^1({\Bbb N})$, we see from the dominated convergence theorem that $a_n$ is a weakly Cauchy sequence in $c_0({\Bbb N})$, in the sense that $\lambda(a_n)$ is Cauchy for any $\lambda \in c_0({\Bbb N})^*$.  Applying S, we conclude that $S(a_n)$ is weakly Cauchy in $M(X)$.  In particular, using the bounded linear functionals $\mu \mapsto \mu(E)$ on M(X), we see that $S(a_n)(E)$ converges to some limit $\mu(E)$ for all measurable sets E.  Applying the Nikodym convergence theorem we see that $\mu$ is also a signed finite measure.  We then see that $S(a_n)$ converges in the weak topology to $\mu$.  (One way to see this is to define $\nu := \sum_{n=1}^\infty 2^{-n} |S(a_n)| + |\mu|$, then $\nu$ is finite and $S(a_n), \mu$ are all absolutely continuous with respect to $\nu$; now use the Radon-Nikodym theorem (see Notes 1) and the fact that $L^1(\nu)^* \equiv L^\infty(\nu)$.)  On the other hand, as $LS=I$ and L and S are both bounded, S is a Banach space isomorphism between $c_0$ and $S(c_0)$.  Thus $S(c_0)$ is complete, hence closed, hence weakly closed (by Hahn-Banach), and so $\mu = S(a)$ for some $a \in c_0$.  By Hahn-Banach again, this implies that $a_n$ converges weakly to $a \in c_0$.  But this is easily seen to be impossible, since the constant sequence $(1)_{m=1}^\infty$ does not lie in $c_0$, and the claim follows.

Now we give the “hard analysis” proof.  Let $e_1, e_2, \ldots$ be the standard basis for $\ell^\infty({\Bbb N})$, let N be a large number, and consider the random sums

$S( \varepsilon_1 e_1 + \ldots + \varepsilon_N e_N )$ (8)

where $\varepsilon_n \in \{-1,1\}$ are iid random signs.  Since the $\ell^\infty$ norm of $\varepsilon_1 e_1 + \ldots + \varepsilon_N e_N$ is 1, we have

$\| S( \varepsilon_1 e_1 + \ldots + \varepsilon_N e_N ) \|_{M(X)} \leq C$ (9)

for some constant C independent of N.  On the other hand, we can write $S(e_n) = f_n \nu$ for some finite measure $\nu$ and some $f_n \in L^1(\nu)$ using  Radon-Nikodym as in the previous proof, and then

$\| \varepsilon_1 f_1 + \ldots + \varepsilon_N f_N \|_{L^1(\nu)} \leq C.$ (10)

Taking expectations and applying Khintchine’s inequality we conclude

$\| (\sum_{n=1}^N |f_n|^2)^{1/2} \|_{L^1(\nu)} \leq C'$ (11)

for some constant C’ independent of N.  By Cauchy-Schwarz this implies that

$\| \sum_{n=1}^N |f_n| \|_{L^1(\nu)} \leq C' \sqrt{N}$ (12)

But as $\|f_n\|_{L^1(\nu)} = \|S(e_n)\|_{M(X)} \geq c$ for some constant c > 0 independent of N, we obtain a contradiction for N large enough, and the claim follows.

Remark 9. The phenomenon of nonlinear quantitative solvability actually comes up in many applications of interest.  For instance, consider the Fefferman-Stein decomposition theorem, which asserts that any $f \in BMO({\Bbb R})$ of bounded mean oscillation can be decomposed as $f = g + Hh$ for some $g, h \in L^\infty({\Bbb R})$, where H is the Hilbert transform.  This theorem was first proven by using the duality of the Hardy space $H^1({\Bbb R})$ and BMO (and by using Exercise 13 from Notes 6), and by using the fact that a function f is in $H^1({\Bbb R})$ if and only if f and Hf both lie in $L^1({\Bbb R})$.  From the open mapping theorem we know that we can pick g, h so that the $L^\infty$ norms of g, h are bounded by a multiple of the BMO norm of f.  But it turns out not to be possible to pick g and h in a bounded linear manner in terms of f, although this is a little tricky to prove.  (Uchiyama famously gave an explicit construction of g, h in terms of f, but the construction was highly nonlinear; see my blog post on the topic.)

An example in a similar spirit was given more recently by Bourgain and Brezis, who considered the problem of solving the equation $\hbox{div} u = f$ on the d-dimensional torus ${\Bbb T}^d$ for some function $f: {\Bbb T}^d \to {\Bbb C}$ on the torus with mean zero, and with some unknown vector field $u: {\Bbb T}^d \to {\Bbb C}^d$, where the derivatives are interpreted in the weak sense.  They showed that if $d \geq 2$ and $f \in L^d({\Bbb T}^d)$, then there existed a solution u to this problem with $u \in W^{1,d} \cap C^0$, despite the failure of Sobolev embedding at this endpoint.  Again, the open mapping theorem allows one to choose u with norm bounded by a multiple of the norm of f, but Bourgain and Brezis also show that one cannot select u in a bounded linear fashion depending on f.  $\diamond$

Question. All of the above constructions of non-complemented closed subspaces, or of linear problems that can only be quantitatively solved nonlinearly, were quite involved.  Is there a “soft” or “elementary” way to see that closed subspaces of Banach spaces exist which are not complemented, or (equivalently) that surjective continuous linear maps between Banach spaces do not always enjoy a continuous linear right-inverse?  I do not have a good answer to this question. $\diamond$

[Update, Feb 4: definition of "residual" corrected.]