You are currently browsing the category archive for the ‘math.FA’ category.

In the previous two quarters, we have been focusing largely on the “soft” side of real analysis, which is primarily concerned with “qualitative” properties such as convergence, compactness, measurability, and so forth. In contrast, we will begin this quarter with more of an emphasis on the “hard” side of real analysis, in which we study estimates and upper and lower bounds of various quantities, such as norms of functions or operators. (Of course, the two sides of analysis are closely connected to each other; an understanding of both sides and their interrelationships, are needed in order to get the broadest and most complete perspective for this subject.)

One basic tool in hard analysis is that of interpolation, which allows one to start with a hypothesis of two (or more) “upper bound” estimates, e.g. ${A_0 \leq B_0}$ and ${A_1 \leq B_1}$, and conclude a family of intermediate estimates ${A_\theta \leq B_\theta}$ (or maybe ${A_\theta \leq C_\theta B_\theta}$, where ${C_\theta}$ is a constant) for any choice of parameter ${0 < \theta < 1}$. Of course, interpolation is not a magic wand; one needs various hypotheses (e.g. linearity, sublinearity, convexity, or complexifiability) on ${A_i, B_i}$ in order for interpolation methods to be applicable. Nevertheless, these techniques are available for many important classes of problems, most notably that of establishing boundedness estimates such as ${\| T f \|_{L^q(Y, \nu)} \leq C \| f \|_{L^p(X, \mu)}}$ for linear (or “linear-like”) operators ${T}$ from one Lebesgue space ${L^p(X,\mu)}$ to another ${L^q(Y,\nu)}$. (Interpolation can also be performed for many other normed vector spaces than the Lebesgue spaces, but we will just focus on Lebesgue spaces in these notes to focus the discussion.) Using interpolation, it is possible to reduce the task of proving such estimates to that of proving various “endpoint” versions of these estimates. In some cases, each endpoint only faces a portion of the difficulty that the interpolated estimate did, and so by using interpolation one has split the task of proving the original estimate into two or more simpler subtasks. In other cases, one of the endpoint estimates is very easy, and the other one is significantly more difficult than the original estimate; thus interpolation does not really simplify the task of proving estimates in this case, but at least clarifies the relative difficulty between various estimates in a given family.

As is the case with many other tools in analysis, interpolation is not captured by a single “interpolation theorem”; instead, there are a family of such theorems, which can be broadly divided into two major categories, reflecting the two basic methods that underlie the principle of interpolation. The real interpolation method is based on a divide and conquer strategy: to understand how to obtain control on some expression such as ${\| T f \|_{L^q(Y, \nu)}}$ for some operator ${T}$ and some function ${f}$, one would divide ${f}$ into two or more components, e.g. into components where ${f}$ is large and where ${f}$ is small, or where ${f}$ is oscillating with high frequency or only varying with low frequency. Each component would be estimated using a carefully chosen combination of the extreme estimates available; optimising over these choices and summing up (using whatever linearity-type properties on ${T}$ are available), one would hope to get a good estimate on the original expression. The strengths of the real interpolation method are that the linearity hypotheses on ${T}$ can be relaxed to weaker hypotheses, such as sublinearity or quasilinearity; also, the endpoint estimates are allowed to be of a weaker “type” than the interpolated estimates. On the other hand, the real interpolation often concedes a multiplicative constant in the final estimates obtained, and one is usually obligated to keep the operator ${T}$ fixed throughout the interpolation process. The proofs of real interpolation theorems are also a little bit messy, though in many cases one can simply invoke a standard instance of such theorems (e.g. the Marcinkiewicz interpolation theorem) as a black box in applications.

The complex interpolation method instead proceeds by exploiting the powerful tools of complex analysis, in particular the maximum modulus principle and its relatives (such as the Phragmén-Lindelöf principle). The idea is to rewrite the estimate to be proven (e.g. ${\| T f \|_{L^q(Y, \nu)} \leq C \| f \|_{L^p(X, \mu)}}$) in such a way that it can be embedded into a family of such estimates which depend holomorphically on a complex parameter ${s}$ in some domain (e.g. the strip ${\{ \sigma+it: t \in {\mathbb R}, \sigma \in [0,1]\}}$. One then exploits things like the maximum modulus principle to bound an estimate corresponding to an interior point of this domain by the estimates on the boundary of this domain. The strengths of the complex interpolation method are that it typically gives cleaner constants than the real interpolation method, and also allows the underlying operator ${T}$ to vary holomorphically with respect to the parameter ${s}$, which can significantly increase the flexibility of the interpolation technique. The proofs of these methods are also very short (if one takes the maximum modulus principle and its relatives as a black box), which make the method particularly amenable for generalisation to more intricate settings (e.g. multilinear operators, mixed Lebesgue norms, etc.). On the other hand, the somewhat rigid requirement of holomorphicity makes it much more difficult to apply this method to non-linear operators, such as sublinear or quasilinear operators; also, the interpolated estimate tends to be of the same “type” as the extreme ones, so that one does not enjoy the upgrading of weak type estimates to strong type estimates that the real interpolation method typically produces. Also, the complex method runs into some minor technical problems when target space ${L^q(Y,\nu)}$ ceases to be a Banach space (i.e. when ${q<1}$) as this makes it more difficult to exploit duality.

Despite these differences, the real and complex methods tend to give broadly similar results in practice, especially if one is willing to ignore constant losses in the estimates or epsilon losses in the exponents.

The theory of both real and complex interpolation can be studied abstractly, in general normed or quasi-normed spaces; see e.g. this book for a detailed treatment. However in these notes we shall focus exclusively on interpolation for Lebesgue spaces ${L^p}$ (and their cousins, such as the weak Lebesgue spaces ${L^{p,\infty}}$ and the Lorentz spaces ${L^{p,r}}$).

A key theme in real analysis is that of studying general functions ${f: X \rightarrow {\bf R}}$ or ${f: X \rightarrow {\bf C}}$ by first approximating them by “simpler” or “nicer” functions. But the precise class of “simple” or “nice” functions may vary from context to context. In measure theory, for instance, it is common to approximate measurable functions by indicator functions or simple functions. But in other parts of analysis, it is often more convenient to approximate rough functions by continuous or smooth functions (perhaps with compact support, or some other decay condition), or by functions in some algebraic class, such as the class of polynomials or trigonometric polynomials.

In order to approximate rough functions by more continuous ones, one of course needs tools that can generate continuous functions with some specified behaviour. The two basic tools for this are Urysohn’s lemma, which approximates indicator functions by continuous functions, and the Tietze extension theorem, which extends continuous functions on a subdomain to continuous functions on a larger domain. An important consequence of these theorems is the Riesz representation theorem for linear functionals on the space of compactly supported continuous functions, which describes such functionals in terms of Radon measures.

Sometimes, approximation by continuous functions is not enough; one must approximate continuous functions in turn by an even smoother class of functions. A useful tool in this regard is the Stone-Weierstrass theorem, that generalises the classical Weierstrass approximation theorem to more general algebras of functions.

As an application of this theory (and of many of the results accumulated in previous lecture notes), we will present (in an optional section) the commutative Gelfand-Neimark theorem classifying all commutative unital ${C^*}$-algebras.

A normed vector space ${(X, \| \|_X)}$ automatically generates a topology, known as the norm topology or strong topology on ${X}$, generated by the open balls ${B(x,r) := \{ y \in X: \|y-x\|_X < r \}}$. A sequence ${x_n}$ in such a space converges strongly (or converges in norm) to a limit ${x}$ if and only if ${\|x_n-x\|_X \rightarrow 0}$ as ${n \rightarrow \infty}$. This is the topology we have implicitly been using in our previous discussion of normed vector spaces.

However, in some cases it is useful to work in topologies on vector spaces that are weaker than a norm topology. One reason for this is that many important modes of convergence, such as pointwise convergence, convergence in measure, smooth convergence, or convergence on compact subsets, are not captured by a norm topology, and so it is useful to have a more general theory of topological vector spaces that contains these modes. Another reason (of particular importance in PDE) is that the norm topology on infinite-dimensional spaces is so strong that very few sets are compact or pre-compact in these topologies, making it difficult to apply compactness methods in these topologies. Instead, one often first works in a weaker topology, in which compactness is easier to establish, and then somehow upgrades any weakly convergent sequences obtained via compactness to stronger modes of convergence (or alternatively, one abandons strong convergence and exploits the weak convergence directly). Two basic weak topologies for this purpose are the weak topology on a normed vector space ${X}$, and the weak* topology on a dual vector space ${X^*}$. Compactness in the latter topology is usually obtained from the Banach-Alaoglu theorem (and its sequential counterpart), which will be a quick consequence of the Tychonoff theorem (and its sequential counterpart) from the previous lecture.

The strong and weak topologies on normed vector spaces also have analogues for the space ${B(X \rightarrow Y)}$ of bounded linear operators from ${X}$ to ${Y}$, thus supplementing the operator norm topology on that space with two weaker topologies, which (somewhat confusingly) are named the strong operator topology and the weak operator topology.

One of the most useful concepts for analysis that arise from topology and metric spaces is the concept of compactness; recall that a space ${X}$ is compact if every open cover of ${X}$ has a finite subcover, or equivalently if any collection of closed sets with the finite intersection property (i.e. every finite subcollection of these sets has non-empty intersection) has non-empty intersection. In these notes, we explore how compactness interacts with other key topological concepts: the Hausdorff property, bases and sub-bases, product spaces, and equicontinuity, in particular establishing the useful Tychonoff and Arzelá-Ascoli theorems that give criteria for compactness (or precompactness).

Exercise 1 (Basic properties of compact sets)

• Show that any finite set is compact.
• Show that any finite union of compact subsets of a topological space is still compact.
• Show that any image of a compact space under a continuous map is still compact.

Show that these three statements continue to hold if “compact” is replaced by “sequentially compact”.

The notion of what it means for a subset E of a space X to be “small” varies from context to context.  For instance, in measure theory, when $X = (X, {\mathcal X}, \mu)$ is a measure space, one useful notion of a “small” set is that of a null set: a set E of measure zero (or at least contained in a set of measure zero).  By countable additivity, countable unions of null sets are null.  Taking contrapositives, we obtain

Lemma 1. (Pigeonhole principle for measure spaces) Let $E_1, E_2, \ldots$ be an at most countable sequence of measurable subsets of a measure space X.  If $\bigcup_n E_n$ has positive measure, then at least one of the $E_n$ has positive measure.

Now suppose that X was a Euclidean space ${\Bbb R}^d$ with Lebesgue measure m.  The Lebesgue differentiation theorem easily implies that having positive measure is equivalent to being “dense” in certain balls:

Proposition 1. Let $E$ be a measurable subset of ${\Bbb R}^d$.  Then the following are equivalent:

1. E has positive measure.
2. For any $\varepsilon > 0$, there exists a ball B such that $m( E \cap B ) \geq (1-\varepsilon) m(B)$.

Thus one can think of a null set as a set which is “nowhere dense” in some measure-theoretic sense.

It turns out that there are analogues of these results when the measure space $X = (X, {\mathcal X}, \mu)$  is replaced instead by a complete metric space $X = (X,d)$.  Here, the appropriate notion of a “small” set is not a null set, but rather that of a nowhere dense set: a set E which is not dense in any ball, or equivalently a set whose closure has empty interior.  (A good example of a nowhere dense set would be a proper subspace, or smooth submanifold, of ${\Bbb R}^d$, or a Cantor set; on the other hand, the rationals are a dense subset of ${\Bbb R}$ and thus clearly not nowhere dense.)   We then have the following important result:

Theorem 1. (Baire category theorem). Let $E_1, E_2, \ldots$ be an at most countable sequence of subsets of a complete metric space X.  If $\bigcup_n E_n$ contains a ball B, then at least one of the $E_n$ is dense in a sub-ball B’ of B (and in particular is not nowhere dense).  To put it in the contrapositive: the countable union of nowhere dense sets cannot contain a ball.

Exercise 1. Show that the Baire category theorem is equivalent to the claim that in a complete metric space, the countable intersection of open dense sets remain dense.  $\diamond$

Exercise 2. Using the Baire category theorem, show that any non-empty complete metric space without isolated points is uncountable.  (In particular, this shows that Baire category theorem can fail for incomplete metric spaces such as the rationals ${\Bbb Q}$.)  $\diamond$

To quickly illustrate an application of the Baire category theorem, observe that it implies that one cannot cover a finite-dimensional real or complex vector space ${\Bbb R}^n, {\Bbb C}^n$ by a countable number of proper subspaces.  One can of course also establish this fact by using Lebesgue measure on this space.  However, the advantage of the Baire category approach is that it also works well in infinite dimensional complete normed vector spaces, i.e. Banach spaces, whereas the measure-theoretic approach runs into significant difficulties in infinite dimensions.  This leads to three fundamental equivalences between the qualitative theory of continuous linear operators on Banach spaces (e.g. finiteness, surjectivity, etc.) to the quantitative theory (i.e. estimates):

1. The uniform boundedness principle, that equates the qualitative boundedness (or convergence) of a family of continuous operators with their quantitative boundedness.
2. The open mapping theorem, that equates the qualitative solvability of a linear problem Lu = f with the quantitative solvability.
3. The closed graph theorem, that equates the qualitative regularity of a (weakly continuous) operator T with the quantitative regularity of that operator.

Strictly speaking, these theorems are not used much directly in practice, because one usually works in the reverse direction (i.e. first proving quantitative bounds, and then deriving qualitative corollaries); but the above three theorems help explain why we usually approach qualitative problems in functional analysis via their quantitative counterparts.

When studying a mathematical space X (e.g. a vector space, a topological space, a manifold, a group, an algebraic variety etc.), there are two fundamentally basic ways to try to understand the space:

1. By looking at subobjects in X, or more generally maps $f: Y \to X$ from some other space Y into X.  For instance, a point in a space X can be viewed as a map from $pt$ to X; a curve in a space X could be thought of as a map from ${}[0,1]$ to X; a group G can be studied via its subgroups K, and so forth.
2. By looking at objects on X, or more precisely maps $f: X \to Y$ from X into some other space Y.  For instance, one can study a topological space X via the real- or complex-valued continuous functions $f \in C(X)$ on X; one can study a group G via its quotient groups $\pi: G \to G/H$; one can study an algebraic variety V by studying the polynomials on V (and in particular, the ideal of polynomials that vanish identically on V); and so forth.

(There are also more sophisticated ways to study an object via its maps, e.g. by studying extensions, joinings, splittings, universal lifts, etc.  The general study of objects via the maps between them is formalised abstractly in modern mathematics as category theory, and is also closely related to homological algebra.)

A remarkable phenomenon in many areas of mathematics is that of (contravariant) duality: that the maps into and out of one type of mathematical object X can be naturally associated to the maps out of and into a dual object $X^*$ (note the reversal of arrows here!).  In some cases, the dual object $X^*$ looks quite different from the original object X.  (For instance, in Stone duality, discussed in Notes 4, X would be a Boolean algebra (or some other partially ordered set) and $X^*$ would be a compact totally disconnected Hausdorff space (or some other topological space).)   In other cases, most notably with Hilbert spaces as discussed in Notes 5, the dual object $X^*$ is essentially identical to X itself.

In these notes we discuss a third important case of duality, namely duality of normed vector spaces, which is of an intermediate nature to the previous two examples: the dual $X^*$ of a normed vector space turns out to be another normed vector space, but generally one which is not equivalent to X itself (except in the important special case when X is a Hilbert space, as mentioned above).  On the other hand, the double dual $(X^*)^*$ turns out to be closely related to X, and in several (but not all) important cases, is essentially identical to X.  One of the most important uses of dual spaces in functional analysis is that it allows one to define the transpose $T^*: Y^* \to X^*$ of a continuous linear operator $T: X \to Y$.

A fundamental tool in understanding duality of normed vector spaces will be the Hahn-Banach theorem, which is an indispensable tool for exploring the dual of a vector space.  (Indeed, without this theorem, it is not clear at all that the dual of a non-trivial normed vector space is non-trivial!)  Thus, we shall study this theorem in detail in these notes concurrently with our discussion of duality.

In the next few lectures, we will be studying four major classes of function spaces. In decreasing order of generality, these classes are the topological vector spaces, the normed vector spaces, the Banach spaces, and the Hilbert spaces. In order to motivate the discussion of the more general classes of spaces, we will first focus on the most special class – that of (real and complex) Hilbert spaces. These spaces can be viewed as generalisations of (real and complex) Euclidean spaces such as ${\Bbb R}^n$ and ${\Bbb C}^n$ to infinite-dimensional settings, and indeed much of one’s Euclidean geometry intuition concerning lengths, angles, orthogonality, subspaces, etc. will transfer readily to arbitrary Hilbert spaces; in contrast, this intuition is not always accurate in the more general vector spaces mentioned above. In addition to Euclidean spaces, another fundamental example of Hilbert spaces comes from the Lebesgue spaces $L^2(X,{\mathcal X},\mu)$ of a measure space $(X,{\mathcal X},\mu)$. (There are of course many other Hilbert spaces of importance in complex analysis, harmonic analysis, and PDE, such as Hardy spaces ${\mathcal H}^2$, Sobolev spaces $H^s = W^{s,2}$, and the space $HS$ of Hilbert-Schmidt operators, but we will not discuss those spaces much in this course.  Complex Hilbert spaces also play a fundamental role in the foundations of quantum mechanics, being the natural space to hold all the possible states of a quantum system (possibly after projectivising the Hilbert space), but we will not discuss this subject here.)

Hilbert spaces are the natural abstract framework in which to study two important (and closely related) concepts: orthogonality and unitarity, allowing us to generalise familiar concepts and facts from Euclidean geometry such as the Cartesian coordinate system, rotations and reflections, and the Pythagorean theorem to Hilbert spaces. (For instance, the Fourier transform is a unitary transformation and can thus be viewed as a kind of generalised rotation.) Furthermore, the Hodge duality on Euclidean spaces has a partial analogue for Hilbert spaces, namely the Riesz representation theorem for Hilbert spaces, which makes the theory of duality and adjoints for Hilbert spaces especially simple (when compared with the more subtle theory of duality for, say, Banach spaces). Much later (next quarter, in fact), we will see that this duality allows us to extend the spectral theorem for self-adjoint matrices to that of self-adjoint operators on a Hilbert space.

These notes are only the most basic introduction to the theory of Hilbert spaces.  In particular, the theory of linear transformations between two Hilbert spaces, which is perhaps the most important aspect of the subject, is not covered much at all here (but I hope to discuss it further in future lectures.)

Now that we have reviewed the foundations of measure theory, let us now put it to work to set up the basic theory of one of the fundamental families of function spaces in analysis, namely the $L^p$ spaces (also known as Lebesgue spaces). These spaces serve as important model examples for the general theory of topological and normed vector spaces, which we will discuss a little bit in this lecture and then in much greater detail in later lectures. (See also my previous blog post on function spaces.)

Just as scalar quantities live in the space of real or complex numbers, and vector quantities live in vector spaces, functions $f: X \to {\Bbb C}$ (or other objects closely related to functions, such as measures) live in function spaces. Like other spaces in mathematics (e.g. vector spaces, metric spaces, topological spaces, etc.) a function space $V$ is not just mere sets of objects (in this case, the objects are functions), but they also come with various important structures that allow one to do some useful operations inside these spaces, and from one space to another. For example, function spaces tend to have several (though usually not all) of the following types of structures, which are usually related to each other by various compatibility conditions:

1. Vector space structure. One can often add two functions $f, g$ in a function space $V$, and expect to get another function $f+g$ in that space $V$; similarly, one can multiply a function $f$ in $V$ by a scalar $c$ and get another function $cf$ in $V$. Usually, these operations obey the axioms of a vector space, though it is important to caution that the dimension of a function space is typically infinite. (In some cases, the space of scalars is a more complicated ring than the real or complex field, in which case we need the notion of a module rather than a vector space, but we will not use this more general notion in this course.) Virtually all of the function spaces we shall encounter in this course will be vector spaces. Because the field of scalars is real or complex, vector spaces also come with the notion of convexity, which turns out to be crucial in many aspects of analysis. As a consequence (and in marked contrast to algebra or number theory), much of the theory in real analysis does not seem to extend to other fields of scalars (in particular, real analysis fails spectacularly in the finite characteristic setting).
2. Algebra structure. Sometimes (though not always), we also wish to multiply two functions $f$, $g$ in $V$ and get another function $fg$ in $V$; when combined with the vector space structure and assuming some compatibility conditions (e.g. the distributive law), this makes $V$ an algebra. This multiplication operation is often just pointwise multiplication, but there are other important multiplication operations on function spaces too, such as convolution. (One sometimes sees other algebraic structures than multiplication appear in function spaces, most notably derivations, but again we will not encounter those in this course. Another common algebraic operation for function spaces is conjugation or adjoint, leading to the notion of a *-algebra.)
3. Norm structure. We often want to distinguish “large” functions in $V$ from “small” ones, especially in analysis, in which “small” terms in an expression are routinely discarded or deemed to be acceptable errors. One way to do this is to assign a magnitude or norm $\|f\|_V$ to each function that measures its size. Unlike the situation with scalars, where there is basically a single notion of magnitude, functions have a wide variety of useful notions of size, each measuring a different aspect (or combination of aspects) of the function, such as height, width, oscillation, regularity, decay, and so forth. Typically, each such norm gives rise to a separate function space (although sometimes it is useful to consider a single function space with multiple norms on it). We usually require the norm to be compatible with the vector space structure (and algebra structure, if present), for instance by demanding that the triangle inequality hold.
4. Metric structure. We also want to tell whether two functions f, g in a function space V are “near together” or “far apart”. A typical way to do this is to impose a metric $d: V \times V \to {\Bbb R}^+$ on the space $V$. If both a norm $\| \|_V$ and a vector space structure are available, there is an obvious way to do this: define the distance between two functions $f, g$ in $V$ to be $d( f, g ) := \|f-g\|_V$. (This will be the only type of metric on function spaces encountered in this course. But there are some nonlinear function spaces of importance in nonlinear analysis (e.g. spaces of maps from one manifold to another) which have no vector space structure or norm, but still have a metric.) It is often important to know if the vector space is complete with respect to the given metric; this allows one to take limits of Cauchy sequences, and (with a norm and vector space structure) sum absolutely convergent series, as well as use some useful results from point set topology such as the Baire category theorem. All of these operations are of course vital in analysis. [Compactness would be an even better property than completeness to have, but function spaces unfortunately tend be non-compact in various rather nasty ways, although there are useful partial substitutes for compactness that are available, see e.g. this blog post of mine.]
5. Topological structure. It is often important to know when a sequence (or, occasionally, nets) of functions $f_n$ in $V$ “converges” in some sense to a limit $f$ (which, hopefully, is still in $V$); there are often many distinct modes of convergence (e.g. pointwise convergence, uniform convergence, etc.) that one wishes to carefully distinguish from each other. Also, in order to apply various powerful topological theorems (or to justify various formal operations involving limits, suprema, etc.), it is important to know when certain subsets of $V$ enjoy key topological properties (most notably compactness and connectedness), and to know which operations on $V$ are continuous. For all of this, one needs a topology on $V$. If one already has a metric, then one of course has a topology generated by the open balls of that metric; but there are many important topologies on function spaces in analysis that do not arise from metrics. We also often require the topology to be compatible with the other structures on the function space; for instance, we usually require the vector space operations of addition and scalar multiplication to be continuous. In some cases, the topology on $V$ extends to some natural superspace $W$ of more general functions that contain $V$; in such cases, it is often important to know whether $V$ is closed in $W$, so that limits of sequences in $V$ stay in $V$.
6. Functional structures. Since numbers are easier to understand and deal with than functions, it is not surprising that we often study functions f in a function space V by first applying some functional $\lambda: V \to {\Bbb C}$ to V to identify some key numerical quantity $\lambda(f)$ associated to f. Norms $f \mapsto \|f\|_V$ are of course one important example of a functional; integration $f \mapsto \int_X f\ d\mu$ provides another; and evaluation $f \mapsto f(x)$ at a point x provides a third important class. (Note, though, that while evaluation is the fundamental feature of a function in set theory, it is often a quite minor operation in analysis; indeed, in many function spaces, evaluation is not even defined at all, for instance because the functions in the space are only defined almost everywhere!) An inner product $\langle,\rangle$ on $V$ (see below) also provides a large family $f \mapsto \langle f, g \rangle$ of useful functionals. It is of particular interest to study functionals that are compatible with the vector space structure (i.e. are linear) and with the topological structure (i.e. are continuous); this will give rise to the important notion of duality on function spaces.
7. Inner product structure. One often would like to pair a function f in a function space V with another object g (which is often, though not always, another function in the same function space V) and obtain a number $\langle f, g \rangle$, that typically measures the amount of “interaction” or “correlation” between f and g. Typical examples include inner products arising from integration, such as $\langle f, g\rangle := \int_X f \overline{g}\ d\mu$; integration itself can also be viewed as a pairing, $\langle f, \mu \rangle := \int_X f\ d\mu$. Of course, we usually require such inner products to be compatible with the other structures present on the space (e.g., to be compatible with the vector space structure, we usually require the inner product to be bilinear or sesquilinear). Inner products, when available, are incredibly useful in understanding the metric and norm geometry of a space, due to such fundamental facts as the Cauchy-Schwarz inequality and the parallelogram law. They also give rise to the important notion of orthogonality between functions.
8. Group actions. We often expect our function spaces to enjoy various symmetries; we might wish to rotate, reflect, translate, modulate, or dilate our functions and expect to preserve most of the structure of the space when doing so. In modern mathematics, symmetries are usually encoded by group actions (or actions of other group-like objects, such as semigroups or groupoids; one also often upgrades groups to more structured objects such as Lie groups). As usual, we typically require the group action to preserve the other structures present on the space, e.g. one often restricts attention to group actions that are linear (to preserve the vector space structure), continuous (to preserve topological structure), unitary (to preserve inner product structure), isometric (to preserve metric structure), and so forth. Besides giving us useful symmetries to spend, the presence of such group actions allows one to apply the powerful techniques of representation theory, Fourier analysis, and ergodic theory. However, as this is a foundational real analysis class, we will not discuss these important topics much here (and in fact will not deal with group actions much at all).
9. Order structure. In some cases, we want to utilise the notion of a function f being “non-negative”, or “dominating” another function g. One might also want to take the “max” or “supremum” of two or more functions in a function space V, or split a function into “positive” and “negative” components. Such order structures interact with the other structures on a space in many useful ways (e.g. via the Stone-Weierstrass theorem). Much like convexity, order structure is specific to the real line and is another reason why much of real analysis breaks down over other fields. (The complex plane is of course an extension of the real line and so is able to exploit the order structure of that line, usually by treating the real and imaginary components separately.)

There are of course many ways to combine various flavours of these structures together, and there are entire subfields of mathematics that are devoted to studying particularly common and useful categories of such combinations (e.g. topological vector spaces, normed vector spaces, Banach spaces, Banach algebras, von Neumann algebras, C^* algebras, Frechet spaces, Hilbert spaces, group algebras, etc.). The study of these sorts of spaces is known collectively as functional analysis. We will study some (but certainly not all) of these combinations in an abstract and general setting later in this course, but to begin with we will focus on the $L^p$ spaces, which are very good model examples for many of the above general classes of spaces, and also of importance in many applications of analysis (such as probability or PDE).

From Tim Gowers, I hear the good news that the editing process of the Princeton Companion to Mathematics is finally nearing completion. It therefore seems like a good time to resume my own series of Companion articles, while there is still time to correct any errors.

I’ll start today with my article on “Function spaces“. Just as the analysis of numerical quantities relies heavily on the concept of magnitude or absolute value to measure the size of such quantities, or the extent to which two such quantities are close to each other, the analysis of functions relies on the concept of a norm to measure various “sizes” of such functions, as well as the extent to which two functions resemble to each other. But while numbers mainly have just one notion of magnitude (not counting the p-adic valuations, which are of importance in number theory), functions have a wide variety of such magnitudes, such as “height” ($L^\infty$ or $C^0$ norm), “mass” ($L^1$ norm), “mean square” or “energy” ($L^2$ or $H^1$ norms), “slope” (Lipschitz or $C^1$ norms), and so forth. In modern mathematics, we use the framework of function spaces to understand the properties of functions and their magnitudes; they provide a precise and rigorous way to formalise such “fuzzy” notions as a function being tall, thin, flat, smooth, oscillating, etc. In this article I focus primarily on the analytic aspects of these function spaces (inequalities, interpolation, etc.), leaving aside the algebraic aspects or the connections with mathematical physics.

The Companion has several short articles describing specific landmark achievements in mathematics. For instance, here is Peter Cameron‘s short article on “Gödel’s theorem“, on what is arguably one of the most popularised (and most misunderstood) theorems in all of mathematics.

In the previous lecture, we studied the recurrence properties of compact systems, which are systems in which all measurable functions exhibit almost periodicity – they almost return completely to themselves after repeated shifting. Now, we consider the opposite extreme of mixing systems – those in which all measurable functions (of mean zero) exhibit mixing – they become orthogonal to themselves after repeated shifting. (Actually, there are two different types of mixing, strong mixing and weak mixing, depending on whether the orthogonality occurs individually or on the average; it is the latter concept which is of more importance to the task of establishing the Furstenberg recurrence theorem.)

We shall see that for weakly mixing systems, averages such as $\frac{1}{N} \sum_{n=0}^{N-1} T^n f \ldots T^{(k-1)n} f$ can be computed very explicitly (in fact, this average converges to the constant $(\int_X f\ d\mu)^{k-1}$). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the van der Corput lemma.

As one application of this theory, we will be able to establish Roth’s theorem (the k=3 case of Szemerédi’s theorem).