You are currently browsing the tag archive for the ‘spectral theorem’ tag.

Let ${L: H \rightarrow H}$ be a self-adjoint operator on a finite-dimensional Hilbert space ${H}$. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence ${\lambda_1,\ldots,\lambda_n \in {\bf R}}$ of eigenvalues and an orthonormal basis ${e_1,\ldots,e_n}$ of eigenfunctions such that ${L e_i = \lambda_i e_i}$ for all ${i=1,\ldots,n}$. In particular, given any function ${m: \sigma(L) \rightarrow {\bf C}}$ on the spectrum ${\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}}$ of ${L}$, one can then define the linear operator ${m(L): H \rightarrow H}$ by the formula

$\displaystyle m(L) e_i := m(\lambda_i) e_i,$

which then gives a functional calculus, in the sense that the map ${m \mapsto m(L)}$ is a ${C^*}$-algebra isometric homomorphism from the algebra ${BC(\sigma(L) \rightarrow {\bf C})}$ of bounded continuous functions from ${\sigma(L)}$ to ${{\bf C}}$, to the algebra ${B(H \rightarrow H)}$ of bounded linear operators on ${H}$. Thus, for instance, one can define heat operators ${e^{-tL}}$ for ${t>0}$, Schrödinger operators ${e^{itL}}$ for ${t \in {\bf R}}$, resolvents ${\frac{1}{L-z}}$ for ${z \not \in \sigma(L)}$, and (if ${L}$ is positive) wave operators ${e^{it\sqrt{L}}}$ for ${t \in {\bf R}}$. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with ${L}$ positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

$\displaystyle u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)$

the Schrödinger equation

$\displaystyle u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)$

the wave equation

$\displaystyle u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)$

or the Helmholtz equation

$\displaystyle (L-z) u = f. \ \ \ \ \ (4)$

The functional calculus can also be associated to a spectral measure. Indeed, for any vectors ${f, g \in H}$, there is a complex measure ${\mu_{f,g}}$ on ${\sigma(L)}$ with the property that

$\displaystyle \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);$

indeed, one can set ${\mu_{f,g}}$ to be the discrete measure on ${\sigma(L)}$ defined by the formula

$\displaystyle \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.$

One can also view this complex measure as a coefficient

$\displaystyle \mu_{f,g} = \langle \mu f, g \rangle_H$

of a projection-valued measure ${\mu}$ on ${\sigma(L)}$, defined by setting

$\displaystyle \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.$

Finally, one can view ${L}$ as unitarily equivalent to a multiplication operator ${M: f \mapsto g f}$ on ${\ell^2(\{1,\ldots,n\})}$, where ${g}$ is the real-valued function ${g(i) := \lambda_i}$, and the intertwining map ${U: \ell^2(\{1,\ldots,n\}) \rightarrow H}$ is given by

$\displaystyle U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,$

so that ${L = U M U^{-1}}$.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space ${H}$, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators ${L: H \rightarrow H}$. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators ${L: D \rightarrow H}$, which (initially, at least), are only defined on a dense subspace ${D}$ of the Hilbert space ${H}$. A very typical situation arises when ${H = L^2(\Omega)}$ is the square-integrable functions on some domain or manifold ${\Omega}$ (which may have a boundary or be otherwise “incomplete”), and ${D = C^\infty_c(\Omega)}$ are the smooth compactly supported functions on ${\Omega}$, and ${L}$ is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as ${e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}}$ or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator ${L: D \rightarrow H}$ must be imposed. The most obvious is that of symmetry, which asserts that

$\displaystyle \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)$

for all ${f, g \in D}$. In some applications, one also wants to impose positive definiteness, which asserts that

$\displaystyle \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)$

for all ${f \in D}$. These hypotheses are sufficient in the case when ${L}$ is bounded, and in particular when ${H}$ is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents ${(L-z)^{-1}}$ are well-defined for any strictly complex ${z}$, which by duality implies that the image of ${L-z}$ should be dense in ${H}$. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when ${H}$ is the Hilbert space ${H := L^2((0,1))}$, ${D := C^\infty_c((0,1))}$ is the space of test functions, and ${L}$ is the one-dimensional Laplacian ${L := -\frac{d^2}{dx^2}}$. Then ${L}$ is symmetric and positive, but the operator ${L-k^2}$ does not have dense image for any complex ${k}$, since

$\displaystyle \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0$

for all test functions ${f \in C^\infty_c((0,1))}$, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when ${H := L^2((0,+\infty))}$, ${D := C^\infty_c((0,+\infty))}$, ${L}$ is the momentum operator ${L := i \frac{d}{dx}}$. Then the resolvent ${(L-z)^{-1}}$ can be uniquely defined for ${z}$ in the upper half-plane, but not in the lower half-plane, due to the obstruction

$\displaystyle \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0$

for all test functions ${f}$ (note that the function ${e^{i\bar{z} x}}$ lies in ${L^2((0,+\infty))}$ when ${z}$ is in the lower half-plane). For related reasons, the translation operators ${e^{itL}}$ have a problem with either uniqueness or existence (depending on whether ${t}$ is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once ${L}$ is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing ${L}$ on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator ${L}$, then ${L}$ is self-adjoint. Examples of “expected consequences” include:

• Existence of resolvents ${(L-z)^{-1}}$ (or equivalently, dense image for ${L-z}$);
• Existence of a contractive heat propagator semigroup ${e^{tL}}$ (in the positive case);
• Existence of a unitary Schrödinger propagator group ${e^{itL}}$;
• Existence of a unitary wave propagator group ${e^{it\sqrt{L}}}$ (in the positive case);
• Existence of a “reasonable” functional calculus.
• Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator ${L = -\Delta_g}$ on a smooth Riemannian manifold ${(M,g)}$ (using ${C^\infty_c(M)}$ as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains ${(0,1)}$ and ${(0,+\infty)}$ in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

In the foundations of modern probability, as laid out by Kolmogorov, the basic objects of study are constructed in the following order:

1. Firstly, one selects a sample space ${\Omega}$, whose elements ${\omega}$ represent all the possible states that one’s stochastic system could be in.
2. Then, one selects a ${\sigma}$-algebra ${{\mathcal B}}$ of events ${E}$ (modeled by subsets of ${\Omega}$), and assigns each of these events a probability ${{\bf P}(E) \in [0,1]}$ in a countably additive manner, so that the entire sample space has probability ${1}$.
3. Finally, one builds (commutative) algebras of random variables ${X}$ (such as complex-valued random variables, modeled by measurable functions from ${\Omega}$ to ${{\bf C}}$), and (assuming suitable integrability or moment conditions) one can assign expectations ${\mathop{\bf E} X}$ to each such random variable.

In measure theory, the underlying measure space ${\Omega}$ plays a prominent foundational role, with the measurable sets and measurable functions (the analogues of the events and the random variables) always being viewed as somehow being attached to that space. In probability theory, in contrast, it is the events and their probabilities that are viewed as being fundamental, with the sample space ${\Omega}$ being abstracted away as much as possible, and with the random variables and expectations being viewed as derived concepts. See Notes 0 for further discussion of this philosophy.

However, it is possible to take the abstraction process one step further, and view the algebra of random variables and their expectations as being the foundational concept, and ignoring both the presence of the original sample space, the algebra of events, or the probability measure.

There are two reasons for wanting to shed (or abstract away) these previously foundational structures. Firstly, it allows one to more easily take certain types of limits, such as the large ${n}$ limit ${n \rightarrow \infty}$ when considering ${n \times n}$ random matrices, because quantities built from the algebra of random variables and their expectations, such as the normalised moments of random matrices tend to be quite stable in the large ${n}$ limit (as we have seen in previous notes), even as the sample space and event space varies with ${n}$. (This theme of using abstraction to facilitate the taking of the large ${n}$ limit also shows up in the application of ergodic theory to combinatorics via the correspondence principle; see this previous blog post for further discussion.)

Secondly, this abstract formalism allows one to generalise the classical, commutative theory of probability to the more general theory of non-commutative probability theory, which does not have a classical underlying sample space or event space, but is instead built upon a (possibly) non-commutative algebra of random variables (or “observables”) and their expectations (or “traces”). This more general formalism not only encompasses classical probability, but also spectral theory (with matrices or operators taking the role of random variables, and the trace taking the role of expectation), random matrix theory (which can be viewed as a natural blend of classical probability and spectral theory), and quantum mechanics (with physical observables taking the role of random variables, and their expected value on a given quantum state being the expectation). It is also part of a more general “non-commutative way of thinking” (of which non-commutative geometry is the most prominent example), in which a space is understood primarily in terms of the ring or algebra of functions (or function-like objects, such as sections of bundles) placed on top of that space, and then the space itself is largely abstracted away in order to allow the algebraic structures to become less commutative. In short, the idea is to make algebra the foundation of the theory, as opposed to other possible choices of foundations such as sets, measures, categories, etc..

[Note that this foundational preference is to some extent a metamathematical one rather than a mathematical one; in many cases it is possible to rewrite the theory in a mathematically equivalent form so that some other mathematical structure becomes designated as the foundational one, much as probability theory can be equivalently formulated as the measure theory of probability measures. However, this does not negate the fact that a different choice of foundations can lead to a different way of thinking about the subject, and thus to ask a different set of questions and to discover a different set of proofs and solutions. Thus it is often of value to understand multiple foundational perspectives at once, to get a truly stereoscopic view of the subject.]

It turns out that non-commutative probability can be modeled using operator algebras such as ${C^*}$-algebras, von Neumann algebras, or algebras of bounded operators on a Hilbert space, with the latter being accomplished via the Gelfand-Naimark-Segal construction. We will discuss some of these models here, but just as probability theory seeks to abstract away its measure-theoretic models, the philosophy of non-commutative probability is also to downplay these operator algebraic models once some foundational issues are settled.

When one generalises the set of structures in one’s theory, for instance from the commutative setting to the non-commutative setting, the notion of what it means for a structure to be “universal”, “free”, or “independent” can change. The most familiar example of this comes from group theory. If one restricts attention to the category of abelian groups, then the “freest” object one can generate from two generators ${e,f}$ is the free abelian group of commutative words ${e^n f^m}$ with ${n,m \in {\bf Z}}$, which is isomorphic to the group ${{\bf Z}^2}$. If however one generalises to the non-commutative setting of arbitrary groups, then the “freest” object that can now be generated from two generators ${e,f}$ is the free group ${{\Bbb F}_2}$ of non-commutative words ${e^{n_1} f^{m_1} \ldots e^{n_k} f^{m_k}}$ with ${n_1,m_1,\ldots,n_k,m_k \in {\bf Z}}$, which is a significantly larger extension of the free abelian group ${{\bf Z}^2}$.

Similarly, when generalising classical probability theory to non-commutative probability theory, the notion of what it means for two or more random variables to be independent changes. In the classical (commutative) setting, two (bounded, real-valued) random variables ${X, Y}$ are independent if one has

$\displaystyle \mathop{\bf E} f(X) g(Y) = 0$

whenever ${f, g: {\bf R} \rightarrow {\bf R}}$ are well-behaved functions (such as polynomials) such that all of ${\mathop{\bf E} f(X)}$, ${\mathop{\bf E} g(Y)}$ vanishes. In the non-commutative setting, one can generalise the above definition to two commuting bounded self-adjoint variables; this concept is useful for instance in quantum probability, which is an abstraction of the theory of observables in quantum mechanics. But for two (bounded, self-adjoint) non-commutative random variables ${X, Y}$, the notion of classical independence no longer applies. As a substitute, one can instead consider the notion of being freely independent (or free for short), which means that

$\displaystyle \mathop{\bf E} f_1(X) g_1(Y) \ldots f_k(X) g_k(Y) = 0$

whenever ${f_1,g_1,\ldots,f_k,g_k: {\bf R} \rightarrow {\bf R}}$ are well-behaved functions such that all of ${\mathop{\bf E} f_1(X), \mathop{\bf E} g_1(Y), \ldots, \mathop{\bf E} f_k(X), \mathop{\bf E} g_k(Y)}$ vanish.

The concept of free independence was introduced by Voiculescu, and its study is now known as the subject of free probability. We will not attempt a systematic survey of this subject here; for this, we refer the reader to the surveys of Speicher and of Biane. Instead, we shall just discuss a small number of topics in this area to give the flavour of the subject only.

The significance of free probability to random matrix theory lies in the fundamental observation that random matrices which are independent in the classical sense, also tend to be independent in the free probability sense, in the large ${n}$ limit ${n \rightarrow \infty}$. (This is only possible because of the highly non-commutative nature of these matrices; as we shall see, it is not possible for non-trivial commuting independent random variables to be freely independent.) Because of this, many tedious computations in random matrix theory, particularly those of an algebraic or enumerative combinatorial nature, can be done more quickly and systematically by using the framework of free probability, which by design is optimised for algebraic tasks rather than analytical ones.

Much as free groups are in some sense “maximally non-commutative”, freely independent random variables are about as far from being commuting as possible. For instance, if ${X, Y}$ are freely independent and of expectation zero, then ${\mathop{\bf E} XYXY}$ vanishes, but ${\mathop{\bf E} XXYY}$ instead factors as ${(\mathop{\bf E} X^2) (\mathop{\bf E} Y^2)}$. As a consequence, the behaviour of freely independent random variables can be quite different from the behaviour of their classically independent commuting counterparts. Nevertheless there is a remarkably strong analogy between the two types of independence, in that results which are true in the classically independent case often have an interesting analogue in the freely independent setting. For instance, the central limit theorem (Notes 2) for averages of classically independent random variables, which roughly speaking asserts that such averages become gaussian in the large ${n}$ limit, has an analogue for averages of freely independent variables, the free central limit theorem, which roughly speaking asserts that such averages become semicircular in the large ${n}$ limit. One can then use this theorem to provide yet another proof of Wigner’s semicircle law (Notes 4).

Another important (and closely related) analogy is that while the distribution of sums of independent commutative random variables can be quickly computed via the characteristic function (i.e. the Fourier transform of the distribution), the distribution of sums of freely independent non-commutative random variables can be quickly computed using the Stieltjes transform instead (or with closely related objects, such as the ${R}$-transform of Voiculescu). This is strongly reminiscent of the appearance of the Stieltjes transform in random matrix theory, and indeed we will see many parallels between the use of the Stieltjes transform here and in Notes 4.

As mentioned earlier, free probability is an excellent tool for computing various expressions of interest in random matrix theory, such as asymptotic values of normalised moments in the large ${n}$ limit ${n \rightarrow \infty}$. Nevertheless, as it only covers the asymptotic regime in which ${n}$ is sent to infinity while holding all other parameters fixed, there are some aspects of random matrix theory to which the tools of free probability are not sufficient by themselves to resolve (although it can be possible to combine free probability theory with other tools to then answer these questions). For instance, questions regarding the rate of convergence of normalised moments as ${n \rightarrow \infty}$ are not directly answered by free probability, though if free probability is combined with tools such as concentration of measure (Notes 1) then such rate information can often be recovered. For similar reasons, free probability lets one understand the behaviour of ${k^{th}}$ moments as ${n \rightarrow \infty}$ for fixed ${k}$, but has more difficulty dealing with the situation in which ${k}$ is allowed to grow slowly in ${n}$ (e.g. ${k = O(\log n)}$). Because of this, free probability methods are effective at controlling the bulk of the spectrum of a random matrix, but have more difficulty with the edges of that spectrum (as well as with related concepts such as the operator norm, Notes 3) as well as with fine-scale structure of the spectrum. Finally, free probability methods are most effective when dealing with matrices that are Hermitian with bounded operator norm, largely because the spectral theory of bounded self-adjoint operators in the infinite-dimensional setting of the large ${n}$ limit is non-pathological. (This is ultimately due to the stable nature of eigenvalues in the self-adjoint setting; see this previous blog post for discussion.) For non-self-adjoint operators, free probability needs to be augmented with additional tools, most notably by bounds on least singular values, in order to recover the required stability for the various spectral data of random matrices to behave continuously with respect to the large ${n}$ limit. We will discuss this latter point in a later set of notes.

I was asked recently (in relation to my recent work with Van Vu on the spectral theory of random matrices) to explain some standard heuristics regarding how the eigenvalues $\lambda_1,\ldots,\lambda_n$ of an $n \times n$ matrix A behave under small perturbations.  These heuristics can be summarised as follows:

1. For normal matrices (and in particular, unitary or self-adjoint matrices), eigenvalues are very stable under small perturbations.  For more general matrices, eigenvalues can become unstable if there is pseudospectrum present.
2. For self-adjoint (Hermitian) matrices, eigenvalues that are too close together tend to repel quickly from each other under such perturbations.  For more general matrices, eigenvalues can either be repelled or be attracted to each other, depending on the type of perturbation.

In this post I would like to briefly explain why these heuristics are plausible.

We now begin our study of measure-preserving systems $(X, {\mathcal X}, \mu, T)$, i.e. a probability space $(X, {\mathcal X}, \mu)$ together with a probability space isomorphism $T: (X, {\mathcal X}, \mu) \to (X, {\mathcal X}, \mu)$ (thus $T: X \to X$ is invertible, with T and $T^{-1}$ both being measurable, and $\mu(T^n E) = \mu(E)$ for all $E \in {\mathcal X}$ and all n). For various technical reasons it is convenient to restrict to the case when the $\sigma$-algebra ${\mathcal X}$ is separable, i.e. countably generated. One reason for this is as follows:

Exercise 1. Let $(X, {\mathcal X}, \mu)$ be a probability space with ${\mathcal X}$ separable. Then the Banach spaces $L^p(X, {\mathcal X}, \mu)$ are separable (i.e. have a countable dense subset) for every $1 \leq p < \infty$; in particular, the Hilbert space $L^2(X, {\mathcal X}, \mu)$ is separable. Show that the claim can fail for $p = \infty$. (We allow the $L^p$ spaces to be either real or complex valued, unless otherwise specified.) $\diamond$

Remark 1. In practice, the requirement that ${\mathcal X}$ be separable is not particularly onerous. For instance, if one is studying the recurrence properties of a function $f: X \to {\Bbb R}$ on a non-separable measure-preserving system $(X, {\mathcal X}, \mu, T)$, one can restrict ${\mathcal X}$ to the separable sub-$\sigma$-algebra ${\mathcal X}'$ generated by the level sets $\{ x \in X: T^n f(x) > q \}$ for integer n and rational q, thus passing to a separable measure-preserving system $(X, {\mathcal X}', \mu, T)$ on which f is still measurable. Thus we see that in many cases of interest, we can immediately reduce to the separable case. (In particular, for many of the theorems in this course, the hypothesis of separability can be dropped, though we won’t bother to specify for which ones this is the case.) $\diamond$

We are interested in the recurrence properties of sets $E \in {\mathcal X}$ or functions $f \in L^p(X, {\mathcal X}, \mu)$. The simplest such recurrence theorem is

Theorem 1. (Poincaré recurrence theorem) Let $(X,{\mathcal X},\mu,T)$ be a measure-preserving system, and let $E \in {\mathcal X}$ be a set of positive measure. Then $\limsup_{n \to +\infty} \mu( E \cap T^n E ) \geq \mu(E)^2$. In particular, $E \cap T^n E$ has positive measure (and is thus non-empty) for infinitely many n.

(Compare with Theorem 1 of Lecture 3.)

Proof. For any integer $N > 1$, observe that $\int_X \sum_{n=1}^N 1_{T^n E}\ d\mu = N \mu(E)$, and thus by Cauchy-Schwarz

$\int_X (\sum_{n=1}^N 1_{T^n E})^2\ d\mu \geq N^2 \mu(E)^2.$ (1)

The left-hand side of (1) can be rearranged as

$\sum_{n=1}^N \sum_{m=1}^N \mu( T^n E \cap T^m E ).$ (2)

On the other hand, $\mu( T^n E \cap T^m E) = \mu( E \cap T^{m-n} E )$. From this one easily obtains the asymptotic

$(2)\leq (\limsup_{n \to \infty} \mu( E \cap T^n E ) + o(1)) N^2,$ (3)

where o(1) denotes an expression which goes to zero as N goes to infinity. Combining (1), (2), (3) and taking limits as $N \to +\infty$ we obtain

$\limsup_{n \to \infty} \mu( E \cap T^n E ) \geq \mu(E)^2$ (4)

as desired. $\Box$

Remark 2. In classical physics, the evolution of a physical system in a compact phase space is given by a (continuous-time) measure-preserving system (this is Hamilton’s equations of motion combined with Liouville’s theorem). The Poincaré recurrence theorem then has the following unintuitive consequence: every collection E of states of positive measure, no matter how small, must eventually return to overlap itself given sufficient time. For instance, if one were to burn a piece of paper in a closed system, then there exist arbitrarily small perturbations of the initial conditions such that, if one waits long enough, the piece of paper will eventually reassemble (modulo arbitrarily small error)! This seems to contradict the second law of thermodynamics, but the reason for the discrepancy is because the time required for the recurrence theorem to take effect is inversely proportional to the measure of the set E, which in physical situations is exponentially small in the number of degrees of freedom (which is already typically quite large, e.g. of the order of the Avogadro constant). This gives more than enough opportunity for Maxwell’s demon to come into play to reverse the increase of entropy. (This can be viewed as a manifestation of the curse of dimensionality.) The more sophisticated recurrence theorems we will see later have much poorer quantitative bounds still, so much so that they basically have no direct significance for any physical dynamical system with many relevant degrees of freedom. $\diamond$

Exercise 2. Prove the following generalisation of the Poincaré recurrence theorem: if $(X, {\mathcal X}, \mu, T)$ is a measure-preserving system and $f \in L^1(X, {\mathcal X},\mu)$ is non-negative, then $\limsup_{n \to +\infty} \int_X f T^n f \geq (\int_X f\ d\mu)^2$. $\diamond$

Exercise 3. Give examples to show that the quantity $\mu(E)^2$ in the conclusion of Theorem 1 cannot be replaced by any smaller quantity in general, regardless of the actual value of $\mu(E)$. (Hint: use a Bernoulli system example.) $\diamond$

Exercise 4. Using the pigeonhole principle instead of the Cauchy-Schwarz inequality (and in particular, the statement that if $\mu(E_1) + \ldots + \mu(E_n) > 1$, then the sets $E_1,\ldots,E_n$ cannot all be disjoint), prove the weaker statement that for any set E of positive measure in a measure-preserving system, the set $E \cap T^n E$ is non-empty for infinitely many n. (This exercise illustrates the general point that the Cauchy-Schwarz inequality can be viewed as a quantitative strengthening of the pigeonhole principle.) $\diamond$

For this lecture and the next we shall study several variants of the Poincaré recurrence theorem. We begin by looking at the mean ergodic theorem, which studies the limiting behaviour of the ergodic averages $\frac{1}{N} \sum_{n=1}^N T^n f$ in various $L^p$ spaces, and in particular in $L^2$.