You are currently browsing the tag archive for the ‘spectral theorem’ tag.

The prime number theorem can be expressed as the assertion

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$, where

$\displaystyle \Lambda(n) := \sum_{d|n} \mu(d) \log \frac{n}{d}$

is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula

$\displaystyle \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + O(x) \ \ \ \ \ (2)$

where the second von Mangoldt function ${\Lambda_2}$ is defined by the formula

$\displaystyle \Lambda_2(n) := \sum_{d|n} \mu(d) \log^2 \frac{n}{d} \ \ \ \ \ (3)$

or equivalently

$\displaystyle \Lambda_2(n) = \Lambda(n) \log n + \sum_{d|n} \Lambda(d) \Lambda(\frac{n}{d}). \ \ \ \ \ (4)$

(We are avoiding the use of the ${*}$ symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate

$\displaystyle \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + o(x \log x) \ \ \ \ \ (5)$

suffices.

In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space ${C_c({\bf R})}$ of compactly supported continuous functions ${f: {\bf R} \rightarrow {\bf C}}$ equipped with the convolution operation

$\displaystyle f*g(t) := \int_{\bf R} f(u) g(t-u)\ du.$

This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula

$\displaystyle \lim_{n \rightarrow \infty} \|f^n\|^{1/n} = \sup_{\lambda \in \hat B} |\lambda(f)| \ \ \ \ \ (6)$

for any element ${f}$ of a unital commutative Banach algebra ${B}$, where ${\hat B}$ is the space of characters (i.e., continuous unital algebra homomorphisms from ${B}$ to ${{\bf C}}$) of ${B}$. This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.

The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.

Theorem 1 (Construction of a Banach algebra norm) For any ${G \in C_c({\bf R})}$, let ${\|G\|}$ denote the quantity

$\displaystyle \|G\| := \limsup_{x \rightarrow \infty} |\sum_n \frac{\Lambda(n)}{n} G( \log \frac{x}{n} ) - \int_{\bf R} G(t)\ dt|.$

Then ${\| \|}$ is a seminorm on ${C_c({\bf R})}$ with the bound

$\displaystyle \|G\| \leq \|G\|_{L^1({\bf R})} := \int_{\bf R} |G(t)|\ dt \ \ \ \ \ (7)$

for all ${G \in C_c({\bf R})}$. Furthermore, we have the Banach algebra bound

$\displaystyle \| G * H \| \leq \|G\| \|H\| \ \ \ \ \ (8)$

for all ${G,H \in C_c({\bf R})}$.

We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that ${C_c({\bf R})}$ contains a plentiful supply of local units):

Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum) Let ${\| \|}$ be a seminorm on ${C_c({\bf R})}$ obeying (7), (8). Suppose that ${\| \|}$ is not identically zero. Then there exists ${\xi \in {\bf R}}$ such that

$\displaystyle |\int_{\bf R} G(t) e^{-it\xi}\ dt| \leq \|G\|$

for all ${G \in C_c}$. In particular, by (7), one has

$\displaystyle \|G\| = \| G \|_{L^1({\bf R})}$

whenever ${G(t) e^{-it\xi}}$ is a non-negative function.

The second is a consequence of the Selberg symmetry formula and the fact that ${\Lambda}$ is real (as well as Mertens’ theorem, in the ${\xi=0}$ case), and is closely related to the non-vanishing of the Riemann zeta function ${\zeta}$ on the line ${\{ 1+i\xi: \xi \in {\bf R}\}}$:

Theorem 3 (Breaking the parity barrier) Let ${\xi \in {\bf R}}$. Then there exists ${G \in C_c({\bf R})}$ such that ${G(t) e^{-it\xi}}$ is non-negative, and

$\displaystyle \|G\| < \|G\|_{L^1({\bf R})}.$

Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm ${\| \|}$ constructed in Theorem 1 is trivial, and thus

$\displaystyle \sum_n \frac{\Lambda(n)}{n} G( \log \frac{x}{n} ) = \int_{\bf R} G(t)\ dt + o(1)$

as ${x \rightarrow \infty}$ for any Schwartz function ${G}$ (the decay rate in ${o(1)}$ may depend on ${G}$). Specialising to functions of the form ${G(t) = e^{-t} \eta( e^{-t} )}$ for some smooth compactly supported ${\eta}$ on ${(0,+\infty)}$, we conclude that

$\displaystyle \sum_n \Lambda(n) \eta(\frac{n}{x}) = x \int_{\bf R} \eta(u)\ du + o(x)$

as ${x \rightarrow \infty}$; by the smooth Urysohn lemma this implies that

$\displaystyle \sum_{\varepsilon x \leq n \leq x} \Lambda(n) = x - \varepsilon x + o(x)$

as ${x \rightarrow \infty}$ for any fixed ${\varepsilon>0}$, and the prime number theorem then follows by a telescoping series argument.

The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that

$\displaystyle \sum_{n \leq x} \Lambda(n) \chi(n) = o(x)$

for any fixed Dirichlet character ${\chi}$; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity ${L(1,\chi) = \sum_n \frac{\chi(n)}{n}}$ is non-vanishing.

In the traditional foundations of probability theory, one selects a probability space ${(\Omega, {\mathcal B}, {\mathbf P})}$, and makes a distinction between deterministic mathematical objects, which do not depend on the sampled state ${\omega \in \Omega}$, and stochastic (or random) mathematical objects, which do depend (but in a measurable fashion) on the sampled state ${\omega \in \Omega}$. For instance, a deterministic real number would just be an element ${x \in {\bf R}}$, whereas a stochastic real number (or real random variable) would be a measurable function ${x: \Omega \rightarrow {\bf R}}$, where in this post ${{\bf R}}$ will always be endowed with the Borel ${\sigma}$-algebra. (For readers familiar with nonstandard analysis, the adjectives “deterministic” and “stochastic” will be used here in a manner analogous to the uses of the adjectives “standard” and “nonstandard” in nonstandard analysis. The analogy is particularly close when comparing with the “cheap nonstandard analysis” discussed in this previous blog post. We will also use “relative to ${\Omega}$” as a synonym for “stochastic”.)

Actually, for our purposes we will adopt the philosophy of identifying stochastic objects that agree almost surely, so if one was to be completely precise, we should define a stochastic real number to be an equivalence class ${[x]}$ of measurable functions ${x: \Omega \rightarrow {\bf R}}$, up to almost sure equivalence. However, we shall often abuse notation and write ${[x]}$ simply as ${x}$.

More generally, given any measurable space ${X = (X, {\mathcal X})}$, we can talk either about deterministic elements ${x \in X}$, or about stochastic elements of ${X}$, that is to say equivalence classes ${[x]}$ of measurable maps ${x: \Omega \rightarrow X}$ up to almost sure equivalence. We will use ${\Gamma(X|\Omega)}$ to denote the set of all stochastic elements of ${X}$. (For readers familiar with sheaves, it may helpful for the purposes of this post to think of ${\Gamma(X|\Omega)}$ as the space of measurable global sections of the trivial ${X}$bundle over ${\Omega}$.) Of course every deterministic element ${x}$ of ${X}$ can also be viewed as a stochastic element ${x|\Omega \in \Gamma(X|\Omega)}$ given by (the equivalence class of) the constant function ${\omega \mapsto x}$, thus giving an embedding of ${X}$ into ${\Gamma(X|\Omega)}$. We do not attempt here to give an interpretation of ${\Gamma(X|\Omega)}$ for sets ${X}$ that are not equipped with a ${\sigma}$-algebra ${{\mathcal X}}$.

Remark 1 In my previous post on the foundations of probability theory, I emphasised the freedom to extend the sample space ${(\Omega, {\mathcal B}, {\mathbf P})}$ to a larger sample space whenever one wished to inject additional sources of randomness. This is of course an important freedom to possess (and in the current formalism, is the analogue of the important operation of base change in algebraic geometry), but in this post we will focus on a single fixed sample space ${(\Omega, {\mathcal B}, {\mathbf P})}$, and not consider extensions of this space, so that one only has to consider two types of mathematical objects (deterministic and stochastic), as opposed to having many more such types, one for each potential choice of sample space (with the deterministic objects corresponding to the case when the sample space collapses to a point).

Any (measurable) ${k}$-ary operation on deterministic mathematical objects then extends to their stochastic counterparts by applying the operation pointwise. For instance, the addition operation ${+: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ on deterministic real numbers extends to an addition operation ${+: \Gamma({\bf R}|\Omega) \times \Gamma({\bf R}|\Omega) \rightarrow \Gamma({\bf R}|\Omega)}$, by defining the class ${[x]+[y]}$ for ${x,y: \Omega \rightarrow {\bf R}}$ to be the equivalence class of the function ${\omega \mapsto x(\omega) + y(\omega)}$; this operation is easily seen to be well-defined. More generally, any measurable ${k}$-ary deterministic operation ${O: X_1 \times \dots \times X_k \rightarrow Y}$ between measurable spaces ${X_1,\dots,X_k,Y}$ extends to an stochastic operation ${O: \Gamma(X_1|\Omega) \times \dots \Gamma(X_k|\Omega) \rightarrow \Gamma(Y|\Omega)}$ in the obvious manner.

There is a similar story for ${k}$-ary relations ${R: X_1 \times \dots \times X_k \rightarrow \{\hbox{true},\hbox{false}\}}$, although here one has to make a distinction between a deterministic reading of the relation and a stochastic one. Namely, if we are given stochastic objects ${x_i \in \Gamma(X_i|\Omega)}$ for ${i=1,\dots,k}$, the relation ${R(x_1,\dots,x_k)}$ does not necessarily take values in the deterministic Boolean algebra ${\{ \hbox{true}, \hbox{false}\}}$, but only in the stochastic Boolean algebra ${\Gamma(\{ \hbox{true}, \hbox{false}\}|\Omega)}$ – thus ${R(x_1,\dots,x_k)}$ may be true with some positive probability and also false with some positive probability (with the event that ${R(x_1,\dots,x_k)}$ being stochastically true being determined up to null events). Of course, the deterministic Boolean algebra embeds in the stochastic one, so we can talk about a relation ${R(x_1,\dots,x_k)}$ being determinstically true or deterministically false, which (due to our identification of stochastic objects that agree almost surely) means that ${R(x_1(\omega),\dots,x_k(\omega))}$ is almost surely true or almost surely false respectively. For instance given two stochastic objects ${x,y}$, one can view their equality relation ${x=y}$ as having a stochastic truth value. This is distinct from the way the equality symbol ${=}$ is used in mathematical logic, which we will now call “equality in the deterministic sense” to reduce confusion. Thus, ${x=y}$ in the deterministic sense if and only if the stochastic truth value of ${x=y}$ is equal to ${\hbox{true}}$, that is to say that ${x(\omega)=y(\omega)}$ for almost all ${\omega}$.

Any universal identity for deterministic operations (or universal implication between identities) extends to their stochastic counterparts: for instance, addition is commutative, associative, and cancellative on the space of deterministic reals ${{\bf R}}$, and is therefore commutative, associative, and cancellative on stochastic reals ${\Gamma({\bf R}|\Omega)}$ as well. However, one has to be more careful when working with mathematical laws that are not expressible as universal identities, or implications between identities. For instance, ${{\bf R}}$ is an integral domain: if ${x_1,x_2 \in {\bf R}}$ are deterministic reals such that ${x_1 x_2=0}$, then one must have ${x_1=0}$ or ${x_2=0}$. However, if ${x_1, x_2 \in \Gamma({\bf R}|\Omega)}$ are stochastic reals such that ${x_1 x_2 = 0}$ (in the deterministic sense), then it is no longer necessarily the case that ${x_1=0}$ (in the deterministic sense) or that ${x_2=0}$ (in the deterministic sense); however, it is still true that “${x_1=0}$ or ${x_2=0}$” is true in the deterministic sense if one interprets the boolean operator “or” stochastically, thus “${x_1(\omega)=0}$ or ${x_2(\omega)=0}$” is true for almost all ${\omega}$. Another way to properly obtain a stochastic interpretation of the integral domain property of ${{\bf R}}$ is to rewrite it as

$\displaystyle x_1,x_2 \in {\bf R}, x_1 x_2 = 0 \implies x_i=0 \hbox{ for some } i \in \{1,2\}$

and then make all sets stochastic to obtain the true statement

$\displaystyle x_1,x_2 \in \Gamma({\bf R}|\Omega), x_1 x_2 = 0 \implies x_i=0 \hbox{ for some } i \in \Gamma(\{1,2\}|\Omega),$

thus we have to allow the index ${i}$ for which vanishing ${x_i=0}$ occurs to also be stochastic, rather than deterministic. (A technical note: when one proves this statement, one has to select ${i}$ in a measurable fashion; for instance, one can choose ${i(\omega)}$ to equal ${1}$ when ${x_1(\omega)=0}$, and ${2}$ otherwise (so that in the “tie-breaking” case when ${x_1(\omega)}$ and ${x_2(\omega)}$ both vanish, one always selects ${i(\omega)}$ to equal ${1}$).)

Similarly, the law of the excluded middle fails when interpreted deterministically, but remains true when interpreted stochastically: if ${S}$ is a stochastic statement, then it is not necessarily the case that ${S}$ is either deterministically true or deterministically false; however the sentence “${S}$ or not-${S}$” is still deterministically true if the boolean operator “or” is interpreted stochastically rather than deterministically.

To avoid having to keep pointing out which operations are interpreted stochastically and which ones are interpreted deterministically, we will use the following convention: if we assert that a mathematical sentence ${S}$ involving stochastic objects is true, then (unless otherwise specified) we mean that ${S}$ is deterministically true, assuming that all relations used inside ${S}$ are interpreted stochastically. For instance, if ${x,y}$ are stochastic reals, when we assert that “Exactly one of ${x < y}$, ${x=y}$, or ${x>y}$ is true”, then by default it is understood that the relations ${<}$, ${=}$, ${>}$ and the boolean operator “exactly one of” are interpreted stochastically, and the assertion is that the sentence is deterministically true.

In the above discussion, the stochastic objects ${x}$ being considered were elements of a deterministic space ${X}$, such as the reals ${{\bf R}}$. However, it can often be convenient to generalise this situation by allowing the ambient space ${X}$ to also be stochastic. For instance, one might wish to consider a stochastic vector ${v(\omega)}$ inside a stochastic vector space ${V(\omega)}$, or a stochastic edge ${e}$ of a stochastic graph ${G(\omega)}$. In order to formally describe this situation within the classical framework of measure theory, one needs to place all the ambient spaces ${X(\omega)}$ inside a measurable space. This can certainly be done in many contexts (e.g. when considering random graphs on a deterministic set of vertices, or if one is willing to work up to equivalence and place the ambient spaces inside a suitable moduli space), but is not completely natural in other contexts. For instance, if one wishes to consider stochastic vector spaces of potentially unbounded dimension (in particular, potentially larger than any given cardinal that one might specify in advance), then the class of all possible vector spaces is so large that it becomes a proper class rather than a set (even if one works up to equivalence), making it problematic to give this class the structure of a measurable space; furthermore, even once one does so, one needs to take additional care to pin down what it would mean for a random vector ${\omega \mapsto v_\omega}$ lying in a random vector space ${\omega \mapsto V_\omega}$ to depend “measurably” on ${\omega}$.

Of course, in any reasonable application one can avoid the set theoretic issues at least by various ad hoc means, for instance by restricting the dimension of all spaces involved to some fixed cardinal such as ${2^{\aleph_0}}$. However, the measure-theoretic issues can require some additional effort to resolve properly.

In this post I would like to describe a different way to formalise stochastic spaces, and stochastic elements of these spaces, by viewing the spaces as measure-theoretic analogue of a sheaf, but being over the probability space ${\Omega}$ rather than over a topological space; stochastic objects are then sections of such sheaves. Actually, for minor technical reasons it is convenient to work in the slightly more general setting in which the base space ${\Omega}$ is a finite measure space ${(\Omega, {\mathcal B}, \mu)}$ rather than a probability space, thus ${\mu(\Omega)}$ can take any value in ${[0,+\infty)}$ rather than being normalised to equal ${1}$. This will allow us to easily localise to subevents ${\Omega'}$ of ${\Omega}$ without the need for normalisation, even when ${\Omega'}$ is a null event (though we caution that the map ${x \mapsto x|\Omega'}$ from deterministic objects ${x}$ ceases to be injective in this latter case). We will however still continue to use probabilistic terminology. despite the lack of normalisation; thus for instance, sets ${E}$ in ${{\mathcal B}}$ will be referred to as events, the measure ${\mu(E)}$ of such a set will be referred to as the probability (which is now permitted to exceed ${1}$ in some cases), and an event whose complement is a null event shall be said to hold almost surely. It is in fact likely that almost all of the theory below extends to base spaces which are ${\sigma}$-finite rather than finite (for instance, by damping the measure to become finite, without introducing any further null events), although we will not pursue this further generalisation here.

The approach taken in this post is “topos-theoretic” in nature (although we will not use the language of topoi explicitly here), and is well suited to a “pointless” or “point-free” approach to probability theory, in which the role of the stochastic state ${\omega \in \Omega}$ is suppressed as much as possible; instead, one strives to always adopt a “relative point of view”, with all objects under consideration being viewed as stochastic objects relative to the underlying base space ${\Omega}$. In this perspective, the stochastic version of a set is as follows.

Definition 1 (Stochastic set) Unless otherwise specified, we assume that we are given a fixed finite measure space ${\Omega = (\Omega, {\mathcal B}, \mu)}$ (which we refer to as the base space). A stochastic set (relative to ${\Omega}$) is a tuple ${X|\Omega = (\Gamma(X|E)_{E \in {\mathcal B}}, ((|E))_{E \subset F, E,F \in {\mathcal B}})}$ consisting of the following objects:

• A set ${\Gamma(X|E)}$ assigned to each event ${E \in {\mathcal B}}$; and
• A restriction map ${x \mapsto x|E}$ from ${\Gamma(X|F)}$ to ${\Gamma(X|E)}$ to each pair ${E \subset F}$ of nested events ${E,F \in {\mathcal B}}$. (Strictly speaking, one should indicate the dependence on ${F}$ in the notation for the restriction map, e.g. using ${x \mapsto x|(E \leftarrow F)}$ instead of ${x \mapsto x|E}$, but we will abuse notation by omitting the ${F}$ dependence.)

We refer to elements of ${\Gamma(X|E)}$ as local stochastic elements of the stochastic set ${X|\Omega}$, localised to the event ${E}$, and elements of ${\Gamma(X|\Omega)}$ as global stochastic elements (or simply elements) of the stochastic set. (In the language of sheaves, one would use “sections” instead of “elements” here, but I prefer to use the latter terminology here, for compatibility with conventional probabilistic notation, where for instance measurable maps from ${\Omega}$ to ${{\bf R}}$ are referred to as real random variables, rather than sections of the reals.)

Furthermore, we impose the following axioms:

• (Category) The map ${x \mapsto x|E}$ from ${\Gamma(X|E)}$ to ${\Gamma(X|E)}$ is the identity map, and if ${E \subset F \subset G}$ are events in ${{\mathcal B}}$, then ${((x|F)|E) = (x|E)}$ for all ${x \in \Gamma(X|G)}$.
• (Null events trivial) If ${E \in {\mathcal B}}$ is a null event, then the set ${\Gamma(X|E)}$ is a singleton set. (In particular, ${\Gamma(X|\emptyset)}$ is always a singleton set; this is analogous to the convention that ${x^0=1}$ for any number ${x}$.)
• (Countable gluing) Suppose that for each natural number ${n}$, one has an event ${E_n \in {\mathcal B}}$ and an element ${x_n \in \Gamma(X|E_n)}$ such that ${x_n|(E_n \cap E_m) = x_m|(E_n \cap E_m)}$ for all ${n,m}$. Then there exists a unique ${x\in \Gamma(X|\bigcup_{n=1}^\infty E_n)}$ such that ${x_n = x|E_n}$ for all ${n}$.

If ${\Omega'}$ is an event in ${\Omega}$, we define the localisation ${X|\Omega'}$ of the stochastic set ${X|\Omega}$ to ${\Omega'}$ to be the stochastic set

$\displaystyle X|\Omega' := (\Gamma(X|E)_{E \in {\mathcal B}; E \subset \Omega'}, ((|E))_{E \subset F \subset \Omega', E,F \in {\mathcal B}})$

relative to ${\Omega'}$. (Note that there is no need to renormalise the measure on ${\Omega'}$, as we are not demanding that our base space have total measure ${1}$.)

The following fact is useful for actually verifying that a given object indeed has the structure of a stochastic set:

Exercise 1 Show that to verify the countable gluing axiom of a stochastic set, it suffices to do so under the additional hypothesis that the events ${E_n}$ are disjoint. (Note that this is quite different from the situation with sheaves over a topological space, in which the analogous gluing axiom is often trivial in the disjoint case but has non-trivial content in the overlapping case. This is ultimately because a ${\sigma}$-algebra is closed under all Boolean operations, whereas a topology is only closed under union and intersection.)

Let us illustrate the concept of a stochastic set with some examples.

Example 1 (Discrete case) A simple case arises when ${\Omega}$ is a discrete space which is at most countable. If we assign a set ${X_\omega}$ to each ${\omega \in \Omega}$, with ${X_\omega}$ a singleton if ${\mu(\{\omega\})=0}$. One then sets ${\Gamma(X|E) := \prod_{\omega \in E} X_\omega}$, with the obvious restriction maps, giving rise to a stochastic set ${X|\Omega}$. (Thus, a local element ${x}$ of ${\Gamma(X|E)}$ can be viewed as a map ${\omega \mapsto x(\omega)}$ on ${E}$ that takes values in ${X_\omega}$ for each ${\omega \in E}$.) Conversely, it is not difficult to see that any stochastic set over an at most countable discrete probability space ${\Omega}$ is of this form up to isomorphism. In this case, one can think of ${X|\Omega}$ as a bundle of sets ${X_\omega}$ over each point ${\omega}$ (of positive probability) in the base space ${\Omega}$. One can extend this bundle interpretation of stochastic sets to reasonably nice sample spaces ${\Omega}$ (such as standard Borel spaces) and similarly reasonable ${X}$; however, I would like to avoid this interpretation in the formalism below in order to be able to easily work in settings in which ${\Omega}$ and ${X}$ are very “large” (e.g. not separable in any reasonable sense). Note that we permit some of the ${X_\omega}$ to be empty, thus it can be possible for ${\Gamma(X|\Omega)}$ to be empty whilst ${\Gamma(X|E)}$ for some strict subevents ${E}$ of ${\Omega}$ to be non-empty. (This is analogous to how it is possible for a sheaf to have local sections but no global sections.) As such, the space ${\Gamma(X|\Omega)}$ of global elements does not completely determine the stochastic set ${X|\Omega}$; one sometimes needs to localise to an event ${E}$ in order to see the full structure of such a set. Thus it is important to distinguish between a stochastic set ${X|\Omega}$ and its space ${\Gamma(X|\Omega)}$ of global elements. (As such, it is a slight abuse of the axiom of extensionality to refer to global elements of ${X|\Omega}$ simply as “elements”, but hopefully this should not cause too much confusion.)

Example 2 (Measurable spaces as stochastic sets) Returning now to a general base space ${\Omega}$, any (deterministic) measurable space ${X}$ gives rise to a stochastic set ${X|\Omega}$, with ${\Gamma(X|E)}$ being defined as in previous discussion as the measurable functions from ${E}$ to ${X}$ modulo almost everywhere equivalence (in particular, ${\Gamma(X|E)}$ a singleton set when ${E}$ is null), with the usual restriction maps. The constraint of measurability on the maps ${x: E \rightarrow \Omega}$, together with the quotienting by almost sure equivalence, means that ${\Gamma(X|E)}$ is now more complicated than a plain Cartesian product ${\prod_{\omega \in E} X_\omega}$ of fibres, but this still serves as a useful first approximation to what ${\Gamma(X|E)}$ is for the purposes of developing intuition. Indeed, the measurability constraint is so weak (as compared for instance to topological or smooth constraints in other contexts, such as sheaves of continuous or smooth sections of bundles) that the intuition of essentially independent fibres is quite an accurate one, at least if one avoids consideration of an uncountable number of objects simultaneously.

Example 3 (Extended Hilbert modules) This example is the one that motivated this post for me. Suppose that one has an extension ${(\tilde \Omega, \tilde {\mathcal B}, \tilde \mu)}$ of the base space ${(\Omega, {\mathcal B},\mu)}$, thus we have a measurable factor map ${\pi: \tilde \Omega \rightarrow \Omega}$ such that the pushforward of the measure ${\tilde \mu}$ by ${\pi}$ is equal to ${\mu}$. Then we have a conditional expectation operator ${\pi_*: L^2(\tilde \Omega,\tilde {\mathcal B},\tilde \mu) \rightarrow L^2(\Omega,{\mathcal B},\mu)}$, defined as the adjoint of the pullback map ${\pi^*: L^2(\Omega,{\mathcal B},\mu) \rightarrow L^2(\tilde \Omega,\tilde {\mathcal B},\tilde \mu)}$. As is well known, the conditional expectation operator also extends to a contraction ${\pi_*: L^1(\tilde \Omega,\tilde {\mathcal B},\tilde \mu) \rightarrow L^1(\Omega,{\mathcal B}, \mu)}$; by monotone convergence we may also extend ${\pi_*}$ to a map from measurable functions from ${\tilde \Omega}$ to the extended non-negative reals ${[0,+\infty]}$, to measurable functions from ${\Omega}$ to ${[0,+\infty]}$. We then define the “extended Hilbert module” ${L^2(\tilde \Omega|\Omega)}$ to be the space of functions ${f \in L^2(\tilde \Omega,\tilde {\mathcal B},\tilde \mu)}$ with ${\pi_*(|f|^2)}$ finite almost everywhere. This is an extended version of the Hilbert module ${L^\infty_{\Omega} L^2(\tilde \Omega|\Omega)}$, which is defined similarly except that ${\pi_*(|f|^2)}$ is required to lie in ${L^\infty(\Omega,{\mathcal B},\mu)}$; this is a Hilbert module over ${L^\infty(\Omega, {\mathcal B}, \mu)}$ which is of particular importance in the Furstenberg-Zimmer structure theory of measure-preserving systems. We can then define the stochastic set ${L^2_\pi(\tilde \Omega)|\Omega}$ by setting

$\displaystyle \Gamma(L^2_\pi(\tilde \Omega)|E) := L^2( \pi^{-1}(E) | E )$

with the obvious restriction maps. In the case that ${\Omega,\Omega'}$ are standard Borel spaces, one can disintegrate ${\mu'}$ as an integral ${\mu' = \int_\Omega \nu_\omega\ d\mu(\omega)}$ of probability measures ${\nu_\omega}$ (supported in the fibre ${\pi^{-1}(\{\omega\})}$), in which case this stochastic set can be viewed as having fibres ${L^2( \tilde \Omega, \tilde {\mathcal B}, \nu_\omega )}$ (though if ${\Omega}$ is not discrete, there are still some measurability conditions in ${\omega}$ on the local and global elements that need to be imposed). However, I am interested in the case when ${\Omega,\Omega'}$ are not standard Borel spaces (in fact, I will take them to be algebraic probability spaces, as defined in this previous post), in which case disintegrations are not available. However, it appears that the stochastic analysis developed in this blog post can serve as a substitute for the tool of disintegration in this context.

We make the remark that if ${X|\Omega}$ is a stochastic set and ${E, F}$ are events that are equivalent up to null events, then one can identify ${\Gamma(X|E)}$ with ${\Gamma(X|F)}$ (through their common restriction to ${\Gamma(X|(E \cap F))}$, with the restriction maps now being bijections). As such, the notion of a stochastic set does not require the full structure of a concrete probability space ${(\Omega, {\mathcal B}, {\mathbf P})}$; one could also have defined the notion using only the abstract ${\sigma}$-algebra consisting of ${{\mathcal B}}$ modulo null events as the base space, or equivalently one could define stochastic sets over the algebraic probability spaces defined in this previous post. However, we will stick with the classical formalism of concrete probability spaces here so as to keep the notation reasonably familiar.

As a corollary of the above observation, we see that if the base space ${\Omega}$ has total measure ${0}$, then all stochastic sets are trivial (they are just points).

Exercise 2 If ${X|\Omega}$ is a stochastic set, show that there exists an event ${\Omega'}$ with the property that for any event ${E}$, ${\Gamma(X|E)}$ is non-empty if and only if ${E}$ is contained in ${\Omega'}$ modulo null events. (In particular, ${\Omega'}$ is unique up to null events.) Hint: consider the numbers ${\mu( E )}$ for ${E}$ ranging over all events with ${\Gamma(X|E)}$ non-empty, and form a maximising sequence for these numbers. Then use all three axioms of a stochastic set.

One can now start take many of the fundamental objects, operations, and results in set theory (and, hence, in most other categories of mathematics) and establish analogues relative to a finite measure space. Implicitly, what we will be doing in the next few paragraphs is endowing the category of stochastic sets with the structure of an elementary topos. However, to keep things reasonably concrete, we will not explicitly emphasise the topos-theoretic formalism here, although it is certainly lurking in the background.

Firstly, we define a stochastic function ${f: X|\Omega \rightarrow Y|\Omega}$ between two stochastic sets ${X|\Omega, Y|\Omega}$ to be a collection of maps ${f: \Gamma(X|E) \rightarrow \Gamma(Y|E)}$ for each ${E \in {\mathcal B}}$ which form a natural transformation in the sense that ${f(x|E) = f(x)|E}$ for all ${x \in \Gamma(X|F)}$ and nested events ${E \subset F}$. In the case when ${\Omega}$ is discrete and at most countable (and after deleting all null points), a stochastic function is nothing more than a collection of functions ${f_\omega: X_\omega \rightarrow Y_\omega}$ for each ${\omega \in \Omega}$, with the function ${f: \Gamma(X|E) \rightarrow \Gamma(Y|E)}$ then being a direct sum of the factor functions ${f_\omega}$:

$\displaystyle f( (x_\omega)_{\omega \in E} ) = ( f_\omega(x_\omega) )_{\omega \in E}.$

Thus (in the discrete, at most countable setting, at least) stochastic functions do not mix together information from different states ${\omega}$ in a sample space; the value of ${f(x)}$ at ${\omega}$ depends only on the value of ${x}$ at ${\omega}$. The situation is a bit more subtle for continuous probability spaces, due to the identification of stochastic objects that agree almost surely, nevertheness it is still good intuition to think of stochastic functions as essentially being “pointwise” or “local” in nature.

One can now form the stochastic set ${\hbox{Hom}(X \rightarrow Y)|\Omega}$ of functions from ${X|\Omega}$ to ${Y|\Omega}$, by setting ${\Gamma(\hbox{Hom}(X \rightarrow Y)|E)}$ for any event ${E}$ to be the set of local stochastic functions ${f: X|E \rightarrow Y|E}$ of the localisations of ${X|\Omega, Y|\Omega}$ to ${E}$; this is a stochastic set if we use the obvious restriction maps. In the case when ${\Omega}$ is discrete and at most countable, the fibre ${\hbox{Hom}(X \rightarrow Y)_\omega}$ at a point ${\omega}$ of positive measure is simply the set ${Y_\omega^{X_\omega}}$ of functions from ${X_\omega}$ to ${Y_\omega}$.

In a similar spirit, we say that one stochastic set ${Y|\Omega}$ is a (stochastic) subset of another ${X|\Omega}$, and write ${Y|\Omega \subset X|\Omega}$, if we have a stochastic inclusion map, thus ${\Gamma(Y|E) \subset \Gamma(X|E)}$ for all events ${E}$, with the restriction maps being compatible. We can then define the power set ${2^X|\Omega}$ of a stochastic set ${X|\Omega}$ by setting ${\Gamma(2^X|E)}$ for any event ${E}$ to be the set of all stochastic subsets ${Y|E}$ of ${X|E}$ relative to ${E}$; it is easy to see that ${2^X|\Omega}$ is a stochastic set with the obvious restriction maps (one can also identify ${2^X|\Omega}$ with ${\hbox{Hom}(X, \{\hbox{true},\hbox{false}\})|\Omega}$ in the obvious fashion). Again, when ${\Omega}$ is discrete and at most countable, the fibre of ${2^X|\Omega}$ at a point ${\omega}$ of positive measure is simply the deterministic power set ${2^{X_\omega}}$.

Note that if ${f: X|\Omega \rightarrow Y|\Omega}$ is a stochastic function and ${Y'|\Omega}$ is a stochastic subset of ${Y|\Omega}$, then the inverse image ${f^{-1}(Y')|\Omega}$, defined by setting ${\Gamma(f^{-1}(Y')|E)}$ for any event ${E}$ to be the set of those ${x \in \Gamma(X|E)}$ with ${f(x) \in \Gamma(Y'|E)}$, is a stochastic subset of ${X|\Omega}$. In particular, given a ${k}$-ary relation ${R: X_1 \times \dots \times X_k|\Omega \rightarrow \{\hbox{true}, \hbox{false}\}|\Omega}$, the inverse image ${R^{-1}( \{ \hbox{true} \}|\Omega )}$ is a stochastic subset of ${X_1 \times \dots \times X_k|\Omega}$, which by abuse of notation we denote as

$\displaystyle \{ (x_1,\dots,x_k) \in X_1 \times \dots \times X_k: R(x_1,\dots,x_k) \hbox{ is true} \}|\Omega.$

In a similar spirit, if ${X'|\Omega}$ is a stochastic subset of ${X|\Omega}$ and ${f: X|\Omega \rightarrow Y|\Omega}$ is a stochastic function, we can define the image ${f(X')|\Omega}$ by setting ${\Gamma(f(X')|E)}$ to be the set of those ${f(x)}$ with ${x \in \Gamma(X'|E)}$; one easily verifies that this is a stochastic subset of ${Y|\Omega}$.

Remark 2 One should caution that in the definition of the subset relation ${Y|\Omega \subset X|\Omega}$, it is important that ${\Gamma(Y|E) \subset \Gamma(X|E)}$ for all events ${E}$, not just the global event ${\Omega}$; in particular, just because a stochastic set ${X|\Omega}$ has no global sections, does not mean that it is contained in the stochastic empty set ${\emptyset|\Omega}$.

Now we discuss Boolean operations on stochastic subsets of a given stochastic set ${X|\Omega}$. Given two stochastic subsets ${X_1|\Omega, X_2|\Omega}$ of ${X|\Omega}$, the stochastic intersection ${(X_1 \cap X_2)|\Omega}$ is defined by setting ${\Gamma((X_1 \cap X_2)|E)}$ to be the set of ${x \in \Gamma(X|E)}$ that lie in both ${\Gamma(X_1|E)}$ and ${\Gamma(X_2|E)}$:

$\displaystyle \Gamma(X_1 \cap X_2)|E) := \Gamma(X_1|E) \cap \Gamma(X_2|E).$

This is easily verified to again be a stochastic subset of ${X|\Omega}$. More generally one may define stochastic countable intersections ${(\bigcap_{n=1}^\infty X_n)|\Omega}$ for any sequence ${X_n|\Omega}$ of stochastic subsets of ${X|\Omega}$. One could extend this definition to uncountable families if one wished, but I would advise against it, because some of the usual laws of Boolean algebra (e.g. the de Morgan laws) may break down in this setting.

Stochastic unions are a bit more subtle. The set ${\Gamma((X_1 \cup X_2)|E)}$ should not be defined to simply be the union of ${\Gamma(X_1|E)}$ and ${\Gamma(X_2|E)}$, as this would not respect the gluing axiom. Instead, we define ${\Gamma((X_1 \cup X_2)|E)}$ to be the set of all ${x \in \Gamma(X|E)}$ such that one can cover ${E}$ by measurable subevents ${E_1,E_2}$ such that ${x_i|E_i \in \Gamma(X_i|E_i)}$ for ${i=1,2}$; then ${(X_1 \cup X_2)|\Omega}$ may be verified to be a stochastic subset of ${X|\Omega}$. Thus for instance ${\{0,1\}|\Omega}$ is the stochastic union of ${\{0\}|\Omega}$ and ${\{1\}|\Omega}$. Similarly for countable unions ${(\bigcup_{n=1}^\infty X_n)|\Omega}$ of stochastic subsets ${X_n|\Omega}$ of ${X|\Omega}$, although for uncountable unions are extremely problematic (they are disliked by both the measure theory and the countable gluing axiom) and will not be defined here. Finally, the stochastic difference set ${\Gamma((X_1 \backslash X_2)|E)}$ is defined as the set of all ${x|E}$ in ${\Gamma(X_1|E)}$ such that ${x|F \not \in \Gamma(X_2|F)}$ for any subevent ${F}$ of ${E}$ of positive probability. One may verify that in the case when ${\Omega}$ is discrete and at most countable, these Boolean operations correspond to the classical Boolean operations applied separately to each fibre ${X_{i,\omega}}$ of the relevant sets ${X_i}$. We also leave as an exercise to the reader to verify the usual laws of Boolean arithmetic, e.g. the de Morgan laws, provided that one works with at most countable unions and intersections.

One can also consider a stochastic finite union ${(\bigcup_{n=1}^N X_n)|\Omega}$ in which the number ${N}$ of sets in the union is itself stochastic. More precisely, let ${X|\Omega}$ be a stochastic set, let ${N \in {\bf N}|\Omega}$ be a stochastic natural number, and let ${n \mapsto X_n|\Omega}$ be a stochastic function from the stochastic set ${\{ n \in {\bf N}: n \leq N\}|\Omega}$ (defined by setting ${\Gamma(\{n \in {\bf N}: n\leq N\}|E) := \{ n \in {\bf N}|E: n \leq N|E\}}$)) to the stochastic power set ${2^X|\Omega}$. Here we are considering ${0}$ to be a natural number, to allow for unions that are possibly empty, with ${{\bf N}_+ := {\bf N} \backslash \{0\}}$ used for the positive natural numbers. We also write ${(X_n)_{n=1}^N|\Omega}$ for the stochastic function ${n \mapsto X_n|\Omega}$. Then we can define the stochastic union ${\bigcup_{n=1}^N X_n|\Omega}$ by setting ${\Gamma(\bigcup_{n=1}^N X_n|E)}$ for an event ${E}$ to be the set of local elements ${x \in \Gamma(X|E)}$ with the property that there exists a covering of ${E}$ by measurable subevents ${E_{n_0}}$ for ${n_0 \in {\bf N}_+}$, such that one has ${n_0 \leq N|E_{n_0}}$ and ${x|E_{n_0} \in \Gamma(X_{n_0}|E_{n_0})}$. One can verify that ${\bigcup_{n=1}^N X_n|\Omega}$ is a stochastic set (with the obvious restriction maps). Again, in the model case when ${\Omega}$ is discrete and at most countable, the fibre ${(\bigcup_{n=1}^N X_n)_\omega}$ is what one would expect it to be, namely ${\bigcup_{n=1}^{N(\omega)} (X_n)_\omega}$.

The Cartesian product ${(X \times Y)|\Omega}$ of two stochastic sets may be defined by setting ${\Gamma((X \times Y)|E) := \Gamma(X|E) \times \Gamma(Y|E)}$ for all events ${E}$, with the obvious restriction maps; this is easily seen to be another stochastic set. This lets one define the concept of a ${k}$-ary operation ${f: (X_1 \times \dots \times X_k)|\Omega \rightarrow Y|\Omega}$ from ${k}$ stochastic sets ${X_1,\dots,X_k}$ to another stochastic set ${Y}$, or a ${k}$-ary relation ${R: (X_1 \times \dots \times X_k)|\Omega \rightarrow \{\hbox{true}, \hbox{false}\}|\Omega}$. In particular, given ${x_i \in X_i|\Omega}$ for ${i=1,\dots,k}$, the relation ${R(x_1,\dots,x_k)}$ may be deterministically true, deterministically false, or have some other stochastic truth value.

Remark 3 In the degenerate case when ${\Omega}$ is null, stochastic logic becomes a bit weird: all stochastic statements are deterministically true, as are their stochastic negations, since every event in ${\Omega}$ (even the empty set) now holds with full probability. Among other pathologies, the empty set now has a global element over ${\Omega}$ (this is analogous to the notorious convention ${0^0=1}$), and any two deterministic objects ${x,y}$ become equal over ${\Omega}$: ${x|\Omega=y|\Omega}$.

The following simple observation is crucial to subsequent discussion. If ${(x_n)_{n \in {\bf N}_+}}$ is a sequence taking values in the global elements ${\Gamma(X|\Omega)}$ of a stochastic space ${X|\Omega}$, then we may also define global elements ${x_n \in \Gamma(X|\Omega)}$ for stochastic indices ${n \in {\bf N}_+|\Omega}$ as well, by appealing to the countable gluing axiom to glue together ${x_{n_0}}$ restricted to the set ${\{ \omega \in \Omega: n(\omega) = n_0\}}$ for each deterministic natural number ${n_0}$ to form ${x_n}$. With this definition, the map ${n \mapsto x_n}$ is a stochastic function from ${{\bf N}_+|\Omega}$ to ${X|\Omega}$; indeed, this creates a one-to-one correspondence between external sequences (maps ${n \mapsto x_n}$ from ${{\bf N}_+}$ to ${\Gamma(X|\Omega)}$) and stochastic sequences (stochastic functions ${n \mapsto x_n}$ from ${{\bf N}_+|\Omega}$ to ${X|\Omega}$). Similarly with ${{\bf N}_+}$ replaced by any other at most countable set. This observation will be important in allowing many deterministic arguments involving sequences will be able to be carried over to the stochastic setting.

We now specialise from the extremely broad discipline of set theory to the more focused discipline of real analysis. There are two fundamental axioms that underlie real analysis (and in particular distinguishes it from real algebra). The first is the Archimedean property, which we phrase in the “no infinitesimal” formulation as follows:

Proposition 2 (Archimedean property) Let ${x \in {\bf R}}$ be such that ${x \leq 1/n}$ for all positive natural numbers ${n}$. Then ${x \leq 0}$.

The other is the least upper bound axiom:

Proposition 3 (Least upper bound axiom) Let ${S}$ be a non-empty subset of ${{\bf R}}$ which has an upper bound ${M \in {\bf R}}$, thus ${x \leq M}$ for all ${x \in S}$. Then there exists a unique real number ${\sup S \in {\bf R}}$ with the following properties:

• ${x \leq \sup S}$ for all ${x \in S}$.
• For any real ${L < \sup S}$, there exists ${x \in S}$ such that ${L < x \leq \sup S}$.
• ${\sup S \leq M}$.

Furthermore, ${\sup S}$ does not depend on the choice of ${M}$.

The Archimedean property extends easily to the stochastic setting:

Proposition 4 (Stochastic Archimedean property) Let ${x \in \Gamma({\bf R}|\Omega)}$ be such that ${x \leq 1/n}$ for all deterministic natural numbers ${n}$. Then ${x \leq 0}$.

Remark 4 Here, incidentally, is one place in which this stochastic formalism deviates from the nonstandard analysis formalism, as the latter certainly permits the existence of infinitesimal elements. On the other hand, we caution that stochastic real numbers are permitted to be unbounded, so that formulation of Archimedean property is not valid in the stochastic setting.

The proof is easy and is left to the reader. The least upper bound axiom also extends nicely to the stochastic setting, but the proof requires more work (in particular, our argument uses the monotone convergence theorem):

Theorem 5 (Stochastic least upper bound axiom) Let ${S|\Omega}$ be a stochastic subset of ${{\bf R}|\Omega}$ which has a global upper bound ${M \in {\bf R}|\Omega}$, thus ${x \leq M}$ for all ${x \in \Gamma(S|\Omega)}$, and is globally non-empty in the sense that there is at least one global element ${x \in \Gamma(S|\Omega)}$. Then there exists a unique stochastic real number ${\sup S \in \Gamma({\bf R}|\Omega)}$ with the following properties:

• ${x \leq \sup S}$ for all ${x \in \Gamma(S|\Omega)}$.
• For any stochastic real ${L < \sup S}$, there exists ${x \in \Gamma(S|\Omega)}$ such that ${L < x \leq \sup S}$.
• ${\sup S \leq M}$.

Furthermore, ${\sup S}$ does not depend on the choice of ${M}$.

For future reference, we note that the same result holds with ${{\bf R}}$ replaced by ${{\bf N} \cup \{+\infty\}}$ throughout, since the latter may be embedded in the former, for instance by mapping ${n}$ to ${1 - \frac{1}{n+1}}$ and ${+\infty}$ to ${1}$. In applications, the above theorem serves as a reasonable substitute for the countable axiom of choice, which does not appear to hold in unrestricted generality relative to a measure space; in particular, it can be used to generate various extremising sequences for stochastic functionals on various stochastic function spaces.

Proof: Uniqueness is clear (using the Archimedean property), as well as the independence on ${M}$, so we turn to existence. By using an order-preserving map from ${{\bf R}}$ to ${(-1,1)}$ (e.g. ${x \mapsto \frac{2}{\pi} \hbox{arctan}(x)}$) we may assume that ${S|\Omega}$ is a subset of ${(-1,1)|\Omega}$, and that ${M < 1}$.

We observe that ${\Gamma(S|\Omega)}$ is a lattice: if ${x, y \in \Gamma(S|\Omega)}$, then ${\max(x,y)}$ and ${\min(x,y)}$ also lie in ${\Gamma(S|\Omega)}$. Indeed, ${\max(x,y)}$ may be formed by appealing to the countable gluing axiom to glue ${y}$ (restricted the set ${\{ \omega \in \Omega: x(\omega) < y(\omega) \}}$) with ${x}$ (restricted to the set ${\{ \omega \in \Omega: x(\omega) \geq y(\omega) \}}$), and similarly for ${\min(x,y)}$. (Here we use the fact that relations such as ${<}$ are Borel measurable on ${{\bf R}}$.)

Let ${A \in {\bf R}}$ denote the deterministic quantity

$\displaystyle A := \sup \{ \int_\Omega x(\omega)\ d\mu(\omega): x \in \Gamma(S|\Omega) \}$

then (by Proposition 3!) ${A}$ is well-defined; here we use the hypothesis that ${\mu(\Omega)}$ is finite. Thus we may find a sequence ${(x_n)_{n \in {\bf N}}}$ of elements ${x_n}$ of ${\Gamma(S|\Omega)}$ such that

$\displaystyle \int_\Omega x_n(\omega)\ d\mu(\omega) \rightarrow A \hbox{ as } n \rightarrow \infty. \ \ \ \ \ (1)$

Using the lattice property, we may assume that the ${x_n}$ are non-decreasing: ${x_n \leq x_m}$ whenever ${n \leq m}$. If we then define ${\sup S(\omega) := \sup_n x_n(\omega)}$ (after choosing measurable representatives of each equivalence class ${x_n}$), then ${\sup S}$ is a stochastic real with ${\sup S \leq M}$.

If ${x \in \Gamma(S|\Omega)}$, then ${\max(x,x_n) \in \Gamma(S|\Omega)}$, and so

$\displaystyle \int_\Omega \max(x,x_n)\ d\mu(\omega) \leq A.$

From this and (1) we conclude that

$\displaystyle \int_\Omega \max(x-x_n,0) \rightarrow 0 \hbox{ as } n \rightarrow \infty.$

From monotone convergence, we conclude that

$\displaystyle \int_\Omega \max(x-\sup S,0) = 0$

and so ${x \leq \sup S}$, as required.

Now let ${L < \sup S}$ be a stochastic real. After choosing measurable representatives of each relevant equivalence class, we see that for almost every ${\omega \in \Omega}$, we can find a natural number ${n(\omega)}$ with ${x_{n(\omega)} > L}$. If we choose ${n(\omega)}$ to be the first such positive natural number when it exists, and (say) ${1}$ otherwise, then ${n}$ is a stochastic positive natural number and ${L < x_n}$. The claim follows. $\Box$

Remark 5 One can abstract away the role of the measure ${\mu}$ here, leaving only the ideal of null sets. The property that the measure is finite is then replaced by the more general property that given any non-empty family of measurable sets, there is an at most countable union of sets in that family that is an upper bound modulo null sets for all elements in that faily.

Using Proposition 4 and Theorem 5, one can then revisit many of the other foundational results of deterministic real analysis, and develop stochastic analogues; we give some examples of this below the fold (focusing on the Heine-Borel theorem and a case of the spectral theorem). As an application of this formalism, we revisit some of the Furstenberg-Zimmer structural theory of measure-preserving systems, particularly that of relatively compact and relatively weakly mixing systems, and interpret them in this framework, basically as stochastic versions of compact and weakly mixing systems (though with the caveat that the shift map is allowed to act non-trivially on the underlying probability space). As this formalism is “point-free”, in that it avoids explicit use of fibres and disintegrations, it will be well suited for generalising this structure theory to settings in which the underlying probability spaces are not standard Borel, and the underlying groups are uncountable; I hope to discuss such generalisations in future blog posts.

Remark 6 Roughly speaking, stochastic real analysis can be viewed as a restricted subset of classical real analysis in which all operations have to be “measurable” with respect to the base space. In particular, indiscriminate application of the axiom of choice is not permitted, and one should largely restrict oneself to performing countable unions and intersections rather than arbitrary unions or intersections. Presumably one can formalise this intuition with a suitable “countable transfer principle”, but I was not able to formulate a clean and general principle of this sort, instead verifying various assertions about stochastic objects by hand rather than by direct transfer from the deterministic setting. However, it would be desirable to have such a principle, since otherwise one is faced with the tedious task of redoing all the foundations of real analysis (or whatever other base theory of mathematics one is going to be working in) in the stochastic setting by carefully repeating all the arguments.

More generally, topos theory is a good formalism for capturing precisely the informal idea of performing mathematics with certain operations, such as the axiom of choice, the law of the excluded middle, or arbitrary unions and intersections, being somehow “prohibited” or otherwise “restricted”.

LLet ${L: H \rightarrow H}$ be a self-adjoint operator on a finite-dimensional Hilbert space ${H}$. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence ${\lambda_1,\ldots,\lambda_n \in {\bf R}}$ of eigenvalues and an orthonormal basis ${e_1,\ldots,e_n}$ of eigenfunctions such that ${L e_i = \lambda_i e_i}$ for all ${i=1,\ldots,n}$. In particular, given any function ${m: \sigma(L) \rightarrow {\bf C}}$ on the spectrum ${\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}}$ of ${L}$, one can then define the linear operator ${m(L): H \rightarrow H}$ by the formula

$\displaystyle m(L) e_i := m(\lambda_i) e_i,$

which then gives a functional calculus, in the sense that the map ${m \mapsto m(L)}$ is a ${C^*}$-algebra isometric homomorphism from the algebra ${BC(\sigma(L) \rightarrow {\bf C})}$ of bounded continuous functions from ${\sigma(L)}$ to ${{\bf C}}$, to the algebra ${B(H \rightarrow H)}$ of bounded linear operators on ${H}$. Thus, for instance, one can define heat operators ${e^{-tL}}$ for ${t>0}$, Schrödinger operators ${e^{itL}}$ for ${t \in {\bf R}}$, resolvents ${\frac{1}{L-z}}$ for ${z \not \in \sigma(L)}$, and (if ${L}$ is positive) wave operators ${e^{it\sqrt{L}}}$ for ${t \in {\bf R}}$. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with ${L}$ positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

$\displaystyle u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)$

the Schrödinger equation

$\displaystyle u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)$

the wave equation

$\displaystyle u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)$

or the Helmholtz equation

$\displaystyle (L-z) u = f. \ \ \ \ \ (4)$

The functional calculus can also be associated to a spectral measure. Indeed, for any vectors ${f, g \in H}$, there is a complex measure ${\mu_{f,g}}$ on ${\sigma(L)}$ with the property that

$\displaystyle \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);$

indeed, one can set ${\mu_{f,g}}$ to be the discrete measure on ${\sigma(L)}$ defined by the formula

$\displaystyle \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.$

One can also view this complex measure as a coefficient

$\displaystyle \mu_{f,g} = \langle \mu f, g \rangle_H$

of a projection-valued measure ${\mu}$ on ${\sigma(L)}$, defined by setting

$\displaystyle \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.$

Finally, one can view ${L}$ as unitarily equivalent to a multiplication operator ${M: f \mapsto g f}$ on ${\ell^2(\{1,\ldots,n\})}$, where ${g}$ is the real-valued function ${g(i) := \lambda_i}$, and the intertwining map ${U: \ell^2(\{1,\ldots,n\}) \rightarrow H}$ is given by

$\displaystyle U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,$

so that ${L = U M U^{-1}}$.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space ${H}$, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators ${L: H \rightarrow H}$. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators ${L: D \rightarrow H}$, which (initially, at least), are only defined on a dense subspace ${D}$ of the Hilbert space ${H}$. A very typical situation arises when ${H = L^2(\Omega)}$ is the square-integrable functions on some domain or manifold ${\Omega}$ (which may have a boundary or be otherwise “incomplete”), and ${D = C^\infty_c(\Omega)}$ are the smooth compactly supported functions on ${\Omega}$, and ${L}$ is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as ${e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}}$ or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator ${L: D \rightarrow H}$ must be imposed. The most obvious is that of symmetry, which asserts that

$\displaystyle \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)$

for all ${f, g \in D}$. In some applications, one also wants to impose positive definiteness, which asserts that

$\displaystyle \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)$

for all ${f \in D}$. These hypotheses are sufficient in the case when ${L}$ is bounded, and in particular when ${H}$ is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents ${(L-z)^{-1}}$ are well-defined for any strictly complex ${z}$, which by duality implies that the image of ${L-z}$ should be dense in ${H}$. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when ${H}$ is the Hilbert space ${H := L^2((0,1))}$, ${D := C^\infty_c((0,1))}$ is the space of test functions, and ${L}$ is the one-dimensional Laplacian ${L := -\frac{d^2}{dx^2}}$. Then ${L}$ is symmetric and positive, but the operator ${L-k^2}$ does not have dense image for any complex ${k}$, since

$\displaystyle \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0$

for all test functions ${f \in C^\infty_c((0,1))}$, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when ${H := L^2((0,+\infty))}$, ${D := C^\infty_c((0,+\infty))}$, ${L}$ is the momentum operator ${L := i \frac{d}{dx}}$. Then the resolvent ${(L-z)^{-1}}$ can be uniquely defined for ${z}$ in the upper half-plane, but not in the lower half-plane, due to the obstruction

$\displaystyle \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0$

for all test functions ${f}$ (note that the function ${e^{i\bar{z} x}}$ lies in ${L^2((0,+\infty))}$ when ${z}$ is in the lower half-plane). For related reasons, the translation operators ${e^{itL}}$ have a problem with either uniqueness or existence (depending on whether ${t}$ is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once ${L}$ is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing ${L}$ on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator ${L}$, then ${L}$ is self-adjoint. Examples of “expected consequences” include:

• Existence of resolvents ${(L-z)^{-1}}$ (or equivalently, dense image for ${L-z}$);
• Existence of a contractive heat propagator semigroup ${e^{tL}}$ (in the positive case);
• Existence of a unitary Schrödinger propagator group ${e^{itL}}$;
• Existence of a unitary wave propagator group ${e^{it\sqrt{L}}}$ (in the positive case);
• Existence of a “reasonable” functional calculus.
• Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator ${L = -\Delta_g}$ on a smooth Riemannian manifold ${(M,g)}$ (using ${C^\infty_c(M)}$ as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains ${(0,1)}$ and ${(0,+\infty)}$ in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

In the foundations of modern probability, as laid out by Kolmogorov, the basic objects of study are constructed in the following order:

1. Firstly, one selects a sample space ${\Omega}$, whose elements ${\omega}$ represent all the possible states that one’s stochastic system could be in.
2. Then, one selects a ${\sigma}$-algebra ${{\mathcal B}}$ of events ${E}$ (modeled by subsets of ${\Omega}$), and assigns each of these events a probability ${{\bf P}(E) \in [0,1]}$ in a countably additive manner, so that the entire sample space has probability ${1}$.
3. Finally, one builds (commutative) algebras of random variables ${X}$ (such as complex-valued random variables, modeled by measurable functions from ${\Omega}$ to ${{\bf C}}$), and (assuming suitable integrability or moment conditions) one can assign expectations ${\mathop{\bf E} X}$ to each such random variable.

In measure theory, the underlying measure space ${\Omega}$ plays a prominent foundational role, with the measurable sets and measurable functions (the analogues of the events and the random variables) always being viewed as somehow being attached to that space. In probability theory, in contrast, it is the events and their probabilities that are viewed as being fundamental, with the sample space ${\Omega}$ being abstracted away as much as possible, and with the random variables and expectations being viewed as derived concepts. See Notes 0 for further discussion of this philosophy.

However, it is possible to take the abstraction process one step further, and view the algebra of random variables and their expectations as being the foundational concept, and ignoring both the presence of the original sample space, the algebra of events, or the probability measure.

There are two reasons for wanting to shed (or abstract away) these previously foundational structures. Firstly, it allows one to more easily take certain types of limits, such as the large ${n}$ limit ${n \rightarrow \infty}$ when considering ${n \times n}$ random matrices, because quantities built from the algebra of random variables and their expectations, such as the normalised moments of random matrices tend to be quite stable in the large ${n}$ limit (as we have seen in previous notes), even as the sample space and event space varies with ${n}$. (This theme of using abstraction to facilitate the taking of the large ${n}$ limit also shows up in the application of ergodic theory to combinatorics via the correspondence principle; see this previous blog post for further discussion.)

Secondly, this abstract formalism allows one to generalise the classical, commutative theory of probability to the more general theory of non-commutative probability theory, which does not have a classical underlying sample space or event space, but is instead built upon a (possibly) non-commutative algebra of random variables (or “observables”) and their expectations (or “traces”). This more general formalism not only encompasses classical probability, but also spectral theory (with matrices or operators taking the role of random variables, and the trace taking the role of expectation), random matrix theory (which can be viewed as a natural blend of classical probability and spectral theory), and quantum mechanics (with physical observables taking the role of random variables, and their expected value on a given quantum state being the expectation). It is also part of a more general “non-commutative way of thinking” (of which non-commutative geometry is the most prominent example), in which a space is understood primarily in terms of the ring or algebra of functions (or function-like objects, such as sections of bundles) placed on top of that space, and then the space itself is largely abstracted away in order to allow the algebraic structures to become less commutative. In short, the idea is to make algebra the foundation of the theory, as opposed to other possible choices of foundations such as sets, measures, categories, etc..

[Note that this foundational preference is to some extent a metamathematical one rather than a mathematical one; in many cases it is possible to rewrite the theory in a mathematically equivalent form so that some other mathematical structure becomes designated as the foundational one, much as probability theory can be equivalently formulated as the measure theory of probability measures. However, this does not negate the fact that a different choice of foundations can lead to a different way of thinking about the subject, and thus to ask a different set of questions and to discover a different set of proofs and solutions. Thus it is often of value to understand multiple foundational perspectives at once, to get a truly stereoscopic view of the subject.]

It turns out that non-commutative probability can be modeled using operator algebras such as ${C^*}$-algebras, von Neumann algebras, or algebras of bounded operators on a Hilbert space, with the latter being accomplished via the Gelfand-Naimark-Segal construction. We will discuss some of these models here, but just as probability theory seeks to abstract away its measure-theoretic models, the philosophy of non-commutative probability is also to downplay these operator algebraic models once some foundational issues are settled.

When one generalises the set of structures in one’s theory, for instance from the commutative setting to the non-commutative setting, the notion of what it means for a structure to be “universal”, “free”, or “independent” can change. The most familiar example of this comes from group theory. If one restricts attention to the category of abelian groups, then the “freest” object one can generate from two generators ${e,f}$ is the free abelian group of commutative words ${e^n f^m}$ with ${n,m \in {\bf Z}}$, which is isomorphic to the group ${{\bf Z}^2}$. If however one generalises to the non-commutative setting of arbitrary groups, then the “freest” object that can now be generated from two generators ${e,f}$ is the free group ${{\Bbb F}_2}$ of non-commutative words ${e^{n_1} f^{m_1} \ldots e^{n_k} f^{m_k}}$ with ${n_1,m_1,\ldots,n_k,m_k \in {\bf Z}}$, which is a significantly larger extension of the free abelian group ${{\bf Z}^2}$.

Similarly, when generalising classical probability theory to non-commutative probability theory, the notion of what it means for two or more random variables to be independent changes. In the classical (commutative) setting, two (bounded, real-valued) random variables ${X, Y}$ are independent if one has

$\displaystyle \mathop{\bf E} f(X) g(Y) = 0$

whenever ${f, g: {\bf R} \rightarrow {\bf R}}$ are well-behaved functions (such as polynomials) such that ${\mathop{\bf E} f(X)}$, ${\mathop{\bf E} g(Y)}$ both vanish. In the non-commutative setting, one can generalise the above definition to two commuting bounded self-adjoint variables; this concept is useful for instance in quantum probability, which is an abstraction of the theory of observables in quantum mechanics. But for two (bounded, self-adjoint) non-commutative random variables ${X, Y}$, the notion of classical independence no longer applies. As a substitute, one can instead consider the notion of being freely independent (or free for short), which means that

$\displaystyle \mathop{\bf E} f_1(X) g_1(Y) \ldots f_k(X) g_k(Y) = 0$

whenever ${f_1,g_1,\ldots,f_k,g_k: {\bf R} \rightarrow {\bf R}}$ are well-behaved functions such that all of ${\mathop{\bf E} f_1(X), \mathop{\bf E} g_1(Y), \ldots, \mathop{\bf E} f_k(X), \mathop{\bf E} g_k(Y)}$ vanish.

The concept of free independence was introduced by Voiculescu, and its study is now known as the subject of free probability. We will not attempt a systematic survey of this subject here; for this, we refer the reader to the surveys of Speicher and of Biane. Instead, we shall just discuss a small number of topics in this area to give the flavour of the subject only.

The significance of free probability to random matrix theory lies in the fundamental observation that random matrices which are independent in the classical sense, also tend to be independent in the free probability sense, in the large ${n}$ limit ${n \rightarrow \infty}$. (This is only possible because of the highly non-commutative nature of these matrices; as we shall see, it is not possible for non-trivial commuting independent random variables to be freely independent.) Because of this, many tedious computations in random matrix theory, particularly those of an algebraic or enumerative combinatorial nature, can be done more quickly and systematically by using the framework of free probability, which by design is optimised for algebraic tasks rather than analytical ones.

Much as free groups are in some sense “maximally non-commutative”, freely independent random variables are about as far from being commuting as possible. For instance, if ${X, Y}$ are freely independent and of expectation zero, then ${\mathop{\bf E} XYXY}$ vanishes, but ${\mathop{\bf E} XXYY}$ instead factors as ${(\mathop{\bf E} X^2) (\mathop{\bf E} Y^2)}$. As a consequence, the behaviour of freely independent random variables can be quite different from the behaviour of their classically independent commuting counterparts. Nevertheless there is a remarkably strong analogy between the two types of independence, in that results which are true in the classically independent case often have an interesting analogue in the freely independent setting. For instance, the central limit theorem (Notes 2) for averages of classically independent random variables, which roughly speaking asserts that such averages become gaussian in the large ${n}$ limit, has an analogue for averages of freely independent variables, the free central limit theorem, which roughly speaking asserts that such averages become semicircular in the large ${n}$ limit. One can then use this theorem to provide yet another proof of Wigner’s semicircle law (Notes 4).

Another important (and closely related) analogy is that while the distribution of sums of independent commutative random variables can be quickly computed via the characteristic function (i.e. the Fourier transform of the distribution), the distribution of sums of freely independent non-commutative random variables can be quickly computed using the Stieltjes transform instead (or with closely related objects, such as the ${R}$-transform of Voiculescu). This is strongly reminiscent of the appearance of the Stieltjes transform in random matrix theory, and indeed we will see many parallels between the use of the Stieltjes transform here and in Notes 4.

As mentioned earlier, free probability is an excellent tool for computing various expressions of interest in random matrix theory, such as asymptotic values of normalised moments in the large ${n}$ limit ${n \rightarrow \infty}$. Nevertheless, as it only covers the asymptotic regime in which ${n}$ is sent to infinity while holding all other parameters fixed, there are some aspects of random matrix theory to which the tools of free probability are not sufficient by themselves to resolve (although it can be possible to combine free probability theory with other tools to then answer these questions). For instance, questions regarding the rate of convergence of normalised moments as ${n \rightarrow \infty}$ are not directly answered by free probability, though if free probability is combined with tools such as concentration of measure (Notes 1) then such rate information can often be recovered. For similar reasons, free probability lets one understand the behaviour of ${k^{th}}$ moments as ${n \rightarrow \infty}$ for fixed ${k}$, but has more difficulty dealing with the situation in which ${k}$ is allowed to grow slowly in ${n}$ (e.g. ${k = O(\log n)}$). Because of this, free probability methods are effective at controlling the bulk of the spectrum of a random matrix, but have more difficulty with the edges of that spectrum (as well as with related concepts such as the operator norm, Notes 3) as well as with fine-scale structure of the spectrum. Finally, free probability methods are most effective when dealing with matrices that are Hermitian with bounded operator norm, largely because the spectral theory of bounded self-adjoint operators in the infinite-dimensional setting of the large ${n}$ limit is non-pathological. (This is ultimately due to the stable nature of eigenvalues in the self-adjoint setting; see this previous blog post for discussion.) For non-self-adjoint operators, free probability needs to be augmented with additional tools, most notably by bounds on least singular values, in order to recover the required stability for the various spectral data of random matrices to behave continuously with respect to the large ${n}$ limit. We will discuss this latter point in a later set of notes.

I was asked recently (in relation to my recent work with Van Vu on the spectral theory of random matrices) to explain some standard heuristics regarding how the eigenvalues $\lambda_1,\ldots,\lambda_n$ of an $n \times n$ matrix A behave under small perturbations.  These heuristics can be summarised as follows:

1. For normal matrices (and in particular, unitary or self-adjoint matrices), eigenvalues are very stable under small perturbations.  For more general matrices, eigenvalues can become unstable if there is pseudospectrum present.
2. For self-adjoint (Hermitian) matrices, eigenvalues that are too close together tend to repel quickly from each other under such perturbations.  For more general matrices, eigenvalues can either be repelled or be attracted to each other, depending on the type of perturbation.

In this post I would like to briefly explain why these heuristics are plausible.

We now begin our study of measure-preserving systems $(X, {\mathcal X}, \mu, T)$, i.e. a probability space $(X, {\mathcal X}, \mu)$ together with a probability space isomorphism $T: (X, {\mathcal X}, \mu) \to (X, {\mathcal X}, \mu)$ (thus $T: X \to X$ is invertible, with T and $T^{-1}$ both being measurable, and $\mu(T^n E) = \mu(E)$ for all $E \in {\mathcal X}$ and all n). For various technical reasons it is convenient to restrict to the case when the $\sigma$-algebra ${\mathcal X}$ is separable, i.e. countably generated. One reason for this is as follows:

Exercise 1. Let $(X, {\mathcal X}, \mu)$ be a probability space with ${\mathcal X}$ separable. Then the Banach spaces $L^p(X, {\mathcal X}, \mu)$ are separable (i.e. have a countable dense subset) for every $1 \leq p < \infty$; in particular, the Hilbert space $L^2(X, {\mathcal X}, \mu)$ is separable. Show that the claim can fail for $p = \infty$. (We allow the $L^p$ spaces to be either real or complex valued, unless otherwise specified.) $\diamond$

Remark 1. In practice, the requirement that ${\mathcal X}$ be separable is not particularly onerous. For instance, if one is studying the recurrence properties of a function $f: X \to {\Bbb R}$ on a non-separable measure-preserving system $(X, {\mathcal X}, \mu, T)$, one can restrict ${\mathcal X}$ to the separable sub-$\sigma$-algebra ${\mathcal X}'$ generated by the level sets $\{ x \in X: T^n f(x) > q \}$ for integer n and rational q, thus passing to a separable measure-preserving system $(X, {\mathcal X}', \mu, T)$ on which f is still measurable. Thus we see that in many cases of interest, we can immediately reduce to the separable case. (In particular, for many of the theorems in this course, the hypothesis of separability can be dropped, though we won’t bother to specify for which ones this is the case.) $\diamond$

We are interested in the recurrence properties of sets $E \in {\mathcal X}$ or functions $f \in L^p(X, {\mathcal X}, \mu)$. The simplest such recurrence theorem is

Theorem 1. (Poincaré recurrence theorem) Let $(X,{\mathcal X},\mu,T)$ be a measure-preserving system, and let $E \in {\mathcal X}$ be a set of positive measure. Then $\limsup_{n \to +\infty} \mu( E \cap T^n E ) \geq \mu(E)^2$. In particular, $E \cap T^n E$ has positive measure (and is thus non-empty) for infinitely many n.

(Compare with Theorem 1 of Lecture 3.)

Proof. For any integer $N > 1$, observe that $\int_X \sum_{n=1}^N 1_{T^n E}\ d\mu = N \mu(E)$, and thus by Cauchy-Schwarz

$\int_X (\sum_{n=1}^N 1_{T^n E})^2\ d\mu \geq N^2 \mu(E)^2.$ (1)

The left-hand side of (1) can be rearranged as

$\sum_{n=1}^N \sum_{m=1}^N \mu( T^n E \cap T^m E ).$ (2)

On the other hand, $\mu( T^n E \cap T^m E) = \mu( E \cap T^{m-n} E )$. From this one easily obtains the asymptotic

$(2)\leq (\limsup_{n \to \infty} \mu( E \cap T^n E ) + o(1)) N^2,$ (3)

where o(1) denotes an expression which goes to zero as N goes to infinity. Combining (1), (2), (3) and taking limits as $N \to +\infty$ we obtain

$\limsup_{n \to \infty} \mu( E \cap T^n E ) \geq \mu(E)^2$ (4)

as desired. $\Box$

Remark 2. In classical physics, the evolution of a physical system in a compact phase space is given by a (continuous-time) measure-preserving system (this is Hamilton’s equations of motion combined with Liouville’s theorem). The Poincaré recurrence theorem then has the following unintuitive consequence: every collection E of states of positive measure, no matter how small, must eventually return to overlap itself given sufficient time. For instance, if one were to burn a piece of paper in a closed system, then there exist arbitrarily small perturbations of the initial conditions such that, if one waits long enough, the piece of paper will eventually reassemble (modulo arbitrarily small error)! This seems to contradict the second law of thermodynamics, but the reason for the discrepancy is because the time required for the recurrence theorem to take effect is inversely proportional to the measure of the set E, which in physical situations is exponentially small in the number of degrees of freedom (which is already typically quite large, e.g. of the order of the Avogadro constant). This gives more than enough opportunity for Maxwell’s demon to come into play to reverse the increase of entropy. (This can be viewed as a manifestation of the curse of dimensionality.) The more sophisticated recurrence theorems we will see later have much poorer quantitative bounds still, so much so that they basically have no direct significance for any physical dynamical system with many relevant degrees of freedom. $\diamond$

Exercise 2. Prove the following generalisation of the Poincaré recurrence theorem: if $(X, {\mathcal X}, \mu, T)$ is a measure-preserving system and $f \in L^1(X, {\mathcal X},\mu)$ is non-negative, then $\limsup_{n \to +\infty} \int_X f T^n f \geq (\int_X f\ d\mu)^2$. $\diamond$

Exercise 3. Give examples to show that the quantity $\mu(E)^2$ in the conclusion of Theorem 1 cannot be replaced by any larger quantity in general, regardless of the actual value of $\mu(E)$. (Hint: use a Bernoulli system example.) $\diamond$

Exercise 4. Using the pigeonhole principle instead of the Cauchy-Schwarz inequality (and in particular, the statement that if $\mu(E_1) + \ldots + \mu(E_n) > 1$, then the sets $E_1,\ldots,E_n$ cannot all be disjoint), prove the weaker statement that for any set E of positive measure in a measure-preserving system, the set $E \cap T^n E$ is non-empty for infinitely many n. (This exercise illustrates the general point that the Cauchy-Schwarz inequality can be viewed as a quantitative strengthening of the pigeonhole principle.) $\diamond$

For this lecture and the next we shall study several variants of the Poincaré recurrence theorem. We begin by looking at the mean ergodic theorem, which studies the limiting behaviour of the ergodic averages $\frac{1}{N} \sum_{n=1}^N T^n f$ in various $L^p$ spaces, and in particular in $L^2$.