You are currently browsing the monthly archive for October 2010.

In this course so far, we have focused primarily on one specific example of a countably additive measure, namely Lebesgue measure. This measure was constructed from a more primitive concept of Lebesgue outer measure, which in turn was constructed from the even more primitive concept of elementary measure.
It turns out that both of these constructions can be abstracted. In this set of notes, we will give the Carathéodory lemma, which constructs a countably additive measure from any abstract outer measure; this generalises the construction of Lebesgue measure from Lebesgue outer measure. One can in turn construct outer measures from another concept known as a pre-measure, of which elementary measure is a typical example.
With these tools, one can start constructing many more measures, such as Lebesgue-Stieltjes measures, product measures, and Hausdorff measures. With a little more effort, one can also establish the Kolmogorov extension theorem, which allows one to construct a variety of measures on infinite-dimensional spaces, and is of particular importance in the foundations of probability theory, as it allows one to set up probability spaces associated to both discrete and continuous random processes, even if they have infinite length.
The most important result about product measure, beyond the fact that it exists, is that one can use it to evaluate iterated integrals, and to interchange their order, provided that the integrand is either unsigned or absolutely integrable. This fact is known as the Fubini-Tonelli theorem, and is an absolutely indispensable tool for computing integrals, and for deducing higher-dimensional results from lower-dimensional ones.
We remark that these notes omit a very important way to construct measures, namely the Riesz representation theorem, but we will defer discussion of this theorem to 245B.
This is the final set of notes in this sequence. If time permits, the course will then begin covering the 245B notes, starting with the material on signed measures and the Radon-Nikodym-Lebesgue theorem.
Read the rest of this entry »

Let ${n}$ be a large integer, and let ${M_n}$ be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution

$\displaystyle C_n e^{-\hbox{tr}(M_n^2)/2} dM_n$

where ${dM_n}$ is a Haar measure on Hermitian matrices and ${C_n}$ is the normalisation constant required to make the distribution of unit mass. The eigenvalues ${\lambda_1 < \ldots < \lambda_n}$ of this matrix are then a coupled family of ${n}$ real random variables. For any ${1 \leq k \leq n}$, we can define the ${k}$-point correlation function ${\rho_k( x_1,\ldots,x_k )}$ to be the unique symmetric measure on ${{\bf R}^k}$ such that

$\displaystyle \int_{{\bf R}^k} F(x_1,\ldots,x_k) \rho_k(x_1,\ldots,x_k) = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F( \lambda_{i_1}, \ldots, \lambda_{i_k} ).$

A standard computation (given for instance in these lecture notes of mine) gives the Ginebre formula

$\displaystyle \rho_n(x_1,\ldots,x_n) = C'_n (\prod_{1 \leq i < j \leq n} |x_i-x_j|^2) e^{-\sum_{j=1}^n |x_j|^2/2}.$

for the ${n}$-point correlation function, where ${C'_n}$ is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as

$\displaystyle \rho_n(x_1,\ldots,x_n) = C''_n \det(K_n(x_i,x_j))_{1 \leq i, j \leq n}$

where the kernel ${K_n}$ is given by

$\displaystyle K_n(x,y) := \sum_{k=0}^{n-1} \phi_k(x) \phi_k(y)$

where ${\phi_k(x) := P_k(x) e^{-x^2/4}}$ and ${P_0, P_1, \ldots}$ are the (${L^2}$-normalised) Hermite polynomials (thus the ${\phi_k}$ are an orthonormal family, with each ${P_k}$ being a polynomial of degree ${k}$). Integrating out one or more of the variables, one is led to the Gaudin-Mehta formula

$\displaystyle \rho_k(x_1,\ldots,x_k) = \det(K_n(x_i,x_j))_{1 \leq i, j \leq k}. \ \ \ \ \ (1)$

(In particular, the normalisation constant ${C''_n}$ in the previous formula turns out to simply be equal to ${1}$.) Again, see these lecture notes for details.

The functions ${\phi_k(x)}$ can be viewed as an orthonormal basis of eigenfunctions for the harmonic oscillator operator

$\displaystyle L \phi := (-\frac{d^2}{dx^2} + \frac{x^2}{4})\phi;$

indeed it is a classical fact that

$\displaystyle L \phi_k = (k + \frac{1}{2}) \phi_k.$

As such, the kernel ${K_n}$ can be viewed as the integral kernel of the spectral projection operator ${1_{(-\infty,n+\frac{1}{2}]}(L)}$.

From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of ${K_n}$ as ${n \rightarrow \infty}$. The two main asymptotics of interest are given by the following lemmas:

Lemma 1 (Asymptotics of ${K_n}$ in the bulk) Let ${x_0 \in (-2,2)}$, and let ${\rho_{sc}(x_0) := \frac{1}{2\pi} (4-x_0^2)^{1/2}_+}$ be the semicircular law density at ${x_0}$. Then, we have

$\displaystyle K_n( x_0 \sqrt{n} + \frac{y}{\sqrt{n} \rho_{sc}(x_0)}, x_0 \sqrt{n} + \frac{z}{\sqrt{n} \rho_{sc}(x_0)} )$

$\displaystyle \rightarrow \frac{\sin(\pi(y-z))}{\pi(y-z)} \ \ \ \ \ (2)$

as ${n \rightarrow \infty}$ for any fixed ${y,z \in {\bf R}}$ (removing the singularity at ${y=z}$ in the usual manner).

Lemma 2 (Asymptotics of ${K_n}$ at the edge) We have

$\displaystyle K_n( 2\sqrt{n} + \frac{y}{n^{1/6}}, 2\sqrt{n} + \frac{z}{n^{1/6}} )$

$\displaystyle \rightarrow \frac{Ai(y) Ai'(z) - Ai'(y) Ai(z)}{y-z} \ \ \ \ \ (3)$

as ${n \rightarrow \infty}$ for any fixed ${y,z \in {\bf R}}$, where ${Ai}$ is the Airy function

$\displaystyle Ai(x) := \frac{1}{\pi} \int_0^\infty \cos( \frac{t^3}{3} + tx )\ dt$

and again removing the singularity at ${y=z}$ in the usual manner.

The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).

This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).

(See also the Tricki for some general mathematical problem solving tips.  Once this page matures somewhat, I might migrate it to the Tricki.)

Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at.  Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.

Emmanuel Breuillard, Ben Green, Robert Guralnick, and I have just uploaded to the arxiv our paper “Strongly dense free subgroups of semisimple algebraic groups“, submitted to Israel J. Math.. This paper was originally motivated by (and provides a key technical tool for) another forthcoming paper of ours, on expander Cayley graphs in finite simple groups of Lie type, but also has some independent interest due to connections with other topics, such as the Banach-Tarski paradox.

Recall that one of the basic facts underlying the Banach-Tarski paradox is that the rotation group $O(3)$ contains a copy of the free non-abelian group $F_2$ on two generators; thus there exists $a, b \in O(3)$ such that $a,b$ obey no nontrivial word identities.  In fact, using basic algebraic geometry, one can then deduce that a generic pair $(a,b)$ of group elements $a, b \in O(3)$ has this property, where for the purposes of this paper “generic” means “outside of at most countably many algebraic subvarieties of strictly smaller dimension”.    (In particular, using Haar measure on $O(3)$, almost every pair has this property.)  In fact one has a stronger property, given any non-trivial word $w \in F_2$, the associated word map $(a,b) \mapsto w(a,b)$ from $O(3) \times O(3)$ to $O(3)$ is a dominant map, which means that its image is Zariski-dense.  More succinctly, if $(a,b)$ is generic, then $w(a,b)$ is generic also.

In contrast, if one were working in a solvable, nilpotent, or abelian group (such as $O(2)$), then this property would not hold, since every subgroup of a solvable group is still solvable and thus not free (and similarly for nilpotent or abelian groups).  (This already goes a long way to explain why the Banach-Tarski paradox holds in three or more dimensions, but not in two or fewer.)  On the other hand, a famous result of Borel asserts that for any semisimple Lie group $G$ (over an algebraically closed field), and any nontrivial word $w \in F_2$, the word map $w: G \times G \to G$ is dominant, thus generalising the preceding discussion for $O(3)$.  (There is also the even more famous Tits alternative, that asserts that any linear group that is not (virtually) solvable will contain a copy of the free group $F_2$; as pointed out to me by Michael Cowling, this already shows that generic pairs of generators will generate a free group, and with a little more effort one can even show that it generates a Zariski-dense free group.)

Now suppose we take two words $w_1, w_2 \in F_2$, and look at the double word map $(w_1,w_2): G \times G \to G \times G$ on a semisimple Lie group $G$.  If $w_1, w_2$ are non-trivial, then Borel’s theorem tells us that each component of this map is dominant, but this does not mean that the entire map is dominant, because there could be constraints between $w_1(a,b)$ and $w_2(a,b)$.  For instance, if the two words $w_1, w_2$ commute, then $w_1(a,b), w_2(a,b)$ must also commute and so the image of the double word map is not Zariski-dense.  But there are also non-commuting examples of non-trivial constraints: for instance, if $w_1, w_2$ are conjugate, then $w_1(a,b), w_2(a,b)$ must also be conjugate, which is also a constraint that obstructs dominance.

It is still not clear exactly what pairs of words $w_1, w_2$ have the dominance property.  However, we are able to establish that all pairs of non-commuting words have a weaker property than dominance:

Theorem. Let $w_1, w_2 \in F_2$ be non-commuting words, and let $a, b$ be generic elements of a semisimple Lie group $G$ over an algebraically closed field.  Then $w_1(a,b), w_2(a,b)$ generate a Zariski-dense subgroup of $G$.

To put it another way, $G$ not only contains free subgroups, but contains what we call strongly dense free subgroups: free subgroups such that any two non-commuting elements generate a Zariski-dense subgroup.

Our initial motivation for this theorem is its implications for finite simple groups $G$ of Lie type.  Roughly speaking, one can use this theorem to show that a generic random walk in such a group cannot be trapped in a (bounded complexity) proper algebraic subgroup $H$ of $G$, and this “escape from subgroups” fact is a key ingredient in our forthcoming paper in which we demonstrate that random Cayley graphs in such groups are expander graphs.

It also has implications for results of Banach-Tarski type; it shows that for any semisimple Lie group G, and for generic $a, b \in G$, one can use $a, b$ to create Banach-Tarski paradoxical decompositions for all homogeneous spaces of $G$.  In particular there is one pair of $a,b$ that gives paradoxical decompositions for all homogeneous spaces simultaneously.

Our argument is based on a concept that we call “degeneration”.  Let $a, b$ be generic elements of $G$, and suppose for contradiction that $w_1(a,b), w_2(a,b)$ generically generated a group whose algebraic closure was conjugate to a proper algebraic subgroup $H$ of $G$.  Borel’s theorem lets us show that $w_1(a,b), w_2(a,b)$, and latex [w_1(a,b), w_2(a,b)]\$ each generate maximal tori of $G$, which by basic algebraic group theory can be used to show that $H$ must be a proper semisimple subgroup of $G$ of maximal rank.  If we were in the model case $G = SL_n$, then we would already be done, as there are no such maximal rank semisimple subgroups; but in the other groups, such proper maximal semisimple groups unfortunately exist.  Fortunately, they have been completely classified, and we take advantage of this classification in our argument.

The degeneration argument comes in as follows.  Let $(a,b)$ be a non-generic pair in $G \times G$.  Then $(a,b)$ lies in the Zariski closure of the generic pairs, which means that $(w_1(a,b), w_2(a,b))$ lies in the Zariski closure of the set formed by $H \times H$ and its conjugates.  In particular, if the non-generic pair is such that $w_1(a,b), w_2(a,b)$ generates a group that is dense in some proper algebraic subgroup $H'$, then $H' \times H'$ is in the Zariski closure of the union of the conjugates of $H \times H$.  When this happens, we say that $H'$ is a degeneration of $H$.  (For instance, $H$ could be the stabiliser of some non-degenerate quadratic form, and $H'$ could be the stabiliser of a degenerate limit of that form.)

The key fact we need (that relies on the classification, and a certain amount of representation theory) is:

Proposition. Given any proper semisimple maximal rank subgroup $H$ of $G$, there exists another proper semisimple subgroup $H'$ that is not a degeneration of $H$.

Using an induction hypothesis, we can find pairs $(a,b)$ such that $w_1(a,b), w_2(a,b)$ generate a dense subgroup of $H'$, which together with the preceding discussion contradicts the proposition.

The proposition is currently proven by using some known facts about certain representation-theoretic invariants of all the semisimple subgroups of the classical and exceptional simple Lie groups.  While the proof is of finite length, it is not particularly elegant, ultimately relying on the numerical value of one or more  invariants of $H$ being sufficiently different from their counterparts for $H'$ that one can prevent the latter being a degeneration of the former.  Perhaps there is another way to proceed here that is not based so heavily on classification.

One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion.  For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let $n$ be a natural number such that …”), which may or may not apply to the particular value of $n$ one may have in mind.  Or, if one ever argues by dividing into separate cases (e.g. “Case 1: $n$ is even. … Case 2: $n$ is odd.  …”), then for any given $n$, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives.     But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that $\sqrt{2}$ is equal to the ratio $p/q$ of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.

Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives.  This can lead to confused exchanges such as the following:

Lecturer: “Theorem.  Let $p$ be a prime number.  Then…”

Student: “But how do you know that $p$ is a prime number?  Couldn’t it be composite?”

or

Lecturer: “Now we see what the function $f$ does when we give it the input of $x+dx$ instead.  …”

Student: “But didn’t you just say that the input was equal to $x$ just a moment ago?”

This is not to say that counterfactual thinking is not encountered at all outside of mathematics.  For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history.  Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result.  The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing.  Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning.  For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it.  And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.

Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses.   Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.

Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics.  Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set ${\bf R}$ of real numbers is uncountable.  This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics.  (The only other two examples I know of that come close to doing this are the fact that the real number $0.999\ldots$ is equal to 1, and the solution to the blue-eyed islanders puzzle.)

Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist.  Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.

I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction.  Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.

It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses.  This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter.  This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.

For instance, consider the very first no-self-defeating result presented in the previous post:

Proposition 1 (No largest natural number). There does not exist a natural number $N$ that is larger than all the other natural numbers.

This is formulated in the “no self-defeating object” formulation.  But it has a logically equivalent “any object can be defeated” form:

Proposition 1′. Given any natural number $N$, one can find another natural number $N'$ which is larger than $N$.

Proof. Take $N' := N+1$. $\Box$

While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take $N = 100$, so that the proof gives $N'=101$ ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number).  So there is a sense in which Proposition 1′ is more “non-counterfactual” or  “constructive” than the “counterfactual” Proposition 1.

In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),

Proposition 3. There are infinitely many primes.

can be recast in “all objects can be defeated” form as

Proposition 3′.  Let $p_1,\ldots,p_n$ be a collection of primes.   Then there exists a prime $q$ which is distinct from any of the primes $p_1,\ldots,p_n$.

Proof. Take $q$ to be any prime factor of $p_1 \ldots p_n + 1$ (for instance, one could take the smallest prime factor, if one wished to be completely concrete).   Since $p_1 \ldots p_n + 1$ is not divisible by any of the primes $p_1,\ldots,p_n$, $q$ must be distinct from all of these primes.  $\Box$

One could argue that  there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that $q$ was equal to one of the $p_1,\ldots,p_n$), but the proposition itself is not inherently counterfactual, as  it does not make as patently impossible a hypothesis as a finite enumeration of the primes.  Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today.  Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say $2, 3, 5$, and it will actually exhibit a prime not in that list (in this case, $31$).  The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.

[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”.  For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]

Let ${[a,b]}$ be a compact interval of positive length (thus ${-\infty < a < b < +\infty}$). Recall that a function ${F: [a,b] \rightarrow {\bf R}}$ is said to be differentiable at a point ${x \in [a,b]}$ if the limit

$\displaystyle F'(x) := \lim_{y \rightarrow x; y \in [a,b] \backslash \{x\}} \frac{F(y)-F(x)}{y-x} \ \ \ \ \ (1)$

exists. In that case, we call ${F'(x)}$ the strong derivative, classical derivative, or just derivative for short, of ${F}$ at ${x}$. We say that ${F}$ is everywhere differentiable, or differentiable for short, if it is differentiable at all points ${x \in [a,b]}$, and differentiable almost everywhere if it is differentiable at almost every point ${x \in [a,b]}$. If ${F}$ is differentiable everywhere and its derivative ${F'}$ is continuous, then we say that ${F}$ is continuously differentiable.

Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.

Exercise 2 If ${F: [a,b] \rightarrow {\bf R}}$ is everywhere differentiable, show that ${F}$ is continuous and ${F'}$ is measurable. If ${F}$ is almost everywhere differentiable, show that the (almost everywhere defined) function ${F'}$ is measurable (i.e. it is equal to an everywhere defined measurable function on ${[a,b]}$ outside of a null set), but give an example to demonstrate that ${F}$ need not be continuous.

Exercise 3 Give an example of a function ${F: [a,b] \rightarrow {\bf R}}$ which is everywhere differentiable, but not continuously differentiable. (Hint: choose an ${F}$ that vanishes quickly at some point, say at the origin ${0}$, but which also oscillates rapidly near that point.)

In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.

Theorem 4 (Rolle’s theorem) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function such that ${F(a)=F(b)}$. Then there exists ${x \in (a,b)}$ such that ${F'(x)=0}$.

Proof: By subtracting a constant from ${F}$ (which does not affect differentiability or the derivative) we may assume that ${F(a)=F(b)=0}$. If ${F}$ is identically zero then the claim is trivial, so assume that ${F}$ is non-zero somewhere. By replacing ${F}$ with ${-F}$ if necessary, we may assume that ${F}$ is positive somewhere, thus ${\sup_{x \in [a,b]} F(x) > 0}$. On the other hand, as ${F}$ is continuous and ${[a,b]}$ is compact, ${F}$ must attain its maximum somewhere, thus there exists ${x \in [a,b]}$ such that ${F(x) \geq F(y)}$ for all ${y \in [a,b]}$. Then ${F(x)}$ must be positive and so ${x}$ cannot equal either ${a}$ or ${b}$, and thus must lie in the interior. From the right limit of (1) we see that ${F'(x) \leq 0}$, while from the left limit we have ${F'(x) \geq 0}$. Thus ${F'(x)=0}$ and the claim follows. $\Box$

Remark 5 Observe that the same proof also works if ${F}$ is only differentiable in the interior ${(a,b)}$ of the interval ${[a,b]}$, so long as it is continuous all the way up to the boundary of ${[a,b]}$.

Exercise 6 Give an example to show that Rolle’s theorem can fail if ${f}$ is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that ${f}$ is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.

Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when ${F}$ is real-valued, as it relies heavily on the least upper bound property for the domain ${{\bf R}}$. If, for instance, we consider complex-valued scalar functions ${F: [a,b] \rightarrow {\bf C}}$, then the theorem can fail; for instance, the function ${F: [0,1] \rightarrow {\bf C}}$ defined by ${F(x) := e^{2\pi i x} - 1}$ vanishes at both endpoints and is differentiable, but its derivative ${F'(x) = 2\pi i e^{2\pi i x}}$ is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative ${F'}$ both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as ${{\bf R}^n}$.

One can easily amplify Rolle’s theorem to the mean value theorem:

Corollary 8 (Mean value theorem) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function. Then there exists ${x \in (a,b)}$ such that ${F'(x)=\frac{F(b)-F(a)}{b-a}}$.

Proof: Apply Rolle’s theorem to the function ${x \mapsto F(x) - \frac{F(b)-F(a)}{b-a} (x-a)}$. $\Box$

Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.

Exercise 10 (Uniqueness of antiderivatives up to constants) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ and ${G: [a,b] \rightarrow {\bf R}}$ be differentiable functions. Show that ${F'(x)=G'(x)}$ for every ${x \in [a,b]}$ if and only if ${F(x)=G(x)+C}$ for some constant ${C \in {\bf R}}$ and all ${x \in [a,b]}$.

We can use the mean value theorem to deduce one of the fundamental theorems of calculus:

Theorem 11 (Second fundamental theorem of calculus) Let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function, such that ${F'}$ is Riemann integrable. Then the Riemann integral ${\int_a^b F'(x)\ dx}$ of ${F'}$ is equal to ${F(b) - F(a)}$. In particular, we have ${\int_a^b F'(x)\ dx = F(b)-F(a)}$ whenever ${F}$ is continuously differentiable.

Proof: Let ${\varepsilon > 0}$. By the definition of Riemann integrability, there exists a finite partition ${a = t_0 < t_1 < \ldots < t_k = b}$ such that

$\displaystyle |\sum_{j=1}^k F'(t^*_j) (t_j - t_{j-1}) - \int_a^b F'(x)| \leq \varepsilon$

for every choice of ${t^*_j \in [t_{j-1},t_j]}$.

Fix this partition. From the mean value theorem, for each ${1 \leq j \leq k}$ one can find ${t^*_j \in [t_{j-1},t_j]}$ such that

$\displaystyle F'(t^*_j) (t_j - t_{j-1}) = F(t_j) - F(t_{j-1})$

and thus by telescoping series

$\displaystyle |(F(b)-F(a)) - \int_a^b F'(x)| \leq \varepsilon.$

Since ${\varepsilon > 0}$ was arbitrary, the claim follows. $\Box$

Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.

Of course, we also have the other half of the fundamental theorem of calculus:

Theorem 13 (First fundamental theorem of calculus) Let ${[a,b]}$ be a compact interval of positive length. Let ${f: [a,b] \rightarrow {\bf C}}$ be a continuous function, and let ${F: [a,b] \rightarrow {\bf C}}$ be the indefinite integral ${F(x) := \int_a^x f(t)\ dt}$. Then ${F}$ is differentiable on ${[a,b]}$, with derivative ${F'(x) = f(x)}$ for all ${x \in [a,b]}$. In particular, ${F}$ is continuously differentiable.

Proof: It suffices to show that

$\displaystyle \lim_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h} = f(x)$

for all ${x \in [a,b)}$, and

$\displaystyle \lim_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h} = f(x)$

for all ${x \in (a,b]}$. After a change of variables, we can write

$\displaystyle \frac{F(x+h)-F(x)}{h} = \int_0^1 f(x+ht)\ dt$

for any ${x \in [a,b)}$ and any sufficiently small ${h>0}$, or any ${x \in (a,b]}$ and any sufficiently small ${h<0}$. As ${f}$ is continuous, the function ${t \mapsto f(x+ht)}$ converges uniformly to ${f(x)}$ on ${[0,1]}$ as ${h \rightarrow 0}$ (keeping ${x}$ fixed). As the interval ${[0,1]}$ is bounded, ${\int_0^1 f(x+ht)\ dt}$ thus converges to ${\int_0^1 f(x)\ dt = f(x)}$, and the claim follows. $\Box$

Corollary 14 (Differentiation theorem for continuous functions) Let ${f: [a,b] \rightarrow {\bf C}}$ be a continuous function on a compact interval. Then we have

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt = f(x)$

for all ${x \in [a,b)}$,

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x-h,x]} f(t)\ dt = f(x)$

for all ${x \in (a,b]}$, and thus

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{2h} \int_{[x-h,x+h]} f(t)\ dt = f(x)$

for all ${x \in (a,b)}$.

In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions ${F, F', f}$ are relaxed. Among the results proven in these notes are

• The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every ${x}$ if ${f}$ is merely absolutely integrable, rather than continuous;
• A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
• The second fundamental theorem of calculus for absolutely continuous functions.

The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »

This week I once again gave some public lectures on the cosmic distance ladder in astronomy, once at Stanford and once at UCLA.  The slides I used were similar to the “version 3.0” slides I used for the same talk last year in Australia and elsewhere, but the images have been updated (and the permissions for copyrighted images secured), and some additional data has also been placed on them.    I am placing these slides here on this blog, in Powerpoint format and also in PDF format.  (Video for the UCLA talk should also be available on the UCLA web site at some point; I’ll add a link when it becomes available.)

These slides have evolved over a period of almost five years, particularly with regards to the imagery, but this is likely to be close to the final version.  Here are some of the older iterations of the slides:

I have found that working on and polishing a single public lecture over a period of several years has been very rewarding and educational, especially given that I had very little public speaking experience at the beginning; there are several other mathematicians I know of who are also putting some effort into giving good talks that communicate mathematics and science to the general public, but I think there could potentially be many more such talks like this.

A note regarding copyright: I am happy to have the text or layout of these slides used as the basis for other presentations, so long as the source is acknowledged.  However, some of the images in these slides are copyrighted by others, and permission by the copyright holders was granted only for the display of the slides in their current format.  (The list of such images is given at the end of the slides.)  So if you wish to adapt the slides for your own purposes, you may need to use slightly different imagery.

(Update, October 11: Version 4.2 uploaded, and notice on copyright added.)

(Update, October 20: Some photos from the UCLA talk are available here.)

(Update, October 25: Video from the talk is available on Youtube and on Itunes.)

The following question came up in my 245A class today:

Is it possible to express a non-closed interval in the real line, such as [0,1), as a countable union of disjoint closed intervals?

I was not able to answer the question immediately, but by the end of the class some of the students had come up with an answer.  It is actually a nice little test of one’s basic knowledge of real analysis, so I am posing it here as well for anyone else who is interested.  Below the fold is the answer to the question (whited out; one has to highlight the text in order to read it).

If one has a sequence ${x_1, x_2, x_3, \ldots \in {\bf R}}$ of real numbers ${x_n}$, it is unambiguous what it means for that sequence to converge to a limit ${x \in {\bf R}}$: it means that for every ${\varepsilon > 0}$, there exists an ${N}$ such that ${|x_n-x| \leq \varepsilon}$ for all ${n > N}$. Similarly for a sequence ${z_1, z_2, z_3, \ldots \in {\bf C}}$ of complex numbers ${z_n}$ converging to a limit ${z \in {\bf C}}$.

More generally, if one has a sequence ${v_1, v_2, v_3, \ldots}$ of ${d}$-dimensional vectors ${v_n}$ in a real vector space ${{\bf R}^d}$ or complex vector space ${{\bf C}^d}$, it is also unambiguous what it means for that sequence to converge to a limit ${v \in {\bf R}^d}$ or ${v \in {\bf C}^d}$; it means that for every ${\varepsilon > 0}$, there exists an ${N}$ such that ${\|v_n-v\| \leq \varepsilon}$ for all ${n \geq N}$. Here, the norm ${\|v\|}$ of a vector ${v = (v^{(1)},\ldots,v^{(d)})}$ can be chosen to be the Euclidean norm ${\|v\|_2 := (\sum_{j=1}^d (v^{(j)})^2)^{1/2}}$, the supremum norm ${\|v\|_\infty := \sup_{1 \leq j \leq d} |v^{(j)}|}$, or any other number of norms, but for the purposes of convergence, these norms are all equivalent; a sequence of vectors converges in the Euclidean norm if and only if it converges in the supremum norm, and similarly for any other two norms on the finite-dimensional space ${{\bf R}^d}$ or ${{\bf C}^d}$.

If however one has a sequence ${f_1, f_2, f_3, \ldots}$ of functions ${f_n: X \rightarrow {\bf R}}$ or ${f_n: X \rightarrow {\bf C}}$ on a common domain ${X}$, and a putative limit ${f: X \rightarrow {\bf R}}$ or ${f: X \rightarrow {\bf C}}$, there can now be many different ways in which the sequence ${f_n}$ may or may not converge to the limit ${f}$. (One could also consider convergence of functions ${f_n: X_n \rightarrow {\bf C}}$ on different domains ${X_n}$, but we will not discuss this issue at all here.) This is contrast with the situation with scalars ${x_n}$ or ${z_n}$ (which corresponds to the case when ${X}$ is a single point) or vectors ${v_n}$ (which corresponds to the case when ${X}$ is a finite set such as ${\{1,\ldots,d\}}$). Once ${X}$ becomes infinite, the functions ${f_n}$ acquire an infinite number of degrees of freedom, and this allows them to approach ${f}$ in any number of inequivalent ways.

What different types of convergence are there? As an undergraduate, one learns of the following two basic modes of convergence:

1. We say that ${f_n}$ converges to ${f}$ pointwise if, for every ${x \in X}$, ${f_n(x)}$ converges to ${f(x)}$. In other words, for every ${\varepsilon > 0}$ and ${x \in X}$, there exists ${N}$ (that depends on both ${\varepsilon}$ and ${x}$) such that ${|f_n(x)-f(x)| \leq \varepsilon}$ whenever ${n \geq N}$.
2. We say that ${f_n}$ converges to ${f}$ uniformly if, for every ${\varepsilon > 0}$, there exists ${N}$ such that for every ${n \geq N}$, ${|f_n(x) - f(x)| \leq \varepsilon}$ for every ${x \in X}$. The difference between uniform convergence and pointwise convergence is that with the former, the time ${N}$ at which ${f_n(x)}$ must be permanently ${\varepsilon}$-close to ${f(x)}$ is not permitted to depend on ${x}$, but must instead be chosen uniformly in ${x}$.

Uniform convergence implies pointwise convergence, but not conversely. A typical example: the functions ${f_n: {\bf R} \rightarrow {\bf R}}$ defined by ${f_n(x) := x/n}$ converge pointwise to the zero function ${f(x) := 0}$, but not uniformly.

However, pointwise and uniform convergence are only two of dozens of many other modes of convergence that are of importance in analysis. We will not attempt to exhaustively enumerate these modes here (but see this Wikipedia page, and see also these 245B notes on strong and weak convergence). We will, however, discuss some of the modes of convergence that arise from measure theory, when the domain ${X}$ is equipped with the structure of a measure space ${(X, {\mathcal B}, \mu)}$, and the functions ${f_n}$ (and their limit ${f}$) are measurable with respect to this space. In this context, we have some additional modes of convergence:

1. We say that ${f_n}$ converges to ${f}$ pointwise almost everywhere if, for (${\mu}$-)almost everywhere ${x \in X}$, ${f_n(x)}$ converges to ${f(x)}$.
2. We say that ${f_n}$ converges to ${f}$ uniformly almost everywhere, essentially uniformly, or in ${L^\infty}$ norm if, for every ${\varepsilon > 0}$, there exists ${N}$ such that for every ${n \geq N}$, ${|f_n(x) - f(x)| \leq \varepsilon}$ for ${\mu}$-almost every ${x \in X}$.
3. We say that ${f_n}$ converges to ${f}$ almost uniformly if, for every ${\varepsilon > 0}$, there exists an exceptional set ${E \in {\mathcal B}}$ of measure ${\mu(E) \leq \varepsilon}$ such that ${f_n}$ converges uniformly to ${f}$ on the complement of ${E}$.
4. We say that ${f_n}$ converges to ${f}$ in ${L^1}$ norm if the quantity ${\|f_n-f\|_{L^1(\mu)} = \int_X |f_n(x)-f(x)|\ d\mu}$ converges to ${0}$ as ${n \rightarrow \infty}$.
5. We say that ${f_n}$ converges to ${f}$ in measure if, for every ${\varepsilon > 0}$, the measures ${\mu( \{ x \in X: |f_n(x) - f(x)| \geq \varepsilon \} )}$ converge to zero as ${n \rightarrow \infty}$.

Observe that each of these five modes of convergence is unaffected if one modifies ${f_n}$ or ${f}$ on a set of measure zero. In contrast, the pointwise and uniform modes of convergence can be affected if one modifies ${f_n}$ or ${f}$ even on a single point.

Remark 1 In the context of probability theory, in which ${f_n}$ and ${f}$ are interpreted as random variables, convergence in ${L^1}$ norm is often referred to as convergence in mean, pointwise convergence almost everywhere is often referred to as almost sure convergence, and convergence in measure is often referred to as convergence in probability.

Exercise 2 (Linearity of convergence) Let ${(X, {\mathcal B}, \mu)}$ be a measure space, let ${f_n, g_n: X \rightarrow {\bf C}}$ be sequences of measurable functions, and let ${f, g: X \rightarrow {\bf C}}$ be measurable functions.

1. Show that ${f_n}$ converges to ${f}$ along one of the above seven modes of convergence if and only if ${|f_n-f|}$ converges to ${0}$ along the same mode.
2. If ${f_n}$ converges to ${f}$ along one of the above seven modes of convergence, and ${g_n}$ converges to ${g}$ along the same mode, show that ${f_n+g_n}$ converges to ${f+g}$ along the same mode, and that ${cf_n}$ converges to ${cf}$ along the same mode for any ${c \in {\bf C}}$.
3. (Squeeze test) If ${f_n}$ converges to ${0}$ along one of the above seven modes, and ${|g_n| \leq f_n}$ pointwise for each ${n}$, show that ${g_n}$ converges to ${0}$ along the same mode.

We have some easy implications between modes:

Exercise 3 (Easy implications) Let ${(X, {\mathcal B}, \mu)}$ be a measure space, and let ${f_n: X \rightarrow {\bf C}}$ and ${f: X \rightarrow {\bf C}}$ be measurable functions.

1. If ${f_n}$ converges to ${f}$ uniformly, then ${f_n}$ converges to ${f}$ pointwise.
2. If ${f_n}$ converges to ${f}$ uniformly, then ${f_n}$ converges to ${f}$ in ${L^\infty}$ norm. Conversely, if ${f_n}$ converges to ${f}$ in ${L^\infty}$ norm, then ${f_n}$ converges to ${f}$ uniformly outside of a null set (i.e. there exists a null set ${E}$ such that the restriction ${f_n\downharpoonright_{X \backslash E}}$ of ${f_n}$ to the complement of ${E}$ converges to the restriction ${f\downharpoonright_{X \backslash E}}$ of ${f}$).
3. If ${f_n}$ converges to ${f}$ in ${L^\infty}$ norm, then ${f_n}$ converges to ${f}$ almost uniformly.
4. If ${f_n}$ converges to ${f}$ almost uniformly, then ${f_n}$ converges to ${f}$ pointwise almost everywhere.
5. If ${f_n}$ converges to ${f}$ pointwise, then ${f_n}$ converges to ${f}$ pointwise almost everywhere.
6. If ${f_n}$ converges to ${f}$ in ${L^1}$ norm, then ${f_n}$ converges to ${f}$ in measure.
7. If ${f_n}$ converges to ${f}$ almost uniformly, then ${f_n}$ converges to ${f}$ in measure.

The reader is encouraged to draw a diagram that summarises the logical implications between the seven modes of convergence that the above exercise describes.

We give four key examples that distinguish between these modes, in the case when ${X}$ is the real line ${{\bf R}}$ with Lebesgue measure. The first three of these examples already were introduced in the previous set of notes.

Example 4 (Escape to horizontal infinity) Let ${f_n := 1_{[n,n+1]}}$. Then ${f_n}$ converges to zero pointwise (and thus, pointwise almost everywhere), but not uniformly, in ${L^\infty}$ norm, almost uniformly, in ${L^1}$ norm, or in measure.

Example 5 (Escape to width infinity) Let ${f_n := \frac{1}{n} 1_{[0,n]}}$. Then ${f_n}$ converges to zero uniformly (and thus, pointwise, pointwise almost everywhere, in ${L^\infty}$ norm, almost uniformly, and in measure), but not in ${L^1}$ norm.

Example 6 (Escape to vertical infinity) Let ${f_n := n 1_{[\frac{1}{n}, \frac{2}{n}]}}$. Then ${f_n}$ converges to zero pointwise (and thus, pointwise almost everywhere) and almost uniformly (and hence in measure), but not uniformly, in ${L^\infty}$ norm, or in ${L^1}$ norm.

Example 7 (Typewriter sequence) Let ${f_n}$ be defined by the formula

$\displaystyle f_n := 1_{[\frac{n-2^k}{2^k}, \frac{n-2^k+1}{2^k}]}$

whenever ${k \geq 0}$ and ${2^k \leq n < 2^{k+1}}$. This is a sequence of indicator functions of intervals of decreasing length, marching across the unit interval ${[0,1]}$ over and over again. Then ${f_n}$ converges to zero in measure and in ${L^1}$ norm, but not pointwise almost everywhere (and hence also not pointwise, not almost uniformly, nor in ${L^\infty}$ norm, nor uniformly).

Remark 8 The ${L^\infty}$ norm ${\|f\|_{L^\infty(\mu)}}$ of a measurable function ${f: X \rightarrow {\bf C}}$ is defined to the infimum of all the quantities ${M \in [0,+\infty]}$ that are essential upper bounds for ${f}$ in the sense that ${|f(x)| \leq M}$ for almost every ${x}$. Then ${f_n}$ converges to ${f}$ in ${L^\infty}$ norm if and only if ${\|f_n-f\|_{L^\infty(\mu)} \rightarrow 0}$ as ${n \rightarrow \infty}$. The ${L^\infty}$ and ${L^1}$ norms are part of the larger family of ${L^p}$ norms, which we will study in more detail in 245B.

One particular advantage of ${L^1}$ convergence is that, in the case when the ${f_n}$ are absolutely integrable, it implies convergence of the integrals,

$\displaystyle \int_X f_n\ d\mu \rightarrow \int_X f\ d\mu,$

as one sees from the triangle inequality. Unfortunately, none of the other modes of convergence automatically imply this convergence of the integral, as the above examples show.
The purpose of these notes is to compare these modes of convergence with each other. Unfortunately, the relationship between these modes is not particularly simple; unlike the situation with pointwise and uniform convergence, one cannot simply rank these modes in a linear order from strongest to weakest. This is ultimately because the different modes react in different ways to the three “escape to infinity” scenarios described above, as well as to the “typewriter” behaviour when a single set is “overwritten” many times. On the other hand, if one imposes some additional assumptions to shut down one or more of these escape to infinity scenarios, such as a finite measure hypothesis ${\mu(X) < \infty}$ or a uniform integrability hypothesis, then one can obtain some additional implications between the different modes.