You are currently browsing the monthly archive for October 2010.

In this course so far, we have focused primarily on one specific example of a countably additive measure, namely Lebesgue measure. This measure was constructed from a more primitive concept of *Lebesgue outer measure*, which in turn was constructed from the even more primitive concept of *elementary measure*.

It turns out that both of these constructions can be abstracted. In this set of notes, we will give the Carathéodory lemma, which constructs a countably additive measure from any abstract outer measure; this generalises the construction of Lebesgue measure from Lebesgue outer measure. One can in turn construct outer measures from another concept known as a pre-measure, of which elementary measure is a typical example.

With these tools, one can start constructing many more measures, such as Lebesgue-Stieltjes measures, product measures, and Hausdorff measures. With a little more effort, one can also establish the Kolmogorov extension theorem, which allows one to construct a variety of measures on infinite-dimensional spaces, and is of particular importance in the foundations of probability theory, as it allows one to set up probability spaces associated to both discrete and continuous random processes, even if they have infinite length.

The most important result about product measure, beyond the fact that it exists, is that one can use it to evaluate iterated integrals, and to interchange their order, provided that the integrand is either unsigned or absolutely integrable. This fact is known as the Fubini-Tonelli theorem, and is an absolutely indispensable tool for computing integrals, and for deducing higher-dimensional results from lower-dimensional ones.

We remark that these notes omit a very important way to construct measures, namely the Riesz representation theorem, but we will defer discussion of this theorem to 245B.

This is the final set of notes in this sequence. If time permits, the course will then begin covering the 245B notes, starting with the material on signed measures and the Radon-Nikodym-Lebesgue theorem.

Let be a large integer, and let be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution

where is a Haar measure on Hermitian matrices and is the normalisation constant required to make the distribution of unit mass. The eigenvalues of this matrix are then a coupled family of real random variables. For any , we can define the *-point correlation function* to be the unique symmetric measure on such that

A standard computation (given for instance in these lecture notes of mine) gives the *Ginebre formula*

for the -point correlation function, where is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as

where the kernel is given by

where and are the (-normalised) Hermite polynomials (thus the are an orthonormal family, with each being a polynomial of degree ). Integrating out one or more of the variables, one is led to the *Gaudin-Mehta formula*

(In particular, the normalisation constant in the previous formula turns out to simply be equal to .) Again, see these lecture notes for details.

The functions can be viewed as an orthonormal basis of eigenfunctions for the *harmonic oscillator operator*

indeed it is a classical fact that

As such, the kernel can be viewed as the integral kernel of the spectral projection operator .

From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of as . The two main asymptotics of interest are given by the following lemmas:

Lemma 1 (Asymptotics of in the bulk)Let , and let be the semicircular law density at . Then, we haveas for any fixed (removing the singularity at in the usual manner).

Lemma 2 (Asymptotics of at the edge)We haveas for any fixed , where is the Airy function

and again removing the singularity at in the usual manner.

The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).

This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).

(See also the Tricki for some general mathematical problem solving tips. Once this page matures somewhat, I might migrate it to the Tricki.)

Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at. Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.

Emmanuel Breuillard, Ben Green, Robert Guralnick, and I have just uploaded to the arxiv our paper “Strongly dense free subgroups of semisimple algebraic groups“, submitted to Israel J. Math.. This paper was originally motivated by (and provides a key technical tool for) another forthcoming paper of ours, on expander Cayley graphs in finite simple groups of Lie type, but also has some independent interest due to connections with other topics, such as the Banach-Tarski paradox.

Recall that one of the basic facts underlying the Banach-Tarski paradox is that the rotation group contains a copy of the free non-abelian group on two generators; thus there exists such that obey no nontrivial word identities. In fact, using basic algebraic geometry, one can then deduce that a *generic* pair of group elements has this property, where for the purposes of this paper “generic” means “outside of at most countably many algebraic subvarieties of strictly smaller dimension”. (In particular, using Haar measure on , almost every pair has this property.) In fact one has a stronger property, given any non-trivial word , the associated word map from to is a dominant map, which means that its image is Zariski-dense. More succinctly, if is generic, then is generic also.

In contrast, if one were working in a solvable, nilpotent, or abelian group (such as ), then this property would not hold, since every subgroup of a solvable group is still solvable and thus not free (and similarly for nilpotent or abelian groups). (This already goes a long way to explain why the Banach-Tarski paradox holds in three or more dimensions, but not in two or fewer.) On the other hand, a famous result of Borel asserts that for any semisimple Lie group (over an algebraically closed field), and any nontrivial word , the word map is dominant, thus generalising the preceding discussion for . (There is also the even more famous Tits alternative, that asserts that any linear group that is not (virtually) solvable will contain a copy of the free group ; as pointed out to me by Michael Cowling, this already shows that generic pairs of generators will generate a free group, and with a little more effort one can even show that it generates a Zariski-dense free group.)

Now suppose we take *two* words , and look at the double word map on a semisimple Lie group . If are non-trivial, then Borel’s theorem tells us that each component of this map is dominant, but this does not mean that the entire map is dominant, because there could be constraints between and . For instance, if the two words commute, then must also commute and so the image of the double word map is not Zariski-dense. But there are also non-commuting examples of non-trivial constraints: for instance, if are conjugate, then must also be conjugate, which is also a constraint that obstructs dominance.

It is still not clear exactly what pairs of words have the dominance property. However, we are able to establish that all pairs of non-commuting words have a weaker property than dominance:

Theorem.Let be non-commuting words, and let be generic elements of a semisimple Lie group over an algebraically closed field. Then generate a Zariski-dense subgroup of .

To put it another way, not only contains free subgroups, but contains what we call strongly dense free subgroups: free subgroups such that any two non-commuting elements generate a Zariski-dense subgroup.

Our initial motivation for this theorem is its implications for finite simple groups of Lie type. Roughly speaking, one can use this theorem to show that a generic random walk in such a group cannot be trapped in a (bounded complexity) proper algebraic subgroup of , and this “escape from subgroups” fact is a key ingredient in our forthcoming paper in which we demonstrate that random Cayley graphs in such groups are expander graphs.

It also has implications for results of Banach-Tarski type; it shows that for any semisimple Lie group G, and for generic , one can use to create Banach-Tarski paradoxical decompositions for all homogeneous spaces of . In particular there is one pair of that gives paradoxical decompositions for all homogeneous spaces simultaneously.

Our argument is based on a concept that we call “degeneration”. Let be generic elements of , and suppose for contradiction that generically generated a group whose algebraic closure was conjugate to a proper algebraic subgroup of . Borel’s theorem lets us show that , and latex [w_1(a,b), w_2(a,b)]$ each generate maximal tori of , which by basic algebraic group theory can be used to show that must be a proper semisimple subgroup of of maximal rank. If we were in the model case , then we would already be done, as there are no such maximal rank semisimple subgroups; but in the other groups, such proper maximal semisimple groups unfortunately exist. Fortunately, they have been completely classified, and we take advantage of this classification in our argument.

The degeneration argument comes in as follows. Let be a *non*-generic pair in . Then lies in the Zariski closure of the generic pairs, which means that lies in the Zariski closure of the set formed by and its conjugates. In particular, if the non-generic pair is such that generates a group that is dense in some proper algebraic subgroup , then is in the Zariski closure of the union of the conjugates of . When this happens, we say that is a *degeneration* of . (For instance, could be the stabiliser of some non-degenerate quadratic form, and could be the stabiliser of a degenerate limit of that form.)

The key fact we need (that relies on the classification, and a certain amount of representation theory) is:

Proposition.Given any proper semisimple maximal rank subgroup of , there exists another proper semisimple subgroup that isnota degeneration of .

Using an induction hypothesis, we can find pairs such that generate a dense subgroup of , which together with the preceding discussion contradicts the proposition.

The proposition is currently proven by using some known facts about certain representation-theoretic invariants of all the semisimple subgroups of the classical and exceptional simple Lie groups. While the proof is of finite length, it is not particularly elegant, ultimately relying on the numerical value of one or more invariants of being sufficiently different from their counterparts for that one can prevent the latter being a degeneration of the former. Perhaps there is another way to proceed here that is not based so heavily on classification.

One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion. For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let be a natural number such that …”), which may or may not apply to the particular value of one may have in mind. Or, if one ever argues by dividing into separate cases (e.g. “Case 1: is even. … Case 2: is odd. …”), then for any given , at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives. But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or *reductio ad absurdum*) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that is equal to the ratio of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.

Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives. This can lead to confused exchanges such as the following:

Lecturer: “Theorem. Let be a prime number. Then…”

Student: “But how do you know that is a prime number? Couldn’t it be composite?”

or

Lecturer: “Now we see what the function does when we give it the input of instead. …”

Student: “But didn’t you just say that the input was equal to just a moment ago?”

This is not to say that counterfactual thinking is not encountered at all outside of mathematics. For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history. Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result. The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing. Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning. For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it. And if the proof of (say) a lemma is analogous to a short story, then the *statement* of that lemma can be viewed as analogous to the *moral* of that story.

Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses. Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.

Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics. Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set of real numbers is uncountable. This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps *because* of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics. (The only other two examples I know of that come close to doing this are the fact that the real number is equal to 1, and the solution to the blue-eyed islanders puzzle.)

Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist. Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.

I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction. Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.

It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses. This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter. This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.

For instance, consider the very first no-self-defeating result presented in the previous post:

Proposition 1 (No largest natural number).There does not exist a natural number that is larger than all the other natural numbers.

This is formulated in the “no self-defeating object” formulation. But it has a logically equivalent “any object can be defeated” form:

Proposition 1′.Given any natural number , one can find another natural number which is larger than .

**Proof.** Take .

While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take , so that the proof gives ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number). So there is a sense in which Proposition 1′ is more “non-counterfactual” or “constructive” than the “counterfactual” Proposition 1.

In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),

Proposition 3.There are infinitely many primes.

can be recast in “all objects can be defeated” form as

Proposition 3′. Let be a collection of primes. Then there exists a prime which is distinct from any of the primes .

**Proof.** Take to be any prime factor of (for instance, one could take the smallest prime factor, if one wished to be completely concrete). Since is not divisible by any of the primes , must be distinct from all of these primes.

One could argue that there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that was equal to one of the ), but the proposition itself is not inherently counterfactual, as it does not make as patently impossible a hypothesis as a finite enumeration of the primes. Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today. Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say , and it will actually exhibit a prime not in that list (in this case, ). The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.

[*Note*: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”. For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]

Let be a compact interval of positive length (thus ). Recall that a function is said to be *differentiable* at a point if the limit

exists. In that case, we call the *strong derivative*, *classical derivative*, or just *derivative* for short, of at . We say that is *everywhere differentiable*, or differentiable for short, if it is differentiable at all points , and *differentiable almost everywhere* if it is differentiable at almost every point . If is differentiable everywhere and its derivative is continuous, then we say that is *continuously differentiable*.

Remark 1Much later in this sequence, when we cover the theory of distributions, we will see the notion of aweak derivativeordistributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.

Exercise 1If is everywhere differentiable, show that is continuous and is measurable. If is almost everywhere differentiable, show that the (almost everywhere defined) function is measurable (i.e. it is equal to an everywhere defined measurable function on outside of a null set), but give an example to demonstrate that need not be continuous.

Exercise 2Give an example of a function which is everywhere differentiable, but not continuously differentiable. (Hint:choose an that vanishes quickly at some point, say at the origin , but which also oscillates rapidly near that point.)

In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.

Theorem 1 (Rolle’s theorem)Let be a compact interval of positive length, and let be a differentiable function such that . Then there exists such that .

*Proof:* By subtracting a constant from (which does not affect differentiability or the derivative) we may assume that . If is identically zero then the claim is trivial, so assume that is non-zero somewhere. By replacing with if necessary, we may assume that is positive somewhere, thus . On the other hand, as is continuous and is compact, must attain its maximum somewhere, thus there exists such that for all . Then must be positive and so cannot equal either or , and thus must lie in the interior. From the right limit of (1) we see that , while from the left limit we have . Thus and the claim follows.

Remark 2Observe that the same proof also works if is only differentiable in the interior of the interval , so long as it is continuous all the way up to the boundary of .

Exercise 3Give an example to show that Rolle’s theorem can fail if is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to concludeeverywheredifferentiability.

Remark 3It is important to note that Rolle’s theorem only works in the real scalar case when is real-valued, as it relies heavily on the least upper bound property for the domain . If, for instance, we consider complex-valued scalar functions , then the theorem can fail; for instance, the function defined by vanishes at both endpoints and is differentiable, but its derivative is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative both vanish somewhere, but the problem is that they don’tsimultaneouslyvanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as .

One can easily amplify Rolle’s theorem to the mean value theorem:

Corollary 2 (Mean value theorem)Let be a compact interval of positive length, and let be a differentiable function. Then there exists such that .

*Proof:* Apply Rolle’s theorem to the function .

Remark 4As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.

Exercise 4 (Uniqueness of antiderivatives up to constants)Let be a compact interval of positive length, and let and be differentiable functions. Show that for every if and only if for some constant and all .

We can use the mean value theorem to deduce one of the fundamental theorems of calculus:

Theorem 3 (Second fundamental theorem of calculus)Let be a differentiable function, such that is Riemann integrable. Then the Riemann integral of is equal to . In particular, we have whenever is continuously differentiable.

*Proof:* Let . By the definition of Riemann integrability, there exists a finite partition such that

for every choice of .

Fix this partition. From the mean value theorem, for each one can find such that

and thus by telescoping series

Since was arbitrary, the claim follows.

Remark 5Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.

Of course, we also have the other half of the fundamental theorem of calculus:

Theorem 4 (First fundamental theorem of calculus)Let be a compact interval of positive length. Let be a continuous function, and let be the indefinite integral . Then is differentiable on , with derivative for all . In particular, is continuously differentiable.

*Proof:* It suffices to show that

for all , and

for all . After a change of variables, we can write

for any and any sufficiently small , or any and any sufficiently small . As is continuous, the function converges uniformly to on as (keeping fixed). As the interval is bounded, thus converges to , and the claim follows.

Corollary 5 (Differentiation theorem for continuous functions)Let be a continuous function on a compact interval. Then we havefor all ,

for all , and thus

for all .

In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions are relaxed. Among the results proven in these notes are

- The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 5 continues to hold for
*almost*every if is merely absolutely integrable, rather than continuous; - A number of
*differentiation theorems*, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and - The second fundamental theorem of calculus for absolutely continuous functions.

The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »

This week I once again gave some public lectures on the cosmic distance ladder in astronomy, once at Stanford and once at UCLA. The slides I used were similar to the “version 3.0” slides I used for the same talk last year in Australia and elsewhere, but the images have been updated (and the permissions for copyrighted images secured), and some additional data has also been placed on them. I am placing these slides here on this blog, in Powerpoint format and also in PDF format. (Video for the UCLA talk should also be available on the UCLA web site at some point; I’ll add a link when it becomes available.)

These slides have evolved over a period of almost five years, particularly with regards to the imagery, but this is likely to be close to the final version. Here are some of the older iterations of the slides:

- (Version 1.0, 2006) A text-based version of the slides, together with accompanying figures.
- (Version 2.0, 2007) First conversion to Powerpoint format.
- (Version 3.0, 2009) Second conversion to Powerpoint format, with completely new imagery and a slightly different arrangement.
- (Version 4.0, 2010) Images updated from the previous version, with copyright permissions secured.
- (Version 4.1, 2010) The version used for the UCLA talk, with some additional data and calculations added.
- (Version 4.2, 2010) A slightly edited version, incorporating some corrections and feedback.

I have found that working on and polishing a single public lecture over a period of several years has been very rewarding and educational, especially given that I had very little public speaking experience at the beginning; there are several other mathematicians I know of who are also putting some effort into giving good talks that communicate mathematics and science to the general public, but I think there could potentially be many more such talks like this.

A note regarding copyright: I am happy to have the text or layout of these slides used as the basis for other presentations, so long as the source is acknowledged. However, some of the images in these slides are copyrighted by others, and permission by the copyright holders was granted only for the display of the slides in their current format. (The list of such images is given at the end of the slides.) So if you wish to adapt the slides for your own purposes, you may need to use slightly different imagery.

(*Update*, October 11: Version 4.2 uploaded, and notice on copyright added.)

(*Update*, October 20: Some photos from the UCLA talk are available here.)

(Update, October 25: Video from the talk is available on Youtube and on Itunes.)

The following question came up in my 245A class today:

Is it possible to express a non-closed interval in the real line, such as [0,1), as a countable union of disjoint closed intervals?

I was not able to answer the question immediately, but by the end of the class some of the students had come up with an answer. It is actually a nice little test of one’s basic knowledge of real analysis, so I am posing it here as well for anyone else who is interested. Below the fold is the answer to the question (whited out; one has to highlight the text in order to read it).

If one has a sequence of real numbers , it is unambiguous what it means for that sequence to converge to a limit : it means that for every , there exists an such that for all . Similarly for a sequence of complex numbers converging to a limit .

More generally, if one has a sequence of -dimensional vectors in a real vector space or complex vector space , it is also unambiguous what it means for that sequence to converge to a limit or ; it means that for every , there exists an such that for all . Here, the norm of a vector can be chosen to be the Euclidean norm , the supremum norm , or any other number of norms, but for the purposes of convergence, these norms are all *equivalent*; a sequence of vectors converges in the Euclidean norm if and only if it converges in the supremum norm, and similarly for any other two norms on the finite-dimensional space or .

If however one has a sequence of functions or on a common domain , and a putative limit or , there can now be many different ways in which the sequence may or may not converge to the limit . (One could also consider convergence of functions on different domains , but we will not discuss this issue at all here.) This is contrast with the situation with scalars or (which corresponds to the case when is a single point) or vectors (which corresponds to the case when is a finite set such as ). Once becomes infinite, the functions acquire an infinite number of degrees of freedom, and this allows them to approach in any number of inequivalent ways.

What different types of convergence are there? As an undergraduate, one learns of the following two basic modes of convergence:

- We say that converges to pointwise if, for every , converges to . In other words, for every and , there exists (that depends on
*both*and ) such that whenever . - We say that converges to uniformly if, for every , there exists such that for every , for every . The difference between uniform convergence and pointwise convergence is that with the former, the time at which must be permanently -close to is not permitted to depend on , but must instead be chosen uniformly in .

Uniform convergence implies pointwise convergence, but not conversely. A typical example: the functions defined by converge pointwise to the zero function , but not uniformly.

However, pointwise and uniform convergence are only two of dozens of many other modes of convergence that are of importance in analysis. We will not attempt to exhaustively enumerate these modes here (but see this Wikipedia page, and see also these 245B notes on strong and weak convergence). We will, however, discuss some of the modes of convergence that arise from measure theory, when the domain is equipped with the structure of a measure space , and the functions (and their limit ) are measurable with respect to this space. In this context, we have some additional modes of convergence:

- We say that converges to pointwise almost everywhere if, for (-)almost everywhere , converges to .
- We say that converges to
*uniformly almost everywhere*,*essentially uniformly*, or*in norm*if, for every , there exists such that for every , for -almost every . - We say that converges to
*almost uniformly*if, for every , there exists an exceptional set of measure such that converges uniformly to on the complement of . - We say that converges to in norm if the quantity converges to as .
- We say that converges to in measure if, for every , the measures converge to zero as .

Observe that each of these five modes of convergence is unaffected if one modifies or on a set of measure zero. In contrast, the pointwise and uniform modes of convergence can be affected if one modifies or even on a single point.

Remark 1In the context of probability theory, in which and are interpreted as random variables, convergence in norm is often referred to as convergence in mean, pointwise convergence almost everywhere is often referred to as almost sure convergence, and convergence in measure is often referred to as convergence in probability.

Exercise 1 (Linearity of convergence)Let be a measure space, let be sequences of measurable functions, and let be measurable functions.

- Show that converges to along one of the above seven modes of convergence if and only if converges to along the same mode.
- If converges to along one of the above seven modes of convergence, and converges to along the same mode, show that converges to along the same mode, and that converges to along the same mode for any .
- (Squeeze test) If converges to along one of the above seven modes, and pointwise for each , show that converges to along the same mode.

We have some easy implications between modes:

Exercise 2 (Easy implications)Let be a measure space, and let and be measurable functions.

- If converges to uniformly, then converges to pointwise.
- If converges to uniformly, then converges to in norm. Conversely, if converges to in norm, then converges to uniformly outside of a null set (i.e. there exists a null set such that the restriction of to the complement of converges to the restriction of ).
- If converges to in norm, then converges to almost uniformly.
- If converges to almost uniformly, then converges to pointwise almost everywhere.
- If converges to pointwise, then converges to pointwise almost everywhere.
- If converges to in norm, then converges to in measure.
- If converges to almost uniformly, then converges to in measure.

The reader is encouraged to draw a diagram that summarises the logical implications between the seven modes of convergence that the above exercise describes.

We give four key examples that distinguish between these modes, in the case when is the real line with Lebesgue measure. The first three of these examples already were introduced in the previous set of notes.

Example 1 (Escape to horizontal infinity)Let . Then converges to zero pointwise (and thus, pointwise almost everywhere), but not uniformly, in norm, almost uniformly, in norm, or in measure.

Example 2 (Escape to width infinity)Let . Then converges to zero uniformly (and thus, pointwise, pointwise almost everywhere, in norm, almost uniformly, and in measure), but not in norm.

Example 3 (Escape to vertical infinity)Let . Then converges to zero pointwise (and thus, pointwise almost everywhere) and almost uniformly (and hence in measure), but not uniformly, in norm, or in norm.

Example 4 (Typewriter sequence)Let be defined by the formulawhenever and . This is a sequence of indicator functions of intervals of decreasing length, marching across the unit interval over and over again. Then converges to zero in measure and in norm, but not pointwise almost everywhere (and hence also not pointwise, not almost uniformly, nor in norm, nor uniformly).

Remark 2Thenormof a measurable function is defined to the infimum of all the quantities that areessential upper boundsfor in the sense that for almost every . Then converges to in norm if and only if as . The and norms are part of the larger family of norms, which we will study in more detail in 245B.

One particular advantage of convergence is that, in the case when the are absolutely integrable, it implies convergence of the integrals,

as one sees from the triangle inequality. Unfortunately, none of the other modes of convergence automatically imply this convergence of the integral, as the above examples show.

The purpose of these notes is to compare these modes of convergence with each other. Unfortunately, the relationship between these modes is not particularly simple; unlike the situation with pointwise and uniform convergence, one cannot simply rank these modes in a linear order from strongest to weakest. This is ultimately because the different modes react in different ways to the three “escape to infinity” scenarios described above, as well as to the “typewriter” behaviour when a single set is “overwritten” many times. On the other hand, if one imposes some additional assumptions to shut down one or more of these escape to infinity scenarios, such as a finite measure hypothesis or a uniform integrability hypothesis, then one can obtain some additional implications between the different modes.

## Recent Comments