You are currently browsing the category archive for the ‘teaching’ category.

We now begin the rigorous theory of the incompressible Navier-Stokes equations

$\displaystyle \partial_t u + (u \cdot \nabla) u = \nu \Delta u - \nabla p \ \ \ \ \ (1)$

$\displaystyle \nabla \cdot u = 0,$

where ${\nu>0}$ is a given constant (the kinematic viscosity, or viscosity for short), ${u: I \times {\bf R}^d \rightarrow {\bf R}^d}$ is an unknown vector field (the velocity field), and ${p: I \times {\bf R}^d \rightarrow {\bf R}}$ is an unknown scalar field (the pressure field). Here ${I}$ is a time interval, usually of the form ${[0,T]}$ or ${[0,T)}$. We will either be interested in spatially decaying situations, in which ${u(t,x)}$ decays to zero as ${x \rightarrow \infty}$, or ${{\bf Z}^d}$-periodic (or periodic for short) settings, in which one has ${u(t, x+n) = u(t,x)}$ for all ${n \in {\bf Z}^d}$. (One can also require the pressure ${p}$ to be periodic as well; this brings up a small subtlety in the uniqueness theory for these equations, which we will address later in this set of notes.) As is usual, we abuse notation by identifying a ${{\bf Z}^d}$-periodic function on ${{\bf R}^d}$ with a function on the torus ${{\bf R}^d/{\bf Z}^d}$.

In order for the system (1) to even make sense, one requires some level of regularity on the unknown fields ${u,p}$; this turns out to be a relatively important technical issue that will require some attention later in this set of notes, and we will end up transforming (1) into other forms that are more suitable for lower regularity candidate solution. Our focus here will be on local existence of these solutions in a short time interval ${[0,T]}$ or ${[0,T)}$, for some ${T>0}$. (One could in principle also consider solutions that extend to negative times, but it turns out that the equations are not time-reversible, and the forward evolution is significantly more natural to study than the backwards one.) The study of Euler equations, in which ${\nu=0}$, will be deferred to subsequent lecture notes.

As the unknown fields involve a time parameter ${t}$, and the first equation of (1) involves time derivatives of ${u}$, the system (1) should be viewed as describing an evolution for the velocity field ${u}$. (As we shall see later, the pressure ${p}$ is not really an independent dynamical field, as it can essentially be expressed in terms of the velocity field without requiring any differentiation or integration in time.) As such, the natural question to study for this system is the initial value problem, in which an initial velocity field ${u_0: {\bf R}^d \rightarrow {\bf R}^d}$ is specified, and one wishes to locate a solution ${(u,p)}$ to the system (1) with initial condition

$\displaystyle u(0,x) = u_0(x) \ \ \ \ \ (2)$

for ${x \in {\bf R}^d}$. Of course, in order for this initial condition to be compatible with the second equation in (1), we need the compatibility condition

$\displaystyle \nabla \cdot u_0 = 0 \ \ \ \ \ (3)$

and one should also impose some regularity, decay, and/or periodicity hypotheses on ${u_0}$ in order to be compatible with corresponding level of regularity etc. on the solution ${u}$.

The fundamental questions in the local theory of an evolution equation are that of existence, uniqueness, and continuous dependence. In the context of the Navier-Stokes equations, these questions can be phrased (somewhat broadly) as follows:

• (a) (Local existence) Given suitable initial data ${u_0}$, does there exist a solution ${(u,p)}$ to the above initial value problem that exists for some time ${T>0}$? What can one say about the time ${T}$ of existence? How regular is the solution?
• (b) (Uniqueness) Is it possible to have two solutions ${(u,p), (u',p')}$ of a certain regularity class to the same initial value problem on a common time interval ${[0,T)}$? To what extent does the answer to this question depend on the regularity assumed on one or both of the solutions? Does one need to normalise the solutions beforehand in order to obtain uniqueness?
• (c) (Continuous dependence on data) If one perturbs the initial conditions ${u_0}$ by a small amount, what happens to the solution ${(u,p)}$ and on the time of existence ${T}$? (This question tends to only be sensible once one has a reasonable uniqueness theory.)

The answers to these questions tend to be more complicated than a simple “Yes” or “No”, for instance they can depend on the precise regularity hypotheses one wishes to impose on the data and on the solution, and even on exactly how one interprets the concept of a “solution”. However, once one settles on such a set of hypotheses, it generally happens that one either gets a “strong” theory (in which one has existence, uniqueness, and continuous dependence on the data), a “weak” theory (in which one has existence of somewhat low-quality solutions, but with only limited uniqueness results (or even some spectacular failures of uniqueness) and almost no continuous dependence on data), or no satsfactory theory whatsoever. In the former case, we say (roughly speaking) that the initial value problem is locally well-posed, and one can then try to build upon the theory to explore more interesting topics such as global existence and asymptotics, classifying potential blowup, rigorous justification of conservation laws, and so forth. With a weak local theory, it becomes much more difficult to address these latter sorts of questions, and there are serious analytic pitfalls that one could fall into if one tries too strenuously to treat weak solutions as if they were strong. (For instance, conservation laws that are rigorously justified for strong, high-regularity solutions may well fail for weak, low-regularity ones.) Also, even if one is primarily interested in solutions at one level of regularity, the well-posedness theory at another level of regularity can be very helpful; for instance, if one is interested in smooth solutions in ${{\bf R}^d}$, it turns out that the well-posedness theory at the critical regularity of ${\dot H^{\frac{d}{2}-1}({\bf R}^d)}$ can be used to establish globally smooth solutions from small initial data. As such, it can become quite important to know what kind of local theory one can obtain for a given equation.

This set of notes will focus on the “strong” theory, in which a substantial amount of regularity is assumed in the initial data and solution, giving a satisfactory (albeit largely local-in-time) well-posedness theory. “Weak” solutions will be considered in later notes.

The Navier-Stokes equations are not the simplest of partial differential equations to study, in part because they are an amalgam of three more basic equations, which behave rather differently from each other (for instance the first equation is nonlinear, while the latter two are linear):

• (a) Transport equations such as ${\partial_t u + (u \cdot \nabla) u = 0}$.
• (b) Diffusion equations (or heat equations) such as ${\partial_t u = \nu \Delta u}$.
• (c) Systems such as ${v = F - \nabla p}$, ${\nabla \cdot v = 0}$, which (for want of a better name) we will call Leray systems.

Accordingly, we will devote some time to getting some preliminary understanding of the linear diffusion and Leray systems before returning to the theory for the Navier-Stokes equation. Transport systems will be discussed further in subsequent notes; in this set of notes, we will instead focus on a more basic example of nonlinear equations, namely the first-order ordinary differential equation

$\displaystyle \partial_t u = F(u) \ \ \ \ \ (4)$

where ${u: I \rightarrow V}$ takes values in some finite-dimensional (real or complex) vector space ${V}$ on some time interval ${I}$, and ${F: V \rightarrow V}$ is a given linear or nonlinear function. (Here, we use “interval” to denote a connected non-empty subset of ${{\bf R}}$; in particular, we allow intervals to be half-infinite or infinite, or to be open, closed, or half-open.) Fundamental results in this area include the Picard existence and uniqueness theorem, the Duhamel formula, and Grönwall’s inequality; they will serve as motivation for the approach to local well-posedness that we will adopt in this set of notes. (There are other ways to construct strong or weak solutions for Navier-Stokes and Euler equations, which we will discuss in later notes.)

A key role in our treatment here will be played by the fundamental theorem of calculus (in various forms and variations). Roughly speaking, this theorem, and its variants, allow us to recast differential equations (such as (1) or (4)) as integral equations. Such integral equations are less tractable algebraically than their differential counterparts (for instance, they are not ideal for verifying conservation laws), but are significantly more convenient for well-posedness theory, basically because integration tends to increase the regularity of a function, while differentiation reduces it. (Indeed, the problem of “losing derivatives”, or more precisely “losing regularity”, is a key obstacle that one often has to address when trying to establish well-posedness for PDE, particularly those that are quite nonlinear and with rough initial data, though for nonlinear parabolic equations such as Navier-Stokes the obstacle is not as serious as it is for some other PDE, due to the smoothing effects of the heat equation.)

One weakness of the methods deployed here are that the quantitative bounds produced deteriorate to the point of uselessness in the inviscid limit ${\nu \rightarrow 0}$, rendering these techniques unsuitable for analysing the Euler equations in which ${\nu=0}$. However, some of the methods developed in later notes have bounds that remain uniform in the ${\nu \rightarrow 0}$ limit, allowing one to also treat the Euler equations.

In this and subsequent set of notes, we use the following asymptotic notation (a variant of Vinogradov notation that is commonly used in PDE and harmonic analysis). The statement ${X \lesssim Y}$, ${Y \gtrsim X}$, or ${X = O(Y)}$ will be used to denote an estimate of the form ${|X| \leq CY}$ (or equivalently ${Y \geq C^{-1} |X|}$) for some constant ${C}$, and ${X \sim Y}$ will be used to denote the estimates ${X \lesssim Y \lesssim X}$. If the constant ${C}$ depends on other parameters (such as the dimension ${d}$), this will be indicated by subscripts, thus for instance ${X \lesssim_d Y}$ denotes the estimate ${|X| \leq C_d Y}$ for some ${C_d}$ depending on ${d}$.

This coming fall quarter, I am teaching a class on topics in the mathematical theory of incompressible fluid equations, focusing particularly on the incompressible Euler and Navier-Stokes equations. These two equations are by no means the only equations used to model fluids, but I will focus on these two equations in this course to narrow the focus down to something manageable. I have not fully decided on the choice of topics to cover in this course, but I would probably begin with some core topics such as local well-posedness theory and blowup criteria, conservation laws, and construction of weak solutions, then move on to some topics such as boundary layers and the Prandtl equations, the Euler-Poincare-Arnold interpretation of the Euler equations as an infinite dimensional geodesic flow, and some discussion of the Onsager conjecture. I will probably also continue to more advanced and recent topics in the winter quarter.

In this initial set of notes, we begin by reviewing the physical derivation of the Euler and Navier-Stokes equations from the first principles of Newtonian mechanics, and specifically from Newton’s famous three laws of motion. Strictly speaking, this derivation is not needed for the mathematical analysis of these equations, which can be viewed if one wishes as an arbitrarily chosen system of partial differential equations without any physical motivation; however, I feel that the derivation sheds some insight and intuition on these equations, and is also worth knowing on purely intellectual grounds regardless of its mathematical consequences. I also find it instructive to actually see the journey from Newton’s law

$\displaystyle F = ma$

to the seemingly rather different-looking law

$\displaystyle \partial_t u + (u \cdot \nabla) u = -\nabla p + \nu \Delta u$

$\displaystyle \nabla \cdot u = 0$

for incompressible Navier-Stokes (or, if one drops the viscosity term ${\nu \Delta u}$, the Euler equations).

Our discussion in this set of notes is physical rather than mathematical, and so we will not be working at mathematical levels of rigour and precision. In particular we will be fairly casual about interchanging summations, limits, and integrals, we will manipulate approximate identities ${X \approx Y}$ as if they were exact identities (e.g., by differentiating both sides of the approximate identity), and we will not attempt to verify any regularity or convergence hypotheses in the expressions being manipulated. (The same holds for the exercises in this text, which also do not need to be justified at mathematical levels of rigour.) Of course, once we resume the mathematical portion of this course in subsequent notes, such issues will be an important focus of careful attention. This is a basic division of labour in mathematical modeling: non-rigorous heuristic reasoning is used to derive a mathematical model from physical (or other “real-life”) principles, but once a precise model is obtained, the analysis of that model should be completely rigorous if at all possible (even if this requires applying the model to regimes which do not correspond to the original physical motivation of that model). See the discussion by John Ball quoted at the end of these slides of Gero Friesecke for an expansion of these points.

Note: our treatment here will differ slightly from that presented in many fluid mechanics texts, in that it will emphasise first-principles derivations from many-particle systems, rather than relying on bulk laws of physics, such as the laws of thermodynamics, which we will not cover here. (However, the derivations from bulk laws tend to be more robust, in that they are not as reliant on assumptions about the particular interactions between particles. In particular, the physical hypotheses we assume in this post are probably quite a bit stronger than the minimal assumptions needed to justify the Euler or Navier-Stokes equations, which can hold even in situations in which one or more of the hypotheses assumed here break down.)

As readers who have followed my previous post will know, I have been spending the last few weeks extending my previous interactive text on propositional logic (entitied “QED”) to also cover first-order logic.  The text has now reached what seems to be a stable form, with a complete set of deductive rules for first-order logic with equality, and no major bugs as far as I can tell (apart from one weird visual bug I can’t eradicate, in that some graphics elements can occasionally temporarily disappear when one clicks on an item).  So it will likely not change much going forward.

I feel though that there could be more that could be done with this sort of framework (e.g., improved GUI, modification to other logics, developing the ability to write one’s own texts and libraries, exploring mathematical theories such as Peano arithmetic, etc.).  But writing this text (particularly the first-order logic sections) has brought me close to the limit of my programming ability, as the number of bugs introduced with each new feature implemented has begun to grow at an alarming rate.  I would like to repackage the code so that it can be re-used by more adept programmers for further possible applications, though I have never done something like this before and would appreciate advice on how to do so.   The code is already available under a Creative Commons licence, but I am not sure how readable and modifiable it will be to others currently.  [Update: it is now on GitHub.]

[One thing I noticed is that I would probably have to make more of a decoupling between the GUI elements, the underlying logical elements, and the interactive text.  For instance, at some point I made the decision (convenient at the time) to use some GUI elements to store some of the state variables of the text, e.g. the exercise buttons are currently storing the status of what exercises are unlocked or not.  This is presumably not an example of good programming practice, though it would be relatively easy to fix.  More seriously, due to my inability to come up with a good general-purpose matching algorithm (or even specification of such an algorithm) for the the laws of first-order logic, many of the laws have to be hard-coded into the matching routine, so one cannot currently remove them from the text.  It may well be that the best thing to do in fact is to rework the entire codebase from scratch using more professional software design methods.]

[Update, Aug 23: links moved to GitHub version.]

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown ${x}$ using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “(${A}$ AND ${B}$) OR (NOT ${C}$)”, where ${A,B,C}$ are some formulas). A typical such rule is Modus Ponens: if the sentence ${A}$ is known to be true, and the implication “${A}$ IMPLIES ${B}$” is also known to be true, then one can deduce that ${B}$ is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption ${A}$, one is able to derive the statement ${B}$, then one can conclude that the implication “${A}$ IMPLIES ${B}$” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

Important note: As this is not a course in probability, we will try to avoid developing the general theory of stochastic calculus (which includes such concepts as filtrations, martingales, and Ito calculus). This will unfortunately limit what we can actually prove rigorously, and so at some places the arguments will be somewhat informal in nature. A rigorous treatment of many of the topics here can be found for instance in Lawler’s Conformally Invariant Processes in the Plane, from which much of the material here is drawn.

In these notes, random variables will be denoted in boldface.

Definition 1 A real random variable ${\mathbf{X}}$ is said to be normally distributed with mean ${x_0 \in {\bf R}}$ and variance ${\sigma^2 > 0}$ if one has

$\displaystyle \mathop{\bf E} F(\mathbf{X}) = \frac{1}{\sqrt{2\pi} \sigma} \int_{\bf R} e^{-(x-x_0)^2/2\sigma^2} F(x)\ dx$

for all test functions ${F \in C_c({\bf R})}$. Similarly, a complex random variable ${\mathbf{Z}}$ is said to be normally distributed with mean ${z_0 \in {\bf R}}$ and variance ${\sigma^2>0}$ if one has

$\displaystyle \mathop{\bf E} F(\mathbf{Z}) = \frac{1}{\pi \sigma^2} \int_{\bf C} e^{-|z-x_0|^2/\sigma^2} F(z)\ dx dy$

for all test functions ${F \in C_c({\bf C})}$, where ${dx dy}$ is the area element on ${{\bf C}}$.

A real Brownian motion with base point ${x_0 \in {\bf R}}$ is a random, almost surely continuous function ${\mathbf{B}^{x_0}: [0,+\infty) \rightarrow {\bf R}}$ (using the locally uniform topology on continuous functions) with the property that (almost surely) ${\mathbf{B}^{x_0}(0) = x_0}$, and for any sequence of times ${0 \leq t_0 < t_1 < t_2 < \dots < t_n}$, the increments ${\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})}$ for ${i=1,\dots,n}$ are independent real random variables that are normally distributed with mean zero and variance ${t_i - t_{i-1}}$. Similarly, a complex Brownian motion with base point ${z_0 \in {\bf R}}$ is a random, almost surely continuous function ${\mathbf{B}^{z_0}: [0,+\infty) \rightarrow {\bf R}}$ with the property that ${\mathbf{B}^{z_0}(0) = z_0}$ and for any sequence of times ${0 \leq t_0 < t_1 < t_2 < \dots < t_n}$, the increments ${\mathbf{B}^{z_0}(t_i) - \mathbf{B}^{z_0}(t_{i-1})}$ for ${i=1,\dots,n}$ are independent complex random variables that are normally distributed with mean zero and variance ${t_i - t_{i-1}}$.

Remark 2 Thanks to the central limit theorem, the hypothesis that the increments ${\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})}$ be normally distributed can be dropped from the definition of a Brownian motion, so long as one retains the independence and the normalisation of the mean and variance (technically one also needs some uniform integrability on the increments beyond the second moment, but we will not detail this here). A similar statement is also true for the complex Brownian motion (where now we need to normalise the variances and covariances of the real and imaginary parts of the increments).

Real and complex Brownian motions exist from any base point ${x_0}$ or ${z_0}$; see e.g. this previous blog post for a construction. We have the following simple invariances:

Exercise 3

• (i) (Translation invariance) If ${\mathbf{B}^{x_0}}$ is a real Brownian motion with base point ${x_0 \in {\bf R}}$, and ${h \in {\bf R}}$, show that ${\mathbf{B}^{x_0}+h}$ is a real Brownian motion with base point ${x_0+h}$. Similarly, if ${\mathbf{B}^{z_0}}$ is a complex Brownian motion with base point ${z_0 \in {\bf R}}$, and ${h \in {\bf C}}$, show that ${\mathbf{B}^{z_0}+c}$ is a complex Brownian motion with base point ${z_0+h}$.
• (ii) (Dilation invariance) If ${\mathbf{B}^{0}}$ is a real Brownian motion with base point ${0}$, and ${\lambda \in {\bf R}}$ is non-zero, show that ${t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})}$ is also a real Brownian motion with base point ${0}$. Similarly, if ${\mathbf{B}^0}$ is a complex Brownian motion with base point ${0}$, and ${\lambda \in {\bf C}}$ is non-zero, show that ${t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})}$ is also a complex Brownian motion with base point ${0}$.
• (iii) (Real and imaginary parts) If ${\mathbf{B}^0}$ is a complex Brownian motion with base point ${0}$, show that ${\sqrt{2} \mathrm{Re} \mathbf{B}^0}$ and ${\sqrt{2} \mathrm{Im} \mathbf{B}^0}$ are independent real Brownian motions with base point ${0}$. Conversely, if ${\mathbf{B}^0_1, \mathbf{B}^0_2}$ are independent real Brownian motions of base point ${0}$, show that ${\frac{1}{\sqrt{2}} (\mathbf{B}^0_1 + i \mathbf{B}^0_2)}$ is a complex Brownian motion with base point ${0}$.

The next lemma is a special case of the optional stopping theorem.

Lemma 4 (Optional stopping identities)

• (i) (Real case) Let ${\mathbf{B}^{x_0}}$ be a real Brownian motion with base point ${x_0 \in {\bf R}}$. Let ${\mathbf{t}}$ be a bounded stopping time – a bounded random variable with the property that for any time ${t \geq 0}$, the event that ${\mathbf{t} \leq t}$ is determined by the values of the trajectory ${\mathbf{B}^{x_0}}$ for times up to ${t}$ (or more precisely, this event is measurable with respect to the ${\sigma}$ algebra generated by this proprtion of the trajectory). Then

$\displaystyle \mathop{\bf E} \mathbf{B}^{x_0}(\mathbf{t}) = x_0$

and

$\displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^2 - \mathbf{t} = 0$

and

$\displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^4 = O( \mathop{\bf E} \mathbf{t}^2 ).$

• (ii) (Complex case) Let ${\mathbf{B}^{z_0}}$ be a real Brownian motion with base point ${z_0 \in {\bf R}}$. Let ${\mathbf{t}}$ be a bounded stopping time – a bounded random variable with the property that for any time ${t \geq 0}$, the event that ${\mathbf{t} \leq t}$ is determined by the values of the trajectory ${\mathbf{B}^{x_0}}$ for times up to ${t}$. Then

$\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t}) = z_0$

$\displaystyle \mathop{\bf E} (\mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0$

$\displaystyle \mathop{\bf E} (\mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0$

$\displaystyle \mathop{\bf E} \mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) \mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) = 0$

$\displaystyle \mathop{\bf E} |\mathbf{B}^{x_0}(\mathbf{t})-z_0|^4 = O( \mathop{\bf E} \mathbf{t}^2 ).$

Proof: (Slightly informal) We just prove (i) and leave (ii) as an exercise. By translation invariance we can take ${x_0=0}$. Let ${T}$ be an upper bound for ${\mathbf{t}}$. Since ${\mathbf{B}^0(T)}$ is a real normally distributed variable with mean zero and variance ${T}$, we have

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T ) = 0$

and

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^2 = T$

and

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^4 = 3T^2.$

By the law of total expectation, we thus have

$\displaystyle \mathop{\bf E} \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 0$

and

$\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = T$

and

$\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 3T^2$

where the inner conditional expectations are with respect to the event that ${\mathbf{t}, \mathbf{B}^{0}(\mathbf{t})}$ attains a particular point in ${S}$. However, from the independent increment nature of Brownian motion, once one conditions ${(\mathbf{t}, \mathbf{B}^{0}(\mathbf{t}))}$ to a fixed point ${(t, x)}$, the random variable ${\mathbf{B}^0(T)}$ becomes a real normally distributed variable with mean ${x}$ and variance ${T-t}$. Thus we have

$\displaystyle \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})$

and

$\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^2 + T - \mathbf{t}$

and

$\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^4 + 6(T - \mathbf{t}) \mathbf{B}^{z_0}(\mathbf{t})^2 + 3(T - \mathbf{t})^2$

which give the first two claims, and (after some algebra) the identity

$\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t})^4 - 6 \mathbf{t} \mathbf{B}^{z_0}(\mathbf{t})^2 + 3 \mathbf{t}^2 = 0$

which then also gives the third claim. $\Box$

Exercise 5 Prove the second part of Lemma 4.

We now approach conformal maps from yet another perspective. Given an open subset ${U}$ of the complex numbers ${{\bf C}}$, define a univalent function on ${U}$ to be a holomorphic function ${f: U \rightarrow {\bf C}}$ that is also injective. We will primarily be studying this concept in the case when ${U}$ is the unit disk ${D(0,1) := \{ z \in {\bf C}: |z| < 1 \}}$.

Clearly, a univalent function ${f: D(0,1) \rightarrow {\bf C}}$ on the unit disk is a conformal map from ${D(0,1)}$ to the image ${f(D(0,1))}$; in particular, ${f(D(0,1))}$ is simply connected, and not all of ${{\bf C}}$ (since otherwise the inverse map ${f^{-1}: {\bf C} \rightarrow D(0,1)}$ would violate Liouville’s theorem). In the converse direction, the Riemann mapping theorem tells us that every open simply connected proper subset ${V \subsetneq {\bf C}}$ of the complex numbers is the image of a univalent function on ${D(0,1)}$. Furthermore, if ${V}$ contains the origin, then the univalent function ${f: D(0,1) \rightarrow {\bf C}}$ with this image becomes unique once we normalise ${f(0) = 0}$ and ${f'(0) > 0}$. Thus the Riemann mapping theorem provides a one-to-one correspondence between open simply connected proper subsets of the complex plane containing the origin, and univalent functions ${f: D(0,1) \rightarrow {\bf C}}$ with ${f(0)=0}$ and ${f'(0)>0}$. We will focus particular attention on the univalent functions ${f: D(0,1) \rightarrow {\bf C}}$ with the normalisation ${f(0)=0}$ and ${f'(0)=1}$; such functions will be called schlicht functions.

One basic example of a univalent function on ${D(0,1)}$ is the Cayley transform ${z \mapsto \frac{1+z}{1-z}}$, which is a Möbius transformation from ${D(0,1)}$ to the right half-plane ${\{ \mathrm{Re}(z) > 0 \}}$. (The slight variant ${z \mapsto \frac{1-z}{1+z}}$ is also referred to as the Cayley transform, as is the closely related map ${z \mapsto \frac{z-i}{z+i}}$, which maps ${D(0,1)}$ to the upper half-plane.) One can square this map to obtain a further univalent function ${z \mapsto \left( \frac{1+z}{1-z} \right)^2}$, which now maps ${D(0,1)}$ to the complex numbers with the negative real axis ${(-\infty,0]}$ removed. One can normalise this function to be schlicht to obtain the Koebe function

$\displaystyle f(z) := \frac{1}{4}\left( \left( \frac{1+z}{1-z} \right)^2 - 1\right) = \frac{z}{(1-z)^2}, \ \ \ \ \ (1)$

which now maps ${D(0,1)}$ to the complex numbers with the half-line ${(-\infty,-1/4]}$ removed. A little more generally, for any ${\theta \in {\bf R}}$ we have the rotated Koebe function

$\displaystyle f(z) := \frac{z}{(1 - e^{i\theta} z)^2} \ \ \ \ \ (2)$

that is a schlicht function that maps ${D(0,1)}$ to the complex numbers with the half-line ${\{ -re^{-i\theta}: r \geq 1/4\}}$ removed.

Every schlicht function ${f: D(0,1) \rightarrow {\bf C}}$ has a convergent Taylor expansion

$\displaystyle f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots$

for some complex coefficients ${a_1,a_2,\dots}$ with ${a_1=1}$. For instance, the Koebe function has the expansion

$\displaystyle f(z) = z + 2 z^2 + 3 z^3 + \dots = \sum_{n=1}^\infty n z^n$

and similarly the rotated Koebe function has the expansion

$\displaystyle f(z) = z + 2 e^{i\theta} z^2 + 3 e^{2i\theta} z^3 + \dots = \sum_{n=1}^\infty n e^{(n-1)\theta} z^n.$

Intuitively, the Koebe function and its rotations should be the “largest” schlicht functions available. This is formalised by the famous Bieberbach conjecture, which asserts that for any schlicht function, the coefficients ${a_n}$ should obey the bound ${|a_n| \leq n}$ for all ${n}$. After a large number of partial results, this conjecture was eventually solved by de Branges; see for instance this survey of Korevaar or this survey of Koepf for a history.

It turns out that to resolve these sorts of questions, it is convenient to restrict attention to schlicht functions ${g: D(0,1) \rightarrow {\bf C}}$ that are odd, thus ${g(-z)=-g(z)}$ for all ${z}$, and the Taylor expansion now reads

$\displaystyle g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots$

for some complex coefficients ${b_1,b_3,\dots}$ with ${b_1=1}$. One can transform a general schlicht function ${f: D(0,1) \rightarrow {\bf C}}$ to an odd schlicht function ${g: D(0,1) \rightarrow {\bf C}}$ by observing that the function ${f(z^2)/z^2: D(0,1) \rightarrow {\bf C}}$, after removing the singularity at zero, is a non-zero function that equals ${1}$ at the origin, and thus (as ${D(0,1)}$ is simply connected) has a unique holomorphic square root ${(f(z^2)/z^2)^{1/2}}$ that also equals ${1}$ at the origin. If one then sets

$\displaystyle g(z) := z (f(z^2)/z^2)^{1/2} \ \ \ \ \ (3)$

it is not difficult to verify that ${g}$ is an odd schlicht function which additionally obeys the equation

$\displaystyle f(z^2) = g(z)^2. \ \ \ \ \ (4)$

Conversely, given an odd schlicht function ${g}$, the formula (4) uniquely determines a schlicht function ${f}$.

For instance, if ${f}$ is the Koebe function (1), ${g}$ becomes

$\displaystyle g(z) = \frac{z}{1-z^2} = z + z^3 + z^5 + \dots, \ \ \ \ \ (5)$

which maps ${D(0,1)}$ to the complex numbers with two slits ${\{ \pm iy: y > 1/2 \}}$ removed, and if ${f}$ is the rotated Koebe function (2), ${g}$ becomes

$\displaystyle g(z) = \frac{z}{1- e^{i\theta} z^2} = z + e^{i\theta} z^3 + e^{2i\theta} z^5 + \dots. \ \ \ \ \ (6)$

De Branges established the Bieberbach conjecture by first proving an analogous conjecture for odd schlicht functions known as Robertson’s conjecture. More precisely, we have

Theorem 1 (de Branges’ theorem) Let ${n \geq 1}$ be a natural number.

• (i) (Robertson conjecture) If ${g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots}$ is an odd schlicht function, then

$\displaystyle \sum_{k=1}^n |b_{2k-1}|^2 \leq n.$

• (ii) (Bieberbach conjecture) If ${f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots}$ is a schlicht function, then

$\displaystyle |a_n| \leq n.$

It is easy to see that the Robertson conjecture for a given value of ${n}$ implies the Bieberbach conjecture for the same value of ${n}$. Indeed, if ${f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots}$ is schlicht, and ${g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots}$ is the odd schlicht function given by (3), then from extracting the ${z^{2n}}$ coefficient of (4) we obtain a formula

$\displaystyle a_n = \sum_{j=1}^n b_{2j-1} b_{2(n+1-j)-1}$

for the coefficients of ${f}$ in terms of the coefficients of ${g}$. Applying the Cauchy-Schwarz inequality, we derive the Bieberbach conjecture for this value of ${n}$ from the Robertson conjecture for the same value of ${n}$. We remark that Littlewood and Paley had conjectured a stronger form ${|b_{2k-1}| \leq 1}$ of Robertson’s conjecture, but this was disproved for ${k=3}$ by Fekete and Szegö.

To prove the Robertson and Bieberbach conjectures, one first takes a logarithm and deduces both conjectures from a similar conjecture about the Taylor coefficients of ${\log \frac{f(z)}{z}}$, known as the Milin conjecture. Next, one continuously enlarges the image ${f(D(0,1))}$ of the schlicht function to cover all of ${{\bf C}}$; done properly, this places the schlicht function ${f}$ as the initial function ${f = f_0}$ in a sequence ${(f_t)_{t \geq 0}}$ of univalent maps ${f_t: D(0,1) \rightarrow {\bf C}}$ known as a Loewner chain. The functions ${f_t}$ obey a useful differential equation known as the Loewner equation, that involves an unspecified forcing term ${\mu_t}$ (or ${\theta(t)}$, in the case that the image is a slit domain) coming from the boundary; this in turn gives useful differential equations for the Taylor coefficients of ${f(z)}$, ${g(z)}$, or ${\log \frac{f(z)}{z}}$. After some elementary calculus manipulations to “integrate” this equations, the Bieberbach, Robertson, and Milin conjectures are then reduced to establishing the non-negativity of a certain explicit hypergeometric function, which is non-trivial to prove (and will not be done here, except for small values of ${n}$) but for which several proofs exist in the literature.

The theory of Loewner chains subsequently became fundamental to a more recent topic in complex analysis, that of the Schramm-Loewner equation (SLE), which is the focus of the next and final set of notes.

We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map ${f: U \rightarrow V}$ from an open subset ${U}$ of the complex plane to another open set ${V}$ is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of ${f}$ to be nowhere vanishing. We then say that the two open sets ${U,V}$ are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation ${f( z_0 + \Delta z) \approx f(z_0) + f'(z_0) \Delta z + O( |\Delta z|^2)}$ we see that they almost preserve small circles, indeed for ${\varepsilon}$ small the circle ${\{ z: |z-z_0| = \varepsilon\}}$ will approximately map to ${\{ w: |w - f(z_0)| = |f'(z_0)| \varepsilon \}}$.

Theorem 1 (Riemann mapping theorem) Let ${U}$ be a simply connected open subset of ${{\bf C}}$ that is not all of ${{\bf C}}$. Then ${U}$ is conformally equivalent to the unit disk ${D(0,1)}$.

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps ${f: U \rightarrow D(0,1)}$ from ${U}$ to ${D(0,1)}$ that map some fixed point ${z_0 \in U}$ to ${0}$, pick one that maximises the magnitude ${|f'(z_0)|}$ of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If ${f(U)}$ avoids some point in ${D(0,1)}$, one can compose ${f}$ with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase ${|f'(z_0)|}$ without destroying injectivity; see the previous lecture notes for details. The conformal map ${\phi: U \rightarrow D(0,1)}$ is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points ${z_0,z_1}$ in ${U}$, and requiring ${\phi(z_0)}$ to be zero and ${\phi(z_1)}$ to be positive real.

It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).

To make the above discussion more precise we need some notation.

Definition 2 (Circle packing) A (finite) circle packing is a finite collection ${(C_j)_{j \in J}}$ of circles ${C_j = \{ z \in {\bf C}: |z-z_j| = r_j\}}$ in the complex numbers indexed by some finite set ${J}$, whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices ${\{z_j: j \in J \}}$ are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.

The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5 Let ${G}$ be a connected planar graph with ${n \geq 3}$ vertices. Show that the following are equivalent:

• (i) ${G}$ is a maximal planar graph.
• (ii) ${G}$ has ${3n-6}$ edges.
• (iii) Every drawing ${D}$ of ${G}$ divides the plane into faces that have three edges each. (This includes one unbounded face.)
• (iv) At least one drawing ${D}$ of ${G}$ divides the plane into faces that have three edges each.

(Hint: use Euler’s formula ${V-E+F=2}$, where ${F}$ is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture) Let ${U}$ be a simply connected domain, with two distinct points ${z_0,z_1}$. Let ${\phi: U \rightarrow D(0,1)}$ be the conformal map from ${U}$ to ${D(0,1)}$ that maps ${z_0}$ to the origin and ${z_1}$ to a positive real. For any small ${\varepsilon>0}$, let ${{\mathcal C}_\varepsilon}$ be the portion of the regular hexagonal circle packing by circles of radius ${\varepsilon}$ that are contained in ${U}$, and let ${{\mathcal C}'_\varepsilon}$ be an circle packing of ${D(0,1)}$ with all “boundary circles” tangent to ${D(0,1)}$, giving rise to an “approximate map” ${\phi_\varepsilon: U_\varepsilon \rightarrow D(0,1)}$ defined on the subset ${U_\varepsilon}$ of ${U}$ consisting of the circles of ${{\mathcal C}_\varepsilon}$, their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that ${\phi_\varepsilon(z_0)}$ is zero and ${\phi_\varepsilon(z_1)}$ is a positive real. Then ${\phi_\varepsilon}$ converges to ${\phi}$ as ${\varepsilon \rightarrow 0}$.

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation ${\frac{\partial \phi}{\partial \overline{z}} = 0}$, while (sufficiently smooth) quasiconformal maps only obey an inequality ${|\frac{\partial \phi}{\partial \overline{z}}| \leq \frac{K-1}{K+1} |\frac{\partial \phi}{\partial z}|}$. As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map ${\phi_\varepsilon}$ alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).

The fundamental object of study in real differential geometry are the real manifolds: Hausdorff topological spaces ${M = M^n}$ that locally look like open subsets of a Euclidean space ${{\bf R}^n}$, and which can be equipped with an atlas ${(\phi_\alpha: U_\alpha \rightarrow V_\alpha)_{\alpha \in A}}$ of coordinate charts ${\phi_\alpha: U_\alpha \rightarrow V_\alpha}$ from open subsets ${U_\alpha}$ covering ${M}$ to open subsets ${V_\alpha}$ in ${{\bf R}^n}$, which are homeomorphisms; in particular, the transition maps ${\tau_{\alpha,\beta}: \phi_\alpha( U_\alpha \cap U_\beta ) \rightarrow \phi_\beta( U_\alpha \cap U_\beta )}$ defined by ${\tau_{\alpha,\beta}: \phi_\beta \circ \phi_\alpha^{-1}}$ are all continuous. (It is also common to impose the requirement that the manifold ${M}$ be second countable, though this will not be important for the current discussion.) A smooth real manifold is a real manifold in which the transition maps are all smooth.

In a similar fashion, the fundamental object of study in complex differential geometry are the complex manifolds, in which the model space is ${{\bf C}^n}$ rather than ${{\bf R}^n}$, and the transition maps ${\tau_{\alpha\beta}}$ are required to be holomorphic (and not merely smooth or continuous). In the real case, the one-dimensional manifolds (curves) are quite simple to understand, particularly if one requires the manifold to be connected; for instance, all compact connected one-dimensional real manifolds are homeomorphic to the unit circle (why?). However, in the complex case, the connected one-dimensional manifolds – the ones that look locally like subsets of ${{\bf C}}$ – are much richer, and are known as Riemann surfaces. For sake of completeness we give the (somewhat lengthy) formal definition:

Definition 1 (Riemann surface) If ${M}$ is a Hausdorff connected topological space, a (one-dimensional complex) atlas is a collection ${(\phi_\alpha: U_\alpha \rightarrow V_\alpha)_{\alpha \in A}}$ of homeomorphisms from open subsets ${(U_\alpha)_{\alpha \in A}}$ of ${M}$ that cover ${M}$ to open subsets ${V_\alpha}$ of the complex numbers ${{\bf C}}$, such that the transition maps ${\tau_{\alpha,\beta}: \phi_\alpha( U_\alpha \cap U_\beta ) \rightarrow \phi_\beta( U_\alpha \cap U_\beta )}$ defined by ${\tau_{\alpha,\beta}: \phi_\beta \circ \phi_\alpha^{-1}}$ are all holomorphic. Here ${A}$ is an arbitrary index set. Two atlases ${(\phi_\alpha: U_\alpha \rightarrow V_\alpha)_{\alpha \in A}}$, ${(\phi'_\beta: U'_\beta \rightarrow V'_\beta)_{\beta \in B}}$ on ${M}$ are said to be equivalent if their union is also an atlas, thus the transition maps ${\phi'_\beta \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha \cap U'_\beta) \rightarrow \phi'_\beta(U_\alpha \cap U'_\beta)}$ and their inverses are all holomorphic. A Riemann surface is a Hausdorff connected topological space ${M}$ equipped with an equivalence class of one-dimensional complex atlases.

A map ${f: M \rightarrow M'}$ from one Riemann surface ${M}$ to another ${M'}$ is holomorphic if the maps ${\phi'_\beta \circ f \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha \cap f^{-1}(U'_\beta)) \rightarrow {\bf C}}$ are holomorphic for any charts ${\phi_\alpha: U_\alpha \rightarrow V_\alpha}$, ${\phi'_\beta: U'_\beta \rightarrow V'_\beta}$ of an atlas of ${M}$ and ${M'}$ respectively; it is not hard to see that this definition does not depend on the choice of atlas. It is also clear that the composition of two holomorphic maps is holomorphic (and in fact the class of Riemann surfaces with their holomorphic maps forms a category).

Here are some basic examples of Riemann surfaces.

Example 2 (Quotients of ${{\bf C}}$) The complex numbers ${{\bf C}}$ clearly form a Riemann surface (using the identity map ${\phi: {\bf C} \rightarrow {\bf C}}$ as the single chart for an atlas). Of course, maps ${f: {\bf C} \rightarrow {\bf C}}$ that are holomorphic in the usual sense will also be holomorphic in the sense of the above definition, and vice versa, so the notion of holomorphicity for Riemann surfaces is compatible with that of holomorphicity for complex maps. More generally, given any discrete additive subgroup ${\Lambda}$ of ${{\bf C}}$, the quotient ${{\bf C}/\Lambda}$ is a Riemann surface. There are an infinite number of possible atlases to use here; one such is to pick a sufficiently small neighbourhood ${U}$ of the origin in ${{\bf C}}$ and take the atlas ${(\phi_\alpha: U_\alpha \rightarrow U)_{\alpha \in {\bf C}/\Lambda}}$ where ${U_\alpha := \alpha+U}$ and ${\phi_\alpha(\alpha+z) := z}$ for all ${z \in U}$. In particular, given any non-real complex number ${\omega}$, the complex torus ${{\bf C} / \langle 1, \omega \rangle}$ formed by quotienting ${{\bf C}}$ by the lattice ${\langle 1, \omega \rangle := \{ n + m \omega: n,m \in {\bf Z}\}}$ is a Riemann surface.

Example 3 Any open connected subset ${U}$ of ${{\bf C}}$ is a Riemann surface. By the Riemann mapping theorem, all simply connected open ${U \subset {\bf C}}$, other than ${{\bf C}}$ itself, are isomorphic (as Riemann surfaces) to the unit disk (or, equivalently, to the upper half-plane).

Example 4 (Riemann sphere) The Riemann sphere ${{\bf C} \cup \{\infty\}}$, as a topological manifold, is the one-point compactification of ${{\bf C}}$. Topologically, this is a sphere and is in particular connected. One can cover the Riemann sphere by the two open sets ${U_1 := {\bf C}}$ and ${U_2 := {\bf C} \cup \{\infty\} \backslash \{0\}}$, and give these two open sets the charts ${\phi_1: U_1 \rightarrow {\bf C}}$ and ${\phi_2: U_2 \rightarrow {\bf C}}$ defined by ${\phi_1(z) := z}$ for ${z \in {\bf C}}$, ${\phi_2(z) := 1/z}$ for ${z \in {\bf C} \backslash \{0\}}$, and ${\phi_2(\infty) := 0}$. This is a complex atlas since the ${1/z}$ is holomorphic on ${{\bf C} \backslash \{0\}}$.

An alternate way of viewing the Riemann sphere is as the projective line ${\mathbf{CP}^1}$. Topologically, this is the punctured complex plane ${{\bf C}^2 \backslash \{(0,0)\}}$ quotiented out by non-zero complex dilations, thus elements of this space are equivalence classes ${[z,w] := \{ (\lambda z, \lambda w): \lambda \in {\bf C} \backslash \{0\}\}}$ with the usual quotient topology. One can cover this space by two open sets ${U_1 := \{ [z,1]: z \in {\bf C} \}}$ and ${U_2: \{ [1,w]: w \in {\bf C} \}}$ and give these two open sets the charts ${\phi: U_1 \rightarrow {\bf C}}$ and ${\phi_2: U_2 \rightarrow {\bf C}}$ defined by ${\phi_1([z,1]) := z}$ for ${z \in {\bf C}}$, ${\phi_2([1,w]) := w}$. This is a complex atlas, basically because ${[z,1] = [1,1/z]}$ for ${z \in {\bf C} \backslash \{0\}}$ and ${1/z}$ is holomorphic on ${{\bf C} \backslash \{0\}}$.

Exercise 5 Verify that the Riemann sphere is isomorphic (as a Riemann surface) to the projective line.

Example 6 (Smooth algebraic plane curves) Let ${P(z_1,z_2,z_3)}$ be a complex polynomial in three variables which is homogeneous of some degree ${d \geq 1}$, thus

$\displaystyle P( \lambda z_1, \lambda z_2, \lambda z_3) = \lambda^d P( z_1, z_2, z_3). \ \ \ \ \ (1)$

Define the complex projective plane ${\mathbf{CP}^2}$ to be the punctured space ${{\bf C}^3 \backslash \{0\}}$ quotiented out by non-zero complex dilations, with the usual quotient topology. (There is another important topology to place here of fundamental importance in algebraic geometry, namely the Zariski topology, but we will ignore this topology here.) This is a compact space, whose elements are equivalence classes ${[z_1,z_2,z_3] := \{ (\lambda z_1, \lambda z_2, \lambda z_3)\}}$. Inside this plane we can define the (projective, degree ${d}$) algebraic curve

$\displaystyle Z(P) := \{ [z_1,z_2,z_3] \in \mathbf{CP}^2: P(z_1,z_2,z_3) = 0 \};$

this is well defined thanks to (1). It is easy to verify that ${Z(P)}$ is a closed subset of ${\mathbf{CP}^2}$ and hence compact; it is non-empty thanks to the fundamental theorem of algebra.

Suppose that ${P}$ is irreducible, which means that it is not the product of polynomials of smaller degree. As we shall show in the appendix, this makes the algebraic curve connected. (Actually, algebraic curves remain connected even in the reducible case, thanks to Bezout’s theorem, but we will not prove that theorem here.) We will in fact make the stronger nonsingularity hypothesis: there is no triple ${(z_1,z_2,z_3) \in {\bf C}^3 \backslash \{(0,0,0)\}}$ such that the four numbers ${P(z_1,z_2,z_3), \frac{\partial}{\partial z_j} P(z_1,z_2,z_3)}$ simultaneously vanish for ${j=1,2,3}$. (This looks like four constraints, but is in fact essentially just three, due to the Euler identity

$\displaystyle \sum_{j=1}^3 z_j \frac{\partial}{\partial z_j} P(z_1,z_2,z_3) = d P(z_1,z_2,z_3)$

that arises from differentiating (1) in ${\lambda}$. The fact that nonsingularity implies irreducibility is another consequence of Bezout’s theorem, which is not proven here.) For instance, the polynomial ${z_1^2 z_3 - z_2^3}$ is irreducible but singular (there is a “cusp” singularity at ${[0,0,1]}$). With this hypothesis, we call the curve ${Z(P)}$ smooth.

Now suppose ${[z_1,z_2,z_3]}$ is a point in ${Z(P)}$; without loss of generality we may take ${z_3}$ non-zero, and then we can normalise ${z_3=1}$. Now one can think of ${P(z_1,z_2,1)}$ as an inhomogeneous polynomial in just two variables ${z_1,z_2}$, and by nondegeneracy we see that the gradient ${(\frac{\partial}{\partial z_1} P(z_1,z_2,1), \frac{\partial}{\partial z_2} P(z_1,z_2,1))}$ is non-zero whenever ${P(z_1,z_2,1)=0}$. By the (complexified) implicit function theorem, this ensures that the affine algebraic curve

$\displaystyle Z(P)_{aff} := \{ (z_1,z_2) \in {\bf C}^2: P(z_1,z_2,1) = 0 \}$

is a Riemann surface in a neighbourhood of ${(z_1,z_2,1)}$; we leave this as an exercise. This can be used to give a coordinate chart for ${Z(P)}$ in a neighbourhood of ${[z_1,z_2,z_3]}$ when ${z_3 \neq 0}$. Similarly when ${z_1,z_2}$ is non-zero. This can be shown to give an atlas on ${Z(P)}$, which (assuming the connectedness claim that we will prove later) gives ${Z(P)}$ the structure of a Riemann surface.

Exercise 7 State and prove a complex version of the implicit function theorem that justifies the above claim that the charts in the above example form an atlas, and an algebraic curve associated to a non-singular polynomial is a Riemann surface.

Exercise 8

• (i) Show that all (irreducible plane projective) algebraic curves of degree ${1}$ are isomorphic to the Riemann sphere. (Hint: reduce to an explicit linear polynomial such as ${z_3}$.)
• (ii) Show that all (irreducible plane projective) algebraic curves of degree ${2}$ are isomorphic to the Riemann sphere. (Hint: to reduce computation, first use some linear algebra to reduce the homogeneous quadratic polynomial to a standard form, such as ${z_1^2+z_2^2+z_3^2}$ or ${z_2 z_3 - z_1^2}$.)

Exercise 9 If ${a,b}$ are complex numbers, show that the projective cubic curve

$\displaystyle \{ [z_1, z_2, z_3]: z_2^2 z_3 = z_1^3 + a z_1 z_3^2 + b z_3^3 \}$

is nonsingular if and only if the discriminant ${-16 (4a^3 + 27b^2)}$ is non-zero. (When this occurs, the curve is called an elliptic curve (in Weierstrass form), which is a fundamentally important example of a Riemann surface in many areas of mathematics, and number theory in particular. One can also define the discriminant for polynomials of higher degree, but we will not do so here.)

A recurring theme in mathematics is that an object ${X}$ is often best studied by understanding spaces of “good” functions on ${X}$. In complex analysis, there are two basic types of good functions:

Definition 10 Let ${X}$ be a Riemann surface. A holomorphic function on ${X}$ is a holomorphic map from ${X}$ to ${{\bf C}}$; the space of all such functions will be denoted ${{\mathcal O}(X)}$. A meromorphic function on ${X}$ is a holomorphic map from ${X}$ to the Riemann sphere ${{\bf C} \cup \{\infty\}}$, that is not identically equal to ${\infty}$; the space of all such functions will be denoted ${M(X)}$.

One can also define holomorphicity and meromorphicity in terms of charts: a function ${f: X \rightarrow {\bf C}}$ is holomorphic if and only if, for any chart ${\phi_\alpha: U_\alpha \rightarrow {\bf C}}$, the map ${f \circ \phi^{-1}_\alpha: \phi_\alpha(U_\alpha) \rightarrow {\bf C}}$ is holomorphic in the usual complex analysis sense; similarly, a function ${f: X \rightarrow {\bf C} \cup \{\infty\}}$ is meromorphic if and only if the preimage ${f^{-1}(\{\infty\})}$ is discrete (otherwise, by analytic continuation and the connectedness of ${X}$, ${f}$ will be identically equal to ${\infty}$) and for any chart ${\phi_\alpha: U_\alpha \rightarrow X}$, the map ${f \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha) \rightarrow {\bf C} \cup \{\infty\}}$ becomes a meromorphic function in the usual complex analysis sense, after removing the discrete set of complex numbers where this map is infinite. One consequence of this alternate definition is that the space ${{\mathcal O}(X)}$ of holomorphic functions is a commutative complex algebra (a complex vector space closed under pointwise multiplication), while the space ${M(X)}$ of meromorphic functions is a complex field (a commutative complex algebra where every non-zero element has an inverse). Another consequence is that one can define the notion of a zero of given order ${k}$, or a pole of order ${k}$, for a holomorphic or meromorphic function, by composing with a chart map and using the usual complex analysis notions there, noting (from the holomorphicity of transition maps and their inverses) that this does not depend on the choice of chart. (However, one cannot similarly define the residue of a meromorphic function on ${X}$ this way, as the residue turns out to be chart-dependent thanks to the chain rule. Residues should instead be applied to meromorphic ${1}$-forms, a concept we will introduce later.) A third consequence is analytic continuation: if two holomorphic or meromorphic functions on ${X}$ agree on a non-empty open set, then they agree everywhere.

On the complex numbers ${{\bf C}}$, there are of course many holomorphic functions and meromorphic functions; for instance any power series with an infinite radius of convergence will give a holomorphic function, and the quotient of any two such functions (with non-zero denominator) will give a meromorphic function. Furthermore, we have extremely wide latitude in how to specify the zeroes of the holomorphic function, or the zeroes and poles of the meromorphic function, thanks to tools such as the Weierstrass factorisation theorem or the Mittag-Leffler theorem (covered in previous quarters).

It turns out, however, that the situation changes dramatically when the Riemann surface ${X}$ is compact, with the holomorphic and meromorphic functions becoming much more rigid. First of all, compactness eliminates all holomorphic functions except for the constants:

Lemma 11 Let ${f \in \mathcal{O}(X)}$ be a holomorphic function on a compact Riemann surface ${X}$. Then ${f}$ is constant.

This result should be seen as a close sibling of Liouville’s theorem that all bounded entire functions are constant. (Indeed, in the case of a complex torus, this lemma is a corollary of Liouville’s theorem.)

Proof: As ${f}$ is continuous and ${X}$ is compact, ${|f(z_0)|}$ must attain a maximum at some point ${z_0 \in X}$. Working in a chart around ${z_0}$ and applying the maximum principle, we conclude that ${f}$ is constant in a neighbourhood of ${z_0}$, and hence is constant everywhere by analytic continuation. $\Box$

This dramatically cuts down the number of possible meromorphic functions – indeed, for an abstract Riemann surface, it is not immediately obvious that there are any non-constant meromorphic functions at all! As the poles are isolated and the surface is compact, a meromorphic function can only have finitely many poles, and if one prescribes the location of the poles and the maximum order at each pole, then we shall see that the space of meromorphic functions is now finite dimensional. The precise dimensions of these spaces are in fact rather interesting, and obey a basic duality law known as the Riemann-Roch theorem. We will give a mostly self-contained proof of the Riemann-Roch theorem in these notes, omitting only some facts about genus and Euler characteristic, as well as construction of certain meromorphic ${1}$-forms (also known as Abelian differentials).

Next quarter (starting Monday, April 2) I will be teaching Math 246C (complex analysis) here at UCLA.  This is the third in a three-series graduate course on complex analysis; a few years ago I taught the first course in this series (246A), so this course can be thought of in some sense as a sequel to that one (and would certainly assume knowledge of the material in that course as a prerequisite), although it also assumes knowledge of material from the second course 246B (which covers such topics as Weierstrass factorization and the theory of harmonic functions).

246C is primarily a topics course, and tends to be a somewhat miscellaneous collection of complex analysis subjects that were not covered in the previous two installments of the series.  The initial topics I have in mind to cover are

• The Riemann-Roch theorem;
• Circle packings;
• The Bieberbach conjecture (proven by de Branges); and
• the Schramm-Loewner equation (SLE).
• This list is however subject to change (it is the first time I will have taught on any of these topics, and I am not yet certain on the most logical way to arrange them; also I am not completely certain that I will be able to cover all the above topics in ten weeks).  I welcome reference recommendations and other suggestions from readers who have taught on one or more of these topics.

As usual, I will be posting lecture notes on this blog as the course progresses.

[Update: Mar 13: removed elliptic functions, as I have just learned that this was already covered in the prior 246B course.]

Suppose one has a bounded sequence ${(a_n)_{n=1}^\infty = (a_1, a_2, \dots)}$ of real numbers. What kinds of limits can one form from this sequence?

Of course, we have the usual notion of limit ${\lim_{n \rightarrow \infty} a_n}$, which in this post I will refer to as the classical limit to distinguish from the other limits discussed in this post. The classical limit, if it exists, is the unique real number ${L}$ such that for every ${\varepsilon>0}$, one has ${|a_n-L| \leq \varepsilon}$ for all sufficiently large ${n}$. We say that a sequence is (classically) convergent if its classical limit exists. The classical limit obeys many useful limit laws when applied to classically convergent sequences. Firstly, it is linear: if ${(a_n)_{n=1}^\infty}$ and ${(b_n)_{n=1}^\infty}$ are classically convergent sequences, then ${(a_n+b_n)_{n=1}^\infty}$ is also classically convergent with

$\displaystyle \lim_{n \rightarrow \infty} (a_n + b_n) = (\lim_{n \rightarrow \infty} a_n) + (\lim_{n \rightarrow \infty} b_n) \ \ \ \ \ (1)$

and similarly for any scalar ${c}$, ${(ca_n)_{n=1}^\infty}$ is classically convergent with

$\displaystyle \lim_{n \rightarrow \infty} (ca_n) = c \lim_{n \rightarrow \infty} a_n. \ \ \ \ \ (2)$

It is also an algebra homomorphism: ${(a_n b_n)_{n=1}^\infty}$ is also classically convergent with

$\displaystyle \lim_{n \rightarrow \infty} (a_n b_n) = (\lim_{n \rightarrow \infty} a_n) (\lim_{n \rightarrow \infty} b_n). \ \ \ \ \ (3)$

We also have shift invariance: if ${(a_n)_{n=1}^\infty}$ is classically convergent, then so is ${(a_{n+1})_{n=1}^\infty}$ with

$\displaystyle \lim_{n \rightarrow \infty} a_{n+1} = \lim_{n \rightarrow \infty} a_n \ \ \ \ \ (4)$

and more generally in fact for any injection ${\phi: {\bf N} \rightarrow {\bf N}}$, ${(a_{\phi(n)})_{n=1}^\infty}$ is classically convergent with

$\displaystyle \lim_{n \rightarrow \infty} a_{\phi(n)} = \lim_{n \rightarrow \infty} a_n. \ \ \ \ \ (5)$

The classical limit of a sequence is unchanged if one modifies any finite number of elements of the sequence. Finally, we have boundedness: for any classically convergent sequence ${(a_n)_{n=1}^\infty}$, one has

$\displaystyle \inf_n a_n \leq \lim_{n \rightarrow \infty} a_n \leq \sup_n a_n. \ \ \ \ \ (6)$

One can in fact show without much difficulty that these laws uniquely determine the classical limit functional on convergent sequences.

One would like to extend the classical limit notion to more general bounded sequences; however, when doing so one must give up one or more of the desirable limit laws that were listed above. Consider for instance the sequence ${a_n = (-1)^n}$. On the one hand, one has ${a_n^2 = 1}$ for all ${n}$, so if one wishes to retain the homomorphism property (3), any “limit” of this sequence ${a_n}$ would have to necessarily square to ${1}$, that is to say it must equal ${+1}$ or ${-1}$. On the other hand, if one wished to retain the shift invariance property (4) as well as the homogeneity property (2), any “limit” of this sequence would have to equal its own negation and thus be zero.

Nevertheless there are a number of useful generalisations and variants of the classical limit concept for non-convergent sequences that obey a significant portion of the above limit laws. For instance, we have the limit superior

$\displaystyle \limsup_{n \rightarrow \infty} a_n := \inf_N \sup_{n \geq N} a_n$

$\displaystyle \liminf_{n \rightarrow \infty} a_n := \sup_N \inf_{n \geq N} a_n$

which are well-defined real numbers for any bounded sequence ${(a_n)_{n=1}^\infty}$; they agree with the classical limit when the sequence is convergent, but disagree otherwise. They enjoy the shift-invariance property (4), and the boundedness property (6), but do not in general obey the homomorphism property (3) or the linearity property (1); indeed, we only have the subadditivity property

$\displaystyle \limsup_{n \rightarrow \infty} (a_n + b_n) \leq (\limsup_{n \rightarrow \infty} a_n) + (\limsup_{n \rightarrow \infty} b_n)$

for the limit superior, and the superadditivity property

$\displaystyle \liminf_{n \rightarrow \infty} (a_n + b_n) \geq (\liminf_{n \rightarrow \infty} a_n) + (\liminf_{n \rightarrow \infty} b_n)$

for the limit inferior. The homogeneity property (2) is only obeyed by the limits superior and inferior for non-negative ${c}$; for negative ${c}$, one must have the limit inferior on one side of (2) and the limit superior on the other, thus for instance

$\displaystyle \limsup_{n \rightarrow \infty} (-a_n) = - \liminf_{n \rightarrow \infty} a_n.$

The limit superior and limit inferior are examples of limit points of the sequence, which can for instance be defined as points that are limits of at least one subsequence of the original sequence. Indeed, the limit superior is always the largest limit point of the sequence, and the limit inferior is always the smallest limit point. However, limit points can be highly non-unique (indeed they are unique if and only if the sequence is classically convergent), and so it is difficult to sensibly interpret most of the usual limit laws in this setting, with the exception of the homogeneity property (2) and the boundedness property (6) that are easy to state for limit points.

Another notion of limit are the Césaro limits

$\displaystyle \mathrm{C}\!-\!\lim_{n \rightarrow \infty} a_n := \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N a_n;$

if this limit exists, we say that the sequence is Césaro convergent. If the sequence ${(a_n)_{n=1}^\infty}$ already has a classical limit, then it also has a Césaro limit that agrees with the classical limit; but there are additional sequences that have a Césaro limit but not a classical one. For instance, the non-classically convergent sequence ${a_n= (-1)^n}$ discussed above is Césaro convergent, with a Césaro limit of ${0}$. However, there are still bounded sequences that do not have Césaro limit, such as ${a_n := \sin( \log n )}$ (exercise!). The Césaro limit is linear, bounded, and shift invariant, but not an algebra homomorphism and also does not obey the rearrangement property (5).

Using the Hahn-Banach theorem, one can extend the classical limit functional to generalised limit functionals ${\mathop{\widetilde \lim}_{n \rightarrow \infty} a_n}$, defined to be bounded linear functionals from the space ${\ell^\infty({\bf N})}$ of bounded real sequences to the real numbers ${{\bf R}}$ that extend the classical limit functional (defined on the space ${c_0({\bf N}) + {\bf R}}$ of convergent sequences) without any increase in the operator norm. (In some of my past writings I made the slight error of referring to these generalised limit functionals as Banach limits, though as discussed below, the latter actually refers to a subclass of generalised limit functionals.) It is not difficult to see that such generalised limit functionals will range between the limit inferior and limit superior. In fact, for any specific sequence ${(a_n)_{n=1}^\infty}$ and any number ${L}$ lying in the closed interval ${[\liminf_{n \rightarrow \infty} a_n, \limsup_{n \rightarrow \infty} a_n]}$, there exists at least one generalised limit functional ${\mathop{\widetilde \lim}_{n \rightarrow \infty}}$ that takes the value ${L}$ when applied to ${a_n}$; for instance, for any number ${\theta}$ in ${[-1,1]}$, there exists a generalised limit functional that assigns that number ${\theta}$ as the “limit” of the sequence ${a_n = (-1)^n}$. This claim can be seen by first designing such a limit functional on the vector space spanned by the convergent sequences and by ${(a_n)_{n=1}^\infty}$, and then appealing to the Hahn-Banach theorem to extend to all sequences. This observation also gives a necessary and sufficient criterion for a bounded sequence ${(a_n)_{n=1}^\infty}$ to classically converge to a limit ${L}$, namely that all generalised limits of this sequence must equal ${L}$.

Because of the reliance on the Hahn-Banach theorem, the existence of generalised limits requires the axiom of choice (or some weakened version thereof); there are presumably models of set theory without the axiom of choice in which no generalised limits exist, but I do not know of an explicit reference for this.

Generalised limits can obey the shift-invariance property (4) or the algebra homomorphism property (3), but as the above analysis of the sequence ${a_n = (-1)^n}$ shows, they cannot do both. Generalised limits that obey the shift-invariance property (4) are known as Banach limits; one can for instance construct them by applying the Hahn-Banach theorem to the Césaro limit functional; alternatively, if ${\mathop{\widetilde \lim}}$ is any generalised limit, then the Césaro-type functional ${(a_n)_{n=1}^\infty \mapsto \mathop{\widetilde \lim}_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N a_n}$ will be a Banach limit. The existence of Banach limits can be viewed as a demonstration of the amenability of the natural numbers (or integers); see this previous blog post for further discussion.

Generalised limits that obey the algebra homomorphism property (3) are known as ultrafilter limits. If one is given a generalised limit functional ${p\!-\!\lim_{n \rightarrow \infty}}$ that obeys (2), then for any subset ${A}$ of the natural numbers ${{\bf N}}$, the generalised limit ${p\!-\!\lim_{n \rightarrow \infty} 1_A(n)}$ must equal its own square (since ${1_A(n)^2 = 1_A(n)}$) and is thus either ${0}$ or ${1}$. If one defines ${p \subset 2^{2^{\bf N}}}$ to be the collection of all subsets ${A}$ of ${{\bf N}}$ for which ${p\!-\!\lim_{n \rightarrow \infty} 1_A(n) = 1}$, one can verify that ${p}$ obeys the axioms of a non-principal ultrafilter. Conversely, if ${p}$ is a non-principal ultrafilter, one can define the associated generalised limit ${p\!-\!\lim_{n \rightarrow \infty} a_n}$ of any bounded sequence ${(a_n)_{n=1}^\infty}$ to be the unique real number ${L}$ such that the sets ${\{ n \in {\bf N}: |a_n - L| \leq \varepsilon \}}$ lie in ${p}$ for all ${\varepsilon>0}$; one can check that this does indeed give a well-defined generalised limit that obeys (2). Non-principal ultrafilters can be constructed using Zorn’s lemma. In fact, they do not quite need the full strength of the axiom of choice; see the Wikipedia article on the ultrafilter lemma for examples.

We have previously noted that generalised limits of a sequence can converge to any point between the limit inferior and limit superior. The same is not true if one restricts to Banach limits or ultrafilter limits. For instance, by the arguments already given, the only possible Banach limit for the sequence ${a_n = (-1)^n}$ is zero. Meanwhile, an ultrafilter limit must converge to a limit point of the original sequence, but conversely every limit point can be attained by at least one ultrafilter limit; we leave these assertions as an exercise to the interested reader. In particular, a bounded sequence converges classically to a limit ${L}$ if and only if all ultrafilter limits converge to ${L}$.

There is no generalisation of the classical limit functional to any space that includes non-classically convergent sequences that obeys the subsequence property (5), since any non-classically-convergent sequence will have one subsequence that converges to the limit superior, and another subsequence that converges to the limit inferior, and one of these will have to violate (5) since the limit superior and limit inferior are distinct. So the above limit notions come close to the best generalisations of limit that one can use in practice.

We summarise the above discussion in the following table:

 Limit Always defined Linear Shift-invariant Homomorphism Constructive Classical No Yes Yes Yes Yes Superior Yes No Yes No Yes Inferior Yes No Yes No Yes Césaro No Yes Yes No Yes Generalised Yes Yes Depends Depends No Banach Yes Yes Yes No No Ultrafilter Yes Yes No Yes No