You are currently browsing the category archive for the ‘math.GT’ category.

Let ${k}$ be a field, and let ${E}$ be a finite extension of that field; in this post we will denote such a relationship by ${k \hookrightarrow E}$. We say that ${E}$ is a Galois extension of ${k}$ if the cardinality of the automorphism group ${\mathrm{Aut}(E/k)}$ of ${E}$ fixing ${k}$ is as large as it can be, namely the degree ${[E:k]}$ of the extension. In that case, we call ${\mathrm{Aut}(E/k)}$ the Galois group of ${E}$ over ${k}$ and denote it also by ${\mathrm{Gal}(E/k)}$. The fundamental theorem of Galois theory then gives a one-to-one correspondence (also known as the Galois correspondence) between the intermediate extensions between ${E}$ and ${k}$ and the subgroups of ${\mathrm{Gal}(E/k)}$:

Theorem 1 (Fundamental theorem of Galois theory) Let ${E}$ be a Galois extension of ${k}$.

• (i) If ${k \hookrightarrow F \hookrightarrow E}$ is an intermediate field betwen ${k}$ and ${E}$, then ${E}$ is a Galois extension of ${F}$, and ${\mathrm{Gal}(E/F)}$ is a subgroup of ${\mathrm{Gal}(E/k)}$.
• (ii) Conversely, if ${H}$ is a subgroup of ${\mathrm{Gal}(E/k)}$, then there is a unique intermediate field ${k \hookrightarrow F \hookrightarrow E}$ such that ${\mathrm{Gal}(E/F)=H}$; namely ${F}$ is the set of elements of ${E}$ that are fixed by ${H}$.
• (iii) If ${k \hookrightarrow F_1 \hookrightarrow E}$ and ${k \hookrightarrow F_2 \hookrightarrow E}$, then ${F_1 \hookrightarrow F_2}$ if and only if ${\mathrm{Gal}(E/F_2)}$ is a subgroup of ${\mathrm{Gal}(E/F_1)}$.
• (iv) If ${k \hookrightarrow F \hookrightarrow E}$ is an intermediate field between ${k}$ and ${E}$, then ${F}$ is a Galois extension of ${k}$ if and only if ${\mathrm{Gal}(E/F)}$ is a normal subgroup of ${\mathrm{Gal}(E/k)}$. In that case, ${\mathrm{Gal}(F/k)}$ is isomorphic to the quotient group ${\mathrm{Gal}(E/k) / \mathrm{Gal}(E/F)}$.

Example 2 Let ${k= {\bf Q}}$, and let ${E = {\bf Q}(e^{2\pi i/n})}$ be the degree ${\phi(n)}$ Galois extension formed by adjoining a primitive ${n^{th}}$ root of unity (that is to say, ${E}$ is the cyclotomic field of order ${n}$). Then ${\mathrm{Gal}(E/k)}$ is isomorphic to the multiplicative cyclic group ${({\bf Z}/n{\bf Z})^\times}$ (the invertible elements of the ring ${{\bf Z}/n{\bf Z}}$). Amongst the intermediate fields, one has the cyclotomic fields of the form ${F = {\bf Q}(e^{2\pi i/m})}$ where ${m}$ divides ${n}$; they are also Galois extensions, with ${\mathrm{Gal}(F/k)}$ isomorphic to ${({\bf Z}/m{\bf Z})^\times}$ and ${\mathrm{Gal}(E/F)}$ isomorphic to the elements ${a}$ of ${({\bf Z}/n{\bf Z})^\times}$ such that ${a(n/m) = (n/m)}$ modulo ${n}$. (There can also be other intermediate fields, corresponding to other subgroups of ${({\bf Z}/n{\bf Z})^\times}$.)

Example 3 Let ${k = {\bf C}(z)}$ be the field of rational functions of one indeterminate ${z}$ with complex coefficients, and let ${E = {\bf C}(w)}$ be the field formed by adjoining an ${n^{th}}$ root ${w = z^{1/n}}$ to ${k}$, thus ${k = {\bf C}(w^n)}$. Then ${E}$ is a degree ${n}$ Galois extension of ${k}$ with Galois group isomorphic to ${{\bf Z}/n{\bf Z}}$ (with an element ${a \in {\bf Z}/n{\bf Z}}$ corresponding to the field automorphism of ${k}$ that sends ${w}$ to ${e^{2\pi i a/n} w}$). The intermediate fields are of the form ${F = {\bf C}(w^{n/m})}$ where ${m}$ divides ${n}$; they are also Galois extensions, with ${\mathrm{Gal}(F/k)}$ isomorphic to ${{\bf Z}/m{\bf Z}}$ and ${\mathrm{Gal}(E/F)}$ isomorphic to the multiples of ${m}$ in ${{\bf Z}/n{\bf Z}}$.

There is an analogous Galois correspondence in the covering theory of manifolds. For simplicity we restrict attention to finite covers. If ${L}$ is a connected manifold and ${\pi_{L \leftarrow M}: M \rightarrow L}$ is a finite covering map of ${L}$ by another connected manifold ${M}$, we denote this relationship by ${L \leftarrow M}$. (Later on we will change our function notations slightly and write ${\pi_{L \leftarrow M}: L \leftarrow M}$ in place of the more traditional ${\pi_{L \leftarrow M}: M \rightarrow L}$, and similarly for the deck transformations ${g: M \leftarrow M}$ below; more on this below the fold.) If ${L \leftarrow M}$, we can define ${\mathrm{Aut}(M/L)}$ to be the group of deck transformations: continuous maps ${g: M \rightarrow M}$ which preserve the fibres of ${\pi}$. We say that this covering map is a Galois cover if the cardinality of the group ${\mathrm{Aut}(M/L)}$ is as large as it can be. In that case we call ${\mathrm{Aut}(M/L)}$ the Galois group of ${M}$ over ${L}$ and denote it by ${\mathrm{Gal}(M/L)}$.

Suppose ${M}$ is a finite cover of ${L}$. An intermediate cover ${N}$ between ${M}$ and ${L}$ is a cover of ${N}$ by ${L}$, such that ${L \leftarrow N \leftarrow M}$, in such a way that the covering maps are compatible, in the sense that ${\pi_{L \leftarrow M}}$ is the composition of ${\pi_{L \leftarrow N}}$ and ${\pi_{N \leftarrow M}}$. This sort of compatibilty condition will be implicitly assumed whenever we chain together multiple instances of the ${\leftarrow}$ notation. Two intermediate covers ${N,N'}$ are equivalent if they cover each other, in a fashion compatible with all the other covering maps, thus ${L \leftarrow N \leftarrow N' \leftarrow M}$ and ${L \leftarrow N' \leftarrow N \leftarrow M}$. We then have the analogous Galois correspondence:

Theorem 4 (Fundamental theorem of covering spaces) Let ${L \leftarrow M}$ be a Galois covering.

• (i) If ${L \leftarrow N \leftarrow M}$ is an intermediate cover betwen ${L}$ and ${M}$, then ${M}$ is a Galois extension of ${N}$, and ${\mathrm{Gal}(M/N)}$ is a subgroup of ${\mathrm{Gal}(M/L)}$.
• (ii) Conversely, if ${H}$ is a subgroup of ${\mathrm{Gal}(M/L)}$, then there is a intermediate cover ${L \leftarrow N \leftarrow M}$, unique up to equivalence, such that ${\mathrm{Gal}(M/N)=H}$.
• (iii) If ${L \leftarrow N_1 \leftarrow M}$ and ${L \leftarrow N_2 \leftarrow M}$, then ${L \leftarrow N_1 \leftarrow N_2 \leftarrow M}$ if and only if ${\mathrm{Gal}(M/N_2)}$ is a subgroup of ${\mathrm{Gal}(M/N_1)}$.
• (iv) If ${L \leftarrow N \leftarrow M}$, then ${N}$ is a Galois cover of ${L}$ if and only if ${\mathrm{Gal}(M/N)}$ is a normal subgroup of ${\mathrm{Gal}(M/L)}$. In that case, ${\mathrm{Gal}(N/L)}$ is isomorphic to the quotient group ${\mathrm{Gal}(M/L) / \mathrm{Gal}(M/N)}$.

Example 5 Let ${L= {\bf C}^\times := {\bf C} \backslash \{0\}}$, and let ${M = {\bf C}^\times}$ be the ${n}$-fold cover of ${L}$ with covering map ${\pi_{L \leftarrow M}(w) := w^n}$. Then ${M}$ is a Galois cover of ${L}$, and ${\mathrm{Gal}(M/L)}$ is isomorphic to the cyclic group ${{\bf Z}/n{\bf Z}}$. The intermediate covers are (up to equivalence) of the form ${N = {\bf C}^\times}$ with covering map ${\pi_{L \leftarrow N}(u) := u^m}$ where ${m}$ divides ${n}$; they are also Galois covers, with ${\mathrm{Gal}(N/L)}$ isomorphic to ${{\bf Z}/m{\bf Z}}$ and ${\mathrm{Gal}(M/N)}$ isomorphic to the multiples of ${m}$ in ${{\bf Z}/n{\bf Z}}$.

Given the strong similarity between the two theorems, it is natural to ask if there is some more concrete connection between Galois theory and the theory of finite covers.

In one direction, if the manifolds ${L,M,N}$ have an algebraic structure (or a complex structure), then one can relate covering spaces to field extensions by considering the field of rational functions (or meromorphic functions) on the space. For instance, if ${L = {\bf C}^\times}$ and ${z}$ is the coordinate on ${L}$, one can consider the field ${{\bf C}(z)}$ of rational functions on ${L}$; the ${n}$-fold cover ${M = {\bf C}^\times}$ with coordinate ${w}$ from Example 5 similarly has a field ${{\bf C}(w)}$ of rational functions. The covering ${\pi_{L \leftarrow M}(w) = w^n}$ relates the two coordinates ${z,w}$ by the relation ${z = w^n}$, at which point one sees that the rational functions ${{\bf C}(w)}$ on ${L}$ are a degree ${n}$ extension of that of ${{\bf C}(z)}$ (formed by adjoining the ${n^{th}}$ root of unity ${w}$ to ${z}$). In this way we see that Example 5 is in fact closely related to Example 3.

Exercise 6 What happens if one uses meromorphic functions in place of rational functions in the above example? (To answer this question, I found it convenient to use a discrete Fourier transform associated to the multiplicative action of the ${n^{th}}$ roots of unity on ${M}$ to decompose the meromorphic functions on ${M}$ as a linear combination of functions invariant under this action, times a power ${w^j}$ of the coordinate ${w}$ for ${j=0,\dots,n-1}$.)

I was curious however about the reverse direction. Starting with some field extensions ${k \hookrightarrow F \hookrightarrow E}$, is it is possible to create manifold like spaces ${M_k \leftarrow M_F \leftarrow M_E}$ associated to these fields in such a fashion that (say) ${M_E}$ behaves like a “covering space” to ${M_k}$ with a group ${\mathrm{Aut}(M_E/M_k)}$ of deck transformations isomorphic to ${\mathrm{Aut}(E/k)}$, so that the Galois correspondences agree? Also, given how the notion of a path (and associated concepts such as loops, monodromy and the fundamental group) play a prominent role in the theory of covering spaces, can spaces such as ${M_k}$ or ${M_E}$ also come with a notion of a path that is somehow compatible with the Galois correspondence?

The standard answer from modern algebraic geometry (as articulated for instance in this nice MathOverflow answer by Minhyong Kim) is to set ${M_E}$ equal to the spectrum ${\mathrm{Spec}(E)}$ of the field ${E}$. As a set, the spectrum ${\mathrm{Spec}(R)}$ of a commutative ring ${R}$ is defined as the set of prime ideals of ${R}$. Generally speaking, the map ${R \mapsto \mathrm{Spec}(R)}$ that maps a commutative ring to its spectrum tends to act like an inverse of the operation that maps a space ${X}$ to a ring of functions on that space. For instance, if one considers the commutative ring ${{\bf C}[z, z^{-1}]}$ of regular functions on ${M = {\bf C}^\times}$, then each point ${z_0}$ in ${M}$ gives rise to the prime ideal ${\{ f \in {\bf C}[z, z^{-1}]: f(z_0)=0\}}$, and one can check that these are the only such prime ideals (other than the zero ideal ${(0)}$), giving an almost one-to-one correspondence between ${\mathrm{Spec}( {\bf C}[z,z^{-1}] )}$ and ${M}$. (The zero ideal corresponds instead to the generic point of ${M}$.)

Of course, the spectrum of a field such as ${E}$ is just a point, as the zero ideal ${(0)}$ is the only prime ideal. Naively, it would then seem that there is not enough space inside such a point to support a rich enough structure of paths to recover the Galois theory of this field. In modern algebraic geometry, one addresses this issue by considering not just the set-theoretic elements of ${E}$, but more general “base points” ${p: \mathrm{Spec}(b) \rightarrow \mathrm{Spec}(E)}$ that map from some other (affine) scheme ${\mathrm{Spec}(b)}$ to ${\mathrm{Spec}(E)}$ (one could also consider non-affine base points of course). One has to rework many of the fundamentals of the subject to accommodate this “relative point of view“, for instance replacing the usual notion of topology with an étale topology, but once one does so one obtains a very satisfactory theory.

As an exercise, I set myself the task of trying to interpret Galois theory as an analogue of covering space theory in a more classical fashion, without explicit reference to more modern concepts such as schemes, spectra, or étale topology. After some experimentation, I found a reasonably satisfactory way to do so as follows. The space ${M_E}$ that one associates with ${E}$ in this classical perspective is not the single point ${\mathrm{Spec}(E)}$, but instead the much larger space consisting of ring homomorphisms ${p: E \rightarrow b}$ from ${E}$ to arbitrary integral domains ${b}$; informally, ${M_E}$ consists of all the “models” or “representations” of ${E}$ (in the spirit of this previous blog post). (There is a technical set-theoretic issue here because the class of integral domains ${R}$ is a proper class, so that ${M_E}$ will also be a proper class; I will completely ignore such technicalities in this post.) We view each such homomorphism ${p: E \rightarrow b}$ as a single point in ${M_E}$. The analogous notion of a path from one point ${p: E \rightarrow b}$ to another ${p': E \rightarrow b'}$ is then a homomorphism ${\gamma: b \rightarrow b'}$ of integral domains, such that ${p'}$ is the composition of ${p}$ with ${\gamma}$. Note that every prime ideal ${I}$ in the spectrum ${\mathrm{Spec}(R)}$ of a commutative ring ${R}$ gives rise to a point ${p_I}$ in the space ${M_R}$ defined here, namely the quotient map ${p_I: R \rightarrow R/I}$ to the ring ${R/I}$, which is an integral domain because ${I}$ is prime. So one can think of ${\mathrm{Spec}(R)}$ as being a distinguished subset of ${M_R}$; alternatively, one can think of ${M_R}$ as a sort of “penumbra” surrounding ${\mathrm{Spec}(R)}$. In particular, when ${E}$ is a field, ${\mathrm{Spec}(E) = \{(0)\}}$ defines a special point ${p_R}$ in ${M_R}$, namely the identity homomorphism ${p_R: R \rightarrow R}$.

Below the fold I would like to record this interpretation of Galois theory, by first revisiting the theory of covering spaces using paths as the basic building block, and then adapting that theory to the theory of field extensions using the spaces indicated above. This is not too far from the usual scheme-theoretic way of phrasing the connection between the two topics (basically I have replaced étale-type points ${p: \mathrm{Spec}(b) \rightarrow \mathrm{Spec}(E)}$ with more classical points ${p: E \rightarrow b}$), but I had not seen it explicitly articulated before, so I am recording it here for my own benefit and for any other readers who may be interested.

How many groups of order four are there? Technically, there are an enormous number, so much so, in fact, that the class of groups of order four is not even a set, but merely a proper class. This is because any four objects ${a,b,c,d}$ can be turned into a group ${\{a,b,c,d\}}$ by designating one of the four objects, say ${a}$, to be the group identity, and imposing a suitable multiplication table (and inversion law) on the four elements in a manner that obeys the usual group axioms. Since all sets are themselves objects, the class of four-element groups is thus at least as large as the class of all sets, which by Russell’s paradox is known not to itself be a set (assuming the usual ZFC axioms of set theory).

A much better question is to ask how many groups of order four there are up to isomorphism, counting each isomorphism class of groups exactly once. Now, as one learns in undergraduate group theory classes, the answer is just “two”: the cyclic group ${C_4}$ and the Klein four-group ${C_2 \times C_2}$.

More generally, given a class of objects ${X}$ and some equivalence relation ${\sim}$ on ${X}$ (which one should interpret as describing the property of two objects in ${X}$ being “isomorphic”), one can consider the number ${|X / \sim|}$ of objects in ${X}$ “up to isomorphism”, which is simply the cardinality of the collection ${X/\sim}$ of equivalence classes ${[x]:=\{y\in X:x \sim {}y \}}$ of ${X}$. In the case where ${X}$ is finite, one can express this cardinality by the formula

$\displaystyle |X/\sim| = \sum_{x \in X} \frac{1}{|[x]|}, \ \ \ \ \ (1)$

thus one counts elements in ${X}$, weighted by the reciprocal of the number of objects they are isomorphic to.

Example 1 Let ${X}$ be the five-element set ${\{-2,-1,0,1,2\}}$ of integers between ${-2}$ and ${2}$. Let us say that two elements ${x, y}$ of ${X}$ are isomorphic if they have the same magnitude: ${x \sim y \iff |x| = |y|}$. Then the quotient space ${X/\sim}$ consists of just three equivalence classes: ${\{-2,2\} = [2] = [-2]}$, ${\{-1,1\} = [-1] = [1]}$, and ${\{0\} = [0]}$. Thus there are three objects in ${X}$ up to isomorphism, and the identity (1) is then just

$\displaystyle 3 = \frac{1}{2} + \frac{1}{2} + 1 + \frac{1}{2} + \frac{1}{2}.$

Thus, to count elements in ${X}$ up to equivalence, the elements ${-2,-1,1,2}$ are given a weight of ${1/2}$ because they are each isomorphic to two elements in ${X}$, while the element ${0}$ is given a weight of ${1}$ because it is isomorphic to just one element in ${X}$ (namely, itself).

Given a finite probability set ${X}$, there is also a natural probability distribution on ${X}$, namely the uniform distribution, according to which a random variable ${\mathbf{x} \in X}$ is set equal to any given element ${x}$ of ${X}$ with probability ${\frac{1}{|X|}}$:

$\displaystyle {\bf P}( \mathbf{x} = x ) = \frac{1}{|X|}.$

Given a notion ${\sim}$ of isomorphism on ${X}$, one can then define the random equivalence class ${[\mathbf{x}] \in X/\sim}$ that the random element ${\mathbf{x}}$ belongs to. But if the isomorphism classes are unequal in size, we now encounter a biasing effect: even if ${\mathbf{x}}$ was drawn uniformly from ${X}$, the equivalence class ${[\mathbf{x}]}$ need not be uniformly distributed in ${X/\sim}$. For instance, in the above example, if ${\mathbf{x}}$ was drawn uniformly from ${\{-2,-1,0,1,2\}}$, then the equivalence class ${[\mathbf{x}]}$ will not be uniformly distributed in the three-element space ${X/\sim}$, because it has a ${2/5}$ probability of being equal to the class ${\{-2,2\}}$ or to the class ${\{-1,1\}}$, and only a ${1/5}$ probability of being equal to the class ${\{0\}}$.

However, it is possible to remove this bias by changing the weighting in (1), and thus changing the notion of what cardinality means. To do this, we generalise the previous situation. Instead of considering sets ${X}$ with an equivalence relation ${\sim}$ to capture the notion of isomorphism, we instead consider groupoids, which are sets ${X}$ in which there are allowed to be multiple isomorphisms between elements in ${X}$ (and in particular, there are allowed to be multiple automorphisms from an element to itself). More precisely:

Definition 2 A groupoid is a set (or proper class) ${X}$, together with a (possibly empty) collection ${\mathrm{Iso}(x \rightarrow y)}$ of “isomorphisms” between any pair ${x,y}$ of elements of ${X}$, and a composition map ${f, g \mapsto g \circ f}$ from isomorphisms ${f \in \mathrm{Iso}(x \rightarrow y)}$, ${g \in \mathrm{Iso}(y \rightarrow z)}$ to isomorphisms in ${\mathrm{Iso}(x \rightarrow z)}$ for every ${x,y,z \in X}$, obeying the following group-like axioms:

• (Identity) For every ${x \in X}$, there is an identity isomorphism ${\mathrm{id}_x \in \mathrm{Iso}(x \rightarrow x)}$, such that ${f \circ \mathrm{id}_x = \mathrm{id}_y \circ f = f}$ for all ${f \in \mathrm{Iso}(x \rightarrow y)}$ and ${x,y \in X}$.
• (Associativity) If ${f \in \mathrm{Iso}(x \rightarrow y)}$, ${g \in \mathrm{Iso}(y \rightarrow z)}$, and ${h \in \mathrm{Iso}(z \rightarrow w)}$ for some ${x,y,z,w \in X}$, then ${h \circ (g \circ f) = (h \circ g) \circ f}$.
• (Inverse) If ${f \in \mathrm{Iso}(x \rightarrow y)}$ for some ${x,y \in X}$, then there exists an inverse isomorphism ${f^{-1} \in \mathrm{Iso}(y \rightarrow x)}$ such that ${f \circ f^{-1} = \mathrm{id}_y}$ and ${f^{-1} \circ f = \mathrm{id}_x}$.

We say that two elements ${x,y}$ of a groupoid are isomorphic, and write ${x \sim y}$, if there is at least one isomorphism from ${x}$ to ${y}$.

Example 3 Any category gives a groupoid by taking ${X}$ to be the set (or class) of objects, and ${\mathrm{Iso}(x \rightarrow y)}$ to be the collection of invertible morphisms from ${x}$ to ${y}$. For instance, in the category ${\mathbf{Set}}$ of sets, ${\mathrm{Iso}(x \rightarrow y)}$ would be the collection of bijections from ${x}$ to ${y}$; in the category ${\mathbf{Vec}/k}$ of linear vector spaces over some given base field ${k}$, ${\mathrm{Iso}(x \rightarrow y)}$ would be the collection of invertible linear transformations from ${x}$ to ${y}$; and so forth.

Every set ${X}$ equipped with an equivalence relation ${\sim}$ can be turned into a groupoid by assigning precisely one isomorphism ${\iota_{x \rightarrow y}}$ from ${x}$ to ${y}$ for any pair ${x,y \in X}$ with ${x \sim y}$, and no isomorphisms from ${x}$ to ${y}$ when ${x \not \sim y}$, with the groupoid operations of identity, composition, and inverse defined in the only way possible consistent with the axioms. We will call this the simply connected groupoid associated with this equivalence relation. For instance, with ${X = \{-2,-1,0,1,2\}}$ as above, if we turn ${X}$ into a simply connected groupoid, there will be precisely one isomorphism from ${2}$ to ${-2}$, and also precisely one isomorphism from ${2}$ to ${2}$, but no isomorphisms from ${2}$ to ${-1}$, ${0}$, or ${1}$.

However, one can also form multiply-connected groupoids in which there can be multiple isomorphisms from one element of ${X}$ to another. For instance, one can view ${X = \{-2,-1,0,1,2\}}$ as a space that is acted on by multiplication by the two-element group ${\{-1,+1\}}$. This gives rise to two types of isomorphisms, an identity isomorphism ${(+1)_x}$ from ${x}$ to ${x}$ for each ${x \in X}$, and a negation isomorphism ${(-1)_x}$ from ${x}$ to ${-x}$ for each ${x \in X}$; in particular, there are two automorphisms of ${0}$ (i.e., isomorphisms from ${0}$ to itself), namely ${(+1)_0}$ and ${(-1)_0}$, whereas the other four elements of ${X}$ only have a single automorphism (the identity isomorphism). One defines composition, identity, and inverse in this groupoid in the obvious fashion (using the group law of the two-element group ${\{-1,+1\}}$); for instance, we have ${(-1)_{-2} \circ (-1)_2 = (+1)_2}$.

For a finite multiply-connected groupoid, it turns out that the natural notion of “cardinality” (or as I prefer to call it, “cardinality up to isomorphism”) is given by the variant

$\displaystyle \sum_{x \in X} \frac{1}{|\{ f: f \in \mathrm{Iso}(x \rightarrow y) \hbox{ for some } y\}|}$

of (1). That is to say, in the multiply connected case, the denominator is no longer the number of objects ${y}$ isomorphic to ${x}$, but rather the number of isomorphisms from ${x}$ to other objects ${y}$. Grouping together all summands coming from a single equivalence class ${[x]}$ in ${X/\sim}$, we can also write this expression as

$\displaystyle \sum_{[x] \in X/\sim} \frac{1}{|\mathrm{Aut}(x)|} \ \ \ \ \ (2)$

where ${\mathrm{Aut}(x) := \mathrm{Iso}(x \rightarrow x)}$ is the automorphism group of ${x}$, that is to say the group of isomorphisms from ${x}$ to itself. (Note that if ${x,x'}$ belong to the same equivalence class ${[x]}$, then the two groups ${\mathrm{Aut}(x)}$ and ${\mathrm{Aut}(x')}$ will be isomorphic and thus have the same cardinality, and so the above expression is well-defined.

For instance, if we take ${X}$ to be the simply connected groupoid on ${\{-2,-1,0,1,2\}}$, then the number of elements of ${X}$ up to isomorphism is

$\displaystyle \frac{1}{2} + \frac{1}{2} + 1 + \frac{1}{2} + \frac{1}{2} = 1 + 1 + 1 = 3$

exactly as before. If however we take the multiply connected groupoid on ${\{-2,-1,0,1,2\}}$, in which ${0}$ has two automorphisms, the number of elements of ${X}$ up to isomorphism is now the smaller quantity

$\displaystyle \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} = 1 + \frac{1}{2} + 1 = \frac{5}{2};$

the equivalence class ${[0]}$ is now counted with weight ${1/2}$ rather than ${1}$ due to the two automorphisms on ${0}$. Geometrically, one can think of this groupoid as being formed by taking the five-element set ${\{-2,-1,0,1,2\}}$, and “folding it in half” around the fixed point ${0}$, giving rise to two “full” quotient points ${[1], [2]}$ and one “half” point ${[0]}$. More generally, given a finite group ${G}$ acting on a finite set ${X}$, and forming the associated multiply connected groupoid, the cardinality up to isomorphism of this groupoid will be ${|X|/|G|}$, since each element ${x}$ of ${X}$ will have ${|G|}$ isomorphisms on it (whether they be to the same element ${x}$, or to other elements of ${X}$).

The definition (2) can also make sense for some infinite groupoids; to my knowledge this was first explicitly done in this paper of Baez and Dolan. Consider for instance the category ${\mathbf{FinSet}}$ of finite sets, with isomorphisms given by bijections as in Example 3. Every finite set is isomorphic to ${\{1,\dots,n\}}$ for some natural number ${n}$, so the equivalence classes of ${\mathbf{FinSet}}$ may be indexed by the natural numbers. The automorphism group ${S_n}$ of ${\{1,\dots,n\}}$ has order ${n!}$, so the cardinality of ${\mathbf{FinSet}}$ up to isomorphism is

$\displaystyle \sum_{n=0}^\infty \frac{1}{n!} = e.$

(This fact is sometimes loosely stated as “the number of finite sets is ${e}$“, but I view this statement as somewhat misleading if the qualifier “up to isomorphism” is not added.) Similarly, when one allows for multiple isomorphisms from a group to itself, the number of groups of order four up to isomorphism is now

$\displaystyle \frac{1}{2} + \frac{1}{6} = \frac{2}{3}$

because the cyclic group ${C_4}$ has two automorphisms, whereas the Klein four-group ${C_2 \times C_2}$ has six.

In the case that the cardinality of a groupoid ${X}$ up to isomorphism is finite and non-empty, one can now define the notion of a random isomorphism class ${[\mathbf{x}]}$ in ${X/\sim}$ drawn “uniformly up to isomorphism”, by requiring the probability of attaining any given isomorphism class ${[x]}$ to be

$\displaystyle {\mathbf P}([\mathbf{x}] = [x]) = \frac{1 / |\mathrm{Aut}(x)|}{\sum_{[y] \in X/\sim} 1/|\mathrm{Aut}(y)|},$

thus the probability of being isomorphic to a given element ${x}$ will be inversely proportional to the number of automorphisms that ${x}$ has. For instance, if we take ${X}$ to be the set ${\{-2,-1,0,1,2\}}$ with the simply connected groupoid, ${[\mathbf{x}]}$ will be drawn uniformly from the three available equivalence classes ${[0], [1], [2]}$, with a ${1/3}$ probability of attaining each; but if instead one uses the multiply connected groupoid coming from the action of ${\{-1,+1\}}$, and draws ${[\mathbf{x}]}$ uniformly up to isomorphism, then ${[1]}$ and ${[2]}$ will now be selected with probability ${2/5}$ each, and ${[0]}$ will be selected with probability ${1/5}$. Thus this distribution has accounted for the bias mentioned previously: if a finite group ${G}$ acts on a finite space ${X}$, and ${\mathbf{x}}$ is drawn uniformly from ${X}$, then ${[\mathbf{x}]}$ now still be drawn uniformly up to isomorphism from ${X/G}$, if we use the multiply connected groupoid coming from the ${G}$ action, rather than the simply connected groupoid coming from just the ${G}$-orbit structure on ${X}$.

Using the groupoid of finite sets, we see that a finite set chosen uniformly up to isomorphism will have a cardinality that is distributed according to the Poisson distribution of parameter ${1}$, that is to say it will be of cardinality ${n}$ with probability ${\frac{e^{-1}}{n!}}$.

One important source of groupoids are the fundamental groupoids ${\pi_1(M)}$ of a manifold ${M}$ (one can also consider more general topological spaces than manifolds, but for simplicity we will restrict this discussion to the manifold case), in which the underlying space is simply ${M}$, and the isomorphisms from ${x}$ to ${y}$ are the equivalence classes of paths from ${x}$ to ${y}$ up to homotopy; in particular, the automorphism group of any given point is just the fundamental group of ${M}$ at that base point. The equivalence class ${[x]}$ of a point in ${M}$ is then the connected component of ${x}$ in ${M}$. The cardinality up to isomorphism of the fundamental groupoid is then

$\displaystyle \sum_{M' \in \pi_0(M)} \frac{1}{|\pi_1(M')|}$

where ${\pi_0(M)}$ is the collection of connected components ${M'}$ of ${M}$, and ${|\pi_1(M')|}$ is the order of the fundamental group of ${M'}$. Thus, simply connected components of ${M}$ count for a full unit of cardinality, whereas multiply connected components (which can be viewed as quotients of their simply connected cover by their fundamental group) will count for a fractional unit of cardinality, inversely to the order of their fundamental group.

This notion of cardinality up to isomorphism of a groupoid behaves well with respect to various basic notions. For instance, suppose one has an ${n}$-fold covering map ${\pi: X \rightarrow Y}$ of one finite groupoid ${Y}$ by another ${X}$. This means that ${\pi}$ is a functor that is surjective, with all preimages of cardinality ${n}$, with the property that given any pair ${y,y'}$ in the base space ${Y}$ and any ${x}$ in the preimage ${\pi^{-1}(\{y\})}$ of ${y}$, every isomorphism ${f \in \mathrm{Iso}(y \rightarrow y')}$ has a unique lift ${\tilde f \in \mathrm{Iso}(x \rightarrow x')}$ from the given initial point ${x}$ (and some ${x'}$ in the preimage of ${y'}$). Then one can check that the cardinality up to isomorphism of ${X}$ is ${n}$ times the cardinality up to isomorphism of ${Y}$, which fits well with the geometric picture of ${X}$ as the ${n}$-fold cover of ${Y}$. (For instance, if one covers a manifold ${M}$ with finite fundamental group by its universal cover, this is a ${|\pi_1(M)|}$-fold cover, the base has cardinality ${1/|\pi_1(M)|}$ up to isomorphism, and the universal cover has cardinality one up to isomorphism.) Related to this, if one draws an equivalence class ${[\mathrm{x}]}$ of ${X}$ uniformly up to isomorphism, then ${\pi([\mathrm{x}])}$ will be an equivalence class of ${Y}$ drawn uniformly up to isomorphism also.

Indeed, one can show that this notion of cardinality up to isomorphism for groupoids is uniquely determined by a small number of axioms such as these (similar to the axioms that determine Euler characteristic); see this blog post of Qiaochu Yuan for details.

The probability distributions on isomorphism classes described by the above recipe seem to arise naturally in many applications. For instance, if one draws a profinite abelian group up to isomorphism at random in this fashion (so that each isomorphism class ${[G]}$ of a profinite abelian group ${G}$ occurs with probability inversely proportional to the number of automorphisms of this group), then the resulting distribution is known as the Cohen-Lenstra distribution, and seems to emerge as the natural asymptotic distribution of many randomly generated profinite abelian groups in number theory and combinatorics, such as the class groups of random quadratic fields; see this previous blog post for more discussion. For a simple combinatorial example, the set of fixed points of a random permutation on ${n}$ elements will have a cardinality that converges in distribution to the Poisson distribution of rate ${1}$ (as discussed in this previous post), thus we see that the fixed points of a large random permutation asymptotically are distributed uniformly up to isomorphism. I’ve been told that this notion of cardinality up to isomorphism is also particularly compatible with stacks (which are a good framework to describe such objects as moduli spaces of algebraic varieties up to isomorphism), though I am not sufficiently acquainted with this theory to say much more than this.

I’ve just uploaded to the arXiv my paper “An integration approach to the Toeplitz square peg problem“, submitted to Forum of Mathematics, Sigma. This paper resulted from my attempts recently to solve the Toeplitz square peg problem (also known as the inscribed square problem):

Conjecture 1 (Toeplitz square peg problem) Let ${\gamma}$ be a simple closed curve in the plane. Is it necessarily the case that ${\gamma}$ contains four vertices of a square?

See this recent survey of Matschke in the Notices of the AMS for the latest results on this problem.

The route I took to the results in this paper was somewhat convoluted. I was motivated to look at this problem after lecturing recently on the Jordan curve theorem in my class. The problem is superficially similar to the Jordan curve theorem in that the result is known (and rather easy to prove) if ${\gamma}$ is sufficiently regular (e.g. if it is a polygonal path), but seems to be significantly more difficult when the curve is merely assumed to be continuous. Roughly speaking, all the known positive results on the problem have proceeded using (in some form or another) tools from homology: note for instance that one can view the conjecture as asking whether the four-dimensional subset ${\gamma^4}$ of the eight-dimensional space ${({\bf R}^2)^4}$ necessarily intersects the four-dimensional space ${\mathtt{Squares} \subset ({\bf R}^2)^4}$ consisting of the quadruples ${(v_1,v_2,v_3,v_4)}$ traversing a square in (say) anti-clockwise order; this space is a four-dimensional linear subspace of ${({\bf R}^2)^4}$, with a two-dimensional subspace of “degenerate” squares ${(v,v,v,v)}$ removed. If one ignores this degenerate subspace, one can use intersection theory to conclude (under reasonable “transversality” hypotheses) that ${\gamma^4}$ intersects ${\mathtt{Squares}}$ an odd number of times (up to the cyclic symmetries of the square), which is basically how Conjecture 1 is proven in the regular case. Unfortunately, if one then takes a limit and considers what happens when ${\gamma}$ is just a continuous curve, the odd number of squares created by these homological arguments could conceivably all degenerate to points, thus blocking one from proving the conjecture in the general case.

Inspired by my previous work on finite time blowup for various PDEs, I first tried looking for a counterexample in the category of (locally) self-similar curves that are smooth (or piecewise linear) away from a single origin where it can oscillate infinitely often; this is basically the smoothest type of curve that was not already covered by previous results. By a rescaling and compactness argument, it is not difficult to see that such a counterexample would exist if there was a counterexample to the following periodic version of the conjecture:

Conjecture 2 (Periodic square peg problem) Let ${\gamma_1, \gamma_2}$ be two disjoint simple closed piecewise linear curves in the cylinder ${({\bf R}/{\bf Z}) \times {\bf R}}$ which have a winding number of one, that is to say they are homologous to the loop ${x \mapsto (x,0)}$ from ${{\bf R}/{\bf Z}}$ to ${({\bf R}/{\bf Z}) \times {\bf R}}$. Then the union of ${\gamma_1}$ and ${\gamma_2}$ contains the four vertices of a square.

In contrast to Conjecture 1, which is known for polygonal paths, Conjecture 2 is still open even under the hypothesis of polygonal paths; the homological arguments alluded to previously now show that the number of inscribed squares in the periodic setting is even rather than odd, which is not enough to conclude the conjecture. (This flipping of parity from odd to even due to an infinite amount of oscillation is reminiscent of the “Eilenberg-Mazur swindle“, discussed in this previous post.)

I therefore tried to construct counterexamples to Conjecture 2. I began perturbatively, looking at curves ${\gamma_1, \gamma_2}$ that were small perturbations of constant functions. After some initial Taylor expansion, I was blocked from forming such a counterexample because an inspection of the leading Taylor coefficients required one to construct a continuous periodic function of mean zero that never vanished, which of course was impossible by the intermediate value theorem. I kept expanding to higher and higher order to try to evade this obstruction (this, incidentally, was when I discovered this cute application of Lagrange reversion) but no matter how high an accuracy I went (I think I ended up expanding to sixth order in a perturbative parameter ${\varepsilon}$ before figuring out what was going on!), this obstruction kept resurfacing again and again. I eventually figured out that this obstruction was being caused by a “conserved integral of motion” for both Conjecture 2 and Conjecture 1, which can in fact be used to largely rule out perturbative constructions. This yielded a new positive result for both conjectures:

Theorem 3

• (i) Conjecture 1 holds when ${\gamma}$ is the union ${\{ (t,f(t)): t \in [t_0,t_1]\} \cup \{ (t,g(t)): t \in [t_0,t_1]\}}$ of the graphs of two Lipschitz functions ${f,g: [t_0,t_1] \rightarrow {\bf R}}$ of Lipschitz constant less than one that agree at the endpoints.
• (ii) Conjecture 2 holds when ${\gamma_1, \gamma_2}$ are graphs of Lipschitz functions ${f: {\bf R}/{\bf Z} \rightarrow {\bf R}, g: {\bf R}/{\bf Z} \rightarrow {\bf R}}$ of Lipschitz constant less than one.

We sketch the proof of Theorem 3(i) as follows (the proof of Theorem 3(ii) is very similar). Let ${\gamma_1: [t_0, t_1] \rightarrow {\bf R}}$ be the curve ${\gamma_1(t) := (t,f(t))}$, thus ${\gamma_1}$ traverses one of the two graphs that comprise ${\gamma}$. For each time ${t \in [t_0,t_1]}$, there is a unique square with first vertex ${\gamma_1(t)}$ (and the other three vertices, traversed in anticlockwise order, denoted ${\gamma_2(t), \gamma_3(t), \gamma_4(t)}$) such that ${\gamma_2(t)}$ also lies in the graph of ${f}$ and ${\gamma_4(t)}$ also lies in the graph of ${g}$ (actually for technical reasons we have to extend ${f,g}$ by constants to all of ${{\bf R}}$ in order for this claim to be true). To see this, we simply rotate the graph of ${g}$ clockwise by ${\frac{\pi}{2}}$ around ${\gamma_1(t)}$, where (by the Lipschitz hypotheses) it must hit the graph of ${f}$ in a unique point, which is ${\gamma_2(t)}$, and which then determines the other two vertices ${\gamma_3(t), \gamma_4(t)}$ of the square. The curve ${\gamma_3(t)}$ has the same starting and ending point as the graph of ${f}$ or ${g}$; using the Lipschitz hypothesis one can show this graph is simple. If the curve ever hits the graph of ${g}$ other than at the endpoints, we have created an inscribed square, so we may assume for contradiction that ${\gamma_3(t)}$ avoids the graph of ${g}$, and hence by the Jordan curve theorem the two curves enclose some non-empty bounded open region ${\Omega}$.

Now for the conserved integral of motion. If we integrate the ${1}$-form ${y\ dx}$ on each of the four curves ${\gamma_1, \gamma_2, \gamma_3, \gamma_4}$, we obtain the identity

$\displaystyle \int_{\gamma_1} y\ dx - \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx - \int_{\gamma_4} y\ dx = 0.$

This identity can be established by the following calculation: one can parameterise

$\displaystyle \gamma_1(t) = (x(t), y(t))$

$\displaystyle \gamma_2(t) = (x(t)+a(t), y(t)+b(t))$

$\displaystyle \gamma_3(t) = (x(t)+a(t)-b(t), y(t)+a(t)+b(t))$

$\displaystyle \gamma_4(t) = (x(t)-b(t), y(t)+a(t))$

for some Lipschitz functions ${x,y,a,b: [t_0,t_1] \rightarrow {\bf R}}$; thus for instance ${\int_{\gamma_1} y\ dx = \int_{t_0}^{t_1} y(t)\ dx(t)}$. Inserting these parameterisations and doing some canceling, one can write the above integral as

$\displaystyle \int_{t_0}^{t_1} d \frac{a(t)^2-b(t)^2}{2}$

which vanishes because ${a(t), b(t)}$ (which represent the sidelengths of the squares determined by ${\gamma_1(t), \gamma_2(t), \gamma_3(t), \gamma_4(t)}$ vanish at the endpoints ${t=t_0,t_1}$.

Using this conserved integral of motion, one can show that

$\displaystyle \int_{\gamma_3} y\ dx = \int_{t_0}^{t_1} g(t)\ dt$

which by Stokes’ theorem then implies that the bounded open region ${\Omega}$ mentioned previously has zero area, which is absurd.

This argument hinged on the curve ${\gamma_3}$ being simple, so that the Jordan curve theorem could apply. Once one left the perturbative regime of curves of small Lipschitz constant, it became possible for ${\gamma_3}$ to be self-crossing, but nevertheless there still seemed to be some sort of integral obstruction. I eventually isolated the problem in the form of a strengthened version of Conjecture 2:

Conjecture 4 (Area formulation of square peg problem) Let ${\gamma_1, \gamma_2, \gamma_3, \gamma_4: {\bf R}/{\bf Z} \rightarrow ({\bf R}/{\bf Z}) \times {\bf R}}$ be simple closed piecewise linear curves of winding number ${1}$ obeying the area identity

$\displaystyle \int_{\gamma_1} y\ dx - \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx - \int_{\gamma_4} y\ dx = 0$

(note the ${1}$-form ${y\ dx}$ is still well defined on the cylinder ${({\bf R}/{\bf Z}) \times {\bf R}}$; note also that the curves ${\gamma_1,\gamma_2,\gamma_3,\gamma_4}$ are allowed to cross each other.) Then there exists a (possibly degenerate) square with vertices (traversed in anticlockwise order) lying on ${\gamma_1, \gamma_2, \gamma_3, \gamma_4}$ respectively.

It is not difficult to see that Conjecture 4 implies Conjecture 2. Actually I believe that the converse implication is at least morally true, in that any counterexample to Conjecture 4 can be eventually transformed to a counterexample to Conjecture 2 and Conjecture 1. The conserved integral of motion argument can establish Conjecture 4 in many cases, for instance if ${\gamma_2,\gamma_4}$ are graphs of functions of Lipschitz constant less than one.

Conjecture 4 has a model special case, when one of the ${\gamma_i}$ is assumed to just be a horizontal loop. In this case, the problem collapses to that of producing an intersection between two three-dimensional subsets of a six-dimensional space, rather than to four-dimensional subsets of an eight-dimensional space. More precisely, some elementary transformations reveal that this special case of Conjecture 4 can be formulated in the following fashion in which the geometric notion of a square is replaced by the additive notion of a triple of real numbers summing to zero:

Conjecture 5 (Special case of area formulation) Let ${\gamma_1, \gamma_2, \gamma_3: {\bf R}/{\bf Z} \rightarrow ({\bf R}/{\bf Z}) \times {\bf R}}$ be simple closed piecewise linear curves of winding number ${1}$ obeying the area identity

$\displaystyle \int_{\gamma_1} y\ dx + \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx = 0.$

Then there exist ${x \in {\bf R}/{\bf Z}}$ and ${y_1,y_2,y_3 \in {\bf R}}$ with ${y_1+y_2+y_3=0}$ such that ${(x,y_i) \in \gamma_i}$ for ${i=1,2,3}$.

This conjecture is easy to establish if one of the curves, say ${\gamma_3}$, is the graph ${\{ (t,f(t)): t \in {\bf R}/{\bf Z}\}}$ of some piecewise linear function ${f: {\bf R}/{\bf Z} \rightarrow {\bf R}}$, since in that case the curve ${\gamma_1}$ and the curve ${\tilde \gamma_2 := \{ (x, -y-f(x)): (x,y) \in \gamma_2 \}}$ enclose the same area in the sense that ${\int_{\gamma_1} y\ dx = \int_{\tilde \gamma_2} y\ dx}$, and hence must intersect by the Jordan curve theorem (otherwise they would enclose a non-zero amount of area between them), giving the claim. But when none of the ${\gamma_1,\gamma_2,\gamma_3}$ are graphs, the situation becomes combinatorially more complicated.

Using some elementary homological arguments (e.g. breaking up closed ${1}$-cycles into closed paths) and working with a generic horizontal slice of the curves, I was able to show that Conjecture 5 was equivalent to a one-dimensional problem that was largely combinatorial in nature, revolving around the sign patterns of various triple sums ${y_{1,a} + y_{2,b} + y_{3,c}}$ with ${y_{1,a}, y_{2,b}, y_{3,c}}$ drawn from various finite sets of reals.

Conjecture 6 (Combinatorial form) Let ${k_1,k_2,k_3}$ be odd natural numbers, and for each ${i=1,2,3}$, let ${y_{i,1},\dots,y_{i,k_i}}$ be distinct real numbers; we adopt the convention that ${y_{i,0}=y_{i,k_i+1}=-\infty}$. Assume the following axioms:

• (i) For any ${1 \leq p \leq k_1, 1 \leq q \leq k_2, 1 \leq r \leq k_3}$, the sums ${y_{1,p} + y_{2,q} + y_{3,r}}$ are non-zero.
• (ii) (Non-crossing) For any ${i=1,2,3}$ and ${0 \leq p < q \leq k_i}$ with the same parity, the pairs ${\{ y_{i,p}, y_{i,p+1}\}}$ and ${\{y_{i,q}, y_{i,q+1}\}}$ are non-crossing in the sense that

$\displaystyle \sum_{a \in \{p,p+1\}} \sum_{b \in \{q,q+1\}} (-1)^{a+b} \mathrm{sgn}( y_{i,a} - y_{i,b} ) = 0.$

• (iii) (Non-crossing sums) For any ${0 \leq p \leq k_1}$, ${0 \leq q \leq k_2}$, ${0 \leq r \leq k_3}$ of the same parity, one has

$\displaystyle \sum_{a \in \{p,p+1\}} \sum_{b \in \{q,q+1\}} \sum_{c \in \{r,r+1\}} (-1)^{a+b+c} \mathrm{sgn}( y_{1,a} + y_{2,b} + y_{3,c} ) = 0.$

Then one has

$\displaystyle \sum_{i=1}^3 \sum_{p=1}^{k_i} (-1)^{p-1} y_{i,p} < 0.$

Roughly speaking, Conjecture 6 and Conjecture 5 are connected by constructing curves ${\gamma_i}$ to connect ${(0, y_{i,p})}$ to ${(0,y_{i,p+1})}$ for ${0 \leq p \leq k+1}$ by various paths, which either lie to the right of the ${y}$ axis (when ${p}$ is odd) or to the left of the ${y}$ axis (when ${p}$ is even). The axiom (ii) is asserting that the numbers ${-\infty, y_{i,1},\dots,y_{i,k_i}}$ are ordered according to the permutation of a meander (formed by gluing together two non-crossing perfect matchings).

Using various ad hoc arguments involving “winding numbers”, it is possible to prove this conjecture in many cases (e.g. if one of the ${k_i}$ is at most ${3}$), to the extent that I have now become confident that this conjecture is true (and have now come full circle from trying to disprove Conjecture 1 to now believing that this conjecture holds also). But it seems that there is some non-trivial combinatorial argument to be made if one is to prove this conjecture; purely homological arguments seem to partially resolve the problem, but are not sufficient by themselves.

While I was not able to resolve the square peg problem, I think these results do provide a roadmap to attacking it, first by focusing on the combinatorial conjecture in Conjecture 6 (or its equivalent form in Conjecture 5), then after that is resolved moving on to Conjecture 4, and then finally to Conjecture 1.

Previous set of notes: Notes 1. Next set of notes: Notes 3.

Having discussed differentiation of complex mappings in the preceding notes, we now turn to the integration of complex maps. We first briefly review the situation of integration of (suitably regular) real functions ${f: [a,b] \rightarrow {\bf R}}$ of one variable. Actually there are three closely related concepts of integration that arise in this setting:

• (i) The signed definite integral ${\int_a^b f(x)\ dx}$, which is usually interpreted as the Riemann integral (or equivalently, the Darboux integral), which can be defined as the limit (if it exists) of the Riemann sums

$\displaystyle \sum_{j=1}^n f(x_j^*) (x_j - x_{j-1}) \ \ \ \ \ (1)$

where ${a = x_0 < x_1 < \dots < x_n = b}$ is some partition of ${[a,b]}$, ${x_j^*}$ is an element of the interval ${[x_{j-1},x_j]}$, and the limit is taken as the maximum mesh size ${\max_{1 \leq j \leq n} |x_j - x_{j-1}|}$ goes to zero (this can be formalised using the concept of a net). It is convenient to adopt the convention that ${\int_b^a f(x)\ dx := - \int_a^b f(x)\ dx}$ for ${a < b}$; alternatively one can interpret ${\int_b^a f(x)\ dx}$ as the limit of the Riemann sums (1), where now the (reversed) partition ${b = x_0 > x_1 > \dots > x_n = a}$ goes leftwards from ${b}$ to ${a}$, rather than rightwards from ${a}$ to ${b}$.

• (ii) The unsigned definite integral ${\int_{[a,b]} f(x)\ dx}$, usually interpreted as the Lebesgue integral. The precise definition of this integral is a little complicated (see e.g. this previous post), but roughly speaking the idea is to approximate ${f}$ by simple functions ${\sum_{i=1}^n c_i 1_{E_i}}$ for some coefficients ${c_i \in {\bf R}}$ and sets ${E_i \subset [a,b]}$, and then approximate the integral ${\int_{[a,b]} f(x)\ dx}$ by the quantities ${\sum_{i=1}^n c_i m(E_i)}$, where ${m(E_i)}$ is the Lebesgue measure of ${E_i}$. In contrast to the signed definite integral, no orientation is imposed or used on the underlying domain of integration, which is viewed as an “undirected” set ${[a,b]}$.
• (iii) The indefinite integral or antiderivative ${\int f(x)\ dx}$, defined as any function ${F: [a,b] \rightarrow {\bf R}}$ whose derivative ${F'}$ exists and is equal to ${f}$ on ${[a,b]}$. Famously, the antiderivative is only defined up to the addition of an arbitrary constant ${C}$, thus for instance ${\int x\ dx = \frac{1}{2} x^2 + C}$.

There are some other variants of the above integrals (e.g. the Henstock-Kurzweil integral, discussed for instance in this previous post), which can handle slightly different classes of functions and have slightly different properties than the standard integrals listed here, but we will not need to discuss such alternative integrals in this course (with the exception of some improper and principal value integrals, which we will encounter in later notes).

The above three notions of integration are closely related to each other. For instance, if ${f: [a,b] \rightarrow {\bf R}}$ is a Riemann integrable function, then the signed definite integral and unsigned definite integral coincide (when the former is oriented correctly), thus

$\displaystyle \int_a^b f(x)\ dx = \int_{[a,b]} f(x)\ dx$

and

$\displaystyle \int_b^a f(x)\ dx = -\int_{[a,b]} f(x)\ dx$

If ${f: [a,b] \rightarrow {\bf R}}$ is continuous, then by the fundamental theorem of calculus, it possesses an antiderivative ${F = \int f(x)\ dx}$, which is well defined up to an additive constant ${C}$, and

$\displaystyle \int_c^d f(x)\ dx = F(d) - F(c)$

for any ${c,d \in [a,b]}$, thus for instance ${\int_a^b f(x)\ dx = F(b) - F(a)}$ and ${\int_b^a f(x)\ dx = F(a) - F(b)}$.
All three of the above integration concepts have analogues in complex analysis. By far the most important notion will be the complex analogue of the signed definite integral, namely the contour integral ${\int_\gamma f(z)\ dz}$, in which the directed line segment from one real number ${a}$ to another ${b}$ is now replaced by a type of curve in the complex plane known as a contour. The contour integral can be viewed as the special case of the more general line integral ${\int_\gamma f(z) dx + g(z) dy}$, that is of particular relevance in complex analysis. There are also analogues of the Lebesgue integral, namely the arclength measure integrals ${\int_\gamma f(z)\ |dz|}$ and the area integrals ${\int_\Omega f(x+iy)\ dx dy}$, but these play only an auxiliary role in the subject. Finally, we still have the notion of an antiderivative ${F(z)}$ (also known as a primitive) of a complex function ${f(z)}$.

As it turns out, the fundamental theorem of calculus continues to hold in the complex plane: under suitable regularity assumptions on a complex function ${f}$ and a primitive ${F}$ of that function, one has

$\displaystyle \int_\gamma f(z)\ dz = F(z_1) - F(z_0)$

whenever ${\gamma}$ is a contour from ${z_0}$ to ${z_1}$ that lies in the domain of ${f}$. In particular, functions ${f}$ that possess a primitive must be conservative in the sense that ${\int_\gamma f(z)\ dz = 0}$ for any closed contour. This property of being conservative is not typical, in that “most” functions ${f}$ will not be conservative. However, there is a remarkable and far-reaching theorem, the Cauchy integral theorem (also known as the Cauchy-Goursat theorem), which asserts that any holomorphic function is conservative, so long as the domain is simply connected (or if one restricts attention to contractible closed contours). We will explore this theorem and several of its consequences in the next set of notes.
Read the rest of this entry »

Bill Thurston, who made fundamental contributions to our understanding of low-dimensional manifolds and related structures, died on Tuesday, aged 65.

Perhaps Thurston’s best known achievement is the proof of the hyperbolisation theorem for Haken manifolds, which showed that 3-manifolds which obeyed a certain number of topological conditions, could always be given a hyperbolic geometry (i.e. a Riemannian metric that made the manifold isometric to a quotient of the hyperbolic 3-space $H^3$).  This difficult theorem connecting the topological and geometric structure of 3-manifolds led Thurston to give his influential geometrisation conjecture, which (in principle, at least) completely classifies the topology of an arbitrary compact 3-manifold as a combination of eight model geometries (now known as Thurston model geometries).  This conjecture has many consequences, including Thurston’s hyperbolisation theorem and (most famously) the Poincaré conjecture.  Indeed, by placing that conjecture in the context of a conceptually appealing general framework, of which many other cases could already be verified, Thurston provided one of the strongest pieces of evidence towards the truth of the Poincaré conjecture, until the work of Grisha Perelman in 2002-2003 proved both the Poincaré conjecture and the geometrisation conjecture by developing Hamilton’s Ricci flow methods.  (There are now several variants of Perelman’s proof of both conjectures; in the proof of geometrisation by Bessieres, Besson, Boileau, Maillot, and Porti, Thurston’s hyperbolisation theorem is a crucial ingredient, allowing one to bypass the need for the theory of Alexandrov spaces in a key step in Perelman’s argument.)

One of my favourite results of Thurston’s is his elegant method for everting the sphere (smoothly turning a sphere $S^2$ in ${\bf R}^3$ inside out without any folds or singularities).  The fact that sphere eversion can be achieved at all is highly unintuitive, and is often referred to as Smale’s paradox, as Stephen Smale was the first to give a proof that such an eversion exists.  However, prior to Thurston’s method, the known constructions for sphere eversion were quite complicated.  Thurston’s method, relying on corrugating and then twisting the sphere, is sufficiently conceptual and geometric that it can in fact be explained quite effectively in non-technical terms, as was done in the following excellent video entitled “Outside In“, and produced by the Geometry Center:

In addition to his direct mathematical research contributions, Thurston was also an amazing mathematical expositor, having the rare knack of being able to describe the process of mathematical thinking in addition to the results of that process and the intuition underlying it.  His wonderful essay “On proof and progress in mathematics“, which I highly recommend, is the quintessential instance of this; more recent examples include his many insightful questions and answers on MathOverflow.

I unfortunately never had the opportunity to meet Thurston in person (although we did correspond a few times online), but I know many mathematicians who have been profoundly influenced by him and his work.  His death is a great loss for mathematics.

A topological space ${X}$ is said to be metrisable if one can find a metric ${d: X \times X \rightarrow [0,+\infty)}$ on it whose open balls ${B(x,r) := \{ y \in X: d(x,y) < r \}}$ generate the topology.

There are some obvious necessary conditions on the space ${X}$ in order for it to be metrisable. For instance, it must be Hausdorff, since all metric spaces are Hausdorff. It must also be first countable, because every point ${x}$ in a metric space has a countable neighbourhood base of balls ${B(x,1/n)}$, ${n=1,2,\ldots}$.

In the converse direction, being Hausdorff and first countable is not always enough to guarantee metrisability, for a variety of reasons. For instance the long line is not metrisable despite being both Hausdorff and first countable, due to a failure of paracompactness, which prevents one from gluing together the local metric structures on this line into a global one. Even after adding in paracompactness, this is still not enough; the real line with the lower limit topology (also known as the Sorgenfrey line) is Hausdorff, first countable, and paracompact, but still not metrisable (because of a failure of second countability despite being separable).

However, there is one important setting in which the Hausdorff and first countability axioms do suffice to give metrisability, and that is the setting of topological groups:

Theorem 1 (Birkhoff-Kakutani theorem) Let ${G}$ be a topological group (i.e. a topological space that is also a group, such that the group operations ${\cdot: G \times G \rightarrow G}$ and ${()^{-1}: G \rightarrow G}$ are continuous). Then ${G}$ is metrisable if and only if it is both Hausdorff and first countable.

Remark 1 It is not hard to show that a topological group is Hausdorff if and only if the singleton set ${\{\hbox{id}\}}$ is closed. More generally, in an arbitrary topological group, it is a good exercise to show that the closure of ${\{\hbox{id}\}}$ is always a closed normal subgroup ${H}$ of ${G}$, whose quotient ${G/H}$ is then a Hausdorff topological group. Because of this, the study of topological groups can usually be reduced immediately to the study of Hausdorff topological groups. (Indeed, in many texts, topological groups are automatically understood to be an abbreviation for “Hausdorff topological group”.)

The standard proof of the Birkhoff-Kakutani theorem (which we have taken from this book of Montgomery and Zippin) relies on the following Urysohn-type lemma:

Lemma 2 (Urysohn-type lemma) Let ${G}$ be a Hausdorff first countable group. Then there exists a bounded continuous function ${f: G \rightarrow [0,1]}$ with the following properties:

• (Unique maximum) ${f(\hbox{id}) = 1}$, and ${f(x) < 1}$ for all ${x \neq \hbox{id}}$.
• (Neighbourhood base) The sets ${\{ x \in G: f(x) > 1-1/n \}}$ for ${n=1,2,\ldots}$ form a neighbourhood base at the identity.
• (Uniform continuity) For every ${\varepsilon > 0}$, there exists an open neighbourhood ${U}$ of the identity such that ${|f(gx)-f(x)| \leq \epsilon}$ for all ${g \in U}$ and ${x \in G}$.

Note that if ${G}$ had a left-invariant metric, then the function ${f(x) := \max( 1 - \hbox{dist}(x,\hbox{id}), 0)}$ would suffice for this lemma, which already gives some indication as to why this lemma is relevant to the Birkhoff-Kakutani theorem.

Let us assume Lemma 2 for now and finish the proof of the Birkhoff-Kakutani theorem. We only prove the difficult direction, namely that a Hausdorff first countable topological group ${G}$ is metrisable. We let ${f}$ be the function from Lemma 2, and define the function ${d_f := G \times G \rightarrow [0,+\infty)}$ by the formula

$\displaystyle d_f( g, h ) := \| \tau_g f - \tau_h f \|_{BC(G)} = \sup_{x \in G} |f(g^{-1} x) - f(h^{-1} x)| \ \ \ \ \ (1)$

where ${BC(G)}$ is the space of bounded continuous functions on ${G}$ (with the supremum norm) and ${\tau_g}$ is the left-translation operator ${\tau_g f(x) := f(g^{-1} x)}$.

Clearly ${d_f}$ obeys the the identity ${d_f(g,g) = 0}$ and symmetry ${d_f(g,h) = d_f(h,g)}$ axioms, and the triangle inequality ${d_f(g,k) \leq d_f(g,h) + d_f(h,k)}$ is also immediate. This already makes ${d_f}$ a pseudometric. In order for ${d_f}$ to be a genuine metric, what is needed is that ${f}$ have no non-trivial translation invariances, i.e. one has ${\tau_g f \neq f}$ for all ${g \neq \hbox{id}}$. But this follows since ${f}$ attains its maximum at exactly one point, namely the group identity ${\hbox{id}}$.

To put it another way: because ${f}$ has no non-trivial translation invariances, the left translation action ${\tau}$ gives an embedding ${g \mapsto \tau_g f}$, and ${G}$ then inherits a metric ${d_f}$ from the metric structure on ${BC(G)}$.

Now we have to check whether the metric ${d_f}$ actually generates the topology. This amounts to verifying two things. Firstly, that every ball ${B(x,r)}$ in this metric is open; and secondly, that every open neighbourhood of a point ${x \in G}$ contains a ball ${B(x,r)}$.

To verify the former claim, it suffices to show that the map ${g \mapsto \tau_g f}$ from ${G}$ to ${BC(G)}$ is continuous, follows from the uniform continuity hypothesis. The second claim follows easily from the neighbourhood base hypothesis, since if ${d_f(g,h) < 1/n}$ then ${f(g^{-1} h) > 1-1/n}$.

Remark 2 The above argument in fact shows that if a group ${G}$ is metrisable, then it admits a left-invariant metric. The idea of using a suitable continuous function ${f}$ to generate a useful metric structure on a topological group is a powerful one, for instance underlying the Gleason lemmas which are fundamental to the solution of Hilbert’s fifth problem. I hope to return to this topic in a future post.

Now we prove Lemma 2. By first countability, we can find a countable neighbourhood base

$\displaystyle V_1 \supset V_2 \supset \ldots \supset \{\hbox{id}\}$

of the identity. As ${G}$ is Hausdorff, we must have

$\displaystyle \bigcap_{n=1}^\infty V_n = \{\hbox{id}\}.$

Using the continuity of the group axioms, we can recursively find a sequence of nested open neighbourhoods of the identity

$\displaystyle U_1 \supset U_{1/2} \supset U_{1/4} \supset \ldots \supset \{\hbox{id}\} \ \ \ \ \ (2)$

such that each ${U_{1/2^n}}$ is symmetric (i.e. ${g \in U_{1/2^n}}$ if and only if ${g^{-1} \in U_{1/2^n}}$), is contained in ${V_n}$, and is such that ${U_{1/2^{n+1}} \cdot U_{1/2^{n+1}} \subset U_{1/2^n}}$ for each ${n \geq 0}$. In particular the ${U_{1/2^n}}$ are also a neighbourhood base of the identity with

$\displaystyle \bigcap_{n=1}^\infty U_{1/2^n} = \{\hbox{id}\}. \ \ \ \ \ (3)$

For every dyadic rational ${a/2^n}$ in ${(0,1)}$, we can now define the open sets ${U_{a/2^n}}$ by setting

$\displaystyle U_{a/2^n} := U_{1/2^{n_k}} \cdot \ldots \cdot U_{1/2^{n_1}}$

where ${a/2^n = 2^{-n_1} + \ldots + 2^{-n_k}}$ is the binary expansion of ${a/2^n}$ with ${1 \leq n_1 < \ldots < n_k}$. By repeated use of the hypothesis ${U_{1/2^{n+1}} \cdot U_{1/2^{n+1}} \subset U_{1/2^n}}$ we see that the ${U_{a/2^n}}$ are increasing in ${a/2^n}$; indeed, we have the inclusion

$\displaystyle U_{1/2^n} \cdot U_{a/2^n} \subset U_{(a+1)/2^n} \ \ \ \ \ (4)$

for all ${n \geq 1}$ and ${1 \leq a < 2^n}$.

We now set

$\displaystyle f(x) := \sup \{ 1 - \frac{a}{2^n}: n \geq 1; 1 \leq a < 2^n; x \in U_{a/2^n} \}$

with the understanding that ${f(x)=0}$ if the supremum is over the empty set. One easily verifies using (4) that ${f}$ is continuous, and furthermore obeys the uniform continuity property. The neighbourhood base property follows since the ${U_{1/2^n}}$ are a neighbourhood base of the identity, and the unique maximum property follows from (3). This proves Lemma 2.

Remark 3 A very similar argument to the one above also establishes that every topological group ${G}$ is completely regular.

Notice that the function ${f}$ constructed in the above argument was localised to the set ${V_1}$. As such, it is not difficult to localise the Birkhoff-Kakutani theorem to local groups. A local group is a topological space ${G}$ equipped with an identity ${\hbox{id}}$, a partially defined inversion operation ${()^{-1}: \Lambda \rightarrow G}$, and a partially defined product operation ${\cdot: \Omega \rightarrow G}$, where ${\Lambda}$, ${\Omega}$ are open subsets of ${G}$ and ${G \times G}$, obeying the following restricted versions of the group axioms:

1. (Continuity) ${\cdot}$ and ${()^{-1}}$ are continuous on their domains of definition.
2. (Identity) For any ${g \in G}$, ${\hbox{id} \cdot g}$ and ${g \cdot \hbox{id}}$ are well-defined and equal to ${g}$.
3. (Inverse) For any ${g \in \Lambda}$, ${g \cdot g^{-1}}$ and ${g^{-1} \cdot g}$ are well-defined and equal to ${\hbox{id}}$. ${\hbox{id}^{-1}}$ is well-defined and equal to ${\hbox{id}}$.
4. (Local associativity) If ${g, h, k \in G}$ are such that ${g \cdot h}$, ${(g \cdot h) \cdot k}$, ${h \cdot k}$, and ${g \cdot (h \cdot k)}$ are all well-defined, then ${(g \cdot h) \cdot k = g \cdot (h \cdot k)}$.

Informally, one can view a local group as a topological group in which the closure axiom has been almost completely dropped, but with all the other axioms retained. A basic way to generate a local group is to start with an ordinary topological group ${G}$ and restrict it to an open neighbourhood ${U}$ of the identity, with ${\Lambda := \{ g \in U: g^{-1} \in U \}}$ and ${\Omega := \{ (g,h) \in U \times U: gh \in U \}}$. However, this is not quite the only way to generate local groups (ultimately because the local associativity axiom does not necessarily imply a (stronger) global associativity axiom in which one considers two different ways to multiply more than three group elements together).

Remark 4 Another important example of a local group is that of a group chunk, in which the sets ${\Lambda}$ and ${\Omega}$ are somehow “generic”; for instance, ${G}$ could be an algebraic variety, ${\Lambda, \Omega}$ Zariski-open, and the group operations birational on their domains of definition. This is somewhat analogous to the notion of a “${99\%}$ group” in additive combinatorics. There are a number of group chunk theorems, starting with a theorem of Weil in the algebraic setting, which roughly speaking assert that a generic portion of a group chunk can be identified with the generic portion of a genuine group.

We then have

Theorem 3 (Birkhoff-Kakutani theorem for local groups) Let ${G}$ be a local group which is Hausdorff and first countable. Then there exists an open neighbourhood ${V_0}$ of the identity which is metrisable.

Proof: (Sketch) It is not difficult to see that in a local group ${G}$, one can find a symmetric neighbourhood ${V_0}$ of the identity such that the product of any ${100}$ (say) elements of ${V_0}$ (multiplied together in any order) are well-defined, which effectively allows us to treat elements of ${V_0}$ as if they belonged to a group for the purposes of simple algebraic manipulation, such as applying the cancellation laws ${gh=gk \implies h=k}$ for ${g,h,k \in V_0}$. Inside this ${V_0}$, one can then repeat the previous arguments and eventually end up with a continuous function ${f \in BC(G)}$ supported in ${V_0}$ obeying the conclusions of Lemma 2 (but in the uniform continuity conclusion, one has to restrict ${x}$ to, say, ${V_0^{10}}$, to avoid issues of ill-definedness). The definition (1) then gives a metric on ${V_0}$ with the required properties, where we make the convention that ${\tau_g f(x)}$ vanishes for ${x \not \in V_0^{10}}$ (say) and ${g \in V_0}$. $\Box$

My motivation for studying local groups is that it turns out that there is a correspondence (first observed by Hrushovski) between the concept of an approximate group in additive combinatorics, and a locally compact local group in topological group theory; I hope to discuss this correspondence further in a subsequent post.

The ham sandwich theorem asserts that, given ${d}$ bounded open sets ${U_1,\ldots,U_d}$ in ${{\bf R}^d}$, there exists a hyperplane ${\{ x \in {\bf R}^d: x \cdot v = c \}}$ that bisects each of these sets ${U_i}$, in the sense that each of the two half-spaces ${\{ x \in {\bf R}^d: x \cdot v < c \}, \{ x \in {\bf R}^d: x \cdot v > c \}}$ on either side of the hyperplane captures exactly half of the volume of ${U_i}$. The shortest proof of this result proceeds by invoking the Borsuk-Ulam theorem.

A useful generalisation of the ham sandwich theorem is the polynomial ham sandwich theorem, which asserts that given ${m}$ bounded open sets ${U_1,\ldots,U_m}$ in ${{\bf R}^d}$, there exists a hypersurface ${\{ x \in {\bf R}^d: Q(x)=0\}}$ of degree ${O_d( m^{1/d} )}$ (thus ${P: {\bf R}^d \rightarrow {\bf R}}$ is a polynomial of degree ${O(m^{1/n})}$ such that the two semi-algebraic sets ${\{ Q > 0 \}}$ and ${\{ Q < 0\}}$ capture half the volume of each of the ${U_i}$. (More precisely, the degree will be at most ${D}$, where ${D}$ is the first positive integer for which ${\binom{D+d}{d}}$ exceeds ${m}$.) This theorem can be deduced from the Borsuk-Ulam theorem in the same manner that the ordinary ham sandwich theorem is (and can also be deduced directly from the ordinary ham sandwich theorem via the Veronese embedding).

The polynomial ham sandwich theorem is a theorem about continuous bodies (bounded open sets), but a simple limiting argument leads one to the following discrete analogue: given ${m}$ finite sets ${S_1,\ldots,S_m}$ in ${{\bf R}^d}$, there exists a hypersurface ${\{ x \in {\bf R}^d: Q(x)=0\}}$ of degree ${O_d( m^{1/d} )}$, such that each of the two semi-algebraic sets ${\{ Q > 0 \}}$ and ${\{ Q < 0\}}$ contain at most half of the points on ${S_i}$ (note that some of the points of ${S_i}$ can certainly lie on the boundary ${\{Q=0\}}$). This can be iterated to give a useful cell decomposition:

Proposition 1 (Cell decomposition) Let ${P}$ be a finite set of points in ${{\bf R}^d}$, and let ${D}$ be a positive integer. Then there exists a polynomial ${Q}$ of degree at most ${D}$, and a decomposition

$\displaystyle {\bf R}^d = \{ Q = 0\} \cup C_1 \cup \ldots \cup C_m$

into the hypersurface ${\{Q=0\}}$ and a collection ${C_1,\ldots,C_m}$ of cells bounded by ${\{P=0\}}$, such that ${m = O_d(D^d)}$, and such that each cell ${C_i}$ contains at most ${O_d( |P|/D^d )}$ points.

A proof is sketched in this previous blog post. The cells in the argument are not necessarily connected (being instead formed by intersecting together a number of semi-algebraic sets such as ${\{ Q > 0\}}$ and ${\{Q<0\}}$), but it is a classical result (established independently by Oleinik-Petrovskii, Milnor, and Thom) that any degree ${D}$ hypersurface ${\{Q=0\}}$ divides ${{\bf R}^d}$ into ${O_d(D^d)}$ connected components, so one can easily assume that the cells are connected if desired. (Incidentally, one does not need the full machinery of the results in the above cited papers – which control not just the number of components, but all the Betti numbers of the complement of ${\{Q=0\}}$ – to get the bound on connected components; one can instead observe that every bounded connected component has a critical point where ${\nabla Q = 0}$, and one can control the number of these points by Bezout’s theorem, after perturbing ${Q}$ slightly to enforce genericity, and then count the unbounded components by an induction on dimension.)

Remark 1 By setting ${D}$ as large as ${O_d(|P|^{1/m})}$, we obtain as a limiting case of the cell decomposition the fact that any finite set ${P}$ of points in ${{\bf R}^d}$ can be captured by a hypersurface of degree ${O_d(|P|^{1/m})}$. This fact is in fact true over arbitrary fields (not just over ${{\bf R}}$), and can be proven by a simple linear algebra argument (see e.g. this previous blog post). However, the cell decomposition is more flexible than this algebraic fact due to the ability to arbitrarily select the degree parameter ${D}$.

The cell decomposition can be viewed as a structural theorem for arbitrary large configurations of points in space, much as the Szemerédi regularity lemma can be viewed as a structural theorem for arbitrary large dense graphs. Indeed, just as many problems in the theory of large dense graphs can be profitably attacked by first applying the regularity lemma and then inspecting the outcome, it now seems that many problems in combinatorial incidence geometry can be attacked by applying the cell decomposition (or a similar such decomposition), with a parameter ${D}$ to be optimised later, to a relevant set of points, and seeing how the cells interact with each other and with the other objects in the configuration (lines, planes, circles, etc.). This strategy was spectacularly illustrated recently with Guth and Katz‘s use of the cell decomposition to resolve the Erdös distinct distance problem (up to logarithmic factors), as discussed in this blog post.

In this post, I wanted to record a simpler (but still illustrative) version of this method (that I learned from Nets Katz), namely to provide yet another proof of the Szemerédi-Trotter theorem in incidence geometry:

Theorem 2 (Szemerédi-Trotter theorem) Given a finite set of points ${P}$ and a finite set of lines ${L}$ in ${{\bf R}^2}$, the set of incidences ${I(P,L):= \{ (p,\ell) \in P \times L: p \in \ell \}}$ has cardinality

$\displaystyle |I(P,L)| \ll |P|^{2/3} |L|^{2/3} + |P| + |L|.$

This theorem has many short existing proofs, including one via crossing number inequalities (as discussed in this previous post) or via a slightly different type of cell decomposition (as discussed here). The proof given below is not that different, in particular, from the latter proof, but I believe it still serves as a good introduction to the polynomial method in combinatorial incidence geometry.

Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “Approximate subgroups of linear groups“, submitted to GAFA. This paper contains (the first part) of the results announced previously by us; the second part of these results, concerning expander groups, will appear subsequently. The release of this paper has been coordinated with the release of a parallel paper by Pyber and Szabo (previously announced within an hour(!) of our own announcement).

Our main result describes (with polynomial accuracy) the “sufficiently Zariski dense” approximate subgroups of simple algebraic groups ${{\bf G}(k)}$, or more precisely absolutely almost simple algebraic groups over ${k}$, such as ${SL_d(k)}$. More precisely, define a ${K}$-approximate subgroup of a genuine group ${G}$ to be a finite symmetric neighbourhood of the identity ${A}$ (thus ${1 \in A}$ and ${A^{-1}=A}$) such that the product set ${A \cdot A}$ can be covered by ${K}$ left-translates (and equivalently, ${K}$ right-translates) of ${A}$.

Let ${k}$ be a field, and let ${\overline{k}}$ be its algebraic closure. For us, an absolutely almost simple algebraic group over ${k}$ is a linear algebraic group ${{\bf G}(k)}$ defined over ${k}$ (i.e. an algebraic subvariety of ${GL_n(k)}$ for some ${n}$ with group operations given by regular maps) which is connected (i.e. irreducible), and such that the completion ${{\bf G}(\overline{k})}$ has no proper normal subgroups of positive dimension (i.e. the only normal subgroups are either finite, or are all of ${{\bf G}(\overline{k})}$. To avoid degeneracies we also require ${{\bf G}}$ to be non-abelian (i.e. not one-dimensional). These groups can be classified in terms of their associated finite-dimensional simple complex Lie algebra, which of course is determined by its Dynkin diagram, together with a choice of weight lattice (and there are only finitely many such choices once the Lie algebra is fixed). However, the exact classification of these groups is not directly used in our work.

Our first main theorem classifies the approximate subgroups ${A}$ of such a group ${{\bf G}(k)}$ in the model case when ${A}$ generates the entire group ${{\bf G}(k)}$, and ${k}$ is finite; they are either very small or very large.

Theorem 1 (Approximate groups that generate) Let ${{\bf G}(k)}$ be an absolutely almost simple algebraic group over ${k}$. If ${k}$ is finite and ${A}$ is a ${K}$-approximate subgroup of ${{\bf G}(k)}$ that generates ${{\bf G}(k)}$, then either ${|A| \leq K^{O(1)}}$ or ${|A| \geq K^{-O(1)} |{\bf G}(k)|}$, where the implied constants depend only on ${{\bf G}}$.

The hypothesis that ${A}$ generates ${{\bf G}(k)}$ cannot be removed completely, since if ${A}$ was a proper subgroup of ${{\bf G}(k)}$ of size intermediate between that of the trivial group and of ${{\bf G}(k)}$, then the conclusion would fail (with ${K=O(1)}$). However, one can relax the hypothesis of generation to that of being sufficiently Zariski-dense in ${{\bf G}(k)}$. More precisely, we have

Theorem 2 (Zariski-dense approximate groups) Let ${{\bf G}(k)}$ be an absolutely almost simple algebraic group over ${k}$. If ${A}$ is a ${K}$-approximate group) is not contained in any proper algebraic subgroup of ${k}$ of complexity at most ${M}$ (where ${M}$ is sufficiently large depending on ${{\bf G}}$), then either ${|A| \leq K^{O(1)}}$ or ${|A| \geq K^{-O(1)} |\langle A \rangle|}$, where the implied constants depend only on ${{\bf G}}$ and ${\langle A \rangle}$ is the group generated by ${A}$.

Here, we say that an algebraic variety has complexity at most ${M}$ if it can be cut out of an ambient affine or projective space of dimension at most ${M}$ by using at most ${M}$ polynomials, each of degree at most ${M}$. (Note that this is not an intrinsic notion of complexity, but will depend on how one embeds the algebraic variety into an ambient space; but we are assuming that our algebraic group ${{\bf G}(k)}$ is a linear group and thus comes with such an embedding.)

In the case when ${k = {\bf C}}$, the second option of this theorem cannot occur since ${{\bf G}({\bf C})}$ is infinite, leading to a satisfactory classification of the Zariski-dense approximate subgroups of almost simple connected algebraic groups over ${{\bf C}}$. On the other hand, every approximate subgroup of ${GL_n({\bf C})}$ is Zariski-dense in some algebraic subgroup, which can be then split as an extension of a semisimple algebraic quotient group by a solvable algebraic group (the radical of the Zariski closure). Pursuing this idea (and glossing over some annoying technical issues relating to connectedness), together with the Freiman theory for solvable groups over ${{\bf C}}$ due to Breuillard and Green, we obtain our third theorem:

Theorem 3 (Freiman’s theorem in ${GL_n({\bf C})}$) Let ${A}$ be a ${K}$-approximate subgroup of ${GL_n({\bf C})}$. Then there exists a nilpotent ${K}$-approximate subgroup ${B}$ of size at most ${K^{O(1)}|A|}$, such that ${A}$ is covered by ${K^{O(1)}}$ translates of ${B}$.

This can be compared with Gromov’s celebrated theorem that any finitely generated group of polynomial growth is virtually nilpotent. Indeed, the above theorem easily implies Gromov’s theorem in the case of finitely generated subgroups of ${GL_n({\bf C})}$.

By fairly standard arguments, the above classification theorems for approximate groups can be used to give bounds on the expansion and diameter of Cayley graphs, for instance one can establish a conjecture of Babai and Seress that connected Cayley graphs on absolutely almost simple groups over a finite field have polylogarithmic diameter at most. Applications to expanders include the result on Suzuki groups mentioned in a previous post; further applications will appear in a forthcoming paper.

Apart from the general structural theory of algebraic groups, and some quantitative analogues of the basic theory of algebraic geometry (which we chose to obtain via ultrafilters, as discussed in this post), we rely on two basic tools. Firstly, we use a version of the pivot argument developed first by Konyagin and Bourgain-Glibichuk-Konyagin in the setting of sum-product estimates, and generalised to more non-commutative settings by Helfgott; this is discussed in this previous post. Secondly, we adapt an argument of Larsen and Pink (which we learned from a paper of Hrushovski) to obtain a sharp bound on the extent to which a sufficiently Zariski-dense approximate groups can concentrate in a (bounded complexity) subvariety; this is discussed at the end of this blog post.

This week I was in my home town of Adelaide, Australia, for the 2009 annual meeting of the Australian Mathematical Society. This was a fairly large meeting (almost 500 participants). One of the highlights of such a large meeting is the ability to listen to plenary lectures in fields adjacent to one’s own, in which speakers can give high-level overviews of a subject without getting too bogged down in the technical details. From the talks here I learned a number of basic things which were well known to experts in that field, but which I had not fully appreciated, and so I wanted to share them here.

The first instance of this was from a plenary lecture by Danny Calegari entitled “faces of the stable commutator length (scl) ball”. One thing I learned from this talk is that in homotopy theory, there is a very close relationship between topological spaces (such as manifolds) on one hand, and groups (and generalisations of groups) on the other, so that homotopy-theoretic questions about the former can often be converted to purely algebraic questions about the latter, and vice versa; indeed, it seems that homotopy theorists almost think of topological spaces and groups as being essentially the same concept, despite looking very different at first glance. To get from a space ${X}$ to a group, one looks at homotopy groups ${\pi_n(X)}$ of that space, and in particular the fundamental group ${\pi_1(X)}$; conversely, to get from a group ${G}$ back to a topological space one can use the Eilenberg-Maclane spaces ${K(G,n)}$ associated to that group (and more generally, a Postnikov tower associated to a sequence of such groups, together with additional data). In Danny’s talk, he gave the following specific example: the problem of finding the least complicated embedded surface with prescribed (and homologically trivial) boundary in a space ${X}$, where “least complicated” is measured by genus (or more precisely, the negative component of Euler characteristic), is essentially equivalent to computing the commutator length of the element in the fundamental group ${\pi(X)}$ corresponding to that boundary (i.e. the least number of commutators one is required to multiply together to express the element); and the stable version of this problem (where one allows the surface to wrap around the boundary ${n}$ times for some large ${n}$, and one computes the asymptotic ratio between the Euler characteristic and ${n}$) is similarly equivalent to computing the stable commutator length of that group element. (Incidentally, there is a simple combinatorial open problem regarding commutator length in the free group, which I have placed on the polymath wiki.)

This theme was reinforced by another plenary lecture by Ezra Getzler entitled “${n}$-groups”, in which he showed how sequences of groups (such as the first ${n}$ homotopy groups ${\pi_1(X),\ldots,\pi_n(X)}$) can be enhanced into a more powerful structure known as an ${n}$-group, which is more complicated to define, requiring the machinery of simplicial complexes, sheaves, and nerves. Nevertheless, this gives a very topological and geometric interpretation of the concept of a group and its generalisations, which are of use in topological quantum field theory, among other things.

Mohammed Abuzaid gave a plenary lecture entitled “Functoriality in homological mirror symmetry”. One thing I learned from this talk was that the (partially conjectural) phenomenon of (homological) mirror symmetry is one of several types of duality, in which the behaviour of maps into one mathematical object ${X}$ (e.g. immersed or embedded curves, surfaces, etc.) are closely tied to the behaviour of maps out of a dual mathematical object ${\hat X}$ (e.g. functionals, vector fields, forms, sections, bundles, etc.). A familiar example of this is in linear algebra: by taking adjoints, a linear map into a vector space ${X}$ can be related to an adjoint linear map mapping out of the dual space ${X^*}$. Here, the behaviour of curves in a two-dimensional symplectic manifold (or more generally, Lagrangian submanifolds in a higher-dimensional symplectic manifold), is tied to the behaviour of holomorphic sections on bundles over a dual algebraic variety, where the precise definition of “behaviour” is category-theoretic, involving some rather complicated gadgets such as the Fukaya category of a symplectic manifold. As with many other applications of category theory, it is not just the individual pairings between an object and its dual which are of interest, but also the relationships between these pairings, as formalised by various functors between categories (and natural transformations between functors). (One approach to mirror symmetry was discussed by Shing-Tung Yau at a distinguished lecture at UCLA, as transcribed in this previous post.)

There was a related theme in a talk by Dennis Gaitsgory entitled “The geometric Langlands program”. From my (very superficial) understanding of the Langlands program, the behaviour of specific maps into a reductive Lie group ${G}$, such as representations in ${G}$ of a fundamental group, étale fundamental group, class group, or Galois group of a global field, is conjecturally tied to specific maps out of a dual reductive Lie group ${\hat G}$, such as irreducible automorphic representations of ${\hat G}$, or of various structures (such as derived categories) attached to vector bundles on ${\hat G}$. There are apparently some tentatively conjectured links (due to Witten?) between Langlands duality and mirror symmetry, but they seem at present to be fairly distinct phenomena (one is topological and geometric, the other is more algebraic and arithmetic). For abelian groups, Langlands duality is closely connected to the much more classical Pontryagin duality in Fourier analysis. (There is an analogue of Fourier analysis for nonabelian groups, namely representation theory, but the link from this to the Langlands program is somewhat murky, at least to me.)

Related also to this was a plenary talk by Akshay Venkatesh, entitled “The Cohen-Lenstra heuristics over global fields”. Here, the question concerned the conjectural behaviour of class groups of quadratic fields, and in particular to explain the numerically observed phenomenon that about ${75.4\%}$ of all quadratic fields ${{\Bbb Q}[\sqrt{d}]}$ (with $d$ prime) enjoy unique factorisation (i.e. have trivial class group). (Class groups, as I learned in these two talks, are arithmetic analogues of the (abelianised) fundamental groups in topology, with Galois groups serving as the analogue of the full fundamental group.) One thing I learned here was that there was a canonical way to randomly generate a (profinite) abelian group, by taking the product of randomly generated finite abelian ${p}$-groups for each prime ${p}$. The way to canonically randomly generate a finite abelian ${p}$-group is to take large integers ${n, D}$, and look at the cokernel of a random homomorphism from ${({\mathbb Z}/p^n{\mathbb Z})^d}$ to ${({\mathbb Z}/p^n{\mathbb Z})^d}$. In the limit ${n,d \rightarrow \infty}$ (or by replacing ${{\mathbb Z}/p^n{\mathbb Z}}$ with the ${p}$-adics and just sending ${d \rightarrow \infty}$), this stabilises and generates any given ${p}$-group ${G}$ with probability

$\displaystyle \frac{1}{|\hbox{Aut}(G)|} \prod_{j=1}^\infty (1 - \frac{1}{p^j}), \ \ \ \ \ (1)$

where ${\hbox{Aut}(G)}$ is the group of automorphisms of ${G}$. In particular this leads to the strange identity

$\displaystyle \sum_G \frac{1}{|\hbox{Aut}(G)|} = \prod_{j=1}^\infty (1 - \frac{1}{p^j})^{-1} \ \ \ \ \ (2)$

where ${G}$ ranges over all ${p}$-groups; I do not know how to prove this identity other than via the above probability computation, the proof of which I give below the fold.

Based on the heuristic that the class group should behave “randomly” subject to some “obvious” constraints, it is expected that a randomly chosen real quadratic field ${{\Bbb Q}[\sqrt{d}]}$ has unique factorisation (i.e. the class group has trivial ${p}$-group component for every ${p}$) with probability

$\displaystyle \prod_{p \hbox{ odd}} \prod_{j=2}^\infty (1 - \frac{1}{p^j}) \approx 0.754,$

whereas a randomly chosen imaginary quadratic field ${{\Bbb Q}[\sqrt{-d}]}$ has unique factorisation with probability

$\displaystyle \prod_{p \hbox{ odd}} \prod_{j=1}^\infty (1 - \frac{1}{p^j}) = 0.$

The former claim is conjectural, whereas the latter claim follows from (for instance) Siegel’s theorem on the size of the class group, as discussed in this previous post. Ellenberg, Venkatesh, and Westerland have recently established some partial results towards the function field analogues of these heuristics.

One of the most important topological concepts in analysis is that of compactness (as discussed for instance in my Companion article on this topic).  There are various flavours of this concept, but let us focus on sequential compactness: a subset E of a topological space X is sequentially compact if every sequence in E has a convergent subsequence whose limit is also in E.  This property allows one to do many things with the set E.  For instance, it allows one to maximise a functional on E:

Proposition 1. (Existence of extremisers)  Let E be a non-empty sequentially compact subset of a topological space X, and let $F: E \to {\Bbb R}$ be a continuous function.  Then the supremum $\sup_{x \in E} f(x)$ is attained at at least one point $x_* \in E$, thus $F(x) \leq F(x_*)$ for all $x \in E$.  (In particular, this supremum is finite.)  Similarly for the infimum.

Proof. Let $-\infty < L \leq +\infty$ be the supremum $L := \sup_{x \in E} F(x)$.  By the definition of supremum (and the axiom of (countable) choice), one can find a sequence $x^{(n)}$ in E such that $F(x^{(n)}) \to L$.  By compactness, we can refine this sequence to a subsequence (which, by abuse of notation, we shall continue to call $x^{(n)}$) such that $x^{(n)}$ converges to a limit x in E.  Since we still have $f(x^{(n)}) \to L$, and f is continuous at x, we conclude that f(x)=L, and the claim for the supremum follows.  The claim for the infimum is similar.  $\Box$

Remark 1. An inspection of the argument shows that one can relax the continuity hypothesis on F somewhat: to attain the supremum, it suffices that F be upper semicontinuous, and to attain the infimum, it suffices that F be lower semicontinuous. $\diamond$

We thus see that sequential compactness is useful, among other things, for ensuring the existence of extremisers.  In finite-dimensional spaces (such as vector spaces), compact sets are plentiful; indeed, the Heine-Borel theorem asserts that every closed and bounded set is compact.  However, once one moves to infinite-dimensional spaces, such as function spaces, then the Heine-Borel theorem fails quite dramatically; most of the closed and bounded sets one encounters in a topological vector space are non-compact, if one insists on using a reasonably “strong” topology.  This causes a difficulty in (among other things) calculus of variations, which is often concerned to finding extremisers to a functional $F: E \to {\Bbb R}$ on a subset E of an infinite-dimensional function space X.

In recent decades, mathematicians have found a number of ways to get around this difficulty.  One of them is to weaken the topology to recover compactness, taking advantage of such results as the Banach-Alaoglu theorem (or its sequential counterpart).  Of course, there is a tradeoff: weakening the topology makes compactness easier to attain, but makes the continuity of F harder to establish.  Nevertheless, if F enjoys enough “smoothing” or “cancellation” properties, one can hope to obtain continuity in the weak topology, allowing one to do things such as locate extremisers.  (The phenomenon that cancellation can lead to continuity in the weak topology is sometimes referred to as compensated compactness.)

Another option is to abandon trying to make all sequences have convergent subsequences, and settle just for extremising sequences to have convergent subsequences, as this would still be enough to retain Theorem 1.  Pursuing this line of thought leads to the Palais-Smale condition, which is a substitute for compactness in some calculus of variations situations.

But in many situations, one cannot weaken the topology to the point where the domain E becomes compact, without destroying the continuity (or semi-continuity) of F, though one can often at least find an intermediate topology (or metric) in which F is continuous, but for which E is still not quite compact.  Thus one can find sequences $x^{(n)}$ in E which do not have any subsequences that converge to a constant element $x \in E$, even in this intermediate metric.  (As we shall see shortly, one major cause of this failure of compactness is the existence of a non-trivial action of a non-compact group G on E; such a group action can cause compensated compactness or the Palais-Smale condition to fail also.)  Because of this, it is a priori conceivable that a continuous function F need not attain its supremum or infimum.

Nevertheless, even though a sequence $x^{(n)}$ does not have any subsequences that converge to a constant x, it may have a subsequence (which we also call $x^{(n)}$) which converges to some non-constant sequence $y^{(n)}$ (in the sense that the distance $d(x^{(n)},y^{(n)})$ between the subsequence and the new sequence in a this intermediate metric), where the approximating sequence $y^{(n)}$ is of a very structured form (e.g. “concentrating” to a point, or “travelling” off to infinity, or a superposition $y^{(n)} = \sum_j y^{(n)}_j$ of several concentrating or travelling profiles of this form).  This weaker form of compactness, in which superpositions of a certain type of profile completely describe all the failures (or defects) of compactness, is known as concentration compactness, and the decomposition $x^{(n)} \approx \sum_j y^{(n)}_j$ of the subsequence is known as the profile decomposition.  In many applications, it is a sufficiently good substitute for compactness that one can still do things like locate extremisers for functionals F –  though one often has to make some additional assumptions of F to compensate for the more complicated nature of the compactness.  This phenomenon was systematically studied by P.L. Lions in the 80s, and found great application in calculus of variations and nonlinear elliptic PDE.  More recently, concentration compactness has been a crucial and powerful tool in the non-perturbative analysis of nonlinear dispersive PDE, in particular being used to locate “minimal energy blowup solutions” or “minimal mass blowup solutions” for such a PDE (analogously to how one can use calculus of variations to find minimal energy solutions to a nonlinear elliptic equation); see for instance this recent survey by Killip and Visan.

In typical applications, the concentration compactness phenomenon is exploited in moderately sophisticated function spaces (such as Sobolev spaces or Strichartz spaces), with the failure of traditional compactness being connected to a moderately complicated group G of symmetries (e.g. the group generated by translations and dilations).  Because of this, concentration compactness can appear to be a rather complicated and technical concept when it is first encountered.  In this note, I would like to illustrate concentration compactness in a simple toy setting, namely in the space $X = l^1({\Bbb Z})$ of absolutely summable sequences, with the uniform ($l^\infty$) metric playing the role of the intermediate metric, and the translation group ${\Bbb Z}$ playing the role of the symmetry group G.  This toy setting is significantly simpler than any model that one would actually use in practice [for instance, in most applications X is a Hilbert space], but hopefully it serves to illuminate this useful concept in a less technical fashion.