You are currently browsing the tag archive for the ‘structure’ tag.

Let ${(X,T,\mu)}$ be a measure-preserving system – a probability space ${(X,\mu)}$ equipped with a measure-preserving translation ${T}$ (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points ${x,y}$ in this space as being “close” if ${y = T^n x}$ for some ${n}$ that is not too large; this allows one to distinguish between “local” structure at a point ${x}$ (in which one only looks at nearby points ${T^n x}$ for moderately large ${n}$) and “global” structure (in which one looks at the entire space ${X}$). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function ${f: X \rightarrow {\bf R}}$ which is locally essentially constant in the sense that ${f(Tx) = f(x)}$ for ${\mu}$-almost every ${x}$, is necessarily globally essentially constant in the sense that there is a constant ${c}$ such that ${f(x) = c}$ for ${\mu}$-almost every ${x}$. A basic consequence of ergodicity is the mean ergodic theorem: if ${f \in L^2(X,\mu)}$, then the averages ${x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)}$ converge in ${L^2}$ norm to the mean ${\int_X f\ d\mu}$. (The mean ergodic theorem also applies to other ${L^p}$ spaces with ${1 < p < \infty}$, though it is usually proven first in the Hilbert space ${L^2}$.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that ${\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)}$ converges to ${\mu(E)^2}$ for any measurable set ${E}$.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let ${(X,T,\mu)}$ be an ergodic measure-preserving system, and let ${f: X \rightarrow {\bf R}}$ be measurable. Suppose that

$\displaystyle \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)$

for some ${0 \leq \delta \leq 1}$. Then there exists a constant ${c}$ such that ${f(x)=c}$ for ${x}$ in a set of measure at least ${\delta}$.

Informally: if ${f}$ is locally constant on pairs ${x, T^n x}$ at least ${\delta}$ of the time, then ${f}$ is globally constant at least ${\delta}$ of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take ${f}$ to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing ${f}$ with (say) the arctangent function, we may assume without loss of generality that ${f}$ is bounded. Let ${k>0}$, and partition ${X}$ as ${\bigcup_{m \in {\bf Z}} E_{m,k}}$, where ${E_{m,k}}$ is the level set

$\displaystyle E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.$

For each ${k}$, only finitely many of the ${E_{m,k}}$ are non-empty. By (1), one has

$\displaystyle \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.$

Using the ergodic theorem, we conclude that

$\displaystyle \sum_m \mu( E_{m,k} )^2 \geq \delta.$

On the other hand, ${\sum_m \mu(E_{m,k}) = 1}$. Thus there exists ${m_k}$ such that ${\mu(E_{m_k,k}) \geq \delta}$, thus

$\displaystyle \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.$

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where ${m_k 2^{-k}}$ converges to a limit ${c}$, then we have

$\displaystyle \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta$

for infinitely many ${k}$, and hence

$\displaystyle \mu( \{ x \in X: f(x) = c \}) \geq \delta.$

The claim follows. $\Box$

Let ${G = (G,+)}$, ${H = (H,+)}$ be additive groups (i.e., groups with an abelian addition group law). A map ${f: G \rightarrow H}$ is a homomorphism if one has

$\displaystyle f(x+y) - f(x) - f(y) = 0$

for all ${x,y \in G}$. A map ${f: G \rightarrow H}$ is an affine homomorphism if one has

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) = 0 \ \ \ \ \ (1)$

for all additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${G}$, by which we mean that ${x_1,x_2,x_3,x_4 \in G}$ and ${x_1-x_2+x_3-x_4=0}$. The two notions are closely related; it is easy to verify that ${f}$ is an affine homomorphism if and only if ${f}$ is the sum of a homomorphism and a constant.

Now suppose that ${H}$ also has a translation-invariant metric ${d}$. A map ${f: G \rightarrow H}$ is said to be a quasimorphism if one has

$\displaystyle f(x+y) - f(x) - f(y) = O(1) \ \ \ \ \ (2)$

for all ${x,y \in G}$, where ${O(1)}$ denotes a quantity at a bounded distance from the origin. Similarly, ${f: G \rightarrow H}$ is an affine quasimorphism if

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) = O(1) \ \ \ \ \ (3)$

for all additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${G}$. Again, one can check that ${f}$ is an affine quasimorphism if and only if it is the sum of a quasimorphism and a constant (with the implied constant of the quasimorphism controlled by the implied constant of the affine quasimorphism). (Since every constant is itself a quasimorphism, it is in fact the case that affine quasimorphisms are quasimorphisms, but now the implied constant in the latter is not controlled by the implied constant of the former.)

“Trivial” examples of quasimorphisms include the sum of a homomorphism and a bounded function. Are there others? In some cases, the answer is no. For instance, suppose we have a quasimorphism ${f: {\bf Z} \rightarrow {\bf R}}$. Iterating (2), we see that ${f(kx) = kf(x) + O(k)}$ for any integer ${x}$ and natural number ${k}$, which we can rewrite as ${f(kx)/kx = f(x)/x + O(1/|x|)}$ for non-zero ${x}$. Also, ${f}$ is Lipschitz. Sending ${k \rightarrow \infty}$, we can verify that ${f(x)/x}$ is a Cauchy sequence as ${x \rightarrow \infty}$ and thus tends to some limit ${\alpha}$; we have ${\alpha = f(x)/x + O(1/x)}$ for ${x \geq 1}$, hence ${f(x) = \alpha x + O(1)}$ for positive ${x}$, and then one can use (2) one last time to obtain ${f(x) = \alpha x + O(1)}$ for all ${x}$. Thus ${f}$ is the sum of the homomorphism ${x \mapsto \alpha x}$ and a bounded sequence.

In general, one can phrase this problem in the language of group cohomology (discussed in this previous post). Call a map ${f: G \rightarrow H}$ a ${0}$-cocycle. A ${1}$-cocycle is a map ${\rho: G \times G \rightarrow H}$ obeying the identity

$\displaystyle \rho(x,y+z) + \rho(y,z) = \rho(x,y) + \rho(x+y,z)$

for all ${x,y,z \in G}$. Given a ${0}$-cocycle ${f: G \rightarrow H}$, one can form its derivative ${\partial f: G \times G \rightarrow H}$ by the formula

$\displaystyle \partial f(x,y) := f(x+y)-f(x)-f(y).$

Such functions are called ${1}$-coboundaries. It is easy to see that the abelian group of ${1}$-coboundaries is a subgroup of the abelian group of ${1}$-cocycles. The quotient of these two groups is the first group cohomology of ${G}$ with coefficients in ${H}$, and is denoted ${H^1(G; H)}$.

If a ${0}$-cocycle is bounded then its derivative is a bounded ${1}$-coboundary. The quotient of the group of bounded ${1}$-cocycles by the derivatives of bounded ${0}$-cocycles is called the bounded first group cohomology of ${G}$ with coefficients in ${H}$, and is denoted ${H^1_b(G; H)}$. There is an obvious homomorphism ${\phi}$ from ${H^1_b(G; H)}$ to ${H^1(G; H)}$, formed by taking a coset of the space of derivatives of bounded ${0}$-cocycles, and enlarging it to a coset of the space of ${1}$-coboundaries. By chasing all the definitions, we see that all quasimorphism from ${G}$ to ${H}$ are the sum of a homomorphism and a bounded function if and only if this homomorphism ${\phi}$ is injective; in fact the quotient of the space of quasimorphisms by the sum of homomorphisms and bounded functions is isomorphic to the kernel of ${\phi}$.

In additive combinatorics, one is often working with functions which only have additive structure a fraction of the time, thus for instance (1) or (3) might only hold “${1\%}$ of the time”. This makes it somewhat difficult to directly interpret the situation in terms of group cohomology. However, thanks to tools such as the Balog-Szemerédi-Gowers lemma, one can upgrade this sort of ${1\%}$-structure to ${100\%}$-structure – at the cost of restricting the domain to a smaller set. Here I record one such instance of this phenomenon, thus giving a tentative link between additive combinatorics and group cohomology. (I thank Yuval Wigderson for suggesting the problem of locating such a link.)

Theorem 1 Let ${G = (G,+)}$, ${H = (H,+)}$ be additive groups with ${|G|=N}$, let ${S}$ be a subset of ${H}$, let ${E \subset G}$, and let ${f: E \rightarrow H}$ be a function such that

$\displaystyle f(x_1) - f(x_2) + f(x_3) - f(x_4) \in S$

for ${\geq K^{-1} N^3}$ additive quadruples ${(x_1,x_2,x_3,x_4)}$ in ${E}$. Then there exists a subset ${A}$ of ${G}$ containing ${0}$ with ${|A| \gg K^{-O(1)} N}$, a subset ${X}$ of ${H}$ with ${|X| \ll K^{O(1)}}$, and a function ${g: 4A-4A \rightarrow H}$ such that

$\displaystyle g(x+y) - g(x)-g(y) \in X + 496S - 496S \ \ \ \ \ (4)$

for all ${x, y \in 2A-2A}$ (thus, the derivative ${\partial g}$ takes values in ${X + 496 S - 496 S}$ on ${2A - 2A}$), and such that for each ${h \in A}$, one has

$\displaystyle f(x+h) - f(x) - g(h) \in 8S - 8S \ \ \ \ \ (5)$

for ${\gg K^{-O(1)} N}$ values of ${x \in E}$.

Presumably the constants ${8}$ and ${496}$ can be improved further, but we have not attempted to optimise these constants. We chose ${2A-2A}$ as the domain on which one has a bounded derivative, as one can use the Bogulybov lemma (see e.g, Proposition 4.39 of my book with Van Vu) to find a large Bohr set inside ${2A-2A}$. In applications, the set ${S}$ need not have bounded size, or even bounded doubling; for instance, in the inverse ${U^4}$ theory over a small finite fields ${F}$, one would be interested in the situation where ${H}$ is the group of ${n \times n}$ matrices with coefficients in ${F}$ (for some large ${n}$, and ${S}$ being the subset consisting of those matrices of rank bounded by some bound ${C = O(1)}$.

Proof: By hypothesis, there are ${\geq K N^3}$ triples ${(h,x,y) \in G^3}$ such that ${x,x+h,y,y+h \in E}$ and

$\displaystyle f(x+h) - f(x) \in f(y+h)-f(y) + S. \ \ \ \ \ (6)$

Thus, there is a set ${B \subset G}$ with ${|B| \gg K^{-1} N}$ such that for all ${h \in B}$, one has (6) for ${\gg K^{-1} N^2}$ pairs ${(x,y) \in G^2}$ with ${x,x+h,y,y+h \in E}$; in particular, there exists ${y = y(h) \in E \cap (E-h)}$ such that (6) holds for ${\gg K^{-1} N}$ values of ${x \in E \cap (E-h)}$. Setting ${g_0(h) := f(y(h)+h) - f(y(h))}$, we conclude that for each ${h \in B}$, one has

$\displaystyle f(x+h) - f(x) \in g_0(h) + S \ \ \ \ \ (7)$

for ${\gg K^{-1} N}$ values of ${x \in E \cap (E-h)}$.

Consider the bipartite graph whose vertex sets are two copies of ${E}$, and ${x}$ and ${x+h}$ connected by a (directed) edge if ${h \in B}$ and (7) holds. Then this graph has ${\gg K^{-2} N^2}$ edges. Applying (a slight modification of) the Balog-Szemerédi-Gowers theorem (for instance by modifying the proof of Corollary 5.19 of my book with Van Vu), we can then find a subset ${C}$ of ${E}$ with ${|C| \gg K^{-O(1)} N}$ with the property that for any ${x_1,x_3 \in C}$, there exist ${\gg K^{-O(1)} N^3}$ triples ${(x_2,y_1,y_2) \in E^3}$ such that the edges ${(x_1,y_1), (x_2,y_1), (x_2,y_2), (x_3,y_2)}$ all lie in this bipartite graph. This implies that, for all ${x_1,x_3 \in C}$, there exist ${\gg K^{-O(1)} N^7}$ septuples ${(x_2,y_1,y_2,z_{11},z_{21},z_{22},z_{32}) \in G^7}$ obeying the constraints

$\displaystyle f(y_j) - f(x_i), f(y_j+z_{ij}) - f(x_i+z_{ij}) \in g_0(y_j-x_i) + S$

and ${y_j, x_i, y_j+z_{ij}, x_i+z_{ij} \in E}$ for ${ij = 11, 21, 22, 32}$. These constraints imply in particular that

$\displaystyle f(x_3) - f(x_1) \in f(x_3+z_{32}) - f(y_2+z_{32}) + f(y_2+z_{22}) - f(x_2+z_{22}) + f(x_2+z_{21}) - f(y_1+z_{21}) + f(y_1+z_{11}) - f(x_1+z_{11}) + 4S - 4S.$

Also observe that

$\displaystyle x_3 - x_1 = (x_3+z_{32}) - (y_2+z_{32}) + (y_2+z_{22}) - (x_2+z_{22}) + (x_2+z_{21}) - (y_1+z_{21}) + (y_1+z_{11}) - (x_1+z_{11}).$

Thus, if ${h \in G}$ and ${x_3,x_1 \in C}$ are such that ${x_3-x_1 = h}$, we see that

$\displaystyle f(w_1) - f(w_2) + f(w_3) - f(w_4) + f(w_5) - f(w_6) + f(w_7) - f(w_8) \in f(x_3) - f(x_1) + 4S - 4S$

for ${\gg K^{-O(1)} N^7}$ octuples ${(w_1,w_2,w_3,w_4,w_5,w_6,w_7,w_8) \in E^8}$ in the hyperplane

$\displaystyle h = w_1 - w_2 + w_3 - w_4 + w_5 - w_6 + w_7 - w_8.$

By the pigeonhole principle, this implies that for any fixed ${h \in G}$, there can be at most ${O(K^{O(1)})}$ sets of the form ${f(x_3)-f(x_1) + 3S-3S}$ with ${x_3-x_1=h}$, ${x_1,x_3 \in C}$ that are pairwise disjoint. Using a greedy algorithm, we conclude that there is a set ${W_h}$ of cardinality ${O(K^{O(1)})}$, such that each set ${f(x_3) - f(x_1) + 3S-3S}$ with ${x_3-x_1=h}$, ${x_1,x_3 \in C}$ intersects ${w+4S -4S}$ for some ${w \in W_h}$, or in other words that

$\displaystyle f(x_3) - f(x_1) \in W_{x_3-x_1} + 8S-8S \ \ \ \ \ (8)$

whenever ${x_1,x_3 \in C}$. In particular,

$\displaystyle \sum_{h \in G} \sum_{w \in W_h} | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in w + 8S-8S \}| \geq |C|^2 \gg K^{-O(1)} N^2.$

This implies that there exists a subset ${A}$ of ${G}$ with ${|A| \gg K^{-O(1)} N}$, and an element ${g_1(h) \in W_h}$ for each ${h \in A}$, such that

$\displaystyle | \{ (x_1,x_3) \in C^2: x_3-x_1 = h; f(x_3) - f(x_1) \in g_1(h) + 8S-8S \}| \gg K^{-O(1)} N \ \ \ \ \ (9)$

for all ${h \in A}$. Note we may assume without loss of generality that ${0 \in A}$ and ${g_1(0)=0}$.

Suppose that ${h_1,\dots,h_{16} \in A}$ are such that

$\displaystyle \sum_{i=1}^{16} (-1)^{i-1} h_i = 0. \ \ \ \ \ (10)$

By construction of ${A}$, and permuting labels, we can find ${\gg K^{-O(1)} N^{16}}$ 16-tuples ${(x_1,\dots,x_{16},y_1,\dots,y_{16}) \in C^{32}}$ such that

$\displaystyle y_i - x_i = (-1)^{i-1} h_i$

and

$\displaystyle f(y_i) - f(x_i) \in (-1)^{i-1} g_i(h) + 8S - 8S$

for ${i=1,\dots,16}$. We sum this to obtain

$\displaystyle f(y_1) + \sum_{i=1}^{15} f(y_{i+1})-f(x_i) - f(x_8) \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 128 S - 128 S$

and hence by (8)

$\displaystyle f(y_1) - f(x_{16}) + \sum_{i=1}^{15} W_{k_i} \in \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) + 248 S - 248 S$

where ${k_i := y_{i+1}-x_i}$. Since

$\displaystyle y_1 - x_{16} + \sum_{i=1}^{15} k_i = 0$

we see that there are only ${N^{16}}$ possible values of ${(y_1,x_{16},k_1,\dots,k_{15})}$. By the pigeonhole principle, we conclude that at most ${O(K^{O(1)})}$ of the sets ${\sum_{i=1}^{16} (-1)^i g_1(h_i) + 248 S - 248 S}$ can be disjoint. Arguing as before, we conclude that there exists a set ${X}$ of cardinality ${O(K^{O(1)})}$ such that

$\displaystyle \sum_{i=1}^{16} (-1)^{i-1} g_1(h_i) \in X + 496 S - 496 S \ \ \ \ \ (11)$

whenever (10) holds.

For any ${h \in 4A-4A}$, write ${h}$ arbitrarily as ${h = \sum_{i=1}^8 (-1)^{i-1} h_i}$ for some ${h_1,\dots,h_8 \in A}$ (with ${h_5=\dots=h_8=0}$ if ${h \in 2A-2A}$, and ${h_2 = \dots = h_8 = 0}$ if ${h \in A}$) and then set

$\displaystyle g(h) := \sum_{i=1}^8 (-1)^i g_1(h_i).$

Then from (11) we have (4). For ${h \in A}$ we have ${g(h) = g_1(h)}$, and (5) then follows from (9). $\Box$

Klaus Roth, who made fundamental contributions to analytic number theory, died this Tuesday, aged 90.

I never met or communicated with Roth personally, but was certainly influenced by his work; he wrote relatively few papers, but they tended to have outsized impact. For instance, he was one of the key people (together with Bombieri) to work on simplifying and generalising the large sieve, taking it from the technically formidable original formulation of Linnik and Rényi to the clean and general almost orthogonality principle that we have today (discussed for instance in these lecture notes of mine). The paper of Roth that had the most impact on my own personal work was his three-page paper proving what is now known as Roth’s theorem on arithmetic progressions:

Theorem 1 (Roth’s theorem on arithmetic progressions) Let ${A}$ be a set of natural numbers of positive upper density (thus ${\limsup_{N \rightarrow\infty} |A \cap \{1,\dots,N\}|/N > 0}$). Then ${A}$ contains infinitely many arithmetic progressions ${a,a+r,a+2r}$ of length three (with ${r}$ non-zero of course).

At the heart of Roth’s elegant argument was the following (surprising at the time) dichotomy: if ${A}$ had some moderately large density within some arithmetic progression ${P}$, either one could use Fourier-analytic methods to detect the presence of an arithmetic progression of length three inside ${A \cap P}$, or else one could locate a long subprogression ${P'}$ of ${P}$ on which ${A}$ had increased density. Iterating this dichotomy by an argument now known as the density increment argument, one eventually obtains Roth’s theorem, no matter which side of the dichotomy actually holds. This argument (and the many descendants of it), based on various “dichotomies between structure and randomness”, became essential in many other results of this type, most famously perhaps in Szemerédi’s proof of his celebrated theorem on arithmetic progressions that generalised Roth’s theorem to progressions of arbitrary length. More recently, my recent work on the Chowla and Elliott conjectures that was a crucial component of the solution of the Erdös discrepancy problem, relies on an entropy decrement argument which was directly inspired by the density increment argument of Roth.

The Erdös discrepancy problem also is connected with another well known theorem of Roth:

Theorem 2 (Roth’s discrepancy theorem for arithmetic progressions) Let ${f(1),\dots,f(n)}$ be a sequence in ${\{-1,+1\}}$. Then there exists an arithmetic progression ${a+r, a+2r, \dots, a+kr}$ in ${\{1,\dots,n\}}$ with ${r}$ positive such that

$\displaystyle |\sum_{j=1}^k f(a+jr)| \geq c n^{1/4}$

for an absolute constant ${c>0}$.

In fact, Roth proved a stronger estimate regarding mean square discrepancy, which I am not writing down here; as with the Roth theorem in arithmetic progressions, his proof was short and Fourier-analytic in nature (although non-Fourier-analytic proofs have since been found, for instance the semidefinite programming proof of Lovasz). The exponent ${1/4}$ is known to be sharp (a result of Matousek and Spencer).

As a particular corollary of the above theorem, for an infinite sequence ${f(1), f(2), \dots}$ of signs, the sums ${|\sum_{j=1}^k f(a+jr)|}$ are unbounded in ${a,r,k}$. The Erdös discrepancy problem asks whether the same statement holds when ${a}$ is restricted to be zero. (Roth also established discrepancy theorems for other sets, such as rectangles, which will not be discussed here.)

Finally, one has to mention Roth’s most famous result, cited for instance in his Fields medal citation:

Theorem 3 (Roth’s theorem on Diophantine approximation) Let ${\alpha}$ be an irrational algebraic number. Then for any ${\varepsilon > 0}$ there is a quantity ${c_{\alpha,\varepsilon}}$ such that

$\displaystyle |\alpha - \frac{a}{q}| > \frac{c_{\alpha,\varepsilon}}{q^{2+\varepsilon}}.$

From the Dirichlet approximation theorem (or from the theory of continued fractions) we know that the exponent ${2+\varepsilon}$ in the denominator cannot be reduced to ${2}$ or below. A classical and easy theorem of Liouville gives the claim with the exponent ${2+\varepsilon}$ replaced by the degree of the algebraic number ${\alpha}$; work of Thue and Siegel reduced this exponent, but Roth was the one who obtained the near-optimal result. An important point is that the constant ${c_{\alpha,\varepsilon}}$ is ineffective – it is a major open problem in Diophantine approximation to produce any bound significantly stronger than Liouville’s theorem with effective constants. This is because the proof of Roth’s theorem does not exclude any single rational ${a/q}$ from being close to ${\alpha}$, but instead very ingeniously shows that one cannot have two different rationals ${a/q}$, ${a'/q'}$ that are unusually close to ${\alpha}$, even when the denominators ${q,q'}$ are very different in size. (I refer to this sort of argument as a “dueling conspiracies” argument; they are strangely prevalent throughout analytic number theory.)

An abstract finite-dimensional complex Lie algebra, or Lie algebra for short, is a finite-dimensional complex vector space ${{\mathfrak g}}$ together with an anti-symmetric bilinear form ${[,] = [,]_{\mathfrak g}: {\mathfrak g} \times {\mathfrak g} \rightarrow {\mathfrak g}}$ that obeys the Jacobi identity

$\displaystyle [[x,y],z] + [[y,z],x] + [[z,x],y] = 0 \ \ \ \ \ (1)$

for all ${x,y,z \in {\mathfrak g}}$; by anti-symmetry one can also rewrite the Jacobi identity as

$\displaystyle [x,[y,z]] = [[x,y],z] + [y,[x,z]]. \ \ \ \ \ (2)$

We will usually omit the subscript from the Lie bracket ${[,]_{\mathfrak g}}$ when this will not cause ambiguity. A homomorphism ${\phi: {\mathfrak g} \rightarrow {\mathfrak h}}$ between two Lie algebras ${{\mathfrak g},{\mathfrak h}}$ is a linear map that respects the Lie bracket, thus ${\phi([x,y]_{\mathfrak g}) =[\phi(x),\phi(y)]_{\mathfrak h}}$ for all ${x,y \in {\mathfrak g}}$. As with many other classes of mathematical objects, the class of Lie algebras together with their homomorphisms then form a category. One can of course also consider Lie algebras in infinite dimension or over other fields, but we will restrict attention throughout these notes to the finite-dimensional complex case. The trivial, zero-dimensional Lie algebra is denoted ${0}$; Lie algebras of positive dimension will be called non-trivial.

Lie algebras come up in many contexts in mathematics, in particular arising as the tangent space of complex Lie groups. It is thus very profitable to think of Lie algebras as being the infinitesimal component of a Lie group, and in particular almost all of the notation and concepts that are applicable to Lie groups (e.g. nilpotence, solvability, extensions, etc.) have infinitesimal counterparts in the category of Lie algebras (often with exactly the same terminology). See this previous blog post for more discussion about the connection between Lie algebras and Lie groups (that post was focused over the reals instead of the complexes, but much of the discussion carries over to the complex case).

A particular example of a Lie algebra is the general linear Lie algebra ${{\mathfrak{gl}}(V)}$ of linear transformations ${x: V \rightarrow V}$ on a finite-dimensional complex vector space (or vector space for short) ${V}$, with the commutator Lie bracket ${[x,y] := xy-yx}$; one easily verifies that this is indeed an abstract Lie algebra. We will define a concrete Lie algebra to be a Lie algebra that is a subalgebra of ${{\mathfrak{gl}}(V)}$ for some vector space ${V}$, and similarly define a representation of a Lie algebra ${{\mathfrak g}}$ to be a homomorphism ${\rho: {\mathfrak g} \rightarrow {\mathfrak h}}$ into a concrete Lie algebra ${{\mathfrak h}}$. It is a deep theorem of Ado (discussed in this previous post) that every abstract Lie algebra is in fact isomorphic to a concrete one (or equivalently, that every abstract Lie algebra has a faithful representation), but we will not need or prove this fact here.

Even without Ado’s theorem, though, the structure of abstract Lie algebras is very well understood. As with objects in many other algebraic categories, a basic way to understand a Lie algebra ${{\mathfrak g}}$ is to factor it into two simpler algebras ${{\mathfrak h}, {\mathfrak k}}$ via a short exact sequence

$\displaystyle 0 \rightarrow {\mathfrak h} \rightarrow {\mathfrak g} \rightarrow {\mathfrak k} \rightarrow 0, \ \ \ \ \ (3)$

thus one has an injective homomorphism from ${{\mathfrak h}}$ to ${{\mathfrak g}}$ and a surjective homomorphism from ${{\mathfrak g}}$ to ${{\mathfrak k}}$ such that the image of the former homomorphism is the kernel of the latter. (To be pedantic, a short exact sequence in a general category requires these homomorphisms to be monomorphisms and epimorphisms respectively, but in the category of Lie algebras these turn out to reduce to the more familiar concepts of injectivity and surjectivity respectively.) Given such a sequence, one can (non-uniquely) identify ${{\mathfrak g}}$ with the vector space ${{\mathfrak h} \times {\mathfrak k}}$ equipped with a Lie bracket of the form

$\displaystyle [(t,x), (s,y)]_{\mathfrak g} = ([t,s]_{\mathfrak h} + A(t,y) - A(s,x) + B(x,y), [x,y]_{\mathfrak k}) \ \ \ \ \ (4)$

for some bilinear maps ${A: {\mathfrak h} \times {\mathfrak k} \rightarrow {\mathfrak h}}$ and ${B: {\mathfrak k} \times {\mathfrak k} \rightarrow {\mathfrak h}}$ that obey some Jacobi-type identities which we will not record here. Understanding exactly what maps ${A,B}$ are possible here (up to coordinate change) can be a difficult task (and is one of the key objectives of Lie algebra cohomology), but in principle at least, the problem of understanding ${{\mathfrak g}}$ can be reduced to that of understanding that of its factors ${{\mathfrak k}, {\mathfrak h}}$. To emphasise this, I will (perhaps idiosyncratically) express the existence of a short exact sequence (3) by the ATLAS-type notation

$\displaystyle {\mathfrak g} = {\mathfrak h} . {\mathfrak k} \ \ \ \ \ (5)$

although one should caution that for given ${{\mathfrak h}}$ and ${{\mathfrak k}}$, there can be multiple non-isomorphic ${{\mathfrak g}}$ that can form a short exact sequence with ${{\mathfrak h},{\mathfrak k}}$, so that ${{\mathfrak h} . {\mathfrak k}}$ is not a uniquely defined combination of ${{\mathfrak h}}$ and ${{\mathfrak k}}$; one could emphasise this by writing ${{\mathfrak h} ._{A,B} {\mathfrak k}}$ instead of ${{\mathfrak h} . {\mathfrak k}}$, though we will not do so here. We will refer to ${{\mathfrak g}}$ as an extension of ${{\mathfrak k}}$ by ${{\mathfrak h}}$, and read the notation (5) as “ ${{\mathfrak g}}$ is ${{\mathfrak h}}$-by-${{\mathfrak k}}$“; confusingly, these two notations reverse the subject and object of “by”, but unfortunately both notations are well entrenched in the literature. We caution that the operation ${.}$ is not commutative, and it is only partly associative: every Lie algebra of the form ${{\mathfrak k} . ({\mathfrak h} . {\mathfrak l})}$ is also of the form ${({\mathfrak k} . {\mathfrak h}) . {\mathfrak l}}$, but the converse is not true (see this previous blog post for some related discussion). As we are working in the infinitesimal world of Lie algebras (which have an additive group operation) rather than Lie groups (in which the group operation is usually written multiplicatively), it may help to think of ${{\mathfrak h} . {\mathfrak k}}$ as a (twisted) “sum” of ${{\mathfrak h}}$ and ${{\mathfrak k}}$ rather than a “product”; for instance, we have ${{\mathfrak g} = 0 . {\mathfrak g}}$ and ${{\mathfrak g} = {\mathfrak g} . 0}$, and also ${\dim {\mathfrak h} . {\mathfrak k} = \dim {\mathfrak h} + \dim {\mathfrak k}}$.

Special examples of extensions ${{\mathfrak h} .{\mathfrak k}}$ of ${{\mathfrak k}}$ by ${{\mathfrak h}}$ include the direct sum (or direct product) ${{\mathfrak h} \oplus {\mathfrak k}}$ (also denoted ${{\mathfrak h} \times {\mathfrak k}}$), which is given by the construction (4) with ${A}$ and ${B}$ both vanishing, and the split extension (or semidirect product) ${{\mathfrak h} : {\mathfrak k} = {\mathfrak h} :_\rho {\mathfrak k}}$ (also denoted ${{\mathfrak h} \ltimes {\mathfrak k} = {\mathfrak h} \ltimes_\rho {\mathfrak k}}$), which is given by the construction (4) with ${B}$ vanishing and the bilinear map ${A: {\mathfrak h} \times {\mathfrak k} \rightarrow {\mathfrak h}}$ taking the form

$\displaystyle A( t, x ) = \rho(x)(t)$

for some representation ${\rho: {\mathfrak k} \rightarrow \hbox{Der} {\mathfrak h}}$ of ${{\mathfrak k}}$ in the concrete Lie algebra of derivations ${\hbox{Der} {\mathfrak h} \subset {\mathfrak{gl}}({\mathfrak h})}$ of ${{\mathfrak h}}$, that is to say the algebra of linear maps ${D: {\mathfrak h} \rightarrow {\mathfrak h}}$ that obey the Leibniz rule

$\displaystyle D[s,t]_{\mathfrak h} = [Ds,t]_{\mathfrak h} + [s,Dt]_{\mathfrak h}$

for all ${s,t \in {\mathfrak h}}$. (The derivation algebra ${\hbox{Der} {\mathfrak g}}$ of a Lie algebra ${{\mathfrak g}}$ is analogous to the automorphism group ${\hbox{Aut}(G)}$ of a Lie group ${G}$, with the two concepts being intertwined by the tangent space functor ${G \mapsto {\mathfrak g}}$ from Lie groups to Lie algebras (i.e. the derivation algebra is the infinitesimal version of the automorphism group). Of course, this functor also intertwines the Lie algebra and Lie group versions of most of the other concepts discussed here, such as extensions, semidirect products, etc.)

There are two general ways to factor a Lie algebra ${{\mathfrak g}}$ as an extension ${{\mathfrak h} . {\mathfrak k}}$ of a smaller Lie algebra ${{\mathfrak k}}$ by another smaller Lie algebra ${{\mathfrak h}}$. One is to locate a Lie algebra ideal (or ideal for short) ${{\mathfrak h}}$ in ${{\mathfrak g}}$, thus ${[{\mathfrak h},{\mathfrak g}] \subset {\mathfrak h}}$, where ${[{\mathfrak h},{\mathfrak g}]}$ denotes the Lie algebra generated by ${\{ [x,y]: x \in {\mathfrak h}, y \in {\mathfrak g} \}}$, and then take ${{\mathfrak k}}$ to be the quotient space ${{\mathfrak g}/{\mathfrak h}}$ in the usual manner; one can check that ${{\mathfrak h}}$, ${{\mathfrak k}}$ are also Lie algebras and that we do indeed have a short exact sequence

$\displaystyle {\mathfrak g} = {\mathfrak h} . ({\mathfrak g}/{\mathfrak h}).$

Conversely, whenever one has a factorisation ${{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}$, one can identify ${{\mathfrak h}}$ with an ideal in ${{\mathfrak g}}$, and ${{\mathfrak k}}$ with the quotient of ${{\mathfrak g}}$ by ${{\mathfrak h}}$.

The other general way to obtain such a factorisation is is to start with a homomorphism ${\rho: {\mathfrak g} \rightarrow {\mathfrak m}}$ of ${{\mathfrak g}}$ into another Lie algebra ${{\mathfrak m}}$, take ${{\mathfrak k}}$ to be the image ${\rho({\mathfrak g})}$ of ${{\mathfrak g}}$, and ${{\mathfrak h}}$ to be the kernel ${\hbox{ker} \rho := \{ x \in {\mathfrak g}: \rho(x) = 0 \}}$. Again, it is easy to see that this does indeed create a short exact sequence:

$\displaystyle {\mathfrak g} = \hbox{ker} \rho . \rho({\mathfrak g}).$

Conversely, whenever one has a factorisation ${{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}$, one can identify ${{\mathfrak k}}$ with the image of ${{\mathfrak g}}$ under some homomorphism, and ${{\mathfrak h}}$ with the kernel of that homomorphism. Note that if a representation ${\rho: {\mathfrak g} \rightarrow {\mathfrak m}}$ is faithful (i.e. injective), then the kernel is trivial and ${{\mathfrak g}}$ is isomorphic to ${\rho({\mathfrak g})}$.

Now we consider some examples of factoring some class of Lie algebras into simpler Lie algebras. The easiest examples of Lie algebras to understand are the abelian Lie algebras ${{\mathfrak g}}$, in which the Lie bracket identically vanishes. Every one-dimensional Lie algebra is automatically abelian, and thus isomorphic to the scalar algebra ${{\bf C}}$. Conversely, by using an arbitrary linear basis of ${{\mathfrak g}}$, we see that an abelian Lie algebra is isomorphic to the direct sum of one-dimensional algebras. Thus, a Lie algebra is abelian if and only if it is isomorphic to the direct sum of finitely many copies of ${{\bf C}}$.

Now consider a Lie algebra ${{\mathfrak g}}$ that is not necessarily abelian. We then form the derived algebra ${[{\mathfrak g},{\mathfrak g}]}$; this algebra is trivial if and only if ${{\mathfrak g}}$ is abelian. It is easy to see that ${[{\mathfrak h},{\mathfrak k}]}$ is an ideal whenever ${{\mathfrak h},{\mathfrak k}}$ are ideals, so in particular the derived algebra ${[{\mathfrak g},{\mathfrak g}]}$ is an ideal and we thus have the short exact sequence

$\displaystyle {\mathfrak g} = [{\mathfrak g},{\mathfrak g}] . ({\mathfrak g}/[{\mathfrak g},{\mathfrak g}]).$

The algebra ${{\mathfrak g}/[{\mathfrak g},{\mathfrak g}]}$ is the maximal abelian quotient of ${{\mathfrak g}}$, and is known as the abelianisation of ${{\mathfrak g}}$. If it is trivial, we call the Lie algebra perfect. If instead it is non-trivial, then the derived algebra has strictly smaller dimension than ${{\mathfrak g}}$. From this, it is natural to associate two series to any Lie algebra ${{\mathfrak g}}$, the lower central series

$\displaystyle {\mathfrak g}_1 = {\mathfrak g}; {\mathfrak g}_2 := [{\mathfrak g}, {\mathfrak g}_1]; {\mathfrak g}_3 := [{\mathfrak g}, {\mathfrak g}_2]; \ldots$

and the derived series

$\displaystyle {\mathfrak g}^{(1)} := {\mathfrak g}; {\mathfrak g}^{(2)} := [{\mathfrak g}^{(1)}, {\mathfrak g}^{(1)}]; {\mathfrak g}^{(3)} := [{\mathfrak g}^{(2)}, {\mathfrak g}^{(2)}]; \ldots.$

By induction we see that these are both decreasing series of ideals of ${{\mathfrak g}}$, with the derived series being slightly smaller (${{\mathfrak g}^{(k)} \subseteq {\mathfrak g}_k}$ for all ${k}$). We say that a Lie algebra is nilpotent if its lower central series is eventually trivial, and solvable if its derived series eventually becomes trivial. Thus, abelian Lie algebras are nilpotent, and nilpotent Lie algebras are solvable, but the converses are not necessarily true. For instance, in the general linear group ${{\mathfrak{gl}}_n = {\mathfrak{gl}}({\bf C}^n)}$, which can be identified with the Lie algebra of ${n \times n}$ complex matrices, the subalgebra ${{\mathfrak n}}$ of strictly upper triangular matrices is nilpotent (but not abelian for ${n \geq 3}$), while the subalgebra ${{\mathfrak n}}$ of upper triangular matrices is solvable (but not nilpotent for ${n \geq 2}$). It is also clear that any subalgebra of a nilpotent algebra is nilpotent, and similarly for solvable or abelian algebras.

From the above discussion we see that a Lie algebra is solvable if and only if it can be represented by a tower of abelian extensions, thus

$\displaystyle {\mathfrak g} = {\mathfrak a}_1 . ({\mathfrak a}_2 . \ldots ({\mathfrak a}_{k-1} . {\mathfrak a}_k) \ldots )$

for some abelian ${{\mathfrak a}_1,\ldots,{\mathfrak a}_k}$. Similarly, a Lie algebra ${{\mathfrak g}}$ is nilpotent if it is expressible as a tower of central extensions (so that in all the extensions ${{\mathfrak h} . {\mathfrak k}}$ in the above factorisation, ${{\mathfrak h}}$ is central in ${{\mathfrak h} . {\mathfrak k}}$, where we say that ${{\mathfrak h}}$ is central in ${{\mathfrak g}}$ if ${[{\mathfrak h},{\mathfrak g}]=0}$). We also see that an extension ${{\mathfrak h} . {\mathfrak k}}$ is solvable if and only of both factors ${{\mathfrak h}, {\mathfrak k}}$ are solvable. Splitting abelian algebras into cyclic (i.e. one-dimensional) ones, we thus see that a finite-dimensional Lie algebra is solvable if and only if it is polycylic, i.e. it can be represented by a tower of cyclic extensions.

For our next fundamental example of using short exact sequences to split a general Lie algebra into simpler objects, we observe that every abstract Lie algebra ${{\mathfrak g}}$ has an adjoint representation ${\hbox{ad}: {\mathfrak g} \rightarrow \hbox{ad} {\mathfrak g} \subset {\mathfrak{gl}}({\mathfrak g})}$, where for each ${x \in {\mathfrak g}}$, ${\hbox{ad} x \in {\mathfrak{gl}}({\mathfrak g})}$ is the linear map ${(\hbox{ad} x)(y) := [x,y]}$; one easily verifies that this is indeed a representation (indeed, (2) is equivalent to the assertion that ${\hbox{ad} [x,y] = [\hbox{ad} x, \hbox{ad} y]}$ for all ${x,y \in {\mathfrak g}}$). The kernel of this representation is the center ${Z({\mathfrak g}) := \{ x \in {\mathfrak g}: [x,{\mathfrak g}] = 0\}}$, which the maximal central subalgebra of ${{\mathfrak g}}$. We thus have the short exact sequence

$\displaystyle {\mathfrak g} = Z({\mathfrak g}) . \hbox{ad} g \ \ \ \ \ (6)$

which, among other things, shows that every abstract Lie algebra is a central extension of a concrete Lie algebra (which can serve as a cheap substitute for Ado’s theorem mentioned earlier).

For our next fundamental decomposition of Lie algebras, we need some more definitions. A Lie algebra ${{\mathfrak g}}$ is simple if it is non-abelian and has no ideals other than ${0}$ and ${{\mathfrak g}}$; thus simple Lie algebras cannot be factored ${{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}$ into strictly smaller algebras ${{\mathfrak h},{\mathfrak k}}$. In particular, simple Lie algebras are automatically perfect and centerless. We have the following fundamental theorem:

Theorem 1 (Equivalent definitions of semisimplicity) Let ${{\mathfrak g}}$ be a Lie algebra. Then the following are equivalent:

• (i) ${{\mathfrak g}}$ does not contain any non-trivial solvable ideal.
• (ii) ${{\mathfrak g}}$ does not contain any non-trivial abelian ideal.
• (iii) The Killing form ${K: {\mathfrak g} \times {\mathfrak g} \rightarrow {\bf C}}$, defined as the bilinear form ${K(x,y) := \hbox{tr}_{\mathfrak g}( (\hbox{ad} x) (\hbox{ad} y) )}$, is non-degenerate on ${{\mathfrak g}}$.
• (iv) ${{\mathfrak g}}$ is isomorphic to the direct sum of finitely many non-abelian simple Lie algebras.

We review the proof of this theorem later in these notes. A Lie algebra obeying any (and hence all) of the properties (i)-(iv) is known as a semisimple Lie algebra. The statement (iv) is usually taken as the definition of semisimplicity; the equivalence of (iv) and (i) is a special case of Weyl’s complete reducibility theorem (see Theorem 32), and the equivalence of (iv) and (iii) is known as the Cartan semisimplicity criterion. (The equivalence of (i) and (ii) is easy.)

If ${{\mathfrak h}}$ and ${{\mathfrak k}}$ are solvable ideals of a Lie algebra ${{\mathfrak g}}$, then it is not difficult to see that the vector sum ${{\mathfrak h}+{\mathfrak k}}$ is also a solvable ideal (because on quotienting by ${{\mathfrak h}}$ we see that the derived series of ${{\mathfrak h}+{\mathfrak k}}$ must eventually fall inside ${{\mathfrak h}}$, and thence must eventually become trivial by the solvability of ${{\mathfrak h}}$). As our Lie algebras are finite dimensional, we conclude that ${{\mathfrak g}}$ has a unique maximal solvable ideal, known as the radical ${\hbox{rad} {\mathfrak g}}$ of ${{\mathfrak g}}$. The quotient ${{\mathfrak g}/\hbox{rad} {\mathfrak g}}$ is then a Lie algebra with trivial radical, and is thus semisimple by the above theorem, giving the Levi decomposition

$\displaystyle {\mathfrak g} = \hbox{rad} {\mathfrak g} . ({\mathfrak g} / \hbox{rad} {\mathfrak g})$

expressing an arbitrary Lie algebra as an extension of a semisimple Lie algebra ${{\mathfrak g}/\hbox{rad}{\mathfrak g}}$ by a solvable algebra ${\hbox{rad} {\mathfrak g}}$ (and it is not hard to see that this is the only possible such extension up to isomorphism). Indeed, a deep theorem of Levi allows one to upgrade this decomposition to a split extension

$\displaystyle {\mathfrak g} = \hbox{rad} {\mathfrak g} : ({\mathfrak g} / \hbox{rad} {\mathfrak g})$

although we will not need or prove this result here.

In view of the above decompositions, we see that we can factor any Lie algebra (using a suitable combination of direct sums and extensions) into a finite number of simple Lie algebras and the scalar algebra ${{\bf C}}$. In principle, this means that one can understand an arbitrary Lie algebra once one understands all the simple Lie algebras (which, being defined over ${{\bf C}}$, are somewhat confusingly referred to as simple complex Lie algebras in the literature). Amazingly, this latter class of algebras are completely classified:

Theorem 2 (Classification of simple Lie algebras) Up to isomorphism, every simple Lie algebra is of one of the following forms:

• ${A_n = \mathfrak{sl}_{n+1}}$ for some ${n \geq 1}$.
• ${B_n = \mathfrak{so}_{2n+1}}$ for some ${n \geq 2}$.
• ${C_n = \mathfrak{sp}_{2n}}$ for some ${n \geq 3}$.
• ${D_n = \mathfrak{so}_{2n}}$ for some ${n \geq 4}$.
• ${E_6, E_7}$, or ${E_8}$.
• ${F_4}$.
• ${G_2}$.

(The precise definition of the classical Lie algebras ${A_n,B_n,C_n,D_n}$ and the exceptional Lie algebras ${E_6,E_7,E_8,F_4,G_2}$ will be recalled later.)

(One can extend the families ${A_n,B_n,C_n,D_n}$ of classical Lie algebras a little bit to smaller values of ${n}$, but the resulting algebras are either isomorphic to other algebras on this list, or cease to be simple; see this previous post for further discussion.)

This classification is a basic starting point for the classification of many other related objects, including Lie algebras and Lie groups over more general fields (e.g. the reals ${{\bf R}}$), as well as finite simple groups. Being so fundamental to the subject, this classification is covered in almost every basic textbook in Lie algebras, and I myself learned it many years ago in an honours undergraduate course back in Australia. The proof is rather lengthy, though, and I have always had difficulty keeping it straight in my head. So I have decided to write some notes on the classification in this blog post, aiming to be self-contained (though moving rapidly). There is no new material in this post, though; it is all drawn from standard reference texts (I relied particularly on Fulton and Harris’s text, which I highly recommend). In fact it seems remarkably hard to deviate from the standard routes given in the literature to the classification; I would be interested in knowing about other ways to reach the classification (or substeps in that classification) that are genuinely different from the orthodox route.

This week I am in Bremen, where the 50th International Mathematical Olympiad is being held.  A number of former Olympians (Béla Bollobás, Tim Gowers, Laci Lovasz, Stas Smirnov, Jean-Christophe Yoccoz, and myself) were invited to give a short talk (20 minutes in length) at the celebratory event for this anniversary.  I chose to talk on a topic I have spoken about several times before, on “Structure and randomness in the prime numbers“.  Given the time constraints, there was a limit as to how much substance I could put into the talk; but I try to describe, in very general terms, what we know about the primes, and what we suspect to be true, but cannot yet establish.  As I have mentioned in previous talks, the key problem is that we suspect the distribution of the primes to obey no significant patterns (other than “local” structure, such as having a strong tendency to be odd (which is local information at the 2 place), or obeying the prime number theorem (which is local information at the infinity place)), but we still do not have fully satisfactory tools for establishing the absence of a pattern. (This is in contrast with many types of Olympiad problems, where the key to solving a problem often lies in discovering the right pattern or structure in the problem to exploit.)

The PDF of the talk is here; I decided to try out the Beamer LaTeX package for a change.

The AMS has just notified me that the book version of the first year of my blog, now retitled “Structure and Randomness: pages from year one of a mathematical blog“, is now available.  An official web page for this book has also been set up here, though it is fairly empty at present.  A (2MB) high-resolution PDF file of the cover can be found here.

I plan to start on converting this year’s blog posts to book form in January, and hopefully the process should be a little faster this time.  Given that my lecture notes on ergodic theory and on the Poincaré conjecture will form the bulk of that book, I have chosen the working title for that book to be “Poincaré’s legacies: pages from year two of a mathematical blog“.

One of the most important topological concepts in analysis is that of compactness (as discussed for instance in my Companion article on this topic).  There are various flavours of this concept, but let us focus on sequential compactness: a subset E of a topological space X is sequentially compact if every sequence in E has a convergent subsequence whose limit is also in E.  This property allows one to do many things with the set E.  For instance, it allows one to maximise a functional on E:

Proposition 1. (Existence of extremisers)  Let E be a non-empty sequentially compact subset of a topological space X, and let $F: E \to {\Bbb R}$ be a continuous function.  Then the supremum $\sup_{x \in E} f(x)$ is attained at at least one point $x_* \in E$, thus $F(x) \leq F(x_*)$ for all $x \in E$.  (In particular, this supremum is finite.)  Similarly for the infimum.

Proof. Let $-\infty < L \leq +\infty$ be the supremum $L := \sup_{x \in E} F(x)$.  By the definition of supremum (and the axiom of (countable) choice), one can find a sequence $x^{(n)}$ in E such that $F(x^{(n)}) \to L$.  By compactness, we can refine this sequence to a subsequence (which, by abuse of notation, we shall continue to call $x^{(n)}$) such that $x^{(n)}$ converges to a limit x in E.  Since we still have $f(x^{(n)}) \to L$, and f is continuous at x, we conclude that f(x)=L, and the claim for the supremum follows.  The claim for the infimum is similar.  $\Box$

Remark 1. An inspection of the argument shows that one can relax the continuity hypothesis on F somewhat: to attain the supremum, it suffices that F be upper semicontinuous, and to attain the infimum, it suffices that F be lower semicontinuous. $\diamond$

We thus see that sequential compactness is useful, among other things, for ensuring the existence of extremisers.  In finite-dimensional spaces (such as vector spaces), compact sets are plentiful; indeed, the Heine-Borel theorem asserts that every closed and bounded set is compact.  However, once one moves to infinite-dimensional spaces, such as function spaces, then the Heine-Borel theorem fails quite dramatically; most of the closed and bounded sets one encounters in a topological vector space are non-compact, if one insists on using a reasonably “strong” topology.  This causes a difficulty in (among other things) calculus of variations, which is often concerned to finding extremisers to a functional $F: E \to {\Bbb R}$ on a subset E of an infinite-dimensional function space X.

In recent decades, mathematicians have found a number of ways to get around this difficulty.  One of them is to weaken the topology to recover compactness, taking advantage of such results as the Banach-Alaoglu theorem (or its sequential counterpart).  Of course, there is a tradeoff: weakening the topology makes compactness easier to attain, but makes the continuity of F harder to establish.  Nevertheless, if F enjoys enough “smoothing” or “cancellation” properties, one can hope to obtain continuity in the weak topology, allowing one to do things such as locate extremisers.  (The phenomenon that cancellation can lead to continuity in the weak topology is sometimes referred to as compensated compactness.)

Another option is to abandon trying to make all sequences have convergent subsequences, and settle just for extremising sequences to have convergent subsequences, as this would still be enough to retain Theorem 1.  Pursuing this line of thought leads to the Palais-Smale condition, which is a substitute for compactness in some calculus of variations situations.

But in many situations, one cannot weaken the topology to the point where the domain E becomes compact, without destroying the continuity (or semi-continuity) of F, though one can often at least find an intermediate topology (or metric) in which F is continuous, but for which E is still not quite compact.  Thus one can find sequences $x^{(n)}$ in E which do not have any subsequences that converge to a constant element $x \in E$, even in this intermediate metric.  (As we shall see shortly, one major cause of this failure of compactness is the existence of a non-trivial action of a non-compact group G on E; such a group action can cause compensated compactness or the Palais-Smale condition to fail also.)  Because of this, it is a priori conceivable that a continuous function F need not attain its supremum or infimum.

Nevertheless, even though a sequence $x^{(n)}$ does not have any subsequences that converge to a constant x, it may have a subsequence (which we also call $x^{(n)}$) which converges to some non-constant sequence $y^{(n)}$ (in the sense that the distance $d(x^{(n)},y^{(n)})$ between the subsequence and the new sequence in a this intermediate metric), where the approximating sequence $y^{(n)}$ is of a very structured form (e.g. “concentrating” to a point, or “travelling” off to infinity, or a superposition $y^{(n)} = \sum_j y^{(n)}_j$ of several concentrating or travelling profiles of this form).  This weaker form of compactness, in which superpositions of a certain type of profile completely describe all the failures (or defects) of compactness, is known as concentration compactness, and the decomposition $x^{(n)} \approx \sum_j y^{(n)}_j$ of the subsequence is known as the profile decomposition.  In many applications, it is a sufficiently good substitute for compactness that one can still do things like locate extremisers for functionals F –  though one often has to make some additional assumptions of F to compensate for the more complicated nature of the compactness.  This phenomenon was systematically studied by P.L. Lions in the 80s, and found great application in calculus of variations and nonlinear elliptic PDE.  More recently, concentration compactness has been a crucial and powerful tool in the non-perturbative analysis of nonlinear dispersive PDE, in particular being used to locate “minimal energy blowup solutions” or “minimal mass blowup solutions” for such a PDE (analogously to how one can use calculus of variations to find minimal energy solutions to a nonlinear elliptic equation); see for instance this recent survey by Killip and Visan.

In typical applications, the concentration compactness phenomenon is exploited in moderately sophisticated function spaces (such as Sobolev spaces or Strichartz spaces), with the failure of traditional compactness being connected to a moderately complicated group G of symmetries (e.g. the group generated by translations and dilations).  Because of this, concentration compactness can appear to be a rather complicated and technical concept when it is first encountered.  In this note, I would like to illustrate concentration compactness in a simple toy setting, namely in the space $X = l^1({\Bbb Z})$ of absolutely summable sequences, with the uniform ($l^\infty$) metric playing the role of the intermediate metric, and the translation group ${\Bbb Z}$ playing the role of the symmetry group G.  This toy setting is significantly simpler than any model that one would actually use in practice [for instance, in most applications X is a Hilbert space], but hopefully it serves to illuminate this useful concept in a less technical fashion.

The last two lectures of this course will be on Ratner’s theorems on equidistribution of orbits on homogeneous spaces. Due to lack of time, I will not be able to cover all the material here that I had originally planned; in particular, for an introduction to this family of results, and its connections with number theory, I will have to refer readers to my previous blog post on these theorems. In this course, I will discuss two special cases of Ratner-type theorems. In this lecture, I will talk about Ratner-type theorems for discrete actions (of the integers ${\Bbb Z})$ on nilmanifolds; this case is much simpler than the general case, because there is a simple criterion in the nilmanifold case to test whether any given orbit is equidistributed or not. Ben Green and I had need recently to develop quantitative versions of such theorems for a number-theoretic application. In the next and final lecture of this course, I will discuss Ratner-type theorems for actions of $SL_2({\Bbb R})$, which is simpler in a different way (due to the semisimplicity of $SL_2({\Bbb R})$, and lack of compact factors).

In this lecture – the final one on general measure-preserving dynamics – we put together the results from the past few lectures to establish the Furstenberg-Zimmer structure theorem for measure-preserving systems, and then use this to finish the proof of the Furstenberg recurrence theorem.

In Lecture 11, we studied compact measure-preserving systems – those systems $(X, {\mathcal X}, \mu, T)$ in which every function $f \in L^2(X, {\mathcal X}, \mu)$ was almost periodic, which meant that their orbit $\{ T^n f: n \in {\Bbb Z}\}$ was precompact in the $L^2(X, {\mathcal X}, \mu)$ topology. Among other things, we were able to easily establish the Furstenberg recurrence theorem (Theorem 1 from Lecture 11) for such systems.

In this lecture, we generalise these results to a “relative” or “conditional” setting, in which we study systems which are compact relative to some factor $(Y, {\mathcal Y}, \nu, S)$ of $(X, {\mathcal X}, \mu, T)$. Such systems are to compact systems as isometric extensions are to isometric systems in topological dynamics. The main result we establish here is that the Furstenberg recurrence theorem holds for such compact extensions whenever the theorem holds for the base. The proof is essentially the same as in the compact case; the main new trick is to not to work in the Hilbert spaces $L^2(X,{\mathcal X},\mu)$ over the complex numbers, but rather in the Hilbert module $L^2(X,{\mathcal X},\mu|Y, {\mathcal Y}, \nu)$ over the (commutative) von Neumann algebra $L^\infty(Y,{\mathcal Y},\nu)$. (Modules are to rings as vector spaces are to fields.) Because of the compact nature of the extension, it turns out that results from topological dynamics (and in particular, van der Waerden’s theorem) can be exploited to good effect in this argument.

[Note: this operator-algebraic approach is not the only way to understand these extensions; one can also proceed by disintegrating $\mu$ into fibre measures $\mu_y$ for almost every $y \in Y$ and working fibre by fibre. We will discuss the connection between the two approaches below.]