You are currently browsing the category archive for the ‘math.RA’ category.

As laid out in the foundational work of Kolmogorov, a classical probability space (or probability space for short) is a triplet ${(X, {\mathcal X}, \mu)}$, where ${X}$ is a set, ${{\mathcal X}}$ is a ${\sigma}$-algebra of subsets of ${X}$, and ${\mu: {\mathcal X} \rightarrow [0,1]}$ is a countably additive probability measure on ${{\mathcal X}}$. Given such a space, one can form a number of interesting function spaces, including

• the (real) Hilbert space ${L^2(X, {\mathcal X}, \mu)}$ of square-integrable functions ${f: X \rightarrow {\bf R}}$, modulo ${\mu}$-almost everywhere equivalence, and with the positive definite inner product ${\langle f, g\rangle_{L^2(X, {\mathcal X}, \mu)} := \int_X f g\ d\mu}$; and
• the unital commutative Banach algebra ${L^\infty(X, {\mathcal X}, \mu)}$ of essentially bounded functions ${f: X \rightarrow {\bf R}}$, modulo ${\mu}$-almost everywhere equivalence, with ${\|f\|_{L^\infty(X, {\mathcal X}, \mu)}}$ defined as the essential supremum of ${|f|}$.

There is also a trace ${\tau = \tau_\mu: L^\infty(X, {\mathcal X}, \mu) \rightarrow {\bf C}}$ on ${L^\infty}$ defined by integration: ${\tau(f) := \int_X f\ d\mu}$.

One can form the category ${\mathbf{Prb}}$ of classical probability spaces, by defining a morphism ${\phi: (X, {\mathcal X}, \mu) \rightarrow (Y, {\mathcal Y}, \nu)}$ between probability spaces to be a function ${\phi: X \rightarrow Y}$ which is measurable (thus ${\phi^{-1}(E) \in {\mathcal X}}$ for all ${E \in {\mathcal Y}}$) and measure-preserving (thus ${\mu(\phi^{-1}(E)) = \nu(E)}$ for all ${E \in {\mathcal Y}}$).

Let us now abstract the algebraic features of these spaces as follows; for want of a better name, I will refer to this abstraction as an algebraic probability space, and is very similar to the non-commutative probability spaces studied in this previous post, except that these spaces are now commutative (and real).

Definition 1 An algebraic probability space is a pair ${({\mathcal A}, \tau)}$ where

• ${{\mathcal A}}$ is a unital commutative real algebra;
• ${\tau: {\mathcal A} \rightarrow {\bf R}}$ is a homomorphism such that ${\tau(1)=1}$ and ${\tau( f^2 ) \geq 0}$ for all ${f \in {\mathcal A}}$;
• Every element ${f}$ of ${{\mathcal A}}$ is bounded in the sense that ${\sup_{k \geq 1} \tau( f^{2k} )^{1/2k} < \infty}$. (Technically, this isn’t an algebraic property, but I need it for technical reasons.)

A morphism ${\phi: ({\mathcal A}_1, \tau_1) \rightarrow ({\mathcal A}_2, \tau_2)}$ is a homomorphism ${\phi^*: {\mathcal A}_2 \rightarrow {\mathcal A}_1}$ which is trace-preserving, in the sense that ${\tau_1(\phi^*(f)) = \tau_2(f)}$ for all ${f \in {\mathcal A}_2}$.

For want of a better name, I’ll denote the category of algebraic probability spaces as ${\mathbf{AlgPrb}}$. One can view this category as the opposite category to that of (a subcategory of) the category of tracial commutative real algebras. One could emphasise this opposite nature by denoting the algebraic probability space as ${({\mathcal A}, \tau)^{op}}$ rather than ${({\mathcal A},\tau)}$; another suggestive (but slightly inaccurate) notation, inspired by the language of schemes, would be ${\hbox{Spec}({\mathcal A},\tau)}$ rather than ${({\mathcal A},\tau)}$. However, we will not adopt these conventions here, and refer to algebraic probability spaces just by the pair ${({\mathcal A},\tau)}$.

By the previous discussion, we have a covariant functor ${F: \textbf{Prb} \rightarrow \textbf{AlgPrb}}$ that takes a classical probability space ${(X, {\mathcal X}, \mu)}$ to its algebraic counterpart ${(L^\infty(X, {\mathcal X},\mu), \tau_\mu)}$, with a morphism ${\phi: (X, {\mathcal X}, \mu) \rightarrow (Y, {\mathcal Y}, \nu)}$ of classical probability spaces mapping to a morphism ${F(\phi): (L^\infty(X, {\mathcal X},\mu), \tau_\mu) \rightarrow (L^\infty(Y, {\mathcal Y},\nu), \tau_\nu)}$ of the corresponding algebraic probability spaces by the formula

$\displaystyle F(\phi)^* f := f \circ \phi$

for ${f \in L^\infty(Y, {\mathcal Y}, \nu)}$. One easily verifies that this is a functor.

In this post I would like to describe a functor ${G: \textbf{AlgPrb} \rightarrow \textbf{Prb}}$ which partially inverts ${F}$ (up to natural isomorphism), that is to say a recipe for starting with an algebraic probability space ${({\mathcal A}, \tau)}$ and producing a classical probability space ${(X, {\mathcal X}, \mu)}$. This recipe is not new – it is basically the (commutative) Gelfand-Naimark-Segal construction (discussed in this previous post) combined with the Loomis-Sikorski theorem (discussed in this previous post). However, I wanted to put the construction in a single location for sake of reference. I also wanted to make the point that ${F}$ and ${G}$ are not complete inverses; there is a bit of information in the algebraic probability space (e.g. topological information) which is lost when passing back to the classical probability space. In some future posts, I would like to develop some ergodic theory using the algebraic foundations of probability theory rather than the classical foundations; this turns out to be convenient in the ergodic theory arising from nonstandard analysis (such as that described in this previous post), in which the groups involved are uncountable and the underlying spaces are not standard Borel spaces.

Let us describe how to construct the functor ${G}$, with details postponed to below the fold.

1. Starting with an algebraic probability space ${({\mathcal A}, \tau)}$, form an inner product on ${{\mathcal A}}$ by the formula ${\langle f, g \rangle := \tau(fg)}$, and also form the spectral radius ${\rho(f) :=\lim_{k \rightarrow \infty} \tau(f^{2^k})^{1/2^k}}$.
2. The inner product is clearly positive semi-definite. Quotienting out the null vectors and taking completions, we arrive at a real Hilbert space ${L^2 = L^2({\mathcal A},\tau)}$, to which the trace ${\tau}$ may be extended.
3. Somewhat less obviously, the spectral radius is well-defined and gives a norm on ${{\mathcal A}}$. Taking ${L^2}$ limits of sequences in ${{\mathcal A}}$ of bounded spectral radius gives us a subspace ${L^\infty = L^\infty({\mathcal A},\tau)}$ of ${L^2}$ that has the structure of a real commutative Banach algebra.
4. The idempotents ${1_E}$ of the Banach algebra ${L^\infty}$ may be indexed by elements ${E}$ of an abstract ${\sigma}$-algebra ${{\mathcal B}}$.
5. The Boolean algebra homomorphisms ${\delta_x: {\mathcal B} \rightarrow \{0,1\}}$ (or equivalently, the real algebra homomorphisms ${\iota_x: L^\infty \rightarrow {\bf R}}$) may be indexed by elements ${x}$ of a space ${X}$.
6. Let ${{\mathcal X}}$ denote the ${\sigma}$-algebra on ${X}$ generated by the basic sets ${\overline{E} := \{ x \in X: \delta_x(E) = 1 \}}$ for every ${E \in {\mathcal B}}$.
7. Let ${{\mathcal N}}$ be the ${\sigma}$-ideal of ${{\mathcal X}}$ generated by the sets ${\bigcap_n \overline{E_n}}$, where ${E_n \in {\mathcal B}}$ is a sequence with ${\bigcap_n E_n = \emptyset}$.
8. One verifies that ${{\mathcal B}}$ is isomorphic to ${{\mathcal X}/{\mathcal N}}$. Using this isomorphism, the trace ${\tau}$ on ${L^\infty}$ can be used to construct a countably additive measure ${\mu}$ on ${{\mathcal X}}$. The classical probability space ${(X, {\mathcal X}, \mu)}$ is then ${G( {\mathcal A}, \tau )}$, and the abstract spaces ${L^2, L^\infty}$ may now be identified with their concrete counterparts ${L^2(X, {\mathcal X}, \mu)}$, ${L^\infty(X, {\mathcal X}, \mu)}$.
9. Every algebraic probability space morphism ${\phi: ({\mathcal A}_1,\tau_1) \rightarrow ({\mathcal A}_2,\tau_2)}$ generates a classical probability morphism ${G(\phi): (X_1, {\mathcal X}_1, \mu_1) \rightarrow (X_2, {\mathcal X}_2, \mu_2)}$ via the formula

$\displaystyle \delta_{G(\phi)(x_1)}( E_2 ) = \delta_{x_1}( \phi^*(E_2) )$

using a pullback operation ${\phi^*}$ on the abstract ${\sigma}$-algebras ${{\mathcal B}_1, {\mathcal B}_2}$ that can be defined by density.

Remark 1 The classical probability space ${X}$ constructed by the functor ${G}$ has some additional structure; namely ${X}$ is a ${\sigma}$-Stone space (a Stone space with the property that the closure of any countable union of clopen sets is clopen), ${{\mathcal X}}$ is the Baire ${\sigma}$-algebra (generated by the clopen sets), and the null sets are the meager sets. However, we will not use this additional structure here.

The partial inversion relationship between the functors ${F: \textbf{Prb} \rightarrow \textbf{AlgPrb}}$ and ${G: \textbf{AlgPrb} \rightarrow \textbf{Prb}}$ is given by the following assertion:

1. There is a natural transformation from ${F \circ G: \textbf{AlgPrb} \rightarrow \textbf{AlgPrb}}$ to the identity functor ${I: \textbf{AlgPrb} \rightarrow \textbf{AlgPrb}}$.

More informally: if one starts with an algebraic probability space ${({\mathcal A},\tau)}$ and converts it back into a classical probability space ${(X, {\mathcal X}, \mu)}$, then there is a trace-preserving algebra homomorphism of ${{\mathcal A}}$ to ${L^\infty( X, {\mathcal X}, \mu )}$, which respects morphisms of the algebraic probability space. While this relationship is far weaker than an equivalence of categories (which would require that ${F \circ G}$ and ${G \circ F}$ are both natural isomorphisms), it is still good enough to allow many ergodic theory problems formulated using classical probability spaces to be reformulated instead as an equivalent problem in algebraic probability spaces.

Remark 2 The opposite composition ${G \circ F: \textbf{Prb} \rightarrow \textbf{Prb}}$ is a little odd: it takes an arbitrary probability space ${(X, {\mathcal X}, \mu)}$ and returns a more complicated probability space ${(X', {\mathcal X}', \mu')}$, with ${X'}$ being the space of homomorphisms ${\iota_x: L^\infty(X, {\mathcal X}, \mu) \rightarrow {\bf R}}$. while there is “morally” an embedding of ${X}$ into ${X'}$ using the evaluation map, this map does not exist in general because points in ${X}$ may well have zero measure. However, if one takes a “pointless” approach and focuses just on the measure algebras ${({\mathcal X}, \mu)}$, ${({\mathcal X}', \mu')}$, then these algebras become naturally isomorphic after quotienting out by null sets.

Remark 3 An algebraic probability space captures a bit more structure than a classical probability space, because ${{\mathcal A}}$ may be identified with a proper subset of ${L^\infty}$ that describes the “regular” functions (or random variables) of the space. For instance, starting with the unit circle ${{\bf R}/{\bf Z}}$ (with the usual Haar measure and the usual trace ${\tau(f) = \int_{{\bf R}/{\bf Z}} f}$), any unital subalgebra ${{\mathcal A}}$ of ${L^\infty({\bf R}/{\bf Z})}$ that is dense in ${L^2({\bf R}/{\bf Z})}$ will generate the same classical probability space ${G( {\mathcal A}, \tau )}$ on applying the functor ${G}$, namely one will get the space ${({\bf R}/{\bf Z})'}$ of homomorphisms from ${L^\infty({\bf R}/{\bf Z})}$ to ${{\bf R}}$ (with the measure induced from ${\tau}$). Thus for instance ${{\mathcal A}}$ could be the continuous functions ${C( {\bf R}/{\bf Z} )}$, the Wiener algebra ${A({\bf R}/{\bf Z})}$ or the full space ${L^\infty({\bf R}/{\bf Z})}$, but the classical space ${G( {\mathcal A}, \tau )}$ will be unable to distinguish these spaces from each other. In particular, the functor ${F \circ G}$ loses information (roughly speaking, this functor takes an algebraic probability space and completes it to a von Neumann algebra, but then forgets exactly what algebra was initially used to create this completion). In ergodic theory, this sort of “extra structure” is traditionally encoded in topological terms, by assuming that the underlying probability space ${X}$ has a nice topological structure (e.g. a standard Borel space); however, with the algebraic perspective one has the freedom to have non-topological notions of extra structure, by choosing ${{\mathcal A}}$ to be something other than an algebra ${C(X)}$ of continuous functions on a topological space. I hope to discuss one such example of extra structure (coming from the Gowers-Host-Kra theory of uniformity seminorms) in a later blog post (this generalises the example of the Wiener algebra given previously, which is encoding “Fourier structure”).

A small example of how one could use the functors ${F, G}$ is as follows. Suppose one has a classical probability space ${(X, {\mathcal X}, \mu)}$ with a measure-preserving action of an uncountable group ${\Gamma}$, which is only defined (and an action) up to almost everywhere equivalence; thus for instance for any set ${E}$ and any ${g, h \in \Gamma}$, ${T^{gh} E}$ and ${T^g T^h E}$ might not be exactly equal, but only equal up to a null set. For similar reasons, an element ${E}$ of the invariant factor ${{\mathcal X}^\Gamma}$ might not be exactly invariant with respect to ${\Gamma}$, but instead one only has ${T^g E}$ and ${E}$ equal up to null sets for each ${g \in \Gamma}$. One might like to “clean up” the action of ${\Gamma}$ to make it defined everywhere, and a genuine action everywhere, but this is not immediately achievable if ${\Gamma}$ is uncountable, since the union of all the null sets where something bad occurs may cease to be a null set. However, by applying the functor ${F}$, each shift ${T^g: X \rightarrow X}$ defines a morphism ${T^g: L^\infty(X, {\mathcal X}, \mu) \rightarrow L^\infty(X, {\mathcal X}, \mu)}$ on the associated algebraic probability space (i.e. the Koopman operator), and then applying ${G}$, we obtain a shift ${T^g: X' \rightarrow X'}$ on a new classical probability space ${(X', {\mathcal X}', \mu')}$ which now gives a genuine measure-preserving action of ${\Gamma}$, and which is equivalent to the original action from a measure algebra standpoint. The invariant factor ${{\mathcal X}^\Gamma}$ now consists of those sets in ${{\mathcal X}'}$ which are genuinely ${\Gamma}$-invariant, not just up to null sets. (Basically, the classical probability space ${(X', {\mathcal X}', \mu')}$ contains a Boolean algebra ${\overline{\mathcal B}}$ with the property that every measurable set ${A \in {\mathcal X}'}$ is equivalent up to null sets to precisely one set in ${\overline{\mathcal B}}$, allowing for a canonical “retraction” onto ${\overline{\mathcal B}}$ that eliminates all null set issues.)

More indirectly, the functors ${F, G}$ suggest that one should be able to develop a “pointless” form of ergodic theory, in which the underlying probability spaces are given algebraically rather than classically. I hope to give some more specific examples of this in later posts.

In this previous post I recorded some (very standard) material on the structural theory of finite-dimensional complex Lie algebras (or Lie algebras for short), with a particular focus on those Lie algebras which were semisimple or simple. Among other things, these notes discussed the Weyl complete reducibility theorem (asserting that semisimple Lie algebras are the direct sum of simple Lie algebras) and the classification of simple Lie algebras (with all such Lie algebras being (up to isomorphism) of the form ${A_n}$, ${B_n}$, ${C_n}$, ${D_n}$, ${E_6}$, ${E_7}$, ${E_8}$, ${F_4}$, or ${G_2}$).

Among other things, the structural theory of Lie algebras can then be used to build analogous structures in nearby areas of mathematics, such as Lie groups and Lie algebras over more general fields than the complex field ${{\bf C}}$ (leading in particular to the notion of a Chevalley group), as well as finite simple groups of Lie type, which form the bulk of the classification of finite simple groups (with the exception of the alternating groups and a finite number of sporadic groups).

In the case of complex Lie groups, it turns out that every simple Lie algebra ${\mathfrak{g}}$ is associated with a finite number of connected complex Lie groups, ranging from a “minimal” Lie group ${G_{ad}}$ (the adjoint form of the Lie group) to a “maximal” Lie group ${\tilde G}$ (the simply connected form of the Lie group) that finitely covers ${G_{ad}}$, and occasionally also a number of intermediate forms which finitely cover ${G_{ad}}$, but are in turn finitely covered by ${\tilde G}$. For instance, ${\mathfrak{sl}_n({\bf C})}$ is associated with the projective special linear group ${\hbox{PSL}_n({\bf C}) = \hbox{PGL}_n({\bf C})}$ as its adjoint form and the special linear group ${\hbox{SL}_n({\bf C})}$ as its simply connected form, and intermediate groups can be created by quotienting out ${\hbox{SL}_n({\bf C})}$ by some subgroup of its centre (which is isomorphic to the ${n^{th}}$ roots of unity). The minimal form ${G_{ad}}$ is simple in the group-theoretic sense of having no normal subgroups, but the other forms of the Lie group are merely quasisimple, although traditionally all of the forms of a Lie group associated to a simple Lie algebra are known as simple Lie groups.

Thanks to the work of Chevalley, a very similar story holds for algebraic groups over arbitrary fields ${k}$; given any Dynkin diagram, one can define a simple Lie algebra with that diagram over that field, and also one can find a finite number of connected algebraic groups over ${k}$ (known as Chevalley groups) with that Lie algebra, ranging from an adjoint form ${G_{ad}}$ to a universal form ${G_u}$, with every form having an isogeny (the analogue of a finite cover for algebraic groups) to the adjoint form, and in turn receiving an isogeny from the universal form. Thus, for instance, one could construct the universal form ${E_7(q)_u}$ of the ${E_7}$ algebraic group over a finite field ${{\bf F}_q}$ of finite order.

When one restricts the Chevalley group construction to adjoint forms over a finite field (e.g. ${\hbox{PSL}_n({\bf F}_q)}$), one usually obtains a finite simple group (with a finite number of exceptions when the rank and the field are very small, and in some cases one also has to pass to a bounded index subgroup, such as the derived group, first). One could also use other forms than the adjoint form, but one then recovers the same finite simple group as before if one quotients out by the centre. This construction was then extended by Steinberg, Suzuki, and Ree by taking a Chevalley group over a finite field and then restricting to the fixed points of a certain automorphism of that group; after some additional minor modifications such as passing to a bounded index subgroup or quotienting out a bounded centre, this gives some additional finite simple groups of Lie type, including classical examples such as the projective special unitary groups ${\hbox{PSU}_n({\bf F}_{q^2})}$, as well as some more exotic examples such as the Suzuki groups or the Ree groups.

While I learned most of the classical structural theory of Lie algebras back when I was an undergraduate, and have interacted with Lie groups in many ways in the past (most recently in connection with Hilbert’s fifth problem, as discussed in this previous series of lectures), I have only recently had the need to understand more precisely the concepts of a Chevalley group and of a finite simple group of Lie type, as well as better understand the structural theory of simple complex Lie groups. As such, I am recording some notes here regarding these concepts, mainly for my own benefit, but perhaps they will also be of use to some other readers. The material here is standard, and was drawn from a number of sources, but primarily from Carter, Gorenstein-Lyons-Solomon, and Fulton-Harris, as well as the lecture notes on Chevalley groups by my colleague Robert Steinberg. The arrangement of material also reflects my own personal preferences; in particular, I tend to favour complex-variable or Riemannian geometry methods over algebraic ones, and this influenced a number of choices I had to make regarding how to prove certain key facts. The notes below are far from a comprehensive or fully detailed discussion of these topics, and I would refer interested readers to the references above for a properly thorough treatment.

An abstract finite-dimensional complex Lie algebra, or Lie algebra for short, is a finite-dimensional complex vector space ${{\mathfrak g}}$ together with an anti-symmetric bilinear form ${[,] = [,]_{\mathfrak g}: {\mathfrak g} \times {\mathfrak g} \rightarrow {\mathfrak g}}$ that obeys the Jacobi identity

$\displaystyle [[x,y],z] + [[y,z],x] + [[z,x],y] = 0 \ \ \ \ \ (1)$

for all ${x,y,z \in {\mathfrak g}}$; by anti-symmetry one can also rewrite the Jacobi identity as

$\displaystyle [x,[y,z]] = [[x,y],z] + [y,[x,z]]. \ \ \ \ \ (2)$

We will usually omit the subscript from the Lie bracket ${[,]_{\mathfrak g}}$ when this will not cause ambiguity. A homomorphism ${\phi: {\mathfrak g} \rightarrow {\mathfrak h}}$ between two Lie algebras ${{\mathfrak g},{\mathfrak h}}$ is a linear map that respects the Lie bracket, thus ${\phi([x,y]_{\mathfrak g}) =[\phi(x),\phi(y)]_{\mathfrak h}}$ for all ${x,y \in {\mathfrak g}}$. As with many other classes of mathematical objects, the class of Lie algebras together with their homomorphisms then form a category. One can of course also consider Lie algebras in infinite dimension or over other fields, but we will restrict attention throughout these notes to the finite-dimensional complex case. The trivial, zero-dimensional Lie algebra is denoted ${0}$; Lie algebras of positive dimension will be called non-trivial.

Lie algebras come up in many contexts in mathematics, in particular arising as the tangent space of complex Lie groups. It is thus very profitable to think of Lie algebras as being the infinitesimal component of a Lie group, and in particular almost all of the notation and concepts that are applicable to Lie groups (e.g. nilpotence, solvability, extensions, etc.) have infinitesimal counterparts in the category of Lie algebras (often with exactly the same terminology). See this previous blog post for more discussion about the connection between Lie algebras and Lie groups (that post was focused over the reals instead of the complexes, but much of the discussion carries over to the complex case).

A particular example of a Lie algebra is the general linear Lie algebra ${{\mathfrak{gl}}(V)}$ of linear transformations ${x: V \rightarrow V}$ on a finite-dimensional complex vector space (or vector space for short) ${V}$, with the commutator Lie bracket ${[x,y] := xy-yx}$; one easily verifies that this is indeed an abstract Lie algebra. We will define a concrete Lie algebra to be a Lie algebra that is a subalgebra of ${{\mathfrak{gl}}(V)}$ for some vector space ${V}$, and similarly define a representation of a Lie algebra ${{\mathfrak g}}$ to be a homomorphism ${\rho: {\mathfrak g} \rightarrow {\mathfrak h}}$ into a concrete Lie algebra ${{\mathfrak h}}$. It is a deep theorem of Ado (discussed in this previous post) that every abstract Lie algebra is in fact isomorphic to a concrete one (or equivalently, that every abstract Lie algebra has a faithful representation), but we will not need or prove this fact here.

Even without Ado’s theorem, though, the structure of abstract Lie algebras is very well understood. As with objects in many other algebraic categories, a basic way to understand a Lie algebra ${{\mathfrak g}}$ is to factor it into two simpler algebras ${{\mathfrak h}, {\mathfrak k}}$ via a short exact sequence

$\displaystyle 0 \rightarrow {\mathfrak h} \rightarrow {\mathfrak g} \rightarrow {\mathfrak k} \rightarrow 0, \ \ \ \ \ (3)$

thus one has an injective homomorphism from ${{\mathfrak h}}$ to ${{\mathfrak g}}$ and a surjective homomorphism from ${{\mathfrak g}}$ to ${{\mathfrak k}}$ such that the image of the former homomorphism is the kernel of the latter. (To be pedantic, a short exact sequence in a general category requires these homomorphisms to be monomorphisms and epimorphisms respectively, but in the category of Lie algebras these turn out to reduce to the more familiar concepts of injectivity and surjectivity respectively.) Given such a sequence, one can (non-uniquely) identify ${{\mathfrak g}}$ with the vector space ${{\mathfrak h} \times {\mathfrak k}}$ equipped with a Lie bracket of the form

$\displaystyle [(t,x), (s,y)]_{\mathfrak g} = ([t,s]_{\mathfrak h} + A(t,y) - A(s,x) + B(x,y), [x,y]_{\mathfrak k}) \ \ \ \ \ (4)$

for some bilinear maps ${A: {\mathfrak h} \times {\mathfrak k} \rightarrow {\mathfrak h}}$ and ${B: {\mathfrak k} \times {\mathfrak k} \rightarrow {\mathfrak h}}$ that obey some Jacobi-type identities which we will not record here. Understanding exactly what maps ${A,B}$ are possible here (up to coordinate change) can be a difficult task (and is one of the key objectives of Lie algebra cohomology), but in principle at least, the problem of understanding ${{\mathfrak g}}$ can be reduced to that of understanding that of its factors ${{\mathfrak k}, {\mathfrak h}}$. To emphasise this, I will (perhaps idiosyncratically) express the existence of a short exact sequence (3) by the ATLAS-type notation

$\displaystyle {\mathfrak g} = {\mathfrak h} . {\mathfrak k} \ \ \ \ \ (5)$

although one should caution that for given ${{\mathfrak h}}$ and ${{\mathfrak k}}$, there can be multiple non-isomorphic ${{\mathfrak g}}$ that can form a short exact sequence with ${{\mathfrak h},{\mathfrak k}}$, so that ${{\mathfrak h} . {\mathfrak k}}$ is not a uniquely defined combination of ${{\mathfrak h}}$ and ${{\mathfrak k}}$; one could emphasise this by writing ${{\mathfrak h} ._{A,B} {\mathfrak k}}$ instead of ${{\mathfrak h} . {\mathfrak k}}$, though we will not do so here. We will refer to ${{\mathfrak g}}$ as an extension of ${{\mathfrak k}}$ by ${{\mathfrak h}}$, and read the notation (5) as “ ${{\mathfrak g}}$ is ${{\mathfrak h}}$-by-${{\mathfrak k}}$“; confusingly, these two notations reverse the subject and object of “by”, but unfortunately both notations are well entrenched in the literature. We caution that the operation ${.}$ is not commutative, and it is only partly associative: every Lie algebra of the form ${{\mathfrak k} . ({\mathfrak h} . {\mathfrak l})}$ is also of the form ${({\mathfrak k} . {\mathfrak h}) . {\mathfrak l}}$, but the converse is not true (see this previous blog post for some related discussion). As we are working in the infinitesimal world of Lie algebras (which have an additive group operation) rather than Lie groups (in which the group operation is usually written multiplicatively), it may help to think of ${{\mathfrak h} . {\mathfrak k}}$ as a (twisted) “sum” of ${{\mathfrak h}}$ and ${{\mathfrak k}}$ rather than a “product”; for instance, we have ${{\mathfrak g} = 0 . {\mathfrak g}}$ and ${{\mathfrak g} = {\mathfrak g} . 0}$, and also ${\dim {\mathfrak h} . {\mathfrak k} = \dim {\mathfrak h} + \dim {\mathfrak k}}$.

Special examples of extensions ${{\mathfrak h} .{\mathfrak k}}$ of ${{\mathfrak k}}$ by ${{\mathfrak h}}$ include the direct sum (or direct product) ${{\mathfrak h} \oplus {\mathfrak k}}$ (also denoted ${{\mathfrak h} \times {\mathfrak k}}$), which is given by the construction (4) with ${A}$ and ${B}$ both vanishing, and the split extension (or semidirect product) ${{\mathfrak h} : {\mathfrak k} = {\mathfrak h} :_\rho {\mathfrak k}}$ (also denoted ${{\mathfrak h} \ltimes {\mathfrak k} = {\mathfrak h} \ltimes_\rho {\mathfrak k}}$), which is given by the construction (4) with ${B}$ vanishing and the bilinear map ${A: {\mathfrak h} \times {\mathfrak k} \rightarrow {\mathfrak h}}$ taking the form

$\displaystyle A( t, x ) = \rho(x)(t)$

for some representation ${\rho: {\mathfrak k} \rightarrow \hbox{Der} {\mathfrak h}}$ of ${{\mathfrak k}}$ in the concrete Lie algebra of derivations ${\hbox{Der} {\mathfrak h} \subset {\mathfrak{gl}}({\mathfrak h})}$ of ${{\mathfrak h}}$, that is to say the algebra of linear maps ${D: {\mathfrak h} \rightarrow {\mathfrak h}}$ that obey the Leibniz rule

$\displaystyle D[s,t]_{\mathfrak h} = [Ds,t]_{\mathfrak h} + [s,Dt]_{\mathfrak h}$

for all ${s,t \in {\mathfrak h}}$. (The derivation algebra ${\hbox{Der} {\mathfrak g}}$ of a Lie algebra ${{\mathfrak g}}$ is analogous to the automorphism group ${\hbox{Aut}(G)}$ of a Lie group ${G}$, with the two concepts being intertwined by the tangent space functor ${G \mapsto {\mathfrak g}}$ from Lie groups to Lie algebras (i.e. the derivation algebra is the infinitesimal version of the automorphism group). Of course, this functor also intertwines the Lie algebra and Lie group versions of most of the other concepts discussed here, such as extensions, semidirect products, etc.)

There are two general ways to factor a Lie algebra ${{\mathfrak g}}$ as an extension ${{\mathfrak h} . {\mathfrak k}}$ of a smaller Lie algebra ${{\mathfrak k}}$ by another smaller Lie algebra ${{\mathfrak h}}$. One is to locate a Lie algebra ideal (or ideal for short) ${{\mathfrak h}}$ in ${{\mathfrak g}}$, thus ${[{\mathfrak h},{\mathfrak g}] \subset {\mathfrak h}}$, where ${[{\mathfrak h},{\mathfrak g}]}$ denotes the Lie algebra generated by ${\{ [x,y]: x \in {\mathfrak h}, y \in {\mathfrak g} \}}$, and then take ${{\mathfrak k}}$ to be the quotient space ${{\mathfrak g}/{\mathfrak h}}$ in the usual manner; one can check that ${{\mathfrak h}}$, ${{\mathfrak k}}$ are also Lie algebras and that we do indeed have a short exact sequence

$\displaystyle {\mathfrak g} = {\mathfrak h} . ({\mathfrak g}/{\mathfrak h}).$

Conversely, whenever one has a factorisation ${{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}$, one can identify ${{\mathfrak h}}$ with an ideal in ${{\mathfrak g}}$, and ${{\mathfrak k}}$ with the quotient of ${{\mathfrak g}}$ by ${{\mathfrak h}}$.

The other general way to obtain such a factorisation is is to start with a homomorphism ${\rho: {\mathfrak g} \rightarrow {\mathfrak m}}$ of ${{\mathfrak g}}$ into another Lie algebra ${{\mathfrak m}}$, take ${{\mathfrak k}}$ to be the image ${\rho({\mathfrak g})}$ of ${{\mathfrak g}}$, and ${{\mathfrak h}}$ to be the kernel ${\hbox{ker} \rho := \{ x \in {\mathfrak g}: \rho(x) = 0 \}}$. Again, it is easy to see that this does indeed create a short exact sequence:

$\displaystyle {\mathfrak g} = \hbox{ker} \rho . \rho({\mathfrak g}).$

Conversely, whenever one has a factorisation ${{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}$, one can identify ${{\mathfrak k}}$ with the image of ${{\mathfrak g}}$ under some homomorphism, and ${{\mathfrak h}}$ with the kernel of that homomorphism. Note that if a representation ${\rho: {\mathfrak g} \rightarrow {\mathfrak m}}$ is faithful (i.e. injective), then the kernel is trivial and ${{\mathfrak g}}$ is isomorphic to ${\rho({\mathfrak g})}$.

Now we consider some examples of factoring some class of Lie algebras into simpler Lie algebras. The easiest examples of Lie algebras to understand are the abelian Lie algebras ${{\mathfrak g}}$, in which the Lie bracket identically vanishes. Every one-dimensional Lie algebra is automatically abelian, and thus isomorphic to the scalar algebra ${{\bf C}}$. Conversely, by using an arbitrary linear basis of ${{\mathfrak g}}$, we see that an abelian Lie algebra is isomorphic to the direct sum of one-dimensional algebras. Thus, a Lie algebra is abelian if and only if it is isomorphic to the direct sum of finitely many copies of ${{\bf C}}$.

Now consider a Lie algebra ${{\mathfrak g}}$ that is not necessarily abelian. We then form the derived algebra ${[{\mathfrak g},{\mathfrak g}]}$; this algebra is trivial if and only if ${{\mathfrak g}}$ is abelian. It is easy to see that ${[{\mathfrak h},{\mathfrak k}]}$ is an ideal whenever ${{\mathfrak h},{\mathfrak k}}$ are ideals, so in particular the derived algebra ${[{\mathfrak g},{\mathfrak g}]}$ is an ideal and we thus have the short exact sequence

$\displaystyle {\mathfrak g} = [{\mathfrak g},{\mathfrak g}] . ({\mathfrak g}/[{\mathfrak g},{\mathfrak g}]).$

The algebra ${{\mathfrak g}/[{\mathfrak g},{\mathfrak g}]}$ is the maximal abelian quotient of ${{\mathfrak g}}$, and is known as the abelianisation of ${{\mathfrak g}}$. If it is trivial, we call the Lie algebra perfect. If instead it is non-trivial, then the derived algebra has strictly smaller dimension than ${{\mathfrak g}}$. From this, it is natural to associate two series to any Lie algebra ${{\mathfrak g}}$, the lower central series

$\displaystyle {\mathfrak g}_1 = {\mathfrak g}; {\mathfrak g}_2 := [{\mathfrak g}, {\mathfrak g}_1]; {\mathfrak g}_3 := [{\mathfrak g}, {\mathfrak g}_2]; \ldots$

and the derived series

$\displaystyle {\mathfrak g}^{(1)} := {\mathfrak g}; {\mathfrak g}^{(2)} := [{\mathfrak g}^{(1)}, {\mathfrak g}^{(1)}]; {\mathfrak g}^{(3)} := [{\mathfrak g}^{(2)}, {\mathfrak g}^{(2)}]; \ldots.$

By induction we see that these are both decreasing series of ideals of ${{\mathfrak g}}$, with the derived series being slightly smaller (${{\mathfrak g}^{(k)} \subseteq {\mathfrak g}_k}$ for all ${k}$). We say that a Lie algebra is nilpotent if its lower central series is eventually trivial, and solvable if its derived series eventually becomes trivial. Thus, abelian Lie algebras are nilpotent, and nilpotent Lie algebras are solvable, but the converses are not necessarily true. For instance, in the general linear group ${{\mathfrak{gl}}_n = {\mathfrak{gl}}({\bf C}^n)}$, which can be identified with the Lie algebra of ${n \times n}$ complex matrices, the subalgebra ${{\mathfrak n}}$ of strictly upper triangular matrices is nilpotent (but not abelian for ${n \geq 3}$), while the subalgebra ${{\mathfrak n}}$ of upper triangular matrices is solvable (but not nilpotent for ${n \geq 2}$). It is also clear that any subalgebra of a nilpotent algebra is nilpotent, and similarly for solvable or abelian algebras.

From the above discussion we see that a Lie algebra is solvable if and only if it can be represented by a tower of abelian extensions, thus

$\displaystyle {\mathfrak g} = {\mathfrak a}_1 . ({\mathfrak a}_2 . \ldots ({\mathfrak a}_{k-1} . {\mathfrak a}_k) \ldots )$

for some abelian ${{\mathfrak a}_1,\ldots,{\mathfrak a}_k}$. Similarly, a Lie algebra ${{\mathfrak g}}$ is nilpotent if it is expressible as a tower of central extensions (so that in all the extensions ${{\mathfrak h} . {\mathfrak k}}$ in the above factorisation, ${{\mathfrak h}}$ is central in ${{\mathfrak h} . {\mathfrak k}}$, where we say that ${{\mathfrak h}}$ is central in ${{\mathfrak g}}$ if ${[{\mathfrak h},{\mathfrak g}]=0}$). We also see that an extension ${{\mathfrak h} . {\mathfrak k}}$ is solvable if and only of both factors ${{\mathfrak h}, {\mathfrak k}}$ are solvable. Splitting abelian algebras into cyclic (i.e. one-dimensional) ones, we thus see that a finite-dimensional Lie algebra is solvable if and only if it is polycylic, i.e. it can be represented by a tower of cyclic extensions.

For our next fundamental example of using short exact sequences to split a general Lie algebra into simpler objects, we observe that every abstract Lie algebra ${{\mathfrak g}}$ has an adjoint representation ${\hbox{ad}: {\mathfrak g} \rightarrow \hbox{ad} {\mathfrak g} \subset {\mathfrak{gl}}({\mathfrak g})}$, where for each ${x \in {\mathfrak g}}$, ${\hbox{ad} x \in {\mathfrak{gl}}({\mathfrak g})}$ is the linear map ${(\hbox{ad} x)(y) := [x,y]}$; one easily verifies that this is indeed a representation (indeed, (2) is equivalent to the assertion that ${\hbox{ad} [x,y] = [\hbox{ad} x, \hbox{ad} y]}$ for all ${x,y \in {\mathfrak g}}$). The kernel of this representation is the center ${Z({\mathfrak g}) := \{ x \in {\mathfrak g}: [x,{\mathfrak g}] = 0\}}$, which the maximal central subalgebra of ${{\mathfrak g}}$. We thus have the short exact sequence

$\displaystyle {\mathfrak g} = Z({\mathfrak g}) . \hbox{ad} g \ \ \ \ \ (6)$

which, among other things, shows that every abstract Lie algebra is a central extension of a concrete Lie algebra (which can serve as a cheap substitute for Ado’s theorem mentioned earlier).

For our next fundamental decomposition of Lie algebras, we need some more definitions. A Lie algebra ${{\mathfrak g}}$ is simple if it is non-abelian and has no ideals other than ${0}$ and ${{\mathfrak g}}$; thus simple Lie algebras cannot be factored ${{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}$ into strictly smaller algebras ${{\mathfrak h},{\mathfrak k}}$. In particular, simple Lie algebras are automatically perfect and centerless. We have the following fundamental theorem:

Theorem 1 (Equivalent definitions of semisimplicity) Let ${{\mathfrak g}}$ be a Lie algebra. Then the following are equivalent:

• (i) ${{\mathfrak g}}$ does not contain any non-trivial solvable ideal.
• (ii) ${{\mathfrak g}}$ does not contain any non-trivial abelian ideal.
• (iii) The Killing form ${K: {\mathfrak g} \times {\mathfrak g} \rightarrow {\bf C}}$, defined as the bilinear form ${K(x,y) := \hbox{tr}_{\mathfrak g}( (\hbox{ad} x) (\hbox{ad} y) )}$, is non-degenerate on ${{\mathfrak g}}$.
• (iv) ${{\mathfrak g}}$ is isomorphic to the direct sum of finitely many non-abelian simple Lie algebras.

We review the proof of this theorem later in these notes. A Lie algebra obeying any (and hence all) of the properties (i)-(iv) is known as a semisimple Lie algebra. The statement (iv) is usually taken as the definition of semisimplicity; the equivalence of (iv) and (i) is a special case of Weyl’s complete reducibility theorem (see Theorem 32), and the equivalence of (iv) and (iii) is known as the Cartan semisimplicity criterion. (The equivalence of (i) and (ii) is easy.)

If ${{\mathfrak h}}$ and ${{\mathfrak k}}$ are solvable ideals of a Lie algebra ${{\mathfrak g}}$, then it is not difficult to see that the vector sum ${{\mathfrak h}+{\mathfrak k}}$ is also a solvable ideal (because on quotienting by ${{\mathfrak h}}$ we see that the derived series of ${{\mathfrak h}+{\mathfrak k}}$ must eventually fall inside ${{\mathfrak h}}$, and thence must eventually become trivial by the solvability of ${{\mathfrak h}}$). As our Lie algebras are finite dimensional, we conclude that ${{\mathfrak g}}$ has a unique maximal solvable ideal, known as the radical ${\hbox{rad} {\mathfrak g}}$ of ${{\mathfrak g}}$. The quotient ${{\mathfrak g}/\hbox{rad} {\mathfrak g}}$ is then a Lie algebra with trivial radical, and is thus semisimple by the above theorem, giving the Levi decomposition

$\displaystyle {\mathfrak g} = \hbox{rad} {\mathfrak g} . ({\mathfrak g} / \hbox{rad} {\mathfrak g})$

expressing an arbitrary Lie algebra as an extension of a semisimple Lie algebra ${{\mathfrak g}/\hbox{rad}{\mathfrak g}}$ by a solvable algebra ${\hbox{rad} {\mathfrak g}}$ (and it is not hard to see that this is the only possible such extension up to isomorphism). Indeed, a deep theorem of Levi allows one to upgrade this decomposition to a split extension

$\displaystyle {\mathfrak g} = \hbox{rad} {\mathfrak g} : ({\mathfrak g} / \hbox{rad} {\mathfrak g})$

although we will not need or prove this result here.

In view of the above decompositions, we see that we can factor any Lie algebra (using a suitable combination of direct sums and extensions) into a finite number of simple Lie algebras and the scalar algebra ${{\bf C}}$. In principle, this means that one can understand an arbitrary Lie algebra once one understands all the simple Lie algebras (which, being defined over ${{\bf C}}$, are somewhat confusingly referred to as simple complex Lie algebras in the literature). Amazingly, this latter class of algebras are completely classified:

Theorem 2 (Classification of simple Lie algebras) Up to isomorphism, every simple Lie algebra is of one of the following forms:

• ${A_n = \mathfrak{sl}_{n+1}}$ for some ${n \geq 1}$.
• ${B_n = \mathfrak{so}_{2n+1}}$ for some ${n \geq 2}$.
• ${C_n = \mathfrak{sp}_{2n}}$ for some ${n \geq 3}$.
• ${D_n = \mathfrak{so}_{2n}}$ for some ${n \geq 4}$.
• ${E_6, E_7}$, or ${E_8}$.
• ${F_4}$.
• ${G_2}$.

(The precise definition of the classical Lie algebras ${A_n,B_n,C_n,D_n}$ and the exceptional Lie algebras ${E_6,E_7,E_8,F_4,G_2}$ will be recalled later.)

(One can extend the families ${A_n,B_n,C_n,D_n}$ of classical Lie algebras a little bit to smaller values of ${n}$, but the resulting algebras are either isomorphic to other algebras on this list, or cease to be simple; see this previous post for further discussion.)

This classification is a basic starting point for the classification of many other related objects, including Lie algebras and Lie groups over more general fields (e.g. the reals ${{\bf R}}$), as well as finite simple groups. Being so fundamental to the subject, this classification is covered in almost every basic textbook in Lie algebras, and I myself learned it many years ago in an honours undergraduate course back in Australia. The proof is rather lengthy, though, and I have always had difficulty keeping it straight in my head. So I have decided to write some notes on the classification in this blog post, aiming to be self-contained (though moving rapidly). There is no new material in this post, though; it is all drawn from standard reference texts (I relied particularly on Fulton and Harris’s text, which I highly recommend). In fact it seems remarkably hard to deviate from the standard routes given in the literature to the classification; I would be interested in knowing about other ways to reach the classification (or substeps in that classification) that are genuinely different from the orthodox route.

The fundamental notions of calculus, namely differentiation and integration, are often viewed as being the quintessential concepts in mathematical analysis, as their standard definitions involve the concept of a limit. However, it is possible to capture most of the essence of these notions by purely algebraic means (almost completely avoiding the use of limits, Riemann sums, and similar devices), which turns out to be useful when trying to generalise these concepts to more abstract situations in which it becomes convenient to permit the underlying number systems involved to be something other than the real or complex numbers, even if this makes many standard analysis constructions unavailable. For instance, the algebraic notion of a derivation often serves as a substitute for the analytic notion of a derivative in such cases, by abstracting out the key algebraic properties of differentiation, namely linearity and the Leibniz rule (also known as the product rule).

Abstract algebraic analogues of integration are less well known, but can still be developed. To motivate such an abstraction, consider the integration functional ${I: {\mathcal S}({\bf R} \rightarrow {\bf C}) \rightarrow {\bf C}}$ from the space ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$ of complex-valued Schwarz functions ${f: {\bf R} \rightarrow {\bf C}}$ to the complex numbers, defined by

$\displaystyle I(f) := \int_{\bf R} f(x)\ dx$

where the integration on the right is the usual Lebesgue integral (or improper Riemann integral) from analysis. This functional obeys two obvious algebraic properties. Firstly, it is linear over ${{\bf C}}$, thus

$\displaystyle I(cf) = c I(f) \ \ \ \ \ (1)$

and

$\displaystyle I(f+g) = I(f) + I(g) \ \ \ \ \ (2)$

for all ${f,g \in {\mathcal S}({\bf R} \rightarrow {\bf C})}$ and ${c \in {\bf C}}$. Secondly, it is translation invariant, thus

$\displaystyle I(\tau_h f) = I(f) \ \ \ \ \ (3)$

for all ${h \in {\bf C}}$, where ${\tau_h f(x) := f(x-h)}$ is the translation of ${f}$ by ${h}$. Motivated by the uniqueness theory of Haar measure, one might expect that these two axioms already uniquely determine ${I}$ after one sets a normalisation, for instance by requiring that

$\displaystyle I( x \mapsto e^{-\pi x^2} ) = 1. \ \ \ \ \ (4)$

This is not quite true as stated (one can modify the proof of the Hahn-Banach theorem, after first applying a Fourier transform, to create pathological translation-invariant linear functionals on ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$ that are not multiples of the standard Fourier transform), but if one adds a mild analytical axiom, such as continuity of ${I}$ (using the usual Schwartz topology on ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$), then the above axioms are enough to uniquely pin down the notion of integration. Indeed, if ${I: {\mathcal S}({\bf R} \rightarrow {\bf C}) \rightarrow {\bf C}}$ is a continuous linear functional that is translation invariant, then from the linearity and translation invariance axioms one has

$\displaystyle I( \frac{\tau_h f - f}{h} ) = 0$

for all ${f \in {\mathcal S}({\bf R} \rightarrow {\bf C})}$ and non-zero reals ${h}$. If ${f}$ is Schwartz, then as ${h \rightarrow 0}$, one can verify that the Newton quotients ${\frac{\tau_h f - f}{h}}$ converge in the Schwartz topology to the derivative ${f'}$ of ${f}$, so by the continuity axiom one has

$\displaystyle I(f') = 0.$

Next, note that any Schwartz function of integral zero has an antiderivative which is also Schwartz, and so ${I}$ annihilates all zero-integral Schwartz functions, and thus must be a scalar multiple of the usual integration functional. Using the normalisation (4), we see that ${I}$ must therefore be the usual integration functional, giving the claimed uniqueness.

Motivated by the above discussion, we can define the notion of an abstract integration functional ${I: X \rightarrow R}$ taking values in some vector space ${R}$, and applied to inputs ${f}$ in some other vector space ${X}$ that enjoys a linear action ${h \mapsto \tau_h}$ (the “translation action”) of some group ${V}$, as being a functional which is both linear and translation invariant, thus one has the axioms (1), (2), (3) for all ${f,g \in X}$, scalars ${c}$, and ${h \in V}$. The previous discussion then considered the special case when ${R = {\bf C}}$, ${X = {\mathcal S}({\bf R} \rightarrow {\bf C})}$, ${V = {\bf R}}$, and ${\tau}$ was the usual translation action.

Once we have performed this abstraction, we can now present analogues of classical integration which bear very little analytic resemblance to the classical concept, but which still have much of the algebraic structure of integration. Consider for instance the situation in which we keep the complex range ${R = {\bf C}}$, the translation group ${V = {\bf R}}$, and the usual translation action ${h \mapsto \tau_h}$, but we replace the space ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$ of Schwartz functions by the space ${Poly_{\leq d}({\bf R} \rightarrow {\bf C})}$ of polynomials ${x \mapsto a_0 + a_1 x + \ldots + a_d x^d}$ of degree at most ${d}$ with complex coefficients, where ${d}$ is a fixed natural number; note that this space is translation invariant, so it makes sense to talk about an abstract integration functional ${I: Poly_{\leq d}({\bf R} \rightarrow {\bf C}) \rightarrow {\bf C}}$. Of course, one cannot apply traditional integration concepts to non-zero polynomials, as they are not absolutely integrable. But one can repeat the previous arguments to show that any abstract integration functional must annihilate derivatives of polynomials of degree at most ${d}$:

$\displaystyle I(f') = 0 \hbox{ for all } f \in Poly_{\leq d}({\bf R} \rightarrow {\bf C}). \ \ \ \ \ (5)$

Clearly, every polynomial of degree at most ${d-1}$ is thus annihilated by ${I}$, which makes ${I}$ a scalar multiple of the functional that extracts the top coefficient ${a_d}$ of a polynomial, thus if one sets a normalisation

$\displaystyle I( x \mapsto x^d ) = c$

for some constant ${c}$, then one has

$\displaystyle I( x \mapsto a_0 + a_1 x + \ldots + a_d x^d ) = c a_d \ \ \ \ \ (6)$

for any polynomial ${x \mapsto a_0 + a_1 x + \ldots + a_d x^d}$. So we see that up to a normalising constant, the operation of extracting the top order coefficient of a polynomial of fixed degree serves as the analogue of integration. In particular, despite the fact that integration is supposed to be the “opposite” of differentiation (as indicated for instance by (5)), we see in this case that integration is basically (${d}$-fold) differentiation; indeed, compare (6) with the identity

$\displaystyle (\frac{d}{dx})^d ( a_0 + a_1 x + \ldots + a_d x^d ) = d! a_d.$

In particular, we see, in contrast to the usual Lebesgue integral, the integration functional (6) can be localised to an arbitrary location: one only needs to know the germ of the polynomial ${x \mapsto a_0 + a_1 x + \ldots + a_d x^d}$ at a single point ${x_0}$ in order to determine the value of the functional (6). This localisation property may initially seem at odds with the translation invariance, but the two can be reconciled thanks to the extremely rigid nature of the class ${Poly_{\leq d}({\bf R} \rightarrow {\bf C})}$, in contrast to the Schwartz class ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$ which admits bump functions and so can generate local phenomena that can only be detected in small regions of the underlying spatial domain, and which therefore forces any translation-invariant integration functional on such function classes to measure the function at every single point in space.

The reversal of the relationship between integration and differentiation is also reflected in the fact that the abstract integration operation on polynomials interacts with the scaling operation ${\delta_\lambda f(x) := f(x/\lambda)}$ in essentially the opposite way from the classical integration operation. Indeed, for classical integration on ${{\bf R}^d}$, one has

$\displaystyle \int_{{\bf R}^d} f(x/\lambda)\ dx = \lambda^d \int f(x)\ dx$

for Schwartz functions ${f \in {\mathcal S}({\bf R}^d \rightarrow {\bf C})}$, and so in this case the integration functional ${I(f) := \int_{{\bf R}^d} f(x)\ dx}$ obeys the scaling law

$\displaystyle I( \delta_\lambda f ) = \lambda^d I(f).$

In contrast, the abstract integration operation defined in (6) obeys the opposite scaling law

$\displaystyle I( \delta_\lambda f ) = \lambda^{-d} I(f). \ \ \ \ \ (7)$

Remark 1 One way to interpret what is going on is to view the integration operation (6) as a renormalised version of integration. A polynomial ${x \mapsto a_0 + a_1 + \ldots + a_d x^d}$ is, in general, not absolutely integrable, and the partial integrals

$\displaystyle \int_0^N a_0 + a_1 + \ldots + a_d x^d\ dx$

diverge as ${N \rightarrow \infty}$. But if one renormalises these integrals by the factor ${\frac{1}{N^{d+1}}}$, then one recovers convergence,

$\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N^{d+1}} \int_0^N a_0 + a_1 + \ldots + a_d x^d\ dx = \frac{1}{d+1} a_d$

thus giving an interpretation of (6) as a renormalised classical integral, with the renormalisation being responsible for the unusual scaling relationship in (7). However, this interpretation is a little artificial, and it seems that it is best to view functionals such as (6) from an abstract algebraic perspective, rather than to try to force an analytic interpretation on them.

$\displaystyle I(f) := \int_{\bf R} f(x)\ dx. \ \ \ \ \ (8)$

As noted earlier, this integration functional has a translation invariance associated to translations along the real line ${{\bf R}}$, as well as a dilation invariance by real dilation parameters ${\lambda>0}$. However, if we refine the class ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$ of functions somewhat, we can obtain a stronger family of invariances, in which we allow complex translations and dilations. More precisely, let ${\mathcal{SE}({\bf C} \rightarrow {\bf C})}$ denote the space of all functions ${f: {\bf C} \rightarrow {\bf C}}$ which are entire (or equivalently, are given by a Taylor series with an infinite radius of convergence around the origin) and also admit rapid decay in a sectorial neighbourhood of the real line, or more precisely there exists an ${\epsilon>0}$ such that for every ${A > 0}$ there exists ${C_A > 0}$ such that one has the bound

$\displaystyle |f(z)| \leq C_A (1+|z|)^{-A}$

whenever ${|\hbox{Im}(z)| \leq A + \epsilon |\hbox{Re}(z)|}$. For want of a better name, we shall call elements of this space Schwartz entire functions. This is clearly a complex vector space. A typical example of a Schwartz entire function are the complex gaussians

$\displaystyle f(z) := e^{-\pi (az^2 + 2bz + c)}$

where ${a,b,c}$ are complex numbers with ${\hbox{Re}(a) > 0}$. From the Cauchy integral formula (and its derivatives) we see that if ${f}$ lies in ${\mathcal{SE}({\bf C} \rightarrow {\bf C})}$, then the restriction of ${f}$ to the real line lies in ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$; conversely, from analytic continuation we see that every function in ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$ has at most one extension in ${\mathcal{SE}({\bf C} \rightarrow {\bf C})}$. Thus one can identify ${\mathcal{SE}({\bf C} \rightarrow {\bf C})}$ with a subspace of ${{\mathcal S}({\bf R} \rightarrow {\bf C})}$, and in particular the integration functional (8) is inherited by ${\mathcal{SE}({\bf C} \rightarrow {\bf C})}$, and by abuse of notation we denote the resulting functional ${I: \mathcal{SE}({\bf C} \rightarrow {\bf C}) \rightarrow {\bf C}}$ as ${I}$ also. Note, in analogy with the situation with polynomials, that this abstract integration functional is somewhat localised; one only needs to evaluate the function ${f}$ on the real line, rather than the entire complex plane, in order to compute ${I(f)}$. This is consistent with the rigid nature of Schwartz entire functions, as one can uniquely recover the entire function from its values on the real line by analytic continuation.

Of course, the functional ${I: \mathcal{SE}({\bf C} \rightarrow {\bf C}) \rightarrow {\bf C}}$ remains translation invariant with respect to real translation:

$\displaystyle I(\tau_h f) = I(f) \hbox{ for all } h \in {\bf R}.$

However, thanks to contour shifting, we now also have translation invariance with respect to complex translation:

$\displaystyle I(\tau_h f) = I(f) \hbox{ for all } h \in {\bf C},$

where of course we continue to define the translation operator ${\tau_h}$ for complex ${h}$ by the usual formula ${\tau_h f(x) := f(x-h)}$. In a similar vein, we also have the scaling law

$\displaystyle I(\delta_\lambda f) = \lambda I(f)$

for any ${f \in \mathcal{SE}({\bf C} \rightarrow {\bf C})}$, if ${\lambda}$ is a complex number sufficiently close to ${1}$ (where “sufficiently close” depends on ${f}$, and more precisely depends on the sectoral aperture parameter ${\epsilon}$ associated to ${f}$); again, one can verify that ${\delta_\lambda f}$ lies in ${\mathcal{SE}({\bf C} \rightarrow {\bf C})}$ for ${\lambda}$ sufficiently close to ${1}$. These invariances (which relocalise the integration functional ${I}$ onto other contours than the real line ${{\bf R}}$) are very useful for computing integrals, and in particular for computing gaussian integrals. For instance, the complex translation invariance tells us (after shifting by ${b/a}$) that

$\displaystyle I( z \mapsto e^{-\pi (az^2 + 2bz + c) } ) = e^{-\pi (c-b^2/a)} I( z \mapsto e^{-\pi a z^2} )$

when ${a,b,c \in {\bf C}}$ with ${\hbox{Re}(a) > 0}$, and then an application of the complex scaling law (and a continuity argument, observing that there is a compact path connecting ${a}$ to ${1}$ in the right half plane) gives

$\displaystyle I( z \mapsto e^{-\pi (az^2 + 2bz + c) } ) = a^{-1/2} e^{-\pi (c-b^2/a)} I( z \mapsto e^{-\pi z^2} )$

using the branch of ${a^{-1/2}}$ on the right half-plane for which ${1^{-1/2} = 1}$. Using the normalisation (4) we thus have

$\displaystyle I( z \mapsto e^{-\pi (az^2 + 2bz + c) } ) = a^{-1/2} e^{-\pi (c-b^2/a)}$

giving the usual gaussian integral formula

$\displaystyle \int_{\bf R} e^{-\pi (ax^2 + 2bx + c)}\ dx = a^{-1/2} e^{-\pi (c-b^2/a)}. \ \ \ \ \ (9)$

This is a basic illustration of the power that a large symmetry group (in this case, the complex homothety group) can bring to bear on the task of computing integrals.

One can extend this sort of analysis to higher dimensions. For any natural number ${n \geq 1}$, let ${\mathcal{SE}({\bf C}^n \rightarrow {\bf C})}$ denote the space of all functions ${f: {\bf C}^n \rightarrow {\bf C}}$ which is jointly entire in the sense that ${f(z_1,\ldots,z_n)}$ can be expressed as a Taylor series in ${z_1,\ldots,z_n}$ which is absolutely convergent for all choices of ${z_1,\ldots,z_n}$, and such that there exists an ${\epsilon > 0}$ such that for any ${A>0}$ there is ${C_A>0}$ for which one has the bound

$\displaystyle |f(z)| \leq C_A (1+|z|)^{-A}$

whenever ${|\hbox{Im}(z_j)| \leq A + \epsilon |\hbox{Re}(z_j)|}$ for all ${1 \leq j \leq n}$, where ${z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix}}$ and ${|z| := (|z_1|^2+\ldots+|z_n|^2)^{1/2}}$. Again, we call such functions Schwartz entire functions; a typical example is the function

$\displaystyle f(z) := e^{-\pi (z^T A z + 2b^T z + c)}$

where ${A}$ is an ${n \times n}$ complex symmetric matrix with positive definite real part, ${b}$ is a vector in ${{\bf C}^n}$, and ${c}$ is a complex number. We can then define an abstract integration functional ${I: \mathcal{SE}({\bf C}^n \rightarrow {\bf C}) \rightarrow {\bf C}}$ by integration on the real slice ${{\bf R}^n}$:

$\displaystyle I(f) := \int_{{\bf R}^n} f(x)\ dx$

where ${dx}$ is the usual Lebesgue measure on ${{\bf R}^n}$. By contour shifting in each of the ${n}$ variables ${z_1,\ldots,z_n}$ separately, we see that ${I}$ is invariant with respect to complex translations of each of the ${z_j}$ variables, and is thus invariant under translating the joint variable ${z}$ by ${{\bf C}^n}$. One can also verify the scaling law

$\displaystyle I(\delta_A f) = \hbox{det}(A) I(f)$

for ${n \times n}$ complex matrices ${A}$ sufficiently close to the origin, where ${\delta_A f(z) := f(A^{-1} z)}$. This can be seen for shear transformations ${A}$ by Fubini’s theorem and the aforementioned translation invariance, while for diagonal transformations near the origin this can be seen from ${n}$ applications of one-dimensional scaling law, and the general case then follows by composition. Among other things, these laws then easily lead to the higher-dimensional generalisation

$\displaystyle \int_{{\bf R}^n} e^{-\pi (x^T A x + 2 b^T x + c)}\ dx = \hbox{det}(A)^{-1/2} e^{-\pi (c-b^T A^{-1} b)} \ \ \ \ \ (10)$

whenever ${A}$ is a complex symmetric matrix with positive definite real part, ${b}$ is a vector in ${{\bf C}^n}$, and ${c}$ is a complex number, basically by repeating the one-dimensional argument sketched earlier. Here, we choose the branch of ${\hbox{det}(A)^{-1/2}}$ for all matrices ${A}$ in the indicated class for which ${\hbox{det}(1)^{-1/2} = 1}$.

Now we turn to an integration functional suitable for computing complex gaussian integrals such as

$\displaystyle \int_{{\bf C}^n} e^{-2\pi (z^\dagger A z + b^\dagger z + z^\dagger \tilde b + c)}\ dz d\overline{z}, \ \ \ \ \ (11)$

where ${z}$ is now a complex variable

$\displaystyle z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix},$

${z^\dagger}$ is the adjoint

$\displaystyle z^\dagger := (\overline{z_1},\ldots, \overline{z_n}),$

${A}$ is a complex ${n \times n}$ matrix with positive definite Hermitian part, ${b, \tilde b}$ are column vectors in ${{\bf C}^n}$, ${c}$ is a complex number, and ${dz d\overline{z} = \prod_{j=1}^n 2 d\hbox{Re}(z_j) d\hbox{Im}(z_j)}$ is ${2^n}$ times Lebesgue measure on ${{\bf C}^n}$. (The factors of two here turn out to be a natural normalisation, but they can be ignored on a first reading.) As we shall see later, such integrals are relevant when performing computations on the Gaussian Unitary Ensemble (GUE) in random matrix theory. Note that the integrand here is not complex analytic due to the presence of the complex conjugates. However, this can be dealt with by the trick of replacing the complex conjugate ${\overline{z}}$ by a variable ${z^*}$ which is formally conjugate to ${z}$, but which is allowed to vary independently of ${z}$. More precisely, let ${\mathcal{SA}({\bf C}^n \times {\bf C}^n \rightarrow {\bf C})}$ be the space of all functions ${f: (z,z^*) \mapsto f(z,z^*)}$ of two independent ${n}$-tuples

$\displaystyle z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix}, z^* = \begin{pmatrix} z_1^* \\ \vdots \\ z_n^* \end{pmatrix}$

of complex variables, which is jointly entire in all ${2n}$ variables (in the sense defined previously, i.e. there is a joint Taylor series that is absolutely convergent for all independent choices of ${z, z^* \in {\bf C}^n}$), and such that there is an ${\epsilon>0}$ such that for every ${A>0}$ there is ${C_A>0}$ such that one has the bound

$\displaystyle |f(z,z^*)| \leq C_A (1 + |z|)^{-A}$

whenever ${|z^* - \overline{z}| \leq A + \epsilon |z|}$. We will call such functions Schwartz analytic. Note that the integrand in (11) is Schwartz analytic when ${A}$ has positive definite Hermitian part, if we reinterpret ${z^\dagger}$ as the transpose of ${z^*}$ rather than as the adjoint of ${z}$ in order to make the integrand entire in ${z}$ and ${z^*}$. We can then define an abstract integration functional ${I: \mathcal{SA}({\bf C}^n \times {\bf C}^n \rightarrow {\bf C}) \rightarrow {\bf C}}$ by the formula

$\displaystyle I(f) := \int_{{\bf C}^n} f(z,\overline{z})\ dz d\overline{z}, \ \ \ \ \ (12)$

thus ${I}$ can be localised to the slice ${\{ (z,\overline{z}): z \in {\bf C}^n\}}$ of ${{\bf C}^n \times {\bf C}^n}$ (though, as with previous functionals, one can use contour shifting to relocalise ${I}$ to other slices also.) One can also write this integral as

$\displaystyle I(f) = 2^n \int_{{\bf R}^n \times {\bf R}^n} f(x+iy, x-iy)\ dx dy$

and note that the integrand here is a Schwartz entire function on ${{\bf C}^n \times {\bf C}^n}$, thus linking the Schwartz analytic integral with the Schwartz entire integral. Using this connection, one can verify that this functional ${I}$ is invariant with respect to translating ${z}$ and ${z^*}$ by independent shifts in ${{\bf C}^n}$ (thus giving a ${{\bf C}^n \times {\bf C}^n}$ translation symmetry), and one also has the independent dilation symmetry

$\displaystyle I(\delta_{A,B} f) = \hbox{det}(A) \hbox{det}(B) I(f)$

for ${n \times n}$ complex matrices ${A,B}$ that are sufficiently close to the identity, where ${\delta_{A,B} f(z,z^*) := f(A^{-1} z, B^{-1} z^*)}$. Arguing as before, we can then compute (11) as

$\displaystyle \int_{{\bf C}^n} e^{-2\pi (z^\dagger A z + b^\dagger z + z^\dagger \tilde b + c)}\ dz d\overline{z} = \hbox{det}(A)^{-1} e^{-2\pi (c - b^\dagger A^{-1} \tilde b)}. \ \ \ \ \ (13)$

In particular, this gives an integral representation for the determinant-reciprocal ${\hbox{det}(A)^{-1}}$ of a complex ${n \times n}$ matrix with positive definite Hermitian part, in terms of gaussian expressions in which ${A}$ only appears linearly in the exponential:

$\displaystyle \hbox{det}(A)^{-1} = \int_{{\bf C}^n} e^{-2\pi z^\dagger A z}\ dz d\overline{z}.$

This formula is then convenient for computing statistics such as

$\displaystyle \mathop{\bf E} \hbox{det}(W_n-E-i\eta)^{-1}$

for random matrices ${W_n}$ drawn from the Gaussian Unitary Ensemble (GUE), and some choice of spectral parameter ${E+i\eta}$ with ${\eta>0}$; we review this computation later in this post. By the trick of matrix differentiation of the determinant (as reviewed in this recent blog post), one can also use this method to compute matrix-valued statistics such as

$\displaystyle \mathop{\bf E} \hbox{det}(W_n-E-i\eta)^{-1} (W_n-E-i\eta)^{-1}.$

However, if one restricts attention to classical integrals over real or complex (and in particular, commuting or bosonic) variables, it does not seem possible to easily eradicate the negative determinant factors in such calculations, which is unfortunate because many statistics of interest in random matrix theory, such as the expected Stieltjes transform

$\displaystyle \mathop{\bf E} \frac{1}{n} \hbox{tr} (W_n-E-i\eta)^{-1},$

which is the Stieltjes transform of the density of states. However, it turns out (as I learned recently from Peter Sarnak and Tom Spencer) that it is possible to cancel out these negative determinant factors by balancing the bosonic gaussian integrals with an equal number of fermionic gaussian integrals, in which one integrates over a family of anticommuting variables. These fermionic integrals are closer in spirit to the polynomial integral (6) than to Lebesgue type integrals, and in particular obey a scaling law which is inverse to the Lebesgue scaling (in particular, a linear change of fermionic variables ${\zeta \mapsto A \zeta}$ ends up transforming a fermionic integral by ${\hbox{det}(A)}$ rather than ${\hbox{det}(A)^{-1}}$), which conveniently cancels out the reciprocal determinants in the previous calculations. Furthermore, one can combine the bosonic and fermionic integrals into a unified integration concept, known as the Berezin integral (or Grassmann integral), in which one integrates functions of supervectors (vectors with both bosonic and fermionic components), and is of particular importance in the theory of supersymmetry in physics. (The prefix “super” in physics means, roughly speaking, that the object or concept that the prefix is attached to contains both bosonic and fermionic aspects.) When one applies this unified integration concept to gaussians, this can lead to quite compact and efficient calculations (provided that one is willing to work with “super”-analogues of various concepts in classical linear algebra, such as the supertrace or superdeterminant).

Abstract integrals of the flavour of (6) arose in quantum field theory, when physicists sought to formally compute integrals of the form

$\displaystyle \int F( x_1, \ldots, x_n, \xi_1, \ldots, \xi_m )\ dx_1 \ldots dx_n d\xi_1 \ldots d\xi_m \ \ \ \ \ (14)$

where ${x_1,\ldots,x_n}$ are familiar commuting (or bosonic) variables (which, in particular, can often be localised to be scalar variables taking values in ${{\bf R}}$ or ${{\bf C}}$), while ${\xi_1,\ldots,\xi_m}$ were more exotic anticommuting (or fermionic) variables, taking values in some vector space of fermions. (As we shall see shortly, one can formalise these concepts by working in a supercommutative algebra.) The integrand ${F(x_1,\ldots,x_n,\xi_1,\ldots,\xi_m)}$ was a formally analytic function of ${x_1,\ldots,x_n,\xi_1,\ldots,\xi_m}$, in that it could be expanded as a (formal, noncommutative) power series in the variables ${x_1,\ldots,x_n,\xi_1,\ldots,\xi_m}$. For functions ${F(x_1,\ldots,x_n)}$ that depend only on bosonic variables, it is certainly possible for such analytic functions to be in the Schwartz class and thus fall under the scope of the classical integral, as discussed previously. However, functions ${F(\xi_1,\ldots,\xi_m)}$ that depend on fermionic variables ${\xi_1,\ldots,\xi_m}$ behave rather differently. Indeed, a fermonic variable ${\xi}$ must anticommute with itself, so that ${\xi^2 = 0}$. In particular, any power series in ${\xi}$ terminates after the linear term in ${\xi}$, so that a function ${F(\xi)}$ can only be analytic in ${\xi}$ if it is a polynomial of degree at most ${1}$ in ${\xi}$; more generally, an analytic function ${F(\xi_1,\ldots,\xi_m)}$ of ${m}$ fermionic variables ${\xi_1,\ldots,\xi_m}$ must be a polynomial of degree at most ${m}$, and an analytic function ${F(x_1,\ldots,x_n,\xi_1,\ldots,\xi_m)}$ of ${n}$ bosonic and ${m}$ fermionic variables can be Schwartz in the bosonic variables but will be polynomial in the fermonic variables. As such, to interpret the integral (14), one can use classical (Lebesgue) integration (or the variants discussed above for integrating Schwartz entire or Schwartz analytic functions) for the bosonic variables, but must use abstract integrals such as (6) for the fermonic variables, leading to the concept of Berezin integration mentioned earlier.

In this post I would like to set out some of the basic algebraic formalism of Berezin integration, particularly with regards to integration of gaussian-type expressions, and then show how this formalism can be used to perform computations involving GUE (for instance, one can compute the density of states of GUE by this machinery without recourse to the theory of orthogonal polynomials). The use of supersymmetric gaussian integrals to analyse ensembles such as GUE appears in the work of Efetov (and was also proposed in the slightly earlier works of Parisi-Sourlas and McKane, with a related approach also appearing in the work of Wegner); the material here is adapted from this survey of Mirlin, as well as the later papers of Disertori-Pinson-Spencer and of Disertori.

The determinant ${\det(A)}$ of a square matrix ${A}$ obeys a large number of important identities, the most basic of which is the multiplicativity property

$\displaystyle \det(AB) = \det(A) \det(B) \ \ \ \ \ (1)$

whenever ${A,B}$ are square matrices of the same dimension. This identity then generates many other important identities. For instance, if ${A}$ is an ${n \times m}$ matrix and ${B}$ is an ${m \times n}$ matrix, then by applying the previous identity to equate the determinants of ${\begin{pmatrix} 1 & -A \\ B & 1 \end{pmatrix} \begin{pmatrix} 1 & A \\ 0 & 1 \end{pmatrix}}$ and ${\begin{pmatrix} 1 & A \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & -A \\ B & 1 \end{pmatrix}}$ (where we will adopt the convention that ${1}$ denotes an identity matrix of whatever dimension is needed to make sense of the expressions being computed, and similarly for ${0}$) we obtain the Sylvester determinant identity

$\displaystyle \det( 1 + AB ) = \det( 1 + BA ). \ \ \ \ \ (2)$

This identity, which relates an ${n \times n}$ determinant with an ${m \times m}$ determinant, is very useful in random matrix theory (a point emphasised in particular by Deift), particularly in regimes in which ${m}$ is much smaller than ${n}$.

Another identity generated from (1) arises when trying to compute the determinant of a ${(n+m) \times (n+m)}$ block matrix

$\displaystyle \begin{pmatrix} A & B \\ C & D \end{pmatrix}$

where ${A}$ is an ${n \times n}$ matrix, ${B}$ is an ${n \times m}$ matrix, ${C}$ is an ${m \times n}$ matrix, and ${D}$ is an ${m \times m}$ matrix. If ${A}$ is invertible, then we can manipulate this matrix via block Gaussian elimination as

$\displaystyle \begin{pmatrix} A & B \\ C & D \end{pmatrix} = \begin{pmatrix} A & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & A^{-1} B \\ C & D \end{pmatrix}$

$\displaystyle = \begin{pmatrix} A & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ C & 1 \end{pmatrix} \begin{pmatrix} 1 & A^{-1} B \\ 0 & D - C A^{-1} B \end{pmatrix}$

and on taking determinants using (1) we obtain the Schur determinant identity

$\displaystyle \det \begin{pmatrix} A & B \\ C & D \end{pmatrix} = \det(A) \det( D - C A^{-1} B ) \ \ \ \ \ (3)$

relating the determinant of a block-diagonal matrix with the determinant of the Schur complement ${D-C A^{-1} B}$ of the upper left block ${A}$. This identity can be viewed as the correct way to generalise the ${2 \times 2}$ determinant formula

$\displaystyle \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad-bc = a ( d - c a^{-1} b).$

It is also possible to use determinant identities to deduce other matrix identities that do not involve the determinant, by the technique of matrix differentiation (or equivalently, matrix linearisation). The key observation is that near the identity, the determinant behaves like the trace, or more precisely one has

$\displaystyle \det( 1 + \epsilon A ) = 1 + \epsilon \hbox{tr}(A) + O(\epsilon^2) \ \ \ \ \ (4)$

for any bounded square matrix ${A}$ and infinitesimal ${\epsilon}$. (If one is uncomfortable with infinitesimals, one can interpret this sort of identity as an asymptotic as ${\epsilon\rightarrow 0}$.) Combining this with (1) we see that for square matrices ${A,B}$ of the same dimension with ${A}$ invertible and ${A^{-1}, B}$ invertible, one has

$\displaystyle \det( A + \epsilon B ) = \det(A) \det(1 + \epsilon A^{-1} B )$

$\displaystyle = \det(A) (1 + \epsilon \hbox{tr}( A^{-1} B ) + O(\epsilon^2) )$

for infinitesimal ${\epsilon}$. To put it another way, if ${A(t)}$ is a square matrix that depends in a differentiable fashion on a real parameter ${t}$, then

$\displaystyle \frac{d}{dt} \det(A(t)) = \det(A(t)) \hbox{tr}( A(t)^{-1} \frac{d}{dt} A(t) )$

whenever ${A(t)}$ is invertible. (Note that if one combines this identity with cofactor expansion, one recovers Cramer’s rule.)

Let us see some examples of this differentiation method. If we take the Sylvester identity (2) and multiply one of the rectangular matrices ${A}$ by an infinitesimal ${\epsilon}$, we obtain

$\displaystyle \det( 1 + \epsilon A B ) = \det( 1 + \epsilon B A);$

applying (4) and extracting the linear term in ${\epsilon}$ (or equivalently, differentiating at ${\epsilon}$ and then setting ${\epsilon=0}$) we conclude the cyclic property of trace:

$\displaystyle \hbox{tr}(AB) = \hbox{tr}(BA).$

To manipulate derivatives and inverses, we begin with the Neumann series approximation

$\displaystyle (1 + \epsilon A)^{-1} = 1 - \epsilon A + O(\epsilon^2)$

for bounded square ${A}$ and infinitesimal ${\epsilon}$, which then leads to the more general approximation

$\displaystyle (A + \epsilon B)^{-1} = (1 + \epsilon A^{-1} B)^{-1} A^{-1}$

$\displaystyle = A^{-1} - \epsilon A^{-1} B A^{-1} + O(\epsilon^2) \ \ \ \ \ (5)$

for square matrices ${A,B}$ of the same dimension with ${B, A^{-1}}$ bounded. To put it another way, we have

$\displaystyle \frac{d}{dt} A(t)^{-1} = -A(t)^{-1} (\frac{d}{dt} A(t)) A(t)^{-1}$

whenever ${A(t)}$ depends in a differentiable manner on ${t}$ and ${A(t)}$ is invertible.

We can then differentiate (or linearise) the Schur identity (3) in a number of ways. For instance, if we replace the lower block ${D}$ by ${D + \epsilon H}$ for some test ${m \times m}$ matrix ${H}$, then by (4), the left-hand side of (3) becomes (assuming the invertibility of the block matrix)

$\displaystyle (\det \begin{pmatrix} A & B \\ C & D \end{pmatrix}) (1 + \epsilon \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} \begin{pmatrix} 0 & 0 \\ 0 & H \end{pmatrix} + O(\epsilon^2) )$

while the right-hand side becomes

$\displaystyle \det(A) \det(D-CA^{-1}B) (1 + \epsilon \hbox{tr}( (D-CA^{-1}B)^{-1} H ) + O(\epsilon^2) );$

extracting the linear term in ${\epsilon}$ (after dividing through by (3)), we conclude that

$\displaystyle \hbox{tr} (\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} \begin{pmatrix} 0 & 0 \\ 0 & H \end{pmatrix}) = \hbox{tr}( (D-CA^{-1}B)^{-1} H ).$

As ${H}$ was an arbitrary ${m \times m}$ matrix, we conclude from duality that the lower right ${m \times m}$ block of ${\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1}}$ is given by the inverse ${(D-CA^{-1}B)^{-1}}$ of the Schur complement:

$\displaystyle \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \begin{pmatrix} ?? & ?? \\ ?? & (D-CA^{-1}B)^{-1} \end{pmatrix}.$

One can also compute the other components of this inverse in terms of the Schur complement ${D-CA^{-1} B}$ by a similar method (although the formulae become more complicated). As a variant of this method, we can perturb the block matrix in (3) by an infinitesimal multiple of the identity matrix giving

$\displaystyle \det \begin{pmatrix} A+\epsilon & B \\ C & D+\epsilon \end{pmatrix} = \det(A+\epsilon) \det( D +\epsilon - C (A+\epsilon)^{-1} B ). \ \ \ \ \ (6)$

By (4), the left-hand side is

$\displaystyle (\det \begin{pmatrix} A & B \\ C & D \end{pmatrix}) (1 + \epsilon \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} + O(\epsilon^2) ).$

From (5), we have

$\displaystyle D + \epsilon - C (A+ \epsilon)^{-1} B = D - C A^{-1} B + \epsilon(1 + C A^{-2} B) + O(\epsilon^2)$

and so from (4) the right-hand side of (6) is

$\displaystyle \det(A) \det(D-CA^{-1} B) \times$

$\displaystyle \times ( 1 + \epsilon (\hbox{tr}(A^{-1}) + \hbox{tr}( (D-CA^{-1} B)^{-1} (1 + C A^{-2} B)) ) + O(\epsilon^2) );$

extracting the linear component in ${\epsilon}$, we conclude the identity

$\displaystyle \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \hbox{tr}(A^{-1}) + \hbox{tr}( (D-CA^{-1} B)^{-1} (1 + C A^{-2} B)) \ \ \ \ \ (7)$

which relates the trace of the inverse of a block matrix, with the trace of the inverse of one of its blocks. This particular identity turns out to be useful in random matrix theory; I hope to elaborate on this in a later post.

As a final example of this method, we can analyse low rank perturbations ${A+BC}$ of a large (${n \times n}$) matrix ${A}$, where ${B}$ is an ${n \times m}$ matrix and ${C}$ is an ${m \times n}$ matrix for some ${m. (This type of situation is also common in random matrix theory, for instance it arose in this previous paper of mine on outliers to the circular law.) If ${A}$ is invertible, then from (1) and (2) one has the matrix determinant lemma

$\displaystyle \det( A + BC ) = \det(A) \det( 1 + A^{-1} BC) = \det(A) \det(1 + CA^{-1} B);$

if one then perturbs ${A}$ by an infinitesimal matrix ${\epsilon H}$, we have

$\displaystyle \det( A + BC + \epsilon H ) = \det(A + \epsilon H ) \det(1 + C(A+\epsilon H)^{-1} B).$

Extracting the linear component in ${\epsilon}$ as before, one soon arrives at

$\displaystyle \hbox{tr}( (A+BC)^{-1} H ) = \hbox{tr}( A^{-1} H ) - \hbox{tr}( (1 + C A^{-1} B)^{-1} C A^{-1} H A^{-1} B )$

assuming that ${A}$ and ${A+BC}$ are both invertible; as ${H}$ is arbitrary, we conclude (after using the cyclic property of trace) the Sherman-Morrison formula

$\displaystyle (A+BC)^{-1} = A^{-1} - A^{-1} B (1 + C A^{-1} B)^{-1} C A^{-1}$

for the inverse of a low rank perturbation ${A+BC}$ of a matrix ${A}$. While this identity can be easily verified by direct algebraic computation, it is somewhat difficult to discover this identity by such algebraic manipulation; thus we see that the “determinant first” approach to matrix identities can make it easier to find appropriate matrix identities (particularly those involving traces and/or inverses), even if the identities one is ultimately interested in do not involve determinants. (As differentiation typically makes an identity lengthier, but also more “linear” or “additive”, the determinant identity tends to be shorter (albeit more nonlinear and more multiplicative) than the differentiated identity, and can thus be slightly easier to derive.)

Exercise 1 Use the “determinant first” approach to derive the Woodbury matrix identity (also known as the binomial inverse theorem)

$\displaystyle (A+BDC)^{-1} = A^{-1} - A^{-1} B (D^{-1} + CA^{-1} B)^{-1} C A^{-1}$

where ${A}$ is an ${n \times n}$ matrix, ${B}$ is an ${n \times m}$ matrix, ${C}$ is an ${m \times n}$ matrix, and ${D}$ is an ${m \times m}$ matrix, assuming that ${A}$, ${D}$ and ${A+BDC}$ are all invertible.

Exercise 2 Let ${A,B}$ be invertible ${n \times n}$ matrices. Establish the identity

$\displaystyle \det(A + B) \det(A - B) = \det(B) \det( AB^{-1} A - B)$

and differentiate this in ${A}$ to deduce the identity

$\displaystyle (A+B)^{-1} + (A-B)^{-1} = 2 (A - BA^{-1} B)^{-1}$

(assuming that all inverses exist) and hence

$\displaystyle (A+B)^{-1} = (A - BA^{-1} B)^{-1} - (B - AB^{-1} A)^{-1}.$

Rotating ${B}$ by ${i}$ then gives

$\displaystyle (A+iB)^{-1} = (A + BA^{-1} B)^{-1} - i (B + AB^{-1} A)^{-1},$

which is useful for inverting a matrix ${A+iB}$ that has been split into a self-adjoint component ${A}$ and a skew-adjoint component ${iB}$.

Mathematicians study a variety of different mathematical structures, but perhaps the structures that are most commonly associated with mathematics are the number systems, such as the integers ${{\bf Z}}$ or the real numbers ${{\bf R}}$. Indeed, the use of number systems is so closely identified with the practice of mathematics that one sometimes forgets that it is possible to do mathematics without explicit reference to any concept of number. For instance, the ancient Greeks were able to prove many theorems in Euclidean geometry, well before the development of Cartesian coordinates and analytic geometry in the seventeenth century, or the formal constructions or axiomatisations of the real number system that emerged in the nineteenth century (not to mention precursor concepts such as zero or negative numbers, whose very existence was highly controversial, if entertained at all, to the ancient Greeks). To do this, the Greeks used geometric operations as substitutes for the arithmetic operations that would be more familiar to modern mathematicians. For instance, concatenation of line segments or planar regions serves as a substitute for addition; the operation of forming a rectangle out of two line segments would serve as a substitute for multiplication; the concept of similarity can be used as a substitute for ratios or division; and so forth.

A similar situation exists in modern physics. Physical quantities such as length, mass, momentum, charge, and so forth are routinely measured and manipulated using the real number system ${{\bf R}}$ (or related systems, such as ${{\bf R}^3}$ if one wishes to measure a vector-valued physical quantity such as velocity). Much as analytic geometry allows one to use the laws of algebra and trigonometry to calculate and prove theorems in geometry, the identification of physical quantities with numbers allows one to express physical laws and relationships (such as Einstein’s famous mass-energy equivalence ${E=mc^2}$) as algebraic (or differential) equations, which can then be solved and otherwise manipulated through the extensive mathematical toolbox that has been developed over the centuries to deal with such equations.

However, as any student of physics is aware, most physical quantities are not represented purely by one or more numbers, but instead by a combination of a number and some sort of unit. For instance, it would be a category error to assert that the length of some object was a number such as ${10}$; instead, one has to say something like “the length of this object is ${10}$ yards”, combining both a number ${10}$ and a unit (in this case, the yard). Changing the unit leads to a change in the numerical value assigned to this physical quantity, even though no physical change to the object being measured has occurred. For instance, if one decides to use feet as the unit of length instead of yards, then the length of the object is now ${30}$ feet; if one instead uses metres, the length is now ${9.144}$ metres; and so forth. But nothing physical has changed when performing this change of units, and these lengths are considered all equal to each other:

$\displaystyle 10 \hbox{ yards } = 30 \hbox{ feet } = 9.144 \hbox{ metres}.$

It is then common to declare that while physical quantities and units are not, strictly speaking, numbers, they should be manipulated using the laws of algebra as if they were numerical quantities. For instance, if an object travels ${10}$ metres in ${5}$ seconds, then its speed should be

$\displaystyle (10 m) / (5 s) = 2 ms^{-1}$

where we use the usual abbreviations of ${m}$ and ${s}$ for metres and seconds respectively. Similarly, if the speed of light ${c}$ is ${c=299 792 458 ms^{-1}}$ and an object has mass ${10 kg}$, then Einstein’s mass-energy equivalence ${E=mc^2}$ then tells us that the energy-content of this object is

$\displaystyle (10 kg) (299 792 458 ms^{-1})^2 \approx 8.99 \times 10^{17} kg m^2 s^{-2}.$

Note that the symbols ${kg, m, s}$ are being manipulated algebraically as if they were mathematical variables such as ${x}$ and ${y}$. By collecting all these units together, we see that every physical quantity gets assigned a unit of a certain dimension: for instance, we see here that the energy ${E}$ of an object can be given the unit of ${kg m^2 s^{-2}}$ (more commonly known as a Joule), which has the dimension of ${M L^2 T^{-2}}$ where ${M, L, T}$ are the dimensions of mass, length, and time respectively.

There is however one important limitation to the ability to manipulate “dimensionful” quantities as if they were numbers: one is not supposed to add, subtract, or compare two physical quantities if they have different dimensions, although it is acceptable to multiply or divide two such quantities. For instance, if ${m}$ is a mass (having the units ${M}$) and ${v}$ is a speed (having the units ${LT^{-1}}$), then it is physically “legitimate” to form an expression such as ${\frac{1}{2} mv^2}$, but not an expression such as ${m+v}$ or ${m-v}$; in a similar spirit, statements such as ${m=v}$ or ${m\geq v}$ are physically meaningless. This combines well with the mathematical distinction between vector, scalar, and matrix quantities, which among other things prohibits one from adding together two such quantities if their vector or matrix type are different (e.g. one cannot add a scalar to a vector, or a vector to a matrix), and also places limitations on when two such quantities can be multiplied together. A related limitation, which is not always made explicit in physics texts, is that transcendental mathematical functions such as ${\sin}$ or ${\exp}$ should only be applied to arguments that are dimensionless; thus, for instance, if ${v}$ is a speed, then ${\hbox{arctanh}(v)}$ is not physically meaningful, but ${\hbox{arctanh}(v/c)}$ is (this particular quantity is known as the rapidity associated to this speed).

These limitations may seem like a weakness in the mathematical modeling of physical quantities; one may think that one could get a more “powerful” mathematical framework if one were allowed to perform dimensionally inconsistent operations, such as add together a mass and a velocity, add together a vector and a scalar, exponentiate a length, etc. Certainly there is some precedent for this in mathematics; for instance, the formalism of Clifford algebras does in fact allow one to (among other things) add vectors with scalars, and in differential geometry it is quite common to formally apply transcendental functions (such as the exponential function) to a differential form (for instance, the Liouville measure ${\frac{1}{n!} \omega^n}$ of a symplectic manifold can be usefully thought of as a component of the exponential ${\exp(\omega)}$ of the symplectic form ${\omega}$).

However, there are several reasons why it is advantageous to retain the limitation to only perform dimensionally consistent operations. One is that of error correction: one can often catch (and correct for) errors in one’s calculations by discovering a dimensional inconsistency, and tracing it back to the first step where it occurs. Also, by performing dimensional analysis, one can often identify the form of a physical law before one has fully derived it. For instance, if one postulates the existence of a mass-energy relationship involving only the mass of an object ${m}$, the energy content ${E}$, and the speed of light ${c}$, dimensional analysis is already sufficient to deduce that the relationship must be of the form ${E = \alpha mc^2}$ for some dimensionless absolute constant ${\alpha}$; the only remaining task is then to work out the constant of proportionality ${\alpha}$, which requires physical arguments beyond that provided by dimensional analysis. (This is a simple instance of a more general application of dimensional analysis known as the Buckingham ${\pi}$ theorem.)

The use of units and dimensional analysis has certainly been proven to be very effective tools in physics. But one can pose the question of whether it has a properly grounded mathematical foundation, in order to settle any lingering unease about using such tools in physics, and also in order to rigorously develop such tools for purely mathematical purposes (such as analysing identities and inequalities in such fields of mathematics as harmonic analysis or partial differential equations).

The example of Euclidean geometry mentioned previously offers one possible approach to formalising the use of dimensions. For instance, one could model the length of a line segment not by a number, but rather by the equivalence class of all line segments congruent to the original line segment (cf. the Frege-Russell definition of a number). Similarly, the area of a planar region can be modeled not by a number, but by the equivalence class of all regions that are equidecomposable with the original region (one can, if one wishes, restrict attention here to measurable sets in order to avoid Banach-Tarski-type paradoxes, though that particular paradox actually only arises in three and higher dimensions). As mentioned before, it is then geometrically natural to multiply two lengths to form an area, by taking a rectangle whose line segments have the stated lengths, and using the area of that rectangle as a product. This geometric picture works well for units such as length and volume that have a spatial geometric interpretation, but it is less clear how to apply it for more general units. For instance, it does not seem geometrically natural (or, for that matter, conceptually helpful) to envision the equation ${E=mc^2}$ as the assertion that the energy ${E}$ is the volume of a rectangular box whose height is the mass ${m}$ and whose length and width is given by the speed of light ${c}$.

But there are at least two other ways to formalise dimensionful quantities in mathematics, which I will discuss below the fold. The first is a “parametric” model in which dimensionful objects are modeled as numbers (or vectors, matrices, etc.) depending on some base dimensional parameters (such as units of length, mass, and time, or perhaps a coordinate system for space or spacetime), and transforming according to some representation of a structure group that encodes the range of these parameters; this type of “coordinate-heavy” model is often used (either implicitly or explicitly) by physicists in order to efficiently perform calculations, particularly when manipulating vector or tensor-valued quantities. The second is an “abstract” model in which dimensionful objects now live in an abstract mathematical space (e.g. an abstract vector space), in which only a subset of the operations available to general-purpose number systems such as ${{\bf R}}$ or ${{\bf R}^3}$ are available, namely those operations which are “dimensionally consistent” or invariant (or more precisely, equivariant) with respect to the action of the underlying structure group. This sort of “coordinate-free” approach tends to be the one which is preferred by pure mathematicians, particularly in the various branches of modern geometry, in part because it can lead to greater conceptual clarity, as well as results of great generality; it is also close to the more informal practice of treating mathematical manipulations that do not preserve dimensional consistency as being physically meaningless.

Given a function ${f: X \rightarrow Y}$ between two sets ${X, Y}$, we can form the graph

$\displaystyle \Sigma := \{ (x,f(x)): x\in X \},$

which is a subset of the Cartesian product ${X \times Y}$.

There are a number of “closed graph theorems” in mathematics which relate the regularity properties of the function ${f}$ with the closure properties of the graph ${\Sigma}$, assuming some “completeness” properties of the domain ${X}$ and range ${Y}$. The most famous of these is the closed graph theorem from functional analysis, which I phrase as follows:

Theorem 1 (Closed graph theorem (functional analysis)) Let ${X, Y}$ be complete normed vector spaces over the reals (i.e. Banach spaces). Then a function ${f: X \rightarrow Y}$ is a continuous linear transformation if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is both linearly closed (i.e. it is a linear subspace of ${X \times Y}$) and topologically closed (i.e. closed in the product topology of ${X \times Y}$).

I like to think of this theorem as linking together qualitative and quantitative notions of regularity preservation properties of an operator ${f}$; see this blog post for further discussion.

The theorem is equivalent to the assertion that any continuous linear bijection ${f: X \rightarrow Y}$ from one Banach space to another is necessarily an isomorphism in the sense that the inverse map is also continuous and linear. Indeed, to see that this claim implies the closed graph theorem, one applies it to the projection from ${\Sigma}$ to ${X}$, which is a continuous linear bijection; conversely, to deduce this claim from the closed graph theorem, observe that the graph of the inverse ${f^{-1}}$ is the reflection of the graph of ${f}$. As such, the closed graph theorem is a corollary of the open mapping theorem, which asserts that any continuous linear surjection from one Banach space to another is open. (Conversely, one can deduce the open mapping theorem from the closed graph theorem by quotienting out the kernel of the continuous surjection to get a bijection.)

It turns out that there is a closed graph theorem (or equivalent reformulations of that theorem, such as an assertion that bijective morphisms between sufficiently “complete” objects are necessarily isomorphisms, or as an open mapping theorem) in many other categories in mathematics as well. Here are some easy ones:

Theorem 2 (Closed graph theorem (linear algebra)) Let ${X, Y}$ be vector spaces over a field ${k}$. Then a function ${f: X \rightarrow Y}$ is a linear transformation if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is linearly closed.

Theorem 3 (Closed graph theorem (group theory)) Let ${X, Y}$ be groups. Then a function ${f: X \rightarrow Y}$ is a group homomorphism if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is closed under the group operations (i.e. it is a subgroup of ${X \times Y}$).

Theorem 4 (Closed graph theorem (order theory)) Let ${X, Y}$ be totally ordered sets. Then a function ${f: X \rightarrow Y}$ is monotone increasing if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is totally ordered (using the product order on ${X \times Y}$).

Remark 1 Similar results to the above three theorems (with similarly easy proofs) hold for other algebraic structures, such as rings (using the usual product of rings), modules, algebras, or Lie algebras, groupoids, or even categories (a map between categories is a functor iff its graph is again a category). (ADDED IN VIEW OF COMMENTS: further examples include affine spaces and ${G}$-sets (sets with an action of a given group ${G}$).) There are also various approximate versions of this theorem that are useful in arithmetic combinatorics, that relate the property of a map ${f}$ being an “approximate homomorphism” in some sense with its graph being an “approximate group” in some sense. This is particularly useful for this subfield of mathematics because there are currently more theorems about approximate groups than about approximate homomorphisms, so that one can profitably use closed graph theorems to transfer results about the former to results about the latter.

A slightly more sophisticated result in the same vein:

Theorem 5 (Closed graph theorem (point set topology)) Let ${X, Y}$ be compact Hausdorff spaces. Then a function ${f: X \rightarrow Y}$ is continuous if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is topologically closed.

Indeed, the “only if” direction is easy, while for the “if” direction, note that if ${\Sigma}$ is a closed subset of ${X \times Y}$, then it is compact Hausdorff, and the projection map from ${\Sigma}$ to ${X}$ is then a bijective continuous map between compact Hausdorff spaces, which is then closed, thus open, and hence a homeomorphism, giving the claim.

Note that the compactness hypothesis is necessary: for instance, the function ${f: {\bf R} \rightarrow {\bf R}}$ defined by ${f(x) := 1/x}$ for ${x \neq 0}$ and ${f(0)=0}$ for ${x=0}$ is a function which has a closed graph, but is discontinuous.

A similar result (but relying on a much deeper theorem) is available in algebraic geometry, as I learned after asking this MathOverflow question:

Theorem 6 (Closed graph theorem (algebraic geometry)) Let ${X, Y}$ be normal projective varieties over an algebraically closed field ${k}$ of characteristic zero. Then a function ${f: X \rightarrow Y}$ is a regular map if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is Zariski-closed.

Proof: (Sketch) For the only if direction, note that the map ${x \mapsto (x,f(x))}$ is a regular map from the projective variety ${X}$ to the projective variety ${X \times Y}$ and is thus a projective morphism, hence is proper. In particular, the image ${\Sigma}$ of ${X}$ under this map is Zariski-closed.

Conversely, if ${\Sigma}$ is Zariski-closed, then it is also a projective variety, and the projection ${(x,y) \mapsto x}$ is a projective morphism from ${\Sigma}$ to ${X}$, which is clearly quasi-finite; by the characteristic zero hypothesis, it is also separated. Applying (Grothendieck’s form of) Zariski’s main theorem, this projection is the composition of an open immersion and a finite map. As projective varieties are complete, the open immersion is an isomorphism, and so the projection from ${\Sigma}$ to ${X}$ is finite. Being injective and separable, the degree of this finite map must be one, and hence ${k(\Sigma)}$ and ${k(X)}$ are isomorphic, hence (by normality of ${X}$) ${k[\Sigma]}$ is contained in (the image of) ${k[X]}$, which makes the map from ${X}$ to ${\Sigma}$ regular, which makes ${f}$ regular. $\Box$

The counterexample of the map ${f: k \rightarrow k}$ given by ${f(x) := 1/x}$ for ${x \neq 0}$ and ${f(0) := 0}$ demonstrates why the projective hypothesis is necessary. The necessity of the normality condition (or more precisely, a weak normality condition) is demonstrated by (the projective version of) the map ${(t^2,t^3) \mapsto t}$ from the cusipdal curve ${\{ (t^2,t^3): t \in k \}}$ to ${k}$. (If one restricts attention to smooth varieties, though, normality becomes automatic.) The necessity of characteristic zero is demonstrated by (the projective version of) the inverse of the Frobenius map ${x \mapsto x^p}$ on a field ${k}$ of characteristic ${p}$.

There are also a number of closed graph theorems for topological groups, of which the following is typical (see Exercise 3 of these previous blog notes):

Theorem 7 (Closed graph theorem (topological group theory)) Let ${X, Y}$ be ${\sigma}$-compact, locally compact Hausdorff groups. Then a function ${X \rightarrow Y}$ is a continuous homomorphism if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is both group-theoretically closed and topologically closed.

The hypotheses of being ${\sigma}$-compact, locally compact, and Hausdorff can be relaxed somewhat, but I doubt that they can be eliminated entirely (though I do not have a ready counterexample for this).

In several complex variables, it is a classical theorem (see e.g. Lemma 4 of this blog post) that a holomorphic function from a domain in ${{\bf C}^n}$ to ${{\bf C}^n}$ is locally injective if and only if it is a local diffeomorphism (i.e. its derivative is everywhere non-singular). This leads to a closed graph theorem for complex manifolds:

Theorem 8 (Closed graph theorem (complex manifolds)) Let ${X, Y}$ be complex manifolds. Then a function ${f: X \rightarrow Y}$ is holomorphic if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is a complex manifold (using the complex structure inherited from ${X \times Y}$) of the same dimension as ${X}$.

Indeed, one applies the previous observation to the projection from ${\Sigma}$ to ${X}$. The dimension requirement is needed, as can be seen from the example of the map ${f: {\bf C} \rightarrow {\bf C}}$ defined by ${f(z) =1/z}$ for ${z \neq 0}$ and ${f(0)=0}$.

(ADDED LATER:) There is a real analogue to the above theorem:

Theorem 9 (Closed graph theorem (real manifolds)) Let ${X, Y}$ be real manifolds. Then a function ${f: X \rightarrow Y}$ is continuous if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is a real manifold of the same dimension as ${X}$.

This theorem can be proven by applying invariance of domain (discussed in this previous post) to the projection of ${\Sigma}$ to ${X}$, to show that it is open if ${\Sigma}$ has the same dimension as ${X}$.

Note though that the analogous claim for smooth real manifolds fails: the function ${f: {\bf R} \rightarrow {\bf R}}$ defined by ${f(x) := x^{1/3}}$ has a smooth graph, but is not itself smooth.

(ADDED YET LATER:) Here is an easy closed graph theorem in the symplectic category:

Theorem 10 (Closed graph theorem (symplectic geometry)) Let ${X = (X,\omega_X)}$ and ${Y = (Y,\omega_Y)}$ be smooth symplectic manifolds of the same dimension. Then a smooth map ${f: X \rightarrow Y}$ is a symplectic morphism (i.e. ${f^* \omega_Y = \omega_X}$) if and only if the graph ${\Sigma := \{(x,f(x)): x \in X \}}$ is a Lagrangian submanifold of ${X \times Y}$ with the symplectic form ${\omega_X \oplus -\omega_Y}$.

In view of the symplectic rigidity phenomenon, it is likely that the smoothness hypotheses on ${f,X,Y}$ can be relaxed substantially, but I will not try to formulate such a result here.

There are presumably many further examples of closed graph theorems (or closely related theorems, such as criteria for inverting a morphism, or open mapping type theorems) throughout mathematics; I would be interested to know of further examples.

$\Box$

Let ${{\mathfrak g}}$ be a finite-dimensional Lie algebra (over the reals). Given two sufficiently small elements ${x, y}$ of ${{\mathfrak g}}$, define the right Baker-Campbell-Hausdorff-Dynkin law

$\displaystyle R_y(x) := x + \int_0^1 F_R( \hbox{Ad}_x \hbox{Ad}_{ty} ) y \ dt \ \ \ \ \ (1)$

where ${\hbox{Ad}_x := \exp(\hbox{ad}_x)}$, ${\hbox{ad}_x: {\mathfrak g} \rightarrow {\mathfrak g}}$ is the adjoint map ${\hbox{ad}_x(y) := [x,y]}$, and ${F_R}$ is the function ${F_R(z) := \frac{z \log z}{z-1}}$, which is analytic for ${z}$ near ${1}$. Similarly, define the left Baker-Campbell-Hausdorff-Dynkin law

$\displaystyle L_x(y) := y + \int_0^1 F_L( \hbox{Ad}_{tx} \hbox{Ad}_y ) x\ dt \ \ \ \ \ (2)$

where ${F_L(z) := \frac{\log z}{z-1}}$. One easily verifies that these expressions are well-defined (and depend smoothly on ${x}$ and ${y}$) when ${x}$ and ${y}$ are sufficiently small.

We have the famous Baker-Campbell-Hausdoff-Dynkin formula:

Theorem 1 (BCH formula) Let ${G}$ be a finite-dimensional Lie group over the reals with Lie algebra ${{\mathfrak g}}$. Let ${\log}$ be a local inverse of the exponential map ${\exp: {\mathfrak g} \rightarrow G}$, defined in a neighbourhood of the identity. Then for sufficiently small ${x, y \in {\mathfrak g}}$, one has

$\displaystyle \log( \exp(x) \exp(y) ) = R_y(x) = L_x(y).$

See for instance these notes of mine for a proof of this formula (it is for ${R_y}$, but one easily obtains a similar proof for ${L_x}$).

In particular, one can give a neighbourhood of the identity in ${{\mathfrak g}}$ the structure of a local Lie group by defining the group operation ${\ast}$ as

$\displaystyle x \ast y := R_y(x) = L_x(y) \ \ \ \ \ (3)$

for sufficiently small ${x, y}$, and the inverse operation by ${x^{-1} := -x}$ (one easily verifies that ${R_x(-x) = L_x(-x) = 0}$ for all small ${x}$).

It is tempting to reverse the BCH formula and conclude (the local form of) Lie’s third theorem, that every finite-dimensional Lie algebra is isomorphic to the Lie algebra of some local Lie group, by using (3) to define a smooth local group structure on a neighbourhood of the identity. (See this previous post for a definition of a local Lie group.) The main difficulty in doing so is in verifying that the definition (3) is well-defined (i.e. that ${R_y(x)}$ is always equal to ${L_x(y)}$) and locally associative. The well-definedness issue can be trivially disposed of by using just one of the expressions ${R_y(x)}$ or ${L_x(y)}$ as the definition of ${\ast}$ (though, as we shall see, it will be very convenient to use both of them simultaneously). However, the associativity is not obvious at all.

With the assistance of Ado’s theorem, which places ${{\mathfrak g}}$ inside the general linear Lie algebra ${\mathfrak{gl}_n({\bf R})}$ for some ${n}$, one can deduce both the well-definedness and associativity of (3) from the Baker-Campbell-Hausdorff formula for ${\mathfrak{gl}_n({\bf R})}$. However, Ado’s theorem is rather difficult to prove (see for instance this previous blog post for a proof), and it is natural to ask whether there is a way to establish these facts without Ado’s theorem.

After playing around with this for some time, I managed to extract a direct proof of well-definedness and local associativity of (3), giving a proof of Lie’s third theorem independent of Ado’s theorem. This is not a new result by any means, (indeed, the original proofs of Lie and Cartan of Lie’s third theorem did not use Ado’s theorem), but I found it an instructive exercise to work out the details, and so I am putting it up on this blog in case anyone else is interested (and also because I want to be able to find the argument again if I ever need it in the future).

Jordan’s theorem is a basic theorem in the theory of finite linear groups, and can be formulated as follows:

Theorem 1 (Jordan’s theorem) Let ${G}$ be a finite subgroup of the general linear group ${GL_d({\bf C})}$. Then there is an abelian subgroup ${G'}$ of ${G}$ of index ${[G:G'] \leq C_d}$, where ${C_d}$ depends only on ${d}$.

Informally, Jordan’s theorem asserts that finite linear groups over the complex numbers are almost abelian. The theorem can be extended to other fields of characteristic zero, and also to fields of positive characteristic so long as the characteristic does not divide the order of ${G}$, but we will not consider these generalisations here. A proof of this theorem can be found for instance in these lecture notes of mine.

I recently learned (from this comment of Kevin Ventullo) that the finiteness hypothesis on the group ${G}$ in this theorem can be relaxed to the significantly weaker condition of periodicity. Recall that a group ${G}$ is periodic if all elements are of finite order. Jordan’s theorem with “finite” replaced by “periodic” is known as the Jordan-Schur theorem.

The Jordan-Schur theorem can be quickly deduced from Jordan’s theorem, and the following result of Schur:

Theorem 2 (Schur’s theorem) Every finitely generated periodic subgroup of a general linear group ${GL_d({\bf C})}$ is finite. (Equivalently, every periodic linear group is locally finite.)

Remark 1 The question of whether all finitely generated periodic subgroups (not necessarily linear in nature) were finite was known as the Burnside problem; the answer was shown to be negative by Golod and Shafarevich in 1964.

Let us see how Jordan’s theorem and Schur’s theorem combine via a compactness argument to form the Jordan-Schur theorem. Let ${G}$ be a periodic subgroup of ${GL_d({\bf C})}$. Then for every finite subset ${S}$ of ${G}$, the group ${G_S}$ generated by ${S}$ is finite by Theorem 2. Applying Jordan’s theorem, ${G_S}$ contains an abelian subgroup ${G'_S}$ of index at most ${C_d}$.

In particular, given any finite number ${S_1,\ldots,S_m}$ of finite subsets of ${G}$, we can find abelian subgroups ${G'_{S_1},\ldots,G'_{S_m}}$ of ${G_{S_1},\ldots,G_{S_m}}$ respectively such that each ${G'_{S_j}}$ has index at most ${C_d}$ in ${G_{S_j}}$. We claim that we may furthermore impose the compatibility condition ${G'_{S_i} = G'_{S_j} \cap G_{S_i}}$ whenever ${S_i \subset S_j}$. To see this, we set ${S := S_1 \cup \ldots \cup S_m}$, locate an abelian subgroup ${G'_S}$ of ${G_S}$ of index at most ${C_d}$, and then set ${G'_{S_i} := G'_S \cap G_{S_i}}$. As ${G_S}$ is covered by at most ${C_d}$ cosets of ${G'_S}$, we see that ${G_{S_i}}$ is covered by at most ${C_d}$ cosets of ${G'_{S_i}}$, and the claim follows.

Note that for each ${S}$, the set of possible ${G'_S}$ is finite, and so the product space of all configurations ${(G'_S)_{S \subset G}}$, as ${S}$ ranges over finite subsets of ${G}$, is compact by Tychonoff’s theorem. Using the finite intersection property, we may thus locate a subgroup ${G'_S}$ of ${G_S}$ of index at most ${C_d}$ for all finite subsets ${S}$ of ${G}$, obeying the compatibility condition ${G'_T = G'_S \cap G_T}$ whenever ${T \subset S}$. If we then set ${G' := \bigcup_S G'_S}$, where ${S}$ ranges over all finite subsets of ${G}$, we then easily verify that ${G'}$ is abelian and has index at most ${C_d}$ in ${G}$, as required.

Below I record a proof of Schur’s theorem, which I extracted from this book of Wehrfritz. This was primarily an exercise for my own benefit, but perhaps it may be of interest to some other readers.

Let ${G}$ be a Lie group with Lie algebra ${{\mathfrak g}}$. As is well known, the exponential map ${\exp: {\mathfrak g} \rightarrow G}$ is a local homeomorphism near the identity. As such, the group law on ${G}$ can be locally pulled back to an operation ${*: U \times U \rightarrow {\mathfrak g}}$ defined on a neighbourhood ${U}$ of the identity in ${G}$, defined as

$\displaystyle x * y := \log( \exp(x) \exp(y) )$

where ${\log}$ is the local inverse of the exponential map. One can view ${*}$ as the group law expressed in local exponential coordinates around the origin.

An asymptotic expansion for ${x*y}$ is provided by the Baker-Campbell-Hausdorff (BCH) formula

$\displaystyle x*y = x+y+ \frac{1}{2} [x,y] + \frac{1}{12}[x,[x,y]] - \frac{1}{12}[y,[x,y]] + \ldots$

for all sufficiently small ${x,y}$, where ${[,]: {\mathfrak g} \times {\mathfrak g} \rightarrow {\mathfrak g}}$ is the Lie bracket. More explicitly, one has the Baker-Campbell-Hausdorff-Dynkin formula

$\displaystyle x * y = x + \int_0^1 F( \hbox{Ad}_x \hbox{Ad}_{ty} ) y\ dt \ \ \ \ \ (1)$

for all sufficiently small ${x,y}$, where ${\hbox{Ad}_x = \exp( \hbox{ad}_x )}$, ${\hbox{ad}_x: {\bf R}^d \rightarrow {\bf R}^d}$ is the adjoint representation ${\hbox{ad}_x(y) := [x,y]}$, and ${F}$ is the function

$\displaystyle F( t ) := \frac{t \log t}{t-1}$

which is real analytic near ${t=1}$ and can thus be applied to linear operators sufficiently close to the identity. One corollary of this is that the multiplication operation ${*}$ is real analytic in local coordinates, and so every smooth Lie group is in fact a real analytic Lie group.

It turns out that one does not need the full force of the smoothness hypothesis to obtain these conclusions. It is, for instance, a classical result that ${C^2}$ regularity of the group operations is already enough to obtain the Baker-Campbell-Hausdorff formula. Actually, it turns out that we can weaken this a bit, and show that even ${C^{1,1}}$ regularity (i.e. that the group operations are continuously differentiable, and the derivatives are locally Lipschitz) is enough to make the classical derivation of the Baker-Campbell-Hausdorff formula work. More precisely, we have

Theorem 1 (${C^{1,1}}$ Baker-Campbell-Hausdorff formula) Let ${{\bf R}^d}$ be a finite-dimensional vector space, and suppose one has a continuous operation ${*: U \times U \rightarrow {\bf R}^d}$ defined on a neighbourhood ${U}$ around the origin, which obeys the following three axioms:

• (Approximate additivity) For ${x,y}$ sufficiently close to the origin, one has

$\displaystyle x*y = x+y+O(|x| |y|). \ \ \ \ \ (2)$

(In particular, ${0*x=x*0=x}$ for ${x}$ sufficiently close to the origin.)

• (Associativity) For ${x,y,z}$ sufficiently close to the origin, ${(x*y)*z = x*(y*z)}$.
• (Radial homogeneity) For ${x}$ sufficiently close to the origin, one has

$\displaystyle (sx) * (tx) = (s+t)x \ \ \ \ \ (3)$

for all ${s,t \in [-1,1]}$. (In particular, ${x * (-x) = (-x) * x = 0}$ for all ${x}$ sufficiently close to the origin.)

Then ${*}$ is real analytic (and in particular, smooth) near the origin. (In particular, ${*}$ gives a neighbourhood of the origin the structure of a local Lie group.)

Indeed, we will recover the Baker-Campbell-Hausdorff-Dynkin formula (after defining ${\hbox{Ad}_x}$ appropriately) in this setting; see below the fold.

The reason that we call this a ${C^{1,1}}$ Baker-Campbell-Hausdorff formula is that if the group operation ${*}$ has ${C^{1,1}}$ regularity, and has ${0}$ as an identity element, then Taylor expansion already gives (2), and in exponential coordinates (which, as it turns out, can be defined without much difficulty in the ${C^{1,1}}$ category) one automatically has (3).

We will record the proof of Theorem 1 below the fold; it largely follows the classical derivation of the BCH formula, but due to the low regularity one will rely on tools such as telescoping series and Riemann sums rather than on the fundamental theorem of calculus. As an application of this theorem, we can give an alternate derivation of one of the components of the solution to Hilbert’s fifth problem, namely the construction of a Lie group structure from a Gleason metric, which was covered in the previous post; we discuss this at the end of this article. With this approach, one can avoid any appeal to von Neumann’s theorem and Cartan’s theorem (discussed in this post), or the Kuranishi-Gleason extension theorem (discussed in this post).