In mathematics, one frequently starts with some space {X} and wishes to extend it to a larger space {Y}. Generally speaking, there are two ways in which one can extend a space {X}:

  • By embedding {X} into a space {Y} that has {X} (or at least an isomorphic copy of {X}) as a subspace.
  • By covering {X} by a space {Y} that has {X} (or an isomorphic copy thereof) as a quotient.

For many important categories of interest (such as abelian categories), the former type of extension can be represented by the exact sequence,

\displaystyle  0 \rightarrow X \rightarrow Y

and the latter type of extension be represented by the exact sequence

\displaystyle  Y \rightarrow X \rightarrow 0.

In some cases, {X} can be both embedded in, and covered by, {Y}, in a consistent fashion; in such cases we sometimes say that the above exact sequences split.

An analogy would be to that of digital images. When a computer represents an image, it is limited both by the scope of the image (what it is picturing), and by the resolution of an image (how much physical space is represented by a given pixel). To make the image “larger”, one could either embed the image in an image of larger scope but equal resolution (e.g. embedding a picture of a {200 \times 200} pixel image of person’s face into a {800 \times 800} pixel image that covers a region of space that is four times larger in both dimensions, e.g. the person’s upper body) or cover the image with an image of higher resolution but of equal scope (e.g. enhancing a {200 \times 200} pixel picture of a face to a {800 \times 800} pixel of the same face). In the former case, the original image is a sub-image (or cropped image) of the extension, but in the latter case the original image is a quotient (or a pixelation) of the extension. In the former case, each pixel in the original image can be identified with a pixel in the extension, but not every pixel in the extension is covered. In the latter case, every pixel in the original image is covered by several pixels in the extension, but the pixel in the original image is not canonically identified with any particular pixel in the extension that covers it; it “loses its identity” by dispersing into higher resolution pixels.

(Note that “zooming in” the visual representation of an image by making each pixel occupy a larger region of the screen neither increases the scope or the resolution; in this language, a zoomed-in version of an image is merely an isomorphic copy of the original image; it carries the same amount of information as the original image, but has been represented in a new coordinate system which may make it easier to view, especially to the visually impaired.)

In the study of a given category of spaces (e.g. topological spaces, manifolds, groups, fields, etc.), embedding and coverings are both important; this is particularly true in the more topological areas of mathematics, such as manifold theory. But typically, the term extension is reserved for just one of these two operations. For instance, in the category of fields, coverings are quite trivial; if one covers a field {k} by a field {l}, the kernel of the covering map {\pi: l \rightarrow k} is necessarily trivial and so {k, l} are in fact isomorphic. So in field theory, a field extension refers to an embedding of a field, rather than a covering of a field. Similarly, in the theory of metric spaces, there are no non-trivial isometric coverings of a metric space, and so the only useful notion of an extension of a metric space is the one given by embedding the original space in the extension.

On the other hand, in group theory (and in group-like theories, such as the theory of dynamical systems, which studies group actions), the term “extension” is reserved for coverings, rather than for embeddings. I think one of the main reasons for this is that coverings of groups automatically generate a special type of embedding (a normal embedding), whereas most embeddings don’t generate coverings. More precisely, given a group extension {G} of a base group {H},

\displaystyle  G \rightarrow H \rightarrow 0,

one can form the kernel {K = \hbox{ker}(\phi)} of the covering map {\pi: G \rightarrow H}, which is a normal subgroup of {G}, and we thus can extend the above sequence canonically to a short exact sequence

\displaystyle  0 \rightarrow K \rightarrow G \rightarrow H \rightarrow 0.

On the other hand, an embedding of {K} into {G},

\displaystyle  0 \rightarrow K \rightarrow G

does not similarly extend to a short exact sequence unless the the embedding is normal.

Another reason for the notion of extension varying between embeddings and coverings from subject to subject is that there are various natural duality operations (and more generally, contravariant functors) which turn embeddings into coverings and vice versa. For instance, an embedding of one vector space {V} into another {W} induces a covering of the dual space {V^*} by the dual space {W^*}, and conversely; similarly, an embedding of a locally compact abelian group {H} in another {G} induces a covering of the Pontryagin dual {\hat H} by the Pontryagin dual {\hat G}. In the language of images, embedding an image in an image of larger scope is largely equivalent to covering the Fourier transform of that image by a transform of higher resolution, and conversely; this is ultimately a manifestation of the basic fact that frequency is inversely proportional to wavelength.

Similarly, a common duality operation arises in many areas of mathematics by starting with a space {X} and then considering a space {C(X)} of functions on that space (e.g. continuous real-valued functions, if {X} was a topological space, or in more algebraic settings one could consider homomorphisms from {X} to some fixed space). Embedding {X} into {Y} then induces a covering of {C(X)} by {C(Y)}, and conversely, a covering of {X} by {Y} induces an embedding of {C(X)} into {C(Y)}. Returning again to the analogy with images, if one looks at the collection of all images of a fixed scope and resolution, rather than just a single image, then increasing the available resolution causes an embedding of the space of low-resolution images into the space of high-resolution images (since of course every low-resolution image is an example of a high-resolution image), whereas increasing the available scope causes a covering of the space of narrow-scope images by the space of wide-scope images (since every wide-scope image can be cropped into a narrow-scope image). Note in the case of images, that these extensions can be split: not only can a low-resolution image be viewed as a special case of a high-resolution image, but any high-resolution image can be pixelated into a low-resolution one. Similarly, not only can any wide-scope image be cropped into a narrow-scope one, a narrow-scope image can be extended to a wide-scope one simply by filling in all the new areas of scope with black (or by using more advanced image processing tools to create a more visually pleasing extension). (In the category of sets, the statement that every covering can be split is precisely the axiom of choice.)

I’ve recently found myself having to deal quite a bit with group extensions in my research, so I have decided to make some notes on the basic theory of such extensions here. This is utterly elementary material for a group theorist, but I found this task useful for organising my own thoughts on this topic, and also in pinning down some of the jargon in this field.

Definition 1 (Group extension) An extension of a group {H} is a group {G}, together with a surjective projection map (or covering map) {\pi: G \rightarrow H}. If the kernel of {\pi} can be identified with (i.e. is isomorphic to) a group {K}, we say that {G} is an extension of {H} by {K}, and we have the short exact sequence

\displaystyle  0 \rightarrow K \rightarrow G \rightarrow H \rightarrow 0.

If the group {K} has some property {{\mathcal P}}, we say that {G} is a {{\mathcal P}} extension of {H}. Thus for instance, if {K} is abelian, {G} is an abelian extension of {H}; if {K} is central (in {G}), {G} is a central extension of {H}; and so forth. (Note that all extensions are automatically normal extensions, so we will almost never use the latter term.) We refer to {H} as the base of the extension, and {K} as the fibre, and refer to {H} and {K} collectively as factors of {G}.

If {K} has some property {{\mathcal P}}, and {H} has some property {{\mathcal Q}}, then we say that {G} is {{\mathcal P}}-by-{{\mathcal Q}}. (I have no idea why the order is traditionally arranged in this way; I would have thought that extending a {{\mathcal Q}} group by a {{\mathcal P}} group would give a {{\mathcal P}}-by-{{\mathcal Q}} group, rather than the other way around; perhaps at one point the idea of a normal embedding was considered more important than a group extension. Nevertheless, the notation seems to be entrenched by now.) Thus, for instance, {G} is abelian-by-finite if {K} is abelian and {H} is finite, but finite-by-abelian if {K} is finite and {H} is abelian.

One can think of a {K}-by-{H} group as a group that looks like {H} “at large scales” and like {K} “at small scales”; one can also view this group as a principal {K}-bundle over {H}.

There are several ways to generate a group extension {G \rightarrow H \rightarrow 0}. Firstly, given any homomorphism {\pi: G \rightarrow G'} from one group {G} to another, the homomorphism theorem tells us that {G} is an extension of the image {\pi(G)}, with kernel {\hbox{ker}(\pi)}:

\displaystyle  0 \rightarrow \hbox{ker}(\pi) \rightarrow G \rightarrow \pi(G) \rightarrow 0.

Of course, every group extension arises in this manner.

A group extension {\pi: G \rightarrow H} splits if there is a homomorphism {\phi: H \rightarrow G} such that {\pi(\phi(h)) = h} for all {h \in H}. In this case, {H} acts on the kernel {K} by conjugation (after identifying {H} with {\phi(H)}); denoting this action by {\rho} (thus {\rho(h) k := \phi(h) k \phi(h)^{-1}}), we can then canonically identify {G} with the semi-direct product {K \rtimes_\rho H}, defined as the set of pairs {(k,h)} with {k \in K}, {h \in H}, with the group law {(k,h) (k',h') := (k \rho(h)(k'), hh')}, by identifying {(k,h)} with {k \phi(h)}. Conversely, every semi-direct product {K \rtimes_\rho H} is a group extension of {H} by {K} which splits. If the conjugation action {\rho} is trivial, then the semi-direct product simplifies to the direct product {K \times H}. In particular, any semi-direct product which is a central extension is of this form.

Note that, in general, an extension of {H} by {K} is a different concept from an extension of {K} by {H}, because one can have {H} as a normal subgroup but not as a quotient, or vice versa. For instance, {S_3} has {A_3} as a normal subgroup, but not as a quotient; {S_3} is an extension of {{\mathbb Z}/2{\mathbb Z}} by {A_3}, but not vice versa. To put it another way, the operator “-by-” is not commutative: {H}-by-{K} is a different concept from {K}-by-{H}.

A subgroup of an {K}-by-{H} group is automatically an {K'}-by-{H'} group for some subgroups {H', K'} of {H, K} respectively; this is essentially Goursat’s lemma. Furthermore, the index of the subgroup is the product of the index of {H'} in {H}, and the index of {K'} in {K}.

Some standard notions in group theory can be defined using group extensions.

  • A metabelian group is the same thing as an abelian-by-abelian group, i.e. an abelian extension of an abelian group.
  • A metacyclic group is the same thing as an cyclic-by-cyclic group, i.e. a cyclic extension of a cyclic group.
  • A polycyclic group is defined recursively by declaring the trivial group to be polycyclic of length {0}, and defining a polycyclic group of length {l} to be an extension of a cyclic group by a polycyclic group of length {l-1}. Thus polycyclic groups are polycyclic-by-cyclic, where the polycyclic factor has a shorter length.
  • A supersolvable group is defined recursively by declaring the trivial group to be supersolvable of length {0}, and defining a supersolvable group of length {l} to be a cyclic extension supersolvable group of length {l-1}. Thus supersolvable groups are cyclic-by-supersolvable, where the supersolvable factor has a shorter length. In other words, supersolvable groups are towers of cyclic extensions.
  • A solvable group is defined recursively by declaring the trivial group to be solvable of length {0}, and defining a solvable group of length {l} to be an extension of an abelian group by a solvable group of length {l-1}. Thus solvable groups are solvable-by-abelian, where the solvable factor has a shorter length. One can equivalently define solvable groups as abelian-by-solvable, where the solvable factor again has a shorter length (because the final term in the derived series is abelian and normal). In other words, a solvable group is a tower of abelian extensions.
  • A nilpotent group is defined recursively by declaring the trivial group to be nilpotent of step {0}, and defining a nilpotent group of step {s} to be a central extension of a nilpotent group of step {s-1}, thus nilpotent groups are central-by-nilpotent. In other words, a nilpotent group is a tower of central extensions.

(The inclusions here are: cyclic implies abelian implies metabelian implies solvable, cyclic implies metacyclic implies supersolvable implies polycyclic implies solvable, metacyclic implies metabelian, and abelian implies nilpotent implies solvable.)

The trivial group is the identity for the “-by-” operator: trivial-by-{{\mathcal P}} or {{\mathcal P}}-by-trivial is the same thing as {{\mathcal P}}.

Now we comment on the associativity of the “-by-” operator. If {N, H, K} are groups, observe that an {N}-by-({H}-by-{K}) group (i.e. an extension of an {H}-by-{K} group by {N}) is automatically an ({N}-by-{H})-by-{K} group (i.e. an extension of {K} by an {N}-by-{H} group), since if we denote {G} by the {N}-by-({H}-by-{K}) group, and {\pi} the quotient map from {G} to the {H}-by-{K} group, then {\pi^{-1}(H)} is a {N}-by-{H} normal subgroup of {G} whose quotient is {K}. Thus, for instance, every cyclic-by-metacyclic group is metacyclic-by-cyclic, and more generally every supersolvable group is polycyclic.

On the other hand, the converse is not true: not every ({N}-by-{H})-by-{K} group is an {N}-by-({H}-by-{K}) group. The problem is that {N} is normal in the {N}-by-{H} group, but need not be normal in the ({N}-by-{H})-by-{K} group. For instance, the semi-direct product {{\mathbb Z}^2 \rtimes SL_2({\mathbb Z})} is ({{\mathbb Z}}-by-{{\mathbb Z}})-by-{SL_2({\mathbb Z})} but not {{\mathbb Z}}-by-({{\mathbb Z}}-by-{SL_2({\mathbb Z})}). So the “-by-” operation is not associative in general (for instance, there are polycyclic groups that are not supersolvable). However, if {N} is not just normal in the {N}-by-{H} group, but is characteristic in that group, then it is automatically normal in the larger ({N}-by-{H})-by-{K} group, and then one can interpret the ({N}-by-{H})-by-{K} group as an {N}-by-({H}-by-{K}) group. So one recovers associativity when the first factor is characteristic. This explains why solvable groups can be recursively expressed both as abelian-by-solvable, and equivalently as solvable-by-abelian; this is ultimately because the commutator subgroup {[G,G]} is a characteristic subgroup of {G}. An easy but useful related observation is that solvable-by-solvable groups are again solvable (with the length of the product being bounded by the sum of the length of the factors).

Given a group property {{\mathcal P}}, a group {G} is said to be virtually {{\mathcal P}} if it has a finite index subgroup with the property {{\mathcal P}}; thus for instance a virtually abelian group is one with a finite index abelian subgroup, and so forth. (With this convention, “finite” is the same as “virtually trivial”.) This concept is not directly expressible in terms of group extensions for arbitrary properties {{\mathcal P}}; however, if the group property {{\mathcal P}} is hereditary in the sense that subgroups of a {{\mathcal P}} group are also {{\mathcal P}}, then a virtually {{\mathcal P}} group is the same concept as a {{\mathcal P}}-by-finite group. This is because every finite index subgroup {H} of a group {G} automatically contains a finite index normal subgroup of {G}. (Proof: {G} acts on the finite quotient space {G/H} by left multiplication, hence the stabiliser of {G/H} has finite index in {G}. But this stabliser is also normal in {G} and contained in {H}.)

One also observes that if {{\mathcal P}}, {{\mathcal Q}} are hereditary properties, then the property of {{\mathcal P}}-by-{{\mathcal Q}} is hereditary also; if {0 \rightarrow P \rightarrow G \rightarrow Q \rightarrow 0} is a {{\mathcal P}}-by-{{\mathcal Q}} group, and {G'} is a subgroup of {G}, then the short exact sequence

\displaystyle  0 \rightarrow (P \cap G') \rightarrow G' \rightarrow \pi(G') \rightarrow 0,

where {\pi: G \rightarrow Q} is a projection map, demonstrates that {G'} is also a {{\mathcal P}}-by-{{\mathcal Q}} group. Thus for instance the properties of being metabelian, metacyclic, polycyclic, supersolvable, solvable, or nilpotent, are hereditary. As a consequence, virtually nilpotent is the same as nilpotent-by-finite, etc.

We saw for hereditary properties {{\mathcal P}} that “{{\mathcal P}}-by-finite” was the same concept as “virtually {{\mathcal P}}“. It is natural to ask whether the same is true for “finite-by-{{\mathcal P}}“. The answer is no; for instance, one can extend the an infinite vector space {V} over a finite field {F} by {F} (using some non-degenerate bilinear anti-symmetric form {\omega: V \times V \rightarrow F}, and defining {(v,f) (w,g) = (v+w, f+g+\omega(v,w))} for {v,w \in V} and {f,g \in F}) to create a nilpotent group which is finite-by-abelian, but not virtually abelian. Conversely, the semi-direct product {{\mathbb Z} \rtimes {\mathbb Z}/2{\mathbb Z}} (where {{\mathbb Z}/2{\mathbb Z}} acts on {{\mathbb Z}} by reflection) is virtually abelian, but not finite-by-abelian. On the other hand, for hereditary {{\mathcal P}}, a finite-by-{{\mathcal P}} group is virtually (central finite)-by-{{\mathcal P}}. This is because if {G} is an extension of a {{\mathcal P}} group {P} by a finite group {F}, then {G} acts by conjugation on the finite group {F}; the stabiliser {G'} of this action is then a finite index subgroup, whose intersection of {F} is then central in {G'}. The projection of {G'} onto {P} is also a {{\mathcal P}} group by the hereditary nature of {{\mathcal P}}, and the claim follows.

Remark 1 There is a variant of the above result which is also useful. Suppose one has an {H}-by-{K} group {G} in which the action of {K} on {H} is virtually trivial (i.e. there are only a finite number of distinct automorphisms of {H} induces by {K}. Then {G} is virtually a central {H'}-by-{K'} group for some finite index subgroups {H', K'} of {H, K}.

One can phrase various results in group theory in a succinct form using this notation. For instance, one of the results in my earlier blog post on amenability now states that amenable-by-amenable groups are amenable. Another example that I have been looking at recently is this paper of Larsen and Pink, the main result of which is a classification of finite linear groups over a field of characteristic {p}, namely that such groups are virtually ({p}-group by abelian) by (semisimple of Lie type), where one has bounds on the index of the “virtually” and on the type of the semisimple group.