The classification of finite simple groups (CFSG), first announced in 1983 but only fully completed in 2004, is one of the monumental achievements of twentieth century mathematics. Spanning hundreds of papers and tens of thousands of pages, it has been called the “enormous theorem”. A “second generation” proof of the theorem is nearly completed which is a little shorter (estimated at about five thousand pages in length), but currently there is no reasonably sized proof of the classification.

An important precursor of the CFSG is the Feit-Thompson theorem from 1962-1963, which asserts that every finite group of odd order is solvable, or equivalently that every non-abelian finite simple group has even order. This is an immediate consequence of CFSG, and conversely the Feit-Thompson theorem is an essential starting point in the proof of the classification, since it allows one to reduce matters to groups of even order for which key additional tools (such as the Brauer-Fowler theorem) become available. The original proof of the Feit-Thompson theorem is 255 pages long, which is significantly shorter than the proof of the CFSG, but still far from short. While parts of the proof of the Feit-Thompson theorem have been simplified (and it has recently been converted, after six years of effort, into an argument that has been verified by the proof assistant Coq), the available proofs of this theorem are still extremely lengthy by any reasonable standard.

However, there is a significantly simpler special case of the Feit-Thompson theorem that was established previously by Suzuki in 1957, which was influential in the proof of the more general Feit-Thompson theorem (and thus indirectly to the proof of CFSG). Define a CA-group to be a group {G} with the property that the centraliser {C_G(x) := \{ g \in G: gx=xg \}} of any non-identity element {x \in G} is abelian; equivalently, the commuting relation {x \sim y} (defined as the relation that holds when {x} commutes with {y}, thus {xy=yx}) is an equivalence relation on the non-identity elements {G \backslash \{1\}} of {G}. Trivially, every abelian group is CA. A non-abelian example of a CA-group is the {ax+b} group of invertible affine transformations {x \mapsto ax+b} on a field {F}. A little less obviously, the special linear group {SL_2(F_q)} over a finite field {F_q} is a CA-group when {q} is a power of two. The finite simple groups of Lie type are not, in general, CA-groups, but when the rank is bounded they tend to behave as if they were “almost CA”; the centraliser of a generic element in {SL_d(F_q)}, for instance, when {d} is bounded and {q} is large), is typically a maximal torus (because most elements in {SL_d(F_q)} are regular semisimple) which is certainly abelian. In view of the CFSG, we thus see that CA or nearly CA groups form an important subclass of the simple groups, and it is thus of interest to study them separately. To this end, we have

Theorem 1 (Suzuki’s theorem on CA-groups) Every finite CA-group of odd order is solvable.

Of course, this theorem is superceded by the more general Feit-Thompson theorem, but Suzuki’s proof is substantially shorter (the original proof is nine pages) and will be given in this post. (See this survey of Solomon for some discussion of the link between Suzuki’s argument and the Feit-Thompson argument.) Suzuki’s analysis can be pushed further to give an essentially complete classification of all the finite CA-groups (of either odd or even order), but we will not pursue these matters here.

Moving even further down the ladder of simple precursors of CSFG is the following theorem of Frobenius from 1901. Define a Frobenius group to be a finite group {G} which has a subgroup {H} (called the Frobenius complement) with the property that all the non-trivial conjugates {gHg^{-1}} of {H} for {g \in G \backslash H}, intersect {H} only at the origin. For instance the {ax+b} group is also a Frobenius group (take {H} to be the affine transformations that fix a specified point {x_0 \in F}, e.g. the origin). This example suggests that there is some overlap between the notions of a Frobenius group and a CA group. Indeed, note that if {G} is a CA-group and {H} is a maximal abelian subgroup of {G}, then any conjugate {gHg^{-1}} of {H} that is not identical to {H} will intersect {H} only at the origin (because {H} and each of its conjugates consist of equivalence classes under the commuting relation {\sim}, together with the identity). So if a maximal abelian subgroup {H} of a CA-group is its own normaliser (thus {N(H) := \{ g \in G: gH=Hg\}} is equal to {H}), then the group is a Frobenius group.

Frobenius’ theorem places an unexpectedly strong amount of structure on a Frobenius group:

Theorem 2 (Frobenius’ theorem) Let {G} be a Frobenius group with Frobenius complement {H}. Then there exists a normal subgroup {K} of {G} (called the Frobenius kernel of {G}) such that {G} is the semi-direct product {H \ltimes K} of {H} and {K}.

Roughly speaking, this theorem indicates that all Frobenius groups “behave” like the {ax+b} example (which is a quintessential example of a semi-direct product).

Note that if every CA-group of odd order was either Frobenius or abelian, then Theorem 2 would imply Theorem 1 by an induction on the order of {G}, since any subgroup of a CA-group is clearly again a CA-group. Indeed, the proof of Suzuki’s theorem does basically proceed by this route (Suzuki’s arguments do indeed imply that CA-groups of odd order are Frobenius or abelian, although we will not quite establish that fact here).

Frobenius’ theorem can be reformulated in the following concrete combinatorial form:

Theorem 3 (Frobenius’ theorem, equivalent version) Let {G} be a group of permutations acting transitively on a finite set {X}, with the property that any non-identity permutation in {G} fixes at most one point in {X}. Then the set of permutations in {G} that fix no points in {X}, together with the identity, is closed under composition.

Again, a good example to keep in mind for this theorem is when {G} is the group of affine permutations on a field {F} (i.e. the {ax+b} group for that field), and {X} is the set of points on that field. In that case, the set of permutations in {G} that do not fix any points are the non-trivial translations.

To deduce Theorem 3 from Theorem 2, one applies Theorem 2 to the stabiliser of a single point in {X}. Conversely, to deduce Theorem 2 from Theorem 3, set {X := G/H = \{ gH: g \in G \}} to be the space of left-cosets of {H}, with the obvious left {G}-action; one easily verifies that this action is faithful, transitive, and each non-identity element {g} of {G} fixes at most one left-coset of {H} (basically because it lies in at most one conjugate of {H}). If we let {K} be the elements of {G} that do not fix any point in {X}, plus the identity, then by Theorem 3 {K} is closed under composition; it is also clearly closed under inverse and conjugation, and is hence a normal subgroup of {G}. From construction {K} is the identity plus the complement of all the {|G|/|H|} conjugates of {H}, which are all disjoint except at the identity, so by counting elements we see that

\displaystyle |K| = |G| - \frac{|G|}{|H|}(|H|-1) = |G|/|H|.

As {H} normalises {K} and is disjoint from {K}, we thus see that {KH = H \ltimes K} is all of {G}, giving Theorem 2.

Despite the appealingly concrete and elementary form of Theorem 3, the only known proofs of that theorem (or equivalently, Theorem 2) in its full generality proceed via the machinery of group characters (which one can think of as a version of Fourier analysis for nonabelian groups). On the other hand, once one establishes the basic theory of these characters (reviewed below the fold), the proof of Frobenius’ theorem is very short, which gives quite a striking example of the power of character theory. The proof of Suzuki’s theorem also proceeds via character theory, and is basically a more involved version of the Frobenius argument; again, no character-free proof of Suzuki’s theorem is currently known. (The proofs of Feit-Thompson and CFSG also involve characters, but those proofs also contain many other arguments of much greater complexity than the character-based portions of the proof.)

It seems to me that the above four theorems (Frobenius, Suzuki, Feit-Thompson, and CFSG) provide a ladder of sorts (with exponentially increasing complexity at each step) to the full classification, and that any new approach to the classification might first begin by revisiting the earlier theorems on this ladder and finding new proofs of these results first (in particular, if one had a “robust” proof of Suzuki’s theorem that also gave non-trivial control on “almost CA-groups” – whatever that means – then this might lead to a new route to classifying the finite simple groups of Lie type and bounded rank). But even for the simplest two results on this ladder – Frobenius and Suzuki – it seems remarkably difficult to find any proof that is not essentially the character-based proof. (Even trying to replace character theory by its close cousin, representation theory, doesn’t seem to work unless one gives in to the temptation to take traces everywhere and put the characters back in; it seems that rather than abandon characters altogether, one needs to find some sort of “robust” generalisation of existing character-based methods.) In any case, I am recording here the standard character-based proofs of the theorems of Frobenius and Suzuki below the fold. There is nothing particularly novel here, but I wanted to collect all the relevant material in one place, largely for my own benefit.

— 1. Basic character theory —

Let {G} be a finite group. Then we can form the finite-dimensional complex Hilbert space {L^2(G)} of functions {f:G \rightarrow {\bf C}} with the inner product

\displaystyle  \langle f_1, f_2 \rangle_{L^2(G)} = \mathop{\bf E}_{x \in G} f_1(x) \overline{f_2(x)}

and thus norm

\displaystyle  \|f\|_{L^2(G)} = (\mathop{\bf E}_{x \in G} |f(x)|^2)^{1/2}

where {\mathop{\bf E}_{x \in G} f(x) := \frac{1}{|G|} \sum_{x \in G} f(x)} is the averaging operator. Inside this space, we have the subspace {L^2(G)^G} of class functions: functions {f \in L^2(G)} which are invariant under the conjugation action of {G} on itself, thus

\displaystyle  f(gxg^{-1}) = f(x)

for all {x, g \in G}. Equivalently, {f} is constant on each conjugacy class of {G}. In particular, we see that the dimension of {L^2(G)^G} is equal to the class number of {G} – the number of conjugacy classes of {G}.

One way to generate class functions is from taking traces of finite-dimensional unitary representations {\rho: G \rightarrow U(V)}, i.e. a homomorphism from the group {G} to unitary operators on a finite-dimensional complex Hilbert space {V}. We will abbreviate “finite-dimensional unitary representation” as “representation” henceforth. Given any such representation, one has an associated character {\chi_\rho \in L^2(G)^G} defined by

\displaystyle  \chi_\rho(g) := \hbox{tr}_V( \rho(g) ).

One easily verifies that this is a class function. For instance, the regular representation {\tau: G \rightarrow U(L^2(G))}, in which {\tau(g) f(x) := f(g^{-1} x)}, has character

\displaystyle  \chi_\tau(g) = |G| 1_{g = 1_G}

and every linear character {\chi: G \rightarrow S^1 = U({\bf C})} (i.e. a homomorphism to the complex unit circle) is a character associated the obvious one-dimensional representation corresponding to {\chi}. In particular, the constant function {1} is a character, associated to the principal one-dimensional representation.

The characters interact well with various representation-theoretic operations. For instance, isomorphic representations clearly have the same character. If {\rho_1: G \rightarrow U(V_1), \rho_2: G \rightarrow U(V_2)} are representations, then the characters of the direct sum {\rho_1 \oplus \rho_2: G \rightarrow U(V_1 \oplus U_2)} and tensor product {\rho_1 \otimes \rho_2: G \rightarrow U(V_1 \otimes V_2)} are the sum and product of the individual characters:

\displaystyle  \chi_{\rho_1 \oplus \rho_2} = \chi_{\rho_1} + \chi_{\rho_2}

and

\displaystyle  \chi_{\rho_1 \otimes \rho_2} = \chi_{\rho_1} \chi_{\rho_2}.

Also, if {\overline{V_1}} is the Hilbert space {V_1} with the conjugated inner product, then the conjugate representation {\overline{\rho_1}: G \rightarrow U(\overline{V_1})} (given by taking {\rho_1} and conjugating the inner product structure) has a conjugated character:

\displaystyle  \chi_{\overline{\rho_1}} = \overline{\chi_{\rho_1}}.

Thus the space of characters forms a semi-ring in {L^2(G)^G} that is closed under complex conjugation. Also, since any element of the absolute Galois group {\hbox{Gal}(\overline{{\bf Q}}/{\bf Q})} of the rationals can be extended to the complex numbers {{\bf C}}, we have the stronger fact that the space of characters is also invariant with respect to the action of {\hbox{Gal}(\overline{{\bf Q}}/{\bf Q})}; we will need this fact somewhat later in this post.

The space of characters is not a ring, because characters are certainly not preserved with respect to negation: the value of a representation {\chi_\rho(1_G)} at the identity is the dimension {\hbox{dim}(\rho) := \hbox{dim}(V)} of the space that {\rho} acts on, and so is non-negative; in particular, {-\chi_\rho} will not be a character as long as {\rho} is positive dimensional. We can then define the space of generalised characters to be the ring generated by the characters, thus a generalised character is nothing more than a difference of two characters.

By repeatedly taking orthogonal complements, one can easily see that representations are completely reducible, thus if {\hat G} is a collection of one representative of each of the isomorphism classes of irreducible finite-dimensional representations of {G}, then every character can be written as a linear combination (over the natural numbers) of the irreducible characters {\{ \chi_\xi: \xi \in \hat G \}}.

From the ergodic theorem (which is a triviality in the case of an action of a finite group {G}), the average value of a character {\chi_\rho} of a representation {\rho: G \rightarrow U(V)} is equal to the dimension of its invariant component {V^G := \{ v \in V: \rho(g)v = v \hbox{ for all } g \in G \}}:

\displaystyle  \mathop{\bf E}_{x \in G} \chi_\rho(x) = \hbox{dim}( V^G ).

As a consequence, the inner product {\langle \chi_{\rho_1}, \chi_{\rho_2} \rangle_{L^2(G)^G}} of two characters is equal to the dimension of the {G}-invariant component of {\rho_1 \otimes \overline{\rho_2}}. By Schur’s lemma, this implies in particular that the irreducible characters {\{ \chi_\xi: \xi \in \hat G \}} are an orthonormal system in {L^2(G)}:

\displaystyle  \langle \chi_\xi, \chi_{\xi'} \rangle_{L^2(G)^G} = 1_{\xi=\xi'}.

In fact, they form an orthonormal basis for {L^2(G)^G}. (Proof: given any non-trivial {K \in L^2(G)^G}, the convolution operator {f \mapsto K * f} is non-trivial in the regular representation, and thus must also be non-trivial with respect to at least one irreducible representation {\xi}, which implies that {K} has non-zero inner product with {\chi_\xi}. Thus there is no non-zero element of {L^2(G)^G} that is orthogonal to all the irreducible characters, giving the claim.) In particular, we see that {|\hat G|} is equal to the class number of {G}.

Using this basis, we now have a Fourier transform {f \mapsto \hat f} that is an isometry between the Hilbert space {L^2(G)^G} of class functions, and the Hilbert space {\ell^2(\hat G)}, defined by taking inner products with characters

\displaystyle  \hat f(\xi) := \langle f, \chi_\xi \rangle_{L^2(G)}

and with the usual Fourier inversion formula

\displaystyle  f = \sum_{\xi \in \hat G} \hat f(\xi) \chi_\xi

and Plancherel and Parseval identities

\displaystyle  \|f\|_{L^2(G)^G} = \| \hat f \|_{\ell^2(\hat G)}

and

\displaystyle  \langle f_1,f_2 \rangle_{L^2(G)^G} = \langle \hat f_1, \hat f_2 \rangle_{\ell^2(\hat G)}.

(One can also relate convolution in {L^2(G)^G} with pointwise multiplication in {\ell^2(\hat G)}, and pointwise multiplication in {L^2(G)^G} is related to a plethysm on {\ell^2(\hat G)} involving tensor product multiplicities, but we will not need these operations here.)

A character {\chi_\rho} of a (not necessarily irreducible) representation {\rho} then has Fourier coefficients {\widehat{\chi_\rho}(\xi)} that count the multiplicity of {\xi} in {\rho}; in particular, a class function is a character (resp. generalised character) iff its Fourier coefficients are all natural numbers (resp. integers), and any two representations are isomorphic iff they have the same character.

Thus for instance the regular representation {\tau} has Fourier coefficients {\hat{\chi_\tau}(\xi) = \hbox{dim}(\xi)}, leading to the identity

\displaystyle  \chi_\tau = \sum_{\xi \in \hat G} \hbox{dim}(\xi) \chi_\xi

which gives the Peter-Weyl theorem that {\tau} is isomorphic to the direct sum of {\hbox{dim}(\xi)} copies of {\xi} for each {\xi \in \hat G}. In particular we have

\displaystyle  |G| = \sum_{\xi \in \hat G} \hbox{dim}(\xi)^2. \ \ \ \ \ (1)

From these Fourier identities we can now detect whether a representation is irreducible (or a combination of a small number of representations) through the {L^2} structure of its character. Indeed, by taking Fourier transforms and working in {\ell^2(\hat G)} we now have the following immediate corollaries, which will be very useful to us in the sequel:

Lemma 4 (Small {L^2(G)^G} norm and irreducibility) Let {\chi} be a generalised character.

  • (i) {\|\chi\|_{L^2(G)^G}^2} is a natural number.
  • (ii) {\| \chi\|_{L^2(G)^G}^2} equals {1} iff {\chi = \epsilon \chi_\xi} for some sign {\epsilon \in \{-1,+1\}} and irreducible character {\chi_\xi}. If we also know that {\chi(1_G)>0}, this forces {\epsilon = +1} (since {\chi_\xi(1_G) = \hbox{dim}(\xi)} is positive).
  • (iii) {\|\chi\|_{L^2(G)^G}^2} equals {2} iff {\chi = \epsilon \chi_\xi - \epsilon' \chi_{\xi'}} for some signs {\epsilon,\epsilon' \in \{-1,+1\}} and distinct irreducible characters {\chi_\xi,\chi_{\xi'}}. If we also know that {\chi(1_G)=0}, then this forces {\epsilon=\epsilon'} (again because {\chi_\xi(1_G), \chi_{\xi'}(1_G)} are positive).

One can of course also characterise when {\|\chi\|_{L^2(G)^G}^2} is equal to {3}, {4}, etc. by this method, although the descriptions rapidly become more complicated and less useful. In practice, this lemma will allow us to construct interesting examples of irreducible representations by first exhibiting a generalised character of small norm (or equivalently, two characters that are close to each other in {L^2(G)^G} norm). It seems very difficult to mimic this type of construction by any other means, including non-character-based representation theoretic methods. (But perhaps one could categorify this lemma somehow using K-theory.)

Lemma 4 is a typical application of the integrality gap, which is the trivial but fundamental fact that integers are either zero or have magnitude at least one. It is the interplay between the integrality gap and the Fourier analysis of characters which drives the proof of both Frobenius’ theorem and Suzuki’s theorem, as we shall soon see.

We also record some additional easy properties of characters which we will need later. Firstly, we have the identity

\displaystyle  \chi(g^{-1}) = \overline{\chi(g)} \ \ \ \ \ (2)

for any {g \in G}, since the inverse of a unitary operator is also its adjoint. Secondly, the kernel

\displaystyle  \{ x \in G: \chi(x) = \chi(1_G) \} \ \ \ \ \ (3)

of a character is automatically a normal subgroup of {G}. This is because a unitary operator on a finite-dimensional space {V} has trace {\hbox{dim}(V)} if and only if it is equal to the identity operator, and so (3) is also the kernel of the associated representation {\rho: G \rightarrow U(V)}. This latter fact suggests a strategy to prove Frobenius’ theorem by exhibiting a character whose kernel is precisely the complement of the conjugacy classes of {H} excluding the identity, and this is in fact exactly what we will do.

Thus far we have focused on the representation theory of a single group {G}. However, the situation becomes significantly more interesting when one relates the representation theory of two groups, a finite group {G} and a subgroup {H} (not necessarily normal). We then have an obvious restriction map {\hbox{Res}^H_G: L^2(G)^G \rightarrow L^2(H)^H} that restricts any class function on {G} to a class function on {H} (since any two elements of {H} that are conjugate in {H} are clearly also conjugate in {G}). The adjoint map {\hbox{Ind}^G_H: L^2(H)^H \rightarrow L^2(G)^G} can be easily computed: given any class function {f \in L^2(H)^H}, the induced class function {\hbox{Ind}^G_H f\in L^2(G)^G} is given by the Frobenius formula

\displaystyle  \hbox{Ind}^G_H f(x) = \frac{1}{|H|} \sum_{g \in G} f(gxg^{-1}) \ \ \ \ \ (4)

with the convention that {f} is extended by zero from {H} to {G}. Equivalently, one has

\displaystyle  \hbox{Ind}^G_H f(x) = \sum_{i=1}^k f(g_ixg_i^{-1})

where {g_1 H, \ldots, g_k H} is an enumeration of the left-cosets of {H}.

In a similar fashion, given a representation {\rho: G \rightarrow U(V)} of {G}, we may restrict it to {H} to obtain a representation {\hbox{Res}^H_G \rho: H \rightarrow U(V)} of {H}. In the adjoint direction, given a representation {\eta: H \rightarrow U(V)} of {H}, one can associate an induced representation {\hbox{Ind}^G_H \eta: G \rightarrow U(\tilde V)} of {G} by the following construction. One takes {\tilde V} to be the space of functions {v: G \rightarrow V} that {v(gh) = \eta(h^{-1}) v(g)} for all {g \in G} and {h \in H} with inner product

\displaystyle  \langle v_1, v_2 \rangle_{\tilde V} := \mathop{\bf E}_{x \in G} \langle v_1(x), v_2(x) \rangle_V,

and then lets {G} act on {\tilde V} by the formula

\displaystyle  \hbox{Ind}^G_H \eta(g) v(x) := v(g^{-1} x)

which one can check to indeed give a representation. Thus, for instance, the regular representation on {H} induces (an isomorphic copy of) the regular representation on {G}. It is not difficult to show that these representation-theoretic constructions are compatible with the operations on class functions mentioned earlier, thus

\displaystyle  \hbox{Res}^H_G \chi_\rho = \chi_{\hbox{Res}^H_G \rho}

for every representation {\rho} of {G}, and dually

\displaystyle  \hbox{Ind}^G_H \chi_\rho = \chi_{\hbox{Ind}^G_H \rho}.

In particular, any character of {G} restricts to a character of {H}, and every character of {H} induces a character on {G}. Thus the adjoint relationship between {\hbox{Res}^H_G} and {\hbox{Ind}^G_H} for class functions induces a corresponding adjoint relationship for representations, known as Frobenius reciprocity.

In general, the restriction or induction of an irreducible representation will not be irreducible (and the operations of restriction and induction do not invert each other). However, the {L^2} geometry of the characters can often be controlled quite precisely (especially for structured groups such as Frobenius groups or CA-groups), and this together with tools such as Lemma 4 can allow us to create interesting irreducible representations of {G} in non-trivial fashion from irreducible representations of {H}.

— 2. Frobenius’ theorem —

We are now ready to prove Frobenius’ theorem. Let {G} be a Frobenius group with Frobenius complement {H}. Then there are {|G|/|H|} distinct conjugates {gHg^{-1}} of {H}, which are all disjoint except for the origin, thus one can partition

\displaystyle  G = \{1_G\} \cup \bigcup_{g \in G/H} (gHg^{-1} \backslash \{1_G\}) \cup (K \backslash \{1_G\}) \ \ \ \ \ (5)

where (by abuse of notation) {g} ranges over a set of representatives {gH} of the left cosets of {H} and {K} is the identity together with all the elements that do not lie in any conjugate of {H}. I like to think of this decomposition by picturing {G} as being like a plane, {1_G} being the origin in this plane, each conjugate {gHg^{-1}} being a non-vertical line through the origin, and {K} being the vertical line through the origin (note that in the case of the {ax+b} group, this is more or less exactly what (5) actually looks like). Counting elements in (5), we thus have

\displaystyle  |G|= 1 + \frac{|G|}{|H|} (|H|-1) + (|K|-1)

and so

\displaystyle  |K| = \frac{|G|}{|H|}. \ \ \ \ \ (6)

We now start inducing characters from {H} to {G} and determine their geometr;8 Let {\xi \in \hat H} be a irreducible character of {H} of some dimension {d = \hbox{dim}(\xi)}, then the character {\chi_\xi} obeys the identities

\displaystyle  \chi_\xi(1_G) = d

and

\displaystyle  \sum_{h \in H} |\chi_\xi(h)|^2 = |H| \|\chi_\xi\|_{L^2(H)^H}^2 = |H|.

In particular,

\displaystyle  \sum_{h \in H \backslash \{0\}} |\chi_\xi(h)|^2 = |H| - d^2.

Now we consider the induced character

\displaystyle  \hbox{Ind}^G_H \chi_\xi(x) = \sum_{g \in G/H} \chi_\xi(gxg^{-1}).

Using the above identities and the partition (5), we see that {\hbox{Ind}^G_H \chi_\xi} equals {d |G|/|H|} at {1_G}, vanishes at all other elements of {K}, and on each of the {|G|/|H|} sets {gHg^{-1} \backslash \{1_G\}} is given by a conjugate of {\chi_\xi}. In particular,

\displaystyle  \sum_{x \in G \backslash K} |\hbox{Ind}^G_H \chi_\xi(x)|^2 = \frac{|G|}{|H|} (|H|-d^2).

Similarly, if one induces the trivial character {1} on {H} to a character {\hbox{Ind}^G_H 1} on {G}, this character will equal {|G|/|H|} at {1_G}, vanish at all other elements of {K}, and will equal {1} on {G \backslash K}. If we then form the generalised character

\displaystyle  \tilde \chi_\xi := \hbox{Ind}^G_H \chi_\xi - d\hbox{Ind}^G_H 1 + d,

then {\tilde \chi_\xi} equals {d} on {K} and is equal to {\hbox{Ind}^G_H \chi_\xi} outside of {K}. In particular, we have

\displaystyle  \sum_{x \in G} |\tilde \chi_\xi(x)|^2 = d^2 |K| + \frac{|G|}{|H|} (|H|-d^2).

Using (6), we thus see that {\tilde \chi_\xi} has surprisingly small norm:

\displaystyle  \| \tilde \chi_\xi \|_{L^2(G)^2} = 1.

We can then apply Lemma 4 and conclude that {\tilde \chi_\xi = \chi_{\tilde \xi}} for some irreducible representation {\tilde \xi \in \hat G} of {G}. Note that {\hbox{Res}^H_G \tilde \chi_\xi = \chi_\xi}. Thus we have shown that every irreducible representation {\xi} of {H} is the restriction of an irreducible representation {\tilde \xi} of {G}.

Remark 1 The above analysis shows a little bit more, namely that {\tilde \xi} arises in {\hbox{Ind}^G_H \xi} as the orthogonal complement of a copy of the mean zero component of quasiregular representation on {L^2(G/H)} (i.e. the induction of the trivial representation on {H}), although it is not obvious to me how one would demonstrate (other than via an inspection of characters) that the induced representation {\hbox{Ind}^G_H \xi} actually contains a copy of this component of the quasiregular representation.

Now we consider the regular character of {H}:

\displaystyle  \chi_\tau = \sum_{\xi \in \hat H} \hbox{dim}(\xi) \chi_\xi.

This character equals {|H|} at the identity, and vanishes at the other elements of {H}. If we then form the associated character

\displaystyle  \tilde \chi_\tau := \sum_{\xi \in \hat H} \hbox{dim}(\xi) \chi_{\tilde \xi}

of {G}, then {\tilde \chi_\tau} restricts to {\chi_\tau} and so also equals {|H|} at the identity and vanishes at the other elements of {H}. By (5), we conclude that {\tilde \chi_\tau} (which is a class function) is supported on {K}. Also, as each of the {\chi_{\tilde \xi}} are constant on {K}, {\tilde \chi_\tau} is also, and so {\tilde \chi_\tau} equals {|H|} on all of {K}. Thus {K} is the kernel (3) of the character {\tilde \chi_\tau} and is thus normal. This gives Frobenius’ theorem as discussed in the introduction.

— 3. More character theory —

Before we turn to Suzuki’s theorem, we will need some additional facts about characters which go beyond the Fourier-analytic considerations of Section 1 by also employing some tools from algebraic number theory.

Let {G} be a finite group. Observe that if {\rho: G \rightarrow U(V)} is a representation, then for any {g \in G}, the unitary operator {\rho(g)} can be diagonalised. As {g} (and hence {\rho(g)}) has finite order, the eigenvalues of {\rho(g)} are roots of unity, and so the trace {\chi_\rho(g) = \hbox{tr}(\rho(g))} is the sum of finitely many roots of unity. In particular, {\chi_\rho(g)} is always an algebraic integer. Unlike rational integers, algebraic integers do not directly enjoy an integrality gap; one can have algebraic integers of arbitrarily small nonzero magnitude (e.g. powers of {\sqrt{2}-1}). However, we will rely in several places on the basic but fundamental fact that a number which is both an algebraic integer and a rational is necessarily a rational integer, which then is subject to the integrality gap.

We have a variant of the above fact:

Lemma 5 Let {G} be a finite group, let {\xi: G \rightarrow U(V)} be a {d}-dimensional irreducible representation, and let {x \in G}. Then {\frac{|\hbox{Cl}(x)|}{d} \chi_\xi(x)} is an algebraic integer, where {\hbox{Cl}(x) := \{ gxg^{-1}: g \in G \}} is the conjugacy class of {x}.

Proof: The endomorphism {A := \sum_{y \in \hbox{Cl}(x)} \rho(y) \in \hbox{End}(V)} is {G}-equivariant and has trace {|\hbox{Cl}(x)| \chi_\xi(x)}; by Schur’s lemma, it is thus equal to {\frac{|\hbox{Cl}(x)|}{d} \chi_\xi(x)} times the identity. It thus suffices to show that the diagonal entries of {A} are algebraic integers; thus it will suffice to show that {P(A)=0} for some monic polynomial {P} with integer coefficients.

Consider the associated element {a := \sum_{y \in \hbox{Cl}(x)} y} in the group ring {{\bf Z} G} of {G}. Then the modules {\langle 1 \rangle}, {\langle 1, a \rangle}, {\langle 1, a, a^2 \rangle}, etc. form an increasing sequence of submodules of {{\bf Z} G}. As {{\bf Z} G \equiv {\bf Z}^{|G|}} is Noetherian, we thus have

\displaystyle  \langle 1, a, \ldots, a^n \rangle = \langle 1, a, \ldots, a^{n-1} \rangle

for some {n \geq 1}, or equivalently that {a^n} is an integer combination of {1,a,\ldots,a^{n-1}}. This implies that {A^n} is an integer combination of {1,A,\ldots,A^{n-1}}, and the claim follows. \Box

This leads to an important corollary:

Corollary 6 (Dimension divides order) Let {G} be a finite group, and let {\xi: G \rightarrow U(V)} be an irreducible {d}-dimensional representation. Then {d} divides {|G|}.

Proof: As the character {\chi_\xi} has {L^2(G)^G} norm one, we have

\displaystyle  \sum_{x \in G} |\chi_\xi(x)|^2 = |G|.

Grouping the {x} summation by conjugacy classes {C(g)}, we can express the left-hand side as the sum of terms of the form {\hbox{Cl}(x) \chi_\xi(x) \overline{\chi_\xi(x)}}, which by the preceding lemma and discussion is equal to {d} times an algebraic integer. We conclude that {|G|/d} is an algebraic integer also; but it is rational, and so must be a rational integer also. \Box

In particular, if {G} is of odd order, then the dimension of any irreducible representation of {G} has odd dimension. This has a further important consequence, due to Burnside:

Proposition 7 (Odd groups have non-real characters) Let {G} be a finite group of odd order, and let {\chi} be a non-principal irreducible character of {G}. Then {\chi} is not a real-valued character. In other words, {\overline{\chi} \neq \chi}.

Proof: Suppose for contradiction that {\chi} is real-valued. By (2) this implies that {\chi(x)=\chi(x^{-1})} for all {x \in G}.

As {G} is odd, there are no elements in {G} of order {2}. Thus one can partition

\displaystyle  G = \{1_G\} \cup \bigcup_{g \in A} \{g,g^{-1}\} \ \ \ \ \ (7)

for some subset {A} of {G} of order {(|G|-1)/2}. On the other hand, as {\chi} is non-principal, we have

\displaystyle  \sum_{x \in G} \chi(x) = |G| \langle \chi,1\rangle_{L^2(G)^G} = 0.

By (7) one has

\displaystyle  0 = \chi(1_G) + 2 \sum_{g \in A} \chi(g).

But {\chi(1_G)} is the dimension of {\chi}, which is odd by Corollary 6, so {\sum_{g \in A} \chi(g)} is a half-integer. But it is also an algebraic integer, giving the desired contradiction. \Box

— 4. Suzuki’s theorem —

We can now begin the proof of Suzuki’s theorem; we will basically use an arrangement of this theorem from the thesis of Wilcox. We begin with an easy reduction to the simple case:

Proposition 8 (Reduction to the simple case) Let {G} be a finite CA-group of odd order which is not simple. Suppose that all CA groups of smaller odd order than {G} are solvable. Then {G} is solvable also.

Proof: If {G} is not simple, it has a proper normal subgroup {N}. This group is also of odd order and inherits the CA property from {G}, so by hypothesis {N} is solvable. If we let {A} be the last non-trivial group in the derived series of {N}, then {A} is a non-trivial abelian characteristic subgroup of {N}, and is thus also a normal subgroup of {G}. Let {C} be the centraliser of {A}, then {C} is also a normal subgroup of {G}, which is still non-trivial and abelian as {G} is a CA-group. Furthermore, {C} is maximal abelian (it is not contained in any larger abelian group).

To show that {G} is solvable, it then suffices to show that the quotient {G/C} is solvable. As this group has an odd order smaller than that of {G}, it suffices to show that {G/C} is a CA-group. Thus, if {x,y,z} are non-identity elements of {G/C} with {x,z} both commuting with {y}, we need to show that {x,z} commute with each other. Equivalently, if {a,b,c \in G \backslash C} are such that {a,c} both commute with {b} modulo {C}, then {a} commutes with {c} modulo {C}.

If we fix {b}, then {b} acts on {C} by conjugation. This action cannot fix any non-identity element {d} of {C}, else the centraliser of {d} would contain {b} as well as {C}, contradicting the maximal abelian nature of {C}. Thus the map {d \mapsto bdb^{-1}d^{-1}}, which is a homomorphism on the normal abelian group {C}, has trivial kernel and is thus an isomorphism. From this we see that if {a} commutes with {b} modulo {C}, then one can multiply {a} by an element of {C} (on the left or right) in order to make it commute with {b} exactly. Thus, without loss of generality, {a} and {c} both commute with {b} exactly, and so {a,c} commute exactly as well as {G} is a CA-group, giving the claim. \Box

In view of this proposition, we see that to prove Suzuki’s theorem it suffices to show that simple non-abelian CA-groups of odd order do not exist.

Observe that in a CA-group {G}, every non-identity element {x} of {G} is contained in a unique maximal abelian subgroup of {G}, namely the centraliser {C(x) := \{ g \in G: gx=xg\}} of {x}. Thus the maximal abelian subgroups of {G}, once one removes the identity, form a partition of {G \backslash \{1\}}. It is instructive to keep some examples in mind:

  • In the case of the {ax+b} group on a field {F}, the maximal abelian subgroups are the translation group {\{ x \mapsto x+b: b \in F \}} and the stabilisers {\hbox{Stab}(x_0) = \{ x \mapsto a(x-x_0)+x_0: a \in F^\times\}} of points {x_0 \in F}, where {F^\times} is the multiplicative group {F^\times := F \backslash \{0\}}.
  • In the case of the special linear group {SL_2(F_q)} with {q} a power of two, the maximal abelian groups are conjugates of the split torus

    \displaystyle  T := \{ \begin{pmatrix} t & 0 \\ 0 & t^{-1} \end{pmatrix}: t \in F_q^\times \},

    the non-split torus

    \displaystyle  T' := \{ \begin{pmatrix} a & b \\ bs & a+b \end{pmatrix}: a,b \in F_q; a^2+ab+b^2s = 1 \}

    (where {s \in F_q} is any quantity not of the form {x^2+x} for some {x \in F_q}) or the unipotent group

    \displaystyle  U := \{ \begin{pmatrix} 1 & x \\ 0 & 1 \end{pmatrix}: x \in F_q \}.

As these examples show, while many of the maximal abelian subgroups may be conjugate to each other, there can certainly be several non-conjugate examples of maximal abelian subgroups. Let {H_1,\ldots,H_n} be a set consisting of one representative from each of these conjugacy classes, then we have the following analogue of the partition (5):

\displaystyle  G = \{1_G\} \cup \bigcup_{i=1}^n \bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\}), \ \ \ \ \ (8)

where {N(H_i) := \{ g \in G: gH_i = H_i g \}} is the normaliser of {H_i}. This partition turns out to be a somewhat less favourable than (5), but one can still run analogues of the Frobenius argument, particularly in the case when {|G|} is odd (which forces many other related quantities, such as {|G/N(H_i)|}, to be odd also). I like to think of this decomposition by viewing {G} as a plane, {1_G} as the origin, and {\bigcup_{i=1}^n \bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})} as sweeping out various sectors of this plane, with each conjugate {gH_i g^{-1}} of {H_i} being one of the rays in the sector. (This picture is an oversimplification, for instance it does not accurately reflect the closure of {H_i} with respect to group inversion, but I still find it a useful picture to have in mind.)

We first give the analogue of (6). Taking cardinalities in (8) we obtain the class equation

\displaystyle  |G| = 1 + \sum_{i=1}^n \frac{|G|}{|N(H_i)|} (|H_i|-1)

for a CA-group, which we can rearrange as

\displaystyle  1 = \frac{1}{|G|} + \sum_{i=1}^n \frac{1}{|W_i|} - \frac{1}{|W_i| |H_i|} \ \ \ \ \ (9)

where {W_i} is the group {W_i := N(H_i) / H_i} (I like to think of this group as a sort of “Weyl group” associated to {H_i}). As a first approximation, the right hand side of (9) is close to {\sum_{i=1}^n 1/|W_i|}. Thus, if one can somehow prevent {|W_i|} from getting too small for too many values of {i}, one an hope to upper bound the right-hand side of (9) by something less than {1}, leading to the desired contradiction. (This strategy won’t quite work when {G} is very small – {|G| \leq 70} to be precise – but this case can be worked out by hand.) Thus, we will be looking for such things as lower bounds on {|W_i|} or upper bounds on {n}. The fact that {|G|} is odd will force the {|W_i|} and {|H_i|} to be odd as well, which will turn out to be useful in improving these bounds by doubling the power of the integrality gap. (We will also rely on the odd order of {|G|} in a number of other places, in particular using Proposition 7.)

To get started on this strategy, suppose first that {W_i} was trivial for some {i}, thus {N(H_i) = H_i}, and then all the {|G|/|H_i|} conjugates of {H_i} are distinct. This makes {G} a Frobenius group, and so by Theorem 2 there is a Frobenius kernel {K}, which is a normal subgroup of {|G|/|H_i|}. If {G} is simple, then {K} has to be trivial, which makes {G=H_i} and so {G} is abelian. This gives Suzuki’s theorem in this case. Thus we may assume that {|W_i| \geq 2} for all {i}; as the {W_i} are all odd, we may improve this to

\displaystyle  |W_i| \geq 3 \ \ \ \ \ (10)

for all {i}. From this and (9) (and bounding {1/|G|} by one of the {1/|W_i| |H_i|}) we thus see that {n} is not too small:

\displaystyle  n>3. \ \ \ \ \ (11)

Next, we use the Sylow theorems to make the {|H_i|} pairwise coprime:

Lemma 9 For any distinct {i,j}, {|H_i|} and {|H_j|} are coprime.

Proof: Suppose for contradiction that {|H_i|} and {|H_j|} are divisible by a common prime {p}. Then {H_i} and {H_j} both contain groups of order {p}, and thus both non-trivially intersect a Sylow {p}-group. On the other hand, non-trivial Sylow {p}-groups have non-trivial centre (otherwise all conjugacy classes other than the identity would have order divisible by {p}, contradiction) and so must be abelian in a CA group. By further application of the CA property we thus conclude that {H_i} and {H_j} both contain a Sylow {p}-group. But all Sylow {p}-groups are conjugate, and so {H_i} non-trivially intersects a conjugate of {H_j}, contradicting (8). \Box

We remark that the above analysis also reveals that

\displaystyle  |G| = \prod_{i=1}^n |H_i| \ \ \ \ \ (12)

(because the order of a Sylow {p}-group is the largest power of {p} dividing {|G|}), although we will not need to rely on this fact here. (Actually we will barely use Lemma 9 as it is, it being needed to dispose of one technical case in the final analysis.) It is interesting though to see that classical techniques such as Sylow theorems are capable of demonstrating a number of facts about the various quantities appearing in the class equation (9), although without the additional control arising from character theory these facts appear to be insufficient in and of themselves to actually contradict that equation. As an example of (12) one can take the special linear group {G = SL_2(F_q)} with {q} a power of two, in which there are three abelian groups {H_1,H_2,H_3} (split torus, non-split torus, and unipotent group) of orders {q-1, q+1, q} respectively, with the entire group {G} being of order {q^3-q = (q-1)(q+1)q}.

We do not yet have a sufficiently strong upper bound on the right-hand side of (9), basically because we have no upper bound on the number {n} of conjugacy classes (or sufficiently good lower bounds on the {W_i}). To get further bounds we have to return to character theory. The basic idea will be to construct generalised characters which have small {L^2(G)^G} norm but which take non-trivial rational integer values at many places, which when combined with the integrality gap will yield useful bounds on various quantities that appear in (9).

We turn to the details. Let {H_i} be one of the maximal abelian groups. Being abelian, the character theory of {H_i} is just Fourier analysis: {\hat H_i = \hbox{Hom}(H_i,S^1)} is the group of linear characters on {H_i} (i.e. the Pontryagin dual of {H_i}). (Indeed, from (1) and the observation that the class number of an abelian group is the same as its order, we see that all irreducible representations of an abelian group are one-dimensional.)

The normaliser {N(H_i)} acts on {H_i} by conjugation; as {H_i} is abelian, the action of {H_i} on itself is trivial, and so we obtain an action of {W_i = N(H_i)/H_i} on {H_i} also. Taking adjoints, we obtain an action of {W_i} on {\hat H_i} as well. Any non-trivial element of {W_i} cannot fix an non-trivial element {h_i} of {H_i}, as the centraliser of {h_i} would then contain an element outside of {H_i}, contradicting the CA-group nature of {G}. Taking adjoints, we conclude that a non-trivial element {w_i} of {W_i} cannot fix an non-trivial element {\xi_i} of {\hat H_i} either (otherwise the action of {w_i} minus the identity would be non-injective, hence non-surjective, on {\hat H_i}, so that the corresponding homomorphism on {H_i} is non-injective). Thus we see that the action of {W_i} foliates the non-identity elements {\hat H_i \backslash \{1\}} of {\hat H_i} into orbits of size {|W_i|}. Among other things, this implies that {|W_i|} divides {|H_i|-1}, and that the number of such orbits {w_i} is {(|H_i|-1)/|W_i|}. As {|H_i|, |W_i|} are both odd, {w_i} is even, and in particular

\displaystyle  w_i \geq 2 \ \ \ \ \ (13)

for all {i=1,\ldots,n}.

Now let us make some generalised characters. Let {\xi_i, \xi'_i} be non-identity elements of {\hat H_i} which do not lie in the same {W_i}-orbit. Then {\xi_i-\xi'_i} is a generalised character of {H_i} that vanishes at the identity. Applying induction and (4), we see that

\displaystyle  \hbox{Ind}^G_{H_i} (\xi_i - \xi'_i) \ \ \ \ \ (14)

is a generalised character of {G} that is supported on the set

\displaystyle  \bigcup_{g \in G/N(H_i)} g H_i g^{-1} \backslash \{1_G \},

and whose restriction to {H_i} takes the form

\displaystyle  \sum_{w \in W_i} w\xi_i - w\xi'_i.

From the Plancherel identity (and the assumption that {\xi_i, \xi'_i} have disjoint {W_i}-orbits) we see that

\displaystyle  \sum_{x \in H_i} |\sum_{w \in W_i} w\xi_i(x) - w\xi'_i(x)|^2 = 2 |W_i| |H_i|

and so

\displaystyle  \sum_{x \in G} |\hbox{Ind}^G_{H_i} (\xi_i - \xi'_i)(x)|^2 = \frac{|G|}{|N(H_i)|} \times 2 |W_i| |H_i|;

since {|W_i| = |N(H_i)|/|H_i|}, we thus have

\displaystyle  \| \hbox{Ind}^G_{H_i} (\xi_i - \xi'_i) \|_{L^2(G)^G} = 2.

We can then apply Lemma 4 and conclude the important fact that {\hbox{Ind}^G_{H_i} (\xi_i - \xi'_i)} is the difference of two distinct irreducible characters of {G}.

Let {\xi_{i,1},\ldots,\xi_{i,w_i}} be a set of representatives of all the {W_i} orbits of {\hat H_i \backslash \{0\}}. Then by the above discussion, we see that

\displaystyle  \hbox{Ind}^G_{H_i} \xi_{i,a} - \hbox{Ind}^G_{H_i} \xi_{i,b}

is the difference of two distinct irreducible characters of {G} whenever {a,b \in \{1,\ldots,w_i\}} are distinct. From the linear independence of the irreducible characters and some easy combinatorics (using (13)), we then see that we can find distinct irreducible characters {\xi_{i,1}^*,\ldots,\xi_{i,w_i}^*} of {G} and a sign {\epsilon_i \in \{-1,+1\}} such that

\displaystyle  \hbox{Ind}^G_{H_i} \xi_{i,a} - \hbox{Ind}^G_{H_i} \xi_{i,b} = \epsilon_i (\xi_{i,a}^* - \xi_{i,b}^*) \ \ \ \ \ (15)

for all {a,b \in \{1,\ldots,w_i\}}. For {w \geq 3}, the sign {\epsilon_i} and the characters {\xi_{i,a}^*} are unique. When {w=2}, there is a non-uniqueness: one then has the freedom to swap {\xi_{i,1},\xi_{i,2}} while reversing the sign of {\epsilon_i}. But the set of characters {\xi_{i,1}^*,\ldots,\xi^*_{i,w_i}} remains unique. We will call the {\xi^*_{i,1},\ldots,\xi^*_{i,w_i}} the exceptional characters associated to {H_i}.

Note that if {\sigma \in \hbox{Gal}(\overline{{\bf Q}}/{\bf Q})} is in the absolute Galois group of the rationals, then {\sigma} permutes the non-trivial linear characters of {H_i}. Applying {\sigma} to (15) and using the uniqueness of the set of exceptional characters, we conclude that {\sigma} also permutes the exceptional characters of {H_i}. On the other hand, from (15) we know that the exceptional characters of {H_i} all agree outside of {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})}, and are thus fixed by the absolute Galois group in this region; in other words, they are rational outside of {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})}. On the other hand, as mentioned in the previous section, characters always take the values of algebraic integers. We conclude that

Lemma 10 Any exceptional character for {H_i} takes rational integer values outside of {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})}.

As remarked earlier, from (15) we see that {\xi_{i,a}^* - \xi_{i,b}^*} is supported on the set {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})}. In particular, if {i} and {j} are distinct, and {a,b \in \{1,\ldots,w_i\}} and {c,d \in \{1,\ldots,w_j\}}, then {\xi_{i,a}^* - \xi_{i,b}^*} and {\xi_{j,c}^* - \xi_{j,d}^*} are orthogonal. From this and the orthonormality of irreducible characters, we conclude (again using (13)) that {\xi_{i,a}^*} and {\xi_{j,c}^*} are distinct. Thus we see that the total number of exceptional characters in {\sum_{i=1}^k w_i}; together with the trivial character, this gives {1+\sum_{i=1}^k w_i} distinct irreducible characters of {G}. On the other hand, observe that {H_i \backslash \{1_G\}}, and hence {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})}, consists of {w_i} conjugacy classes, and so from (8) we see that the class number of {G} is also {1 + \sum_{i=1}^k w_i}. As these numbers match, we see that we have located all of the irreducible characters of {G}; thus every non-principal irreducible character of {G} is an exceptional character for some {H_i}.

Now that we have identified the irreducible characters of {G}, we can analyse other generalised characters in terms of them. We pick {i \in \{1,\ldots,n\}} and consider the generalised character

\displaystyle  \alpha_i = \hbox{Ind}^G_{H_i} 1 - \hbox{Ind}^G_{H_i} \xi_{i,1}.

As with (14), {\alpha_i} is supported on {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})} and when restricted to {H_i}, is equal to

\displaystyle  |W_i| - \sum_{w \in W_i} w\xi_{i,1}. \ \ \ \ \ (16)

In particular, from Plancherel’s theorem we have

\displaystyle  \sum_{x \in G} |\alpha_i(x)|^2 = \frac{|G|}{|N(H_i)|} (|W_i|^2 + |W_i|) |H_i|

and hence (since {|G|/|N(H_i)| = |W_i|})

\displaystyle  \| \alpha_i \|_{L^2(G)^G}^2 = |W_i| + 1. \ \ \ \ \ (17)

This is a bit too large of a norm to apply Lemma 4 again, but {\alpha_i} is still only of moderate size (recall our enemy when trying to contradict (9) is that the {|W_i|} are too small, too frequently), and we can nevertheless use the {L^2(G)^G} geometry of {\alpha_i} and the other known characters, together with the integrality gap, to limit how {\alpha_i} breaks up into irreducible components. Firstly, since the {w\xi_{i,1}} all have mean zero, we see that (16) sums to {|W_i| |H_i|} on {H_i}, and thus

\displaystyle  \langle \alpha_i, 1 \rangle_{L^2(G)^G} = \frac{1}{|G|} \frac{|G|}{|N(H_i)|} |W_i| |H_i| = 1.

Thus the Fourier coefficient of {\alpha_i} at the trivial representation is {1}. Next, we see that (16) is orthogonal to {v\xi_{i,a}-v\xi_{i,b}} for any {1 < a,b \leq w_1} and {v \in W_i}, which upon summing on the conjugacy classes of {H_i} and on {v} gives that

\displaystyle  \langle \alpha_i, \xi^*_{i,a} - \xi^*_{i,b} \rangle = 0.

Thus the Fourier coefficients of {\alpha_i} at the exceptional characters {\xi^*_{i,a}} for {a \neq 1} are all equal. Similarly, we have

\displaystyle  \sum_{x \in H_i} (|W_i| - \sum_{w \in W_i} w\xi_{i,1}(x)) (v\xi_{i,1}(x)-v\xi_{i,a}(x)) = - |H_i|

for any {1 < a \leq w_1} and {v \in W_i}, so from (15) we have

\displaystyle  \langle \alpha_i, \xi^*_{i,1} - \xi^*_{i,a} \rangle = - \epsilon_i,

and so the Fourier coefficient at {\xi^*_{i,1}} is {-\epsilon_1} plus the Fourier coefficients at all the other exceptional characters at {H_i}. Next, for {j} distinct from {i} and {a,b \in \{1,\ldots,w_j\}}, we see from (15) that the generalised character {\xi^*_{j,a}-\xi^*_{j,b}} is supported on {\bigcup_{g \in G/N(H_j)} (gH_jg^{-1} \backslash \{1_G\})}, which by (8) is disjoint from the support of {\alpha_i}, thus

\displaystyle  \langle \alpha_i, \xi^*_{j,a} - \xi^*_{j,b} \rangle = 0.

Thus all the Fourier coefficients of {\alpha_i} at exceptional characters of {H_j} are the same. We thus have obtained a decomposition of the form

\displaystyle  \alpha_i = 1 + a_i \sum_{k=1}^{w_i} \xi^*_{i,k} - \epsilon_i \xi^*_{i,1} + \sum_{j \neq i} c_{i,j} \sum_{k=1}^{w_j} \xi^*_{j,k} \ \ \ \ \ (18)

for some natural numbers {a_i, c_{i,j}}.

Taking {L^2(G)^G} norms using the orthonormality of the irreducible characters, we conclude that

\displaystyle  |W_i|+1 = 1 + (w_i-1) a_i^2 + (a_i-\epsilon_i)^2 + \sum_{j \neq i} c_{i,j}^2 w_j.

Note that regardless of what {a_i} is, the quantity {(w_i-1) a_i^2 + (a_i-\epsilon_i)^2} is always at least one, thanks to (13). We thus obtain an upper bound on the {c_{i,j}}:

\displaystyle  \sum_{j \neq i} c_{i,j}^2 w_j \leq |W_i| - 1.

In particular, from (13) we see that there are not many {j} for which {c_{i,j}} is non-zero:

\displaystyle  |\{ j \neq i: c_{i,j} \neq 0 \}| \leq \frac{|W_i| - 1}{2}. \ \ \ \ \ (19)

This is progress towards our goal of bounding (9) (because it helps control {n}), except that we also need to deal with those {j} for which {c_{i,j}} is zero. For this, the generalised character {\alpha_i} will no longer be useful, but another character of small norm – namely, {\sum_{k=1}^{w_i} \xi^*_{i,k}} – will be available as a substitute.

We turn to the details. Let {j} be such that {c_{i,j}=0}. Then we return to (18) and conclude that

\displaystyle  0 = 1 + a_i \sum_{k=1}^{w_i} \xi^*_{i,k} - \epsilon_i \xi^*_{i,1}

on the set {\bigcup_{g \in G/N(H_j)} (gH_jg^{-1} \backslash \{1_G\})}. On this set, we know from Lemma 10 that {\sum_{k=1}^{w_i} \xi^*_{i,k}} is an integer. But furthermore, from Proposition 7 we know that the exceptional characters {\xi^*_{i,1},\ldots,\xi^*_{i,w_i}} come in conjugate pairs, so in fact Lemma 10 gives that {\sum_{k=1}^{w_i} \xi^*_{i,k}} is an even integer. We conclude that {\xi^*_{i,1}} is an odd integer on {\bigcup_{g \in G/N(H_j)} (gH_jg^{-1} \backslash \{1_G\})}, and in particular, has magnitude at least {1} on this set. As all the {\xi^*_{i,1},\ldots,\xi^*_{i,w_1}} agree outside of {\bigcup_{g \in G/N(H_i)} (gH_ig^{-1} \backslash \{1_G\})}, we conclude that

\displaystyle  |\sum_{k=1}^{w_i} \xi^*_{i,k}| \geq w_i

on {\bigcup_{g \in G/N(H_j)} (gH_jg^{-1} \backslash \{1_G\})}. On the other hand, from the orthonormality of the {\xi^*_{i,k}} we know that {\sum_{k=1}^{w_i} \xi^*_{i,k}} has an {L^2(G)^G} norm of {w_i^{1/2}}. Since each set {\bigcup_{g \in G/N(H_j)} (gH_jg^{-1} \backslash \{1_G\})} has cardinality {|G| (\frac{1}{|W_j|} - \frac{1}{|W_j| |H_j|})} (as was shown in the derivation of the class equation (9)), we conclude that

\displaystyle  \sum_{j \neq i: c_{i,j} = 0} \frac{1}{|W_j|} - \frac{1}{|W_j| |H_j|} \leq \frac{1}{w_i}. \ \ \ \ \ (20)

We now have enough bounds on the various terms in (9) to obtain the necessary contradiction to finish Suzuki’s theorem from an elementary (though admittedly ad hoc) analysis. It is convenient to order the subgroups {H_1,\ldots,H_n} so that

\displaystyle  |W_1| \leq |W_2| \leq \ldots \leq |W_n|. \ \ \ \ \ (21)

We then write the right-hand side of (9) as

\displaystyle  \frac{1}{|G|} + \frac{1}{|W_1|} - \frac{1}{|W_1| |H_1|}

\displaystyle  + \sum_{j \neq 1: c_{1,j}\neq 0} \frac{1}{|W_j|} - \frac{1}{|W_j| |H_j|}

\displaystyle  + \sum_{j \neq 1: c_{1,j} = 0} \frac{1}{|W_j|} - \frac{1}{|W_j| |H_j|}.

For the first summation, we crudely bound {\frac{1}{|W_j|} - \frac{1}{|W_j| |H_j|}} by {\frac{1}{|W_2|}} and use (19); for the second summation we use (20). We conclude from (9) that

\displaystyle  1 \leq \frac{1}{|G|} + \frac{1}{|W_1|} - \frac{1}{|W_1| |H_1|} + \frac{|W_1|-1}{2} \frac{1}{|W_2|} + \frac{1}{w_1} \ \ \ \ \ (22)

and thus (bounding {1/|G|} by {1/|N(H_1)| = 1/(|W_1| |H_1|)} and {\frac{1}{|W_2|}} by {\frac{1}{|W_1|}})

\displaystyle  \frac{1}{2} \leq \frac{1}{2|W_1|} + \frac{1}{w_1}.

Applying (10), we conclude that {w_1 \leq 3}, which forces {w_1=2} since {w_1} is even. Since {w_1 = (|H_1|-1)/|W_1|}, we conclude that {|H_1|=2|W_1|+1}. We use this to return to the bound (22) to obtain

\displaystyle  1 \leq \frac{1}{|G|} + \frac{1}{|W_1|} - \frac{1}{|W_1| (2|W_1|+1)} + \frac{|W_1|-1}{2} \frac{1}{|W_2|} + \frac{1}{2}

and hence

\displaystyle  \frac{1}{2} \leq \frac{1}{|G|} + \frac{2}{2|W_1|+1} + \frac{|W_1|-1}{2} \frac{1}{|W_2|}.

If {|W_2| \geq |W_1|+2}, then

\displaystyle  \frac{|W_1|-1}{2} \frac{1}{|W_2|} \leq \frac{1}{2} - \frac{3}{2|W_1|+4}

and so

\displaystyle  \frac{1}{|G|} \geq \frac{3}{2|W_1|+4} - \frac{2}{2|W_1|+1} = \frac{2|W_1|-5}{(2|W_1|+4)(2|W_1|+1)}

and so

\displaystyle  |G| \leq \frac{(2|W_1|+4)(2|W_1|+1)}{2|W_1|-5}. \ \ \ \ \ (23)

On the other hand

\displaystyle  |G| \geq |N(H_1)| = |W_1| (2|W_1|+1).

The two bounds are inconsistent for {|W_1| \geq 5}, so we have {|W_1|=3} from (10) (and the odd nature of {|W_1|}), which then gives the upper bound {|G| \leq 70} from (23), and Suzuki’s theorem can be verified by classical computations for the odd non-abelian groups of order less than {70} (of which there are actually not that many); alternatively one can use (11), Lemma 9, and (12) to eliminate this case (as no odd number less than {70} has more than three prime factors). So the only remaining case is when {w_1=2} and {|W_2|=|W_1|}. In this case we may interchange the indices {1} and {2} (which does not affect (21)) and repeating the above arguments we may thus also assume that {w_2=2}. Since {w_i = (|H_i|-1)/|W_i|}, we conclude that {|H_1|=|H_2|}. But this contradicts Lemma 9, and Suzuki’s theorem is proved.