In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as {SL_d({\bf Z})} or {SL_d({\bf R})}, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.

The definition of quasirandomness is easy enough to state:

Definition 1 (Quasirandom groups) Let {G} be a finite group, and let {D \geq 1}. We say that {G} is {D}-quasirandom if all non-trivial unitary representations {\rho: G \rightarrow U(H)} of {G} have dimension at least {D}. (Recall a representation is trivial if {\rho(g)} is the identity for all {g \in G}.)

Exercise 1 Let {G} be a finite group, and let {D \geq 1}. A unitary representation {\rho: G \rightarrow U(H)} is said to be irreducible if {H} has no {G}-invariant subspaces other than {\{0\}} and {H}. Show that {G} is {D}-quasirandom if and only if every non-trivial irreducible representation of {G} has dimension at least {D}.

Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group {SL_2({\bf R})} to obtain mixing estimates in that group.)

Quasirandomness behaves fairly well with respect to quotients and short exact sequences:

Exercise 2 Let {0 \rightarrow H \rightarrow G \rightarrow K \rightarrow 0} be a short exact sequence of finite groups {H,G,K}.

  • (i) If {G} is {D}-quasirandom, show that {K} is {D}-quasirandom also. (Equivalently: any quotient of a {D}-quasirandom finite group is again a {D}-quasirandom finite group.)
  • (ii) Conversely, if {H} and {K} are both {D}-quasirandom, show that {G} is {D}-quasirandom also. (In particular, the direct or semidirect product of two {D}-quasirandom finite groups is again a {D}-quasirandom finite group.)

Informally, we will call {G} quasirandom if it is {D}-quasirandom for some “large” {D}, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “{D \geq |G|^c} for some constant {c>0} independent of the size of {G}“, but other regimes of {D} are certainly of interest.

The way we have set things up, the trivial group {G = \{1\}} is infinitely quasirandom (i.e. it is {D}-quasirandom for every {D}). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:

Exercise 3 Let {D \geq 1}, and let {G} be a finite {D}-quasirandom group.

  • (i) Show that if {G} is non-trivial, then {|G| \geq D+1}. (Hint: use the mean zero component {\tau\downharpoonright_{\ell^2(G)_0}} of the regular representation {\tau: G \rightarrow U(\ell^2(G))}.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
  • (ii) Show that any proper subgroup {H} of {G} has index {[G:H] \geq D+1}. (Hint: use the mean zero component of the quasiregular representation.)

The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:

Exercise 4 (Quasirandomness, abelianness, and perfection) Let {G} be a finite group.

  • (i) If {G} is abelian and non-trivial, show that {G} is not {2}-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
  • (ii) Show that {G} is {2}-quasirandom if and only if it is perfect, i.e. the commutator group {[G,G]} is equal to {G}. (Equivalently, {G} is {2}-quasirandom if and only if it has no non-trivial abelian quotients.)

Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.

Exercise 5 Let {G} be a finite {D}-quasirandom group. Show that for any subgroup {G'} of {G}, {G'} is {D/[G:G']}-quasirandom, where {[G:G'] := |G|/|G'|} is the index of {G'} in {G}. (Hint: use induced representations.)

Now we give an example of a more quasirandom group.

Lemma 2 (Frobenius lemma) If {F_p} is a field of some prime order {p}, then {SL_2(F_p)} is {\frac{p-1}{2}}-quasirandom.

This should be compared with the cardinality {|SL_2(F_p)|} of the special linear group, which is easily computed to be {(p^2-1) \times p = p^3 - p}.

Proof: We may of course take {p} to be odd. Suppose for contradiction that we have a non-trivial representation {\rho: SL_2(F_p) \rightarrow U_d({\bf C})} on a unitary group of some dimension {d} with {d < \frac{p-1}{2}}. Set {a} to be the group element

\displaystyle  a := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},

and suppose first that {\rho(a)} is non-trivial. Since {a^p=1}, we have {\rho(a)^p=1}; thus all the eigenvalues of {\rho(a)} are {p^{th}} roots of unity. On the other hand, by conjugating {a} by diagonal matrices in {SL_2(F_p)}, we see that {a} is conjugate to {a^m} (and hence {\rho(a)} conjugate to {\rho(a)^m}) whenever {m} is a quadratic residue mod {p}. As such, the eigenvalues of {\rho(a)} must be permuted by the operation {x \mapsto x^m} for any quadratic residue mod {p}. Since {\rho(a)} has at least one non-trivial eigenvalue, and there are {\frac{p-1}{2}} distinct quadratic residues, we conclude that {\rho(a)} has at least {\frac{p-1}{2}} distinct eigenvalues. But {\rho(a)} is a {d \times d} matrix with {d < \frac{p-1}{2}}, a contradiction. Thus {a} lies in the kernel of {\rho}. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate {SL_2(F_p)} (see exercise below), and so {\rho} is trivial, a contradiction. \Box

Exercise 6 Show that for any prime {p}, the unipotent matrices

\displaystyle  \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ t & 1 \end{pmatrix}

for {t} ranging over {F_p} generate {SL_2(F_p)} as a group.

Exercise 7 Let {G} be a finite group, and let {D \geq 1}. If {G} is generated by a collection {G_1,\ldots,G_k} of {D}-quasirandom subgroups, show that {G} is itself {D}-quasirandom.

Exercise 8 Show that {SL_d(F_p)} is {\frac{p-1}{2}}-quasirandom for any {d \geq 2} and any prime {p}. (This is not sharp; the optimal bound here is {\gg_d p^{d-1}}, which follows from the results of Landazuri and Seitz.)

As a corollary of the above results and Exercise 2, we see that the projective special linear group {PSL_d(F_p)} is also {\frac{p-1}{2}}-quasirandom.

Remark 2 One can ask whether the bound {\frac{p-1}{2}} in Lemma 2 is sharp, assuming of course that {p} is odd. Noting that {SL_2(F_p)} acts linearly on the plane {F_p^2}, we see that it also acts projectively on the projective line {PF_p^1 := (F_p^2 \backslash \{0\}) / F_p^\times}, which has {p+1} elements. Thus {SL_2(F_p)} acts via the quasiregular representation on the {p+1}-dimensional space {\ell^2(PF_p^1)}, and also on the {p}-dimensional subspace {\ell^2(PF_p^1)_0}; this latter representation (known as the Steinberg representation) is irreducible. This shows that the {\frac{p-1}{2}} bound cannot be improved beyond {p}. More generally, given any character {\chi: F_p^\times \rightarrow S^1}, {SL_2(F_p)} acts on the {p+1}-dimensional space {V_\chi} of functions {f \in \ell^2( F_p^2 \backslash \{0\} )} that obey the twisted dilation invariance {f(tx) = \chi(t) f(x)} for all {t \in F_p^\times} and {x \in F_p^2 \backslash \{0\}}; these are known as the principal series representations. When {\chi} is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when {\chi} is the quadratic representation (thus taking values in {\{-1,+1\}} while being non-trivial), the principal series representation splits into the direct sum of two {\frac{p+1}{2}}-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed {F_p} in a quadratic extension {F_{p^2}} and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension {\frac{p-1}{2}}, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).

Exercise 9 Let {p} be an odd prime. Show that for any {n \geq p+2}, the alternating group {A_n} is {p-1}-quasirandom. (Hint: show that all cycles of order {p} in {A_n} are conjugate to each other in {A_n} (and not just in {S_n}); in particular, a cycle is conjugate to its {j^{th}} power for all {j=1,\ldots,p-1}. Also, as {n \geq 5}, {A_n} is simple, and so the cycles of order {p} generate the entire group.)

Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that {A_n} is {n-1}-quasirandom for {n \geq 6} (but is only {3}-quasirandom for {n=5} due to icosahedral symmetry, and {1}-quasirandom for {n \leq 4} due to lack of perfectness). Using Exercise 3 with the index {n} subgroup {A_{n-1}}, we see that the bound {n-1} cannot be improved. Thus, {A_n} (for large {n}) is not as quasirandom as the special linear groups {SL_d(F_p)} (for {p} large and {d} bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.

If one replaces the alternating group {A_n} with the slightly larger symmetric group {S_n}, then quasirandomness is destroyed (since {S_n}, having the abelian quotient {S_n/A_n}, is not perfect); indeed, {S_n} is {1}-quasirandom and no better.

Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group {{\bf Z}/p{\bf Z}}, an alternating group {A_n}, or is a finite simple group of Lie type such as {PSL_d(F_p)}. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance {PSL_d(F_p)} is constructed from {SL_d} in characteristic {p}.) In the case of finite simple groups {G} of Lie type with bounded rank {r=O(1)}, it is known from the work of Landazuri and Seitz that such groups are {\gg |G|^c}-quasirandom for some {c>0} depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group {G} is {|G|^c}-quasirandom for some {c>0} and {|G|} is sufficiently large depending on {c}, then {G} is a finite simple group of Lie type with rank {O_c(1)}. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).

A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in {SL_2({\bf Z}/N{\bf Z})} (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in {SL_d({\bf Z}/N{\bf Z})} for any fixed {d \geq 3}).

— 1. Mixing in quasirandom groups —

Let {G} be a finite group. Given two functions {f, g \in \ell^2(G)}, we can define the convolution {f*g \in \ell^2(G)} by the formula

\displaystyle  f*g(x) := \sum_{y \in G} f(y) g(y^{-1} x) = \sum_{y \in G} f(x y^{-1}) g(y).

This operation is bilinear and associative, but is not commutative unless {G} is abelian. From the Cauchy-Schwarz inequality one has

\displaystyle  \|f*g\|_{\ell^\infty(G)} \leq \|f\|_{\ell^2(G)} \|g\|_{\ell^2(G)}

and hence

\displaystyle  \|f*g\|_{\ell^2(G)} \leq |G|^{1/2} \|f\|_{\ell^2(G)} \|g\|_{\ell^2(G)}.

This inequality is sharp in the sense that if we set {f} and {g} to both be constant-valued, then the left-hand side and right-hand side match. For abelian groups, one can also see this example is sharp when {f} and {g} are multiples of the same character.

It turns out, though, that if one restricts one of {f} or {g} (or both) to be of mean zero, and {G} is quasirandom, then one can improve this inequality, which first appeared explicitly in the work of Babai, Nikolov, and Pyber:

Proposition 3 (Mixing inequality) Let {G} be a finite {D}-quasirandom group, and let {f, g \in \ell^2(G)}. If at least one of {f, g} has mean zero, then

\displaystyle  \|f*g\|_{\ell^2(G)} \leq D^{-1/2} |G|^{1/2} \|f\|_{\ell^2(G)} \|g\|_{\ell^2(G)}.

Proof: By subtracting a constant from {f} or {g}, we may assume that {f} or {g} both have mean zero.

Observe that {f*g} (being a superposition of right-translates of {f}) also has mean zero. Thus, we see that we may define an operator {T_g: \ell^2(G)_0 \rightarrow \ell^2(G)_0} by setting {T_g f := f*g}. It thus suffices to show that the operator norm of {T_g} is at most {D^{-1/2} |G|^{1/2} \|g\|_{\ell^2(G)}}.

Fix {g}. We can view {T_g} as a {|G|-1 \times |G|-1} matrix. We apply the singular value decomposition to this matrix to obtain singular values

\displaystyle  \sigma_1 \geq \ldots \geq \sigma_{|G|-1} \geq 0

of {T_g}, together with associated singular vectors. The operator norm of {T_g} is the largest singular value {\sigma_1}. The operator {T_g^* T_g} is then a self-adjoint operator (or matrix) with eigenvalues {\sigma_1^2, \ldots, \sigma_{|G|-1}^2}. In particular, we have

\displaystyle  \hbox{tr} T_g^* T_g = \sigma_1^2 + \ldots + \sigma_{|G|-1}^2.

Now, a short computation shows that {T_g^* T_g f = f * g * \tilde g}, where {\tilde g(x) := \overline{g(x^{-1})}}, and (by embedding {\ell^2(G)_0} in {\ell^2(G)}, and noting that {T_g^* T_g} annihilates constants) the trace can computed as

\displaystyle  \hbox{tr} T_g^* T_g = |G| g * \tilde g(0) = |G| \|g\|_{\ell^2(G)}^2.

Thus, if {V} is the eigenspace of {T_g^* T_g} corresponding to the eigenvalue {\sigma_1^2} (so that the dimension of {V} is the multiplicity of {\sigma_1}), we have

\displaystyle  \hbox{dim}(V) \sigma_1^2 \leq |G| \|g\|_{\ell^2(G)}^2.

Now observe that if {\tau: G \rightarrow U(\ell^2(G)_0)} is the left-regular representation (restricted to {\ell^2(G)_0}) then

\displaystyle  T_g^* T_g \tau(h) f = \tau(h) T_g^* T_g f

for any {f \in \ell^2(G)_0} and {h \in G} (this is a special case of the associativity of convolution). In particular, we see that {V} is invariant under {\tau}. Since {\tau} has no non-trivial invariant vectors in {\ell^2(G)_0}, we conclude from quasirandomness that {V} has dimension at least {D}, and the claim follows. \Box

Remark 5 One can also establish the above inequality using the nonabelian Fourier transform (which is based on the Peter-Weyl theorem combined with Schur’s lemma, and is developed in this blog post); we leave this as an exercise for the interested reader.

Exercise 10 Let {A, B, C} be subsets of a finite {D}-quasirandom group {G}. Show that

\displaystyle  \| 1_A * 1_B - \frac{|A| |B|}{|G|} \|_{\ell^2(G)} \leq D^{-1/2} |G|^{1/2} |A|^{1/2} |B|^{1/2}

and

\displaystyle  \| 1_A * 1_B * 1_C - \frac{|A| |B| |C|}{|G|} \|_{\ell^\infty(G)} \leq D^{-1/2} |G|^{1/2} |A|^{1/2} |B|^{1/2} |C|^{1/2}.

Conclude in particular that

\displaystyle  |AB| \geq |G| - \frac{|G|^3}{D|A| |B|},

and that {ABC=G} whenever {|A| |B| |C| > |G|^3/D}. (The bounds here are not quite sharp, but are simpler than the optimal bounds, and suffice for most applications.)

Thus, for instance, if {A} is a subset of a finite {D}-quasirandom group {G} of density {|A|/|G|} more than {D^{-1/3}}, then {A^2} will be most of {G} (with fewer than {|A|} elements omitted), and {A^3} will be all of {G}; thus large subsets of a quasirandom group rapidly expand to fill out the whole group. In the converse direction, we have

Exercise 11 Let {D \geq 1}, and let {G} be a finite group which is not {D}-quasirandom. Show that there exists a subset {A} of {G} with {|A|/|G| \geq C^{-D}} for some absolute constant {C > 1}, such that {A^3 \subsetneq G}. (Hint: by hypothesis, one has a non-trivial unitary representation {\rho: G \rightarrow U(H)} of dimension at most {D}. Show that {\rho(G)} contains an element at distance {\gg 1} from the identity in operator norm, and take {A} to be the preimage of a suitable ball around the identity in the operator norm.)

One can improve this result by using a quantitative form of Jordan’s theorem; see the next section.

Exercise 12 (Mixing inequality for actions) Let {G} be a finite {D}-quasirandom group acting (on the left) on a discrete set {X}. Given functions {f \in \ell^2(G)} and {g \in\ell^2(X)}, one can define the convolution {f*g \in \ell^2(X)} in much the same way as before:

\displaystyle  f*g(x):=\sum_{h \in G} f(h) g(h^{-1} x).

Show that

\displaystyle  \|f*g\|_{\ell^2(G)} \leq D^{-1/2} |G|^{1/2} \|f\|_{\ell^2(G)} \|g\|_{\ell^2(G)}.

whenever {f} has mean zero, or whenever {g} has mean zero on every orbit of {G}.

One can use quasirandomness to show that Cayley graphs of very large degree {k} in a quasirandom group are expanders:

Exercise 13 Let {Cay(G,S)} be a {k}-regular Cayley graph in a finite {D}-quasirandom group {G} on {n} vertices.

  • (i) If {A}, {B} are subsets of {G}, show that

    \displaystyle  |E(A,B) - \frac{k}{n} |A||B|| \leq \sqrt{ \frac{kn |A||B|}{D} }

    (compare with the expander mixing lemma, Exercise 9 from Notes 1).

  • (ii) Show that {Cay(G,S)} is a two-sided {\epsilon}-expander whenever

    \displaystyle  \epsilon \leq 1 - \frac{1}{n} - \sqrt{\frac{n}{Dk}}.

Unfortunately, the above result is only non-trivial in the regime {k \gg n/D}, whereas for our applications we are interested instead in the regime when {k=O(1)}. We record a tool for this purpose.

Proposition 4 (Using quasirandomness to demonstrate expansion) Let {Cay(G,S)} be a {k}-regular Cayley graph in a finite group {G}. Assume the following:

  • (i) (Quasirandomness) {G} is {c |G|^\alpha}-quasirandom for some {c,\alpha>0}.
  • (ii) (Flattening of random walk) One has

    \displaystyle  \| \mu^{*n} \|_{\ell^2(G)} \leq C |G|^{-1/2+\beta} \ \ \ \ \ (1)

    for some {C, \beta, n > 0} with {\beta < \alpha/2} and {n \leq C \log |G|}, where {\mu := \frac{1}{|S|} \sum_{s \in S} \delta_s} and {\mu^{*n}} is the {n}-fold convolution of {\mu}.

Then {G} is a two-sided {\epsilon}-expander for some {\epsilon>0} depending only on {c,\alpha,C,\beta,k}, if {|G|} is sufficiently large depending on these quantities. If we replace {\mu} by {\nu := \frac{1}{2}\delta_1 + \frac{1}{2} \mu} in the flattening hypothesis, then {G} is a one-sided {\epsilon}-expander instead.

Proof: We allow implied constants to depend on {c, \alpha, C, \beta}. We will just prove the first claim, as the second claim is similar. By Exercise 5 from Notes 2, it will suffice to show that

\displaystyle  \| \mu^{*m} - \frac{1}{|G|} \|_{\ell^2(G)} \ll |G|^{-1}

(say) for some {m = O(\log |G|)}. But from Proposition 3 (with {f := \mu^{*m}-\frac{1}{|G|}} and {g := \mu^{*n}}) and the hypotheses we have

\displaystyle  \| \mu^{*(m+n)} - \frac{1}{|G|} \|_{\ell^2(G)} \ll |G|^{-\alpha/2+\beta} \| \mu^{*m} - \frac{1}{|G|} \|_{\ell^2(G)}

for any {m \geq 0}. Iterating this (starting from, say, {m=n}, and advancing in steps of {n} {O(1)} times) we obtain the claim. \Box

Exercise 14 Obtain an alternate proof of the above result that proceeds directly from the spectral decomposition of the adjacency operator {Af := f * \mu} into eigenvalues and eigenvectors and quasirandomness, rather than through Exercise 5 from Notes 2 and Proposition 3. (This alternate approach is closer in spirit to the arguments of Sarnak-Xue and Bourgain-Gamburd, though the two approaches are largely equivalent in the final analysis.)

Informally, the flattening hypothesis in Proposition 4 asserts that by time {O(\log |G|)}, the random walk has expanded to the point where it is covering a large portion of the group {G} (roughly speaking, it is spread out over a set of size at least {|G|^{1-2\beta}}). The point is that the scale of this set is large enough for the quasirandomness properties of the group {G} to then mix the random walk rapidly towards the uniform distribution. However, this proposition provides no tools with which to prove this flattening property; this task will be a focus of subsequent notes.

The following exercise extends some of the above theory from quasirandom groups to “virtually quasirandom” groups, which have a bounded index subgroup that is quasirandom, but need not themselves be quasirandom.

Exercise 15 (Virtually quasirandom groups) Let {G} be a finite group that contains a normal {D}-quasirandom subgroup {G'} of index at most {K}.

  • (i) If {f, g \in \ell^2(G)}, and at least one of {f, g} has mean zero on every coset of {G'}, show that

    \displaystyle  \|f*g\|_{\ell^2(G)} \leq D^{-1/2} |G|^{1/2} \|f\|_{\ell^2(G)} \|g\|_{\ell^2(G)}.

  • (ii) If {|D| \geq c |G|^\alpha} for some {c,\alpha > 0}, and {Cay(G,S)} is a connected {k}-regular Cayley graph obeying (1) for some {C, \beta, n > 0} with {\beta < \alpha/2} and {n \leq C \log |G|}, show that {G} is a two-sided {\epsilon}-expander for some {\epsilon>0} depending only on {c,\alpha,C,\beta,k,K}.

— 2. An algebraic description of quasirandomness (optional) —

As defined above, quasirandomness is a property of representations. However, one can reformulate this property (at a qualitative level, at least) in a more algebraic fashion, by means of Jordan’s theorem:

Theorem 5 (Jordan’s theorem) Let {G} be a finite subgroup of {U_d({\bf C})} for some {d \geq 1}. Then {G} contains a normal abelian subgroup of index at most {K(d)}, where {K(d)} depends only on {d}.

A proof of this theorem (giving a rather poor value of {K(d)}) may be found in this previous blog post. The optimal value of {K(d)} is known for almost all {d}, thanks to the classification of finite simple groups: for instance, it is a result of Collins that {K(d)=(d+1)!} for {d \geq 71} (which is attained with the example of the symmetric group {S_{d+1}} which acts on the space {{\bf C}^{d+1}_0} of {d+1}-dimensional complex vectors whose coefficients sum to zero).

Jordan’s theorem can be used to give a qualitative description of quasirandomness, providing a converse to Exercises 3 and 4:

Exercise 16 Let {D>1} be an integer. Let {G} be a perfect finite group, with the property that all proper normal subgroups of {G} have index greater than {K(D-1)}, where {K(D-1)} is the quantity in Jordan’s theorem. Show that {G} is {D}-quasirandom.

Conclude in particular that any finite simple nonabelian group of cardinality greater than {K(D-1)} is {D}-quasirandom.

By using the classification of finite simple groups more carefully, Nikolov and Pyber were able to replace {K(D-1)} here by {10^{10} D^2}. Using related arguments, they also showed that if {G} was not {D}-quasirandom, then there was a subset {A} of {G} with cardinality {\gg |G|/D} such that {A^3 \neq G}, thus giving a reasonably tight converse to Exercise 10.

— 3. A weak form of Selberg’s 3/16 theorem (optional) —

Remark 6 This section presumes some familiarity with Riemannian geometry, as well as the functional analysis of Sobolev spaces and distributions. See for instance this blog post for a very brief introduction to Riemannian geometry, and these two previous posts for an introduction to distributions and Sobolev spaces.

We now give an application of quasirandomness to establish the following result, first observed explicitly by Lubotsky, Phillips and Sarnak as a corollary of a famous theorem of Selberg:

Theorem 6 (Selberg’s expander construction) If {S} is a symmetric set of generators of {SL_2({\bf Z})} that does not contain the identity, then the Cayley graphs {Cay( SL_2(F_p), \pi_p(S) )} form a one-sided expander family, where {\pi_p: SL_2({\bf Z}) \rightarrow SL_2(F_p)} is the obvious projection homomorphism, and {p} ranges over primes.

This is the {d=2} analogue of Margulis’s expander construction (Corollary 13 from Notes 2), except that the modulus {n} has been restricted here to be prime. This restriction can be removed with some additional effort, but we will not discuss this issue here. The condition that {S} generates the entire group {SL_2({\bf Z})} can be substantially relaxed; we will discuss this point in later notes.

In the property (T) approach to expansion, one passed from discrete groups such as {SL_d({\bf Z})} to continuous groups such as {SL_d({\bf R})}, in order to take advantage of tools from analysis (such as limits). Similarly, to prove Theorem 6, we will pass from {SL_2({\bf Z})} to {SL_2({\bf R})}. Actually, it will be convenient to work with the quotient space {{\bf H} := SL_2({\bf R})/SO_2({\bf R})}, better known as the hyperbolic plane. We will endow this plane with the structure of a Riemannian manifold, in order to access the Laplace-Beltrami operator on that plane, which is a continuous analogue (after some renormalisation) of the adjacency operator of a Cayley graph, which enjoys some nice exact identities which are difficult to discern in the discrete world.

We now therefore digress from the topic of expansion to recall the geometry of the hyperbolic plane. It will be convenient to switch between a number of different coordinatisations of this plane. Our primary model will be the half-plane model:

Definition 7 (Poincaré half-plane model) The Poincaré half-plane is the upper half-plane {{\bf H} := \{ x+iy \in {\bf C}: y>0 \}} with the Riemannian metric {ds^2 := \frac{dx^2 + dy^2}{y^2}}. The (left) action of {SL_2({\bf R})} on this half-plane is given by the formula

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z := \frac{az+b}{cz+d}.

One easily verifies that this gives an isometric action on {{\bf H}}.

Exercise 17 Verify that {SL_2({\bf R})} acts isometrically and transitively on {{\bf H}}, with stabiliser group isomorphic to {SO_2({\bf R})}; thus {{\bf H}} is isomorphic (as an {SL_2({\bf R})}-homogeneous space) to {SL_2({\bf R})/SO_2({\bf R})}.

Note that in some of the literature, a right action is used instead of a left action, leading to some reversals in the notational conventions used below, but this does not lead to any essential changes in the arguments or results.

Exercise 18 Show that the distance {d(z,w)} between two points {z,w \in {\bf H}} is given by the formula

\displaystyle  \cosh d(z,w) = 1 + 2 \frac{|z-w|^2}{4 \hbox{Im}(z) \hbox{Im}(w)}.

We can also use the model of the Poincaré disk {{\bf D} := \{ a+ib \in {\bf C}: a^2+b^2 < 1 \}} with the Riemannian metric {ds^2 := 4 \frac{da^2+db^2}{(1-a^2-b^2)^2}}.

Exercise 19 Show that the Cayley transform {z \mapsto \frac{z-i}{z+i}} is an isometric isomorphism from the half-plane {{\bf H}} to the disk {{\bf D}}.

Expressing an element {a+ib} of the Poincaré disk in exponential polar coordinates as {\tanh(\rho/2) e^{i\theta}}, we can also model the Poincaré disk (in slightly singular coordinates) as the half-cylinder {\{ (\rho,\theta): \rho \in [0,+\infty); \theta \in {\bf R}/2\pi{\bf Z}\}} with metric {ds^2 = d\rho^2 + \sinh^2 \rho d\theta^2}. (Compare with the Euclidean plane in polar coordinates, which is similar but with the {\sinh^2 \rho} factor replaced by {\rho^2}, or the sphere in Euler coordinates, which is also similar but with {\rho} restricted to {[0,\pi]} and {\sinh^2 \rho} replaced by {\sin^2 \rho}. This similarity reflects the fact that these three Riemannian surfaces have constant curvature {-1, 0, +1} respectively.)

The action of {SL_2({\bf R})} can of course be described explicitly in the disk or half-plane models, but we will not need these explicit formulae here.

A Riemannian metric on a manifold always generates a measure {d\mu} on that manifold. For the Poincaré half-plane, the measure is {d\mu = \frac{dx dy}{y^2}}. For the Poincaré disk, it is {d\mu = 4 \frac{da db}{(1-a^2-b^2)^2}}. For the half-cylinder, it is {\sinh \rho\ d \rho d\theta}. In all cases, the action of {SL_2({\bf R})} will preserve the measure, because it preserves the metric, thus one can view {d\mu} as a Haar measure on the hyperbolic plane.

A Riemannian metric also generates a Laplace-Beltrami operator {\Delta}. In the Poincaré half-plane model, it is

\displaystyle  \Delta = y^2 ( \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} );

in the Poincaré disk model, it is

\displaystyle  \Delta = \frac{(1-a^2-b^2)^2}{4} ( \frac{\partial^2}{\partial a^2} + \frac{\partial^2}{\partial b^2} );

and in the half-cylinder model, it is

\displaystyle  \Delta = \frac{\partial^2}{\partial \rho^2} + \frac{1}{\sinh \rho} \frac{\partial}{\partial \rho} + \frac{1}{\sinh^2 \rho} \frac{\partial^2}{\partial \theta^2}.

Again, in all cases, the Laplacian will commute with the action of {SL_2({\bf R})}, because this action preserves the metric.

The discrete group {SL_2({\bf Z})} acts on the hyperbolic plane, giving rise to a quotient {X(1) := SL_2({\bf Z}) \backslash {\bf H}}, known as the principal modular curve of level {1}. This quotient can also be viewed by taking the (closure of a) fundamental domain

\displaystyle  \Omega := \{ z \in {\bf H}: |\hbox{Re}(z)| \leq 1/2; |z| \geq 1 \}

and then identifying {-1/2+it} with {1/2+it} on the left and right sides of this domain, and also identifying {z} with {-1/z} on the lower boundary of this domain. The quotient {X(1)} is not compact, but it does have finite measure with respect to {\mu}; indeed, outside of a compact set, {X(1)} behaves like the cusp

\displaystyle  \{ x+iy: -1/2 \leq x \leq 1/2; y > C \} \ \ \ \ \ (2)

for any constant {C>1}, again identifying the {x=-1/2} boundary with the {x=1/2} boundary, and this strip has measure

\displaystyle  \int_C^\infty\int_{-1/2}^{1/2} \frac{dx dy}{y^2} < \infty.

Thus {\mu} descends to a finite Haar measure on {X(1)}.

Remark 7 One can interpret the modular curve geometrically as follows. As in the previous set of notes, one can think of {SL_2({\bf R})} as the space of all unimodular lattices in {{\bf R}^2} (or equivalently, in {{\bf C}}) with two marked generators {z_1, z_2 \in {\bf C}} with {\hbox{Im}(\overline{z_1} z_2)=1}. We can then map {SL_2({\bf R})} to the Poincaré half-plane {{\bf H}} by sending such a lattice with generators {z_1,z_2} to the point {z_2/z_1}; note that the fibers of this map correspond to rotations of the lattice and marked generators, thus identifying {{\bf H}} with {SL_2({\bf R})/SO_2({\bf R})}. The action of {SL_2({\bf Z})} on {{\bf H}} corresponds to moving the generators around while keeping the lattice (or more precisely, the lattice modulo rotations) fixed. The (interior of the) fundamental domain {\Omega} then corresponds to the selection of generators given by setting {z_1} to be the non-zero lattice element of smallest norm, and {z_2} to be the generator whose {z_1} component lies between {-z_1/2} and {+z_1/2}.

The quotient {X(1)} is not quite a smooth Riemannian manifold, due to the presence of partially fixed points of the {SL_2({\bf Z})} action at {+i} and {\pm 1/2 + \sqrt{3}/2 i} (of order {2} and order {3} respectively), and is thus technically an orbifold rather than a manifold. However, this distinction turns out to not significantly affect the analysis and will be glossed over here.

The Laplace-Beltrami operator {\Delta} is defined on smooth compactly supported functions {f \in C^\infty_c({\bf H})} on {{\bf H}}, and then descends to an operator on smooth compactly supported functions {f \in C^\infty_c(X(1))} on {X(1)}. (Here, we use the smooth structure on {X(1)} inherited from {{\bf H}}, thus a function is smooth at a point in {X(1)} if it lifts to a function smooth at the preimage of that point.) On {{\bf H}}, we have the integration by parts formula

\displaystyle  \int_{\bf H} (-\Delta f) g\ d\mu = \int_{\bf H} \langle \nabla f, \nabla g \rangle\ d\mu

where {\nabla} is the gradient with respect to the Riemannian metric, and {\langle,\rangle} the inner product; in the half-plane coordinates, we have

\displaystyle  \langle \nabla f, \nabla g \rangle = y^2 (\frac{\partial f}{\partial x} \frac{\partial g}{\partial x} + \frac{\partial f}{\partial y} \frac{\partial g}{\partial y}),

in the disk model, it is

\displaystyle  \langle \nabla f, \nabla g \rangle = \frac{(1-a^2-b^2)^2}{4} (\frac{\partial f}{\partial a} \frac{\partial g}{\partial a} + \frac{\partial f}{\partial b} \frac{\partial g}{\partial b})

and in the half-cylinder model, it is

\displaystyle \langle \nabla f, \nabla g \rangle = \frac{\partial f}{\partial \rho} \frac{\partial g}{\partial \rho} + \frac{1}{\tanh^2 \rho} \frac{\partial f}{\partial \theta} \frac{\partial g}{\partial \theta}.

In particular, we have the positive definiteness property

\displaystyle  \langle -\Delta f, f \rangle_{L^2({\bf H}, \mu)} = \int_{\bf H} |\nabla f|^2\ d\mu \geq 0

for all {f \in C^\infty_c({\bf H})}. This descends to {X(1)}:

\displaystyle  \langle -\Delta f, f \rangle_{L^2(X(1), \mu)} = \int_{X(1)} |\nabla f|^2\ d\mu \geq 0

Thus {-\Delta} is a symmetric positive-definite densely-defined operator on {L^2(X(1),\mu)}. One can in fact show (by solving some PDE, such as the wave equation or the resolvent equation, and exploiting at some point the fact that the Riemannian manifold {X(1)} is complete) that {-\Delta} is essentially self-adjoint and is thus subject to the spectral theorem, but we will avoid using the full force of spectral theory here.

Since {X(1)} has finite measure and {\Delta 1 = 0}, we see that {1 \in L^2(X(1),\mu)} is an eigenfunction of {\Delta} (or {-\Delta}) with eigenvalue zero. We eliminate this eigenfunction by working in the space {L^2(X(1))_0} (or {C^\infty_c(X(1))_0}) of functions in {L^2(X(1))} (or {C^\infty_c(X(1))}) of mean zero. Let us now define the spectral gap {\lambda_1(X(1))} to be the quantity

\displaystyle  \lambda_1(X(1)) := \inf \{ \int_{X(1)} |\nabla f|^2\ d\mu: f \in C^\infty_c(X(1))_0; \|f\|_{L^2(X(1))} = 1 \}.

Then {\lambda_1(X(1)) \geq 0}. Using the spectral theorem, one can interpret the spectral gap as the infimum of the spectrum {\sigma(-\Delta)} of the (negative) Laplacian on {L^2(X(1))_0}. Note also that one can take {f} to be either real or complex valued, as this will not affect the value of the spectral gap. Also by a truncation and mollification argument we may allow {f} to range in {L^2(X(1))_0} instead of {C^\infty_c(X(1))_0} here if desired.

We have the following bounds:

Proposition 8 (Spectral gap of {X(1)}) We have {0 < \lambda_1(X(1)) \leq \frac{1}{4}}.

Proof: We first establish the upper bound {\lambda_1(X(1)) \leq \frac{1}{4}}. It will suffice to find non-zero functions {f \in C^\infty_c(X(1))_0} whose Rayleigh quotient

\displaystyle  \frac{\int_{X(1)} |\nabla f|^2\ d\mu}{\int_{X(1)} |f|^2\ d\mu}

is arbitrarily close to {1/4}.

We will restrict attention to smooth compactly supported functions {f} supported on the cusp (2) for a fixed {C} (e.g. one can take {C=2}). In coordinates, the Raleigh quotient becomes

\displaystyle  \frac{\int_C^\infty \int_{-1/2}^{1/2} |f_x|^2 + |f_y|^2\ dx dy}{\int_C^\infty \int_{-1/2}^{1/2} \frac{|f|^2}{y^2}\ dx dy}

where we use subscripts to denote partial differentiation, while the mean zero condition becomes

\displaystyle  \int_C^\infty \int_{-1/2}^{1/2} \frac{f}{y^2}\ dx dy = 0.

To build such functions, we select a large parameter {R \gg C}, and choose a function {f(x,y) = f_R(y)} that depends only on the {y} variable, is supported on the region {\{ C < y < 2R \}}, and equals {y^{1/2}} in the region {\{ 2C \leq y \leq R \}} and is smoothly truncated in the intermediate region (assigning enough negative mass in the region {\{ C < y \leq 2C \}} to obtain the mean zero condition). A brief calculation shows that

\displaystyle  \int_C^\infty \int_{-1/2}^{1/2} \frac{|f|^2}{y^2}\ dx dy = \log R + O(1)

and

\displaystyle  \int_C^\infty \int_{-1/2}^{1/2} |f_x|^2 + |f_y|^2\ dx dy = \frac{1}{4} \log R + O(1)

(where the implied constant can depend on {C} but not on {R}), and so the claim follows by sending {R \rightarrow \infty}.

Now we show the lower bound {\lambda_1(X(1)) > 0}. Suppose this claim failed; then we may find a sequence of functions {f_n \in C^\infty_c(X(1))_0} with {\|f_n\|_{L^2(X(1))}=1} such that

\displaystyle  \int_{X(1)} |\nabla f_n|^2\ d\mu= o(1)

where {o(1)} denotes a quantity that goes to zero as {n \rightarrow \infty}. We can take the {f_n} to be real valued.

To deal with the non-compact portion of {X(1)} (i.e. the strip (2)) we now use Hardy’s inequality. Observe that if {f} is smooth, real-valued, and compactly supported on a cusp (2) for some {C} (we can take {C=2} as before), then by integration by parts

\displaystyle  \int_C^\infty \int_{-1/2}^{1/2} \frac{f f_y}{y}\ dx dy = \frac{1}{2} \int_C^\infty \int_{-1/2}^{1/2} \frac{f^2}{y^2}\ dx dy

and hence by Cauchy-Schwarz

\displaystyle  \int_C^\infty \int_{-1/2}^{1/2} \frac{|f|^2}{y^2}\ dx dy \leq 4 \int_C^\infty \int_{-1/2}^{1/2} |f_y|^2\ dx dy.

Applying this to a truncated version {f(x,y) = \chi(y/R) f_n(x,y)} of {f_n} for some {R>C} and some smooth cutoff {\chi: {\bf R}^+ \rightarrow [0,1]} supported on {[1,+\infty)} that equals one on {[2,+\infty)}, we conclude that

\displaystyle  \int_{R}^\infty \int_{-1/2}^{1/2} \frac{|f_n|^2}{y^2}\ dx dy \leq 4 \int_R^\infty \int_{-1/2}^{1/2}|(f_n)_y|^2\ dx dy + O( \int_R^{2R} \frac{|f_n|^2}{y^2}\ dx dy ).

For any {\epsilon > 0}, one can use the pigeonhole principle to find an {R = O_\epsilon(1)} (depending on {n}) such that

\displaystyle  \int_R^{2R} \frac{|f_n|^2}{y^2}\ dx dy \leq \epsilon

and thus we see that

\displaystyle  \int_{R_\epsilon}^\infty \int_{-1/2}^{1/2} \frac{|f_n|^2}{y^2}\ dx dy \ll \epsilon + o(1)

for some {R_\epsilon} depending only on {\epsilon}. Thus, the probability measures {|f_n|^2\ d\mu} form a tight sequence of measures in {X(1)}. As the {f_n} are also locally uniformly bounded in the Sobolev space {H^1(X(1))}, we conclude from the Rellich compactness theorem (or the Poincaré inequality) that after passing to a subsequence, the {f_n} converge strongly in {L^2(X(1))} to a limit {f}, which then has {L^2(X(1))} norm one, mean zero, and {\nabla f = 0} in a distributional sense. But then by the Poincaré inequality, {f} is constant, which is absurd. \Box

Remark 8 One can in fact establish after some calculation using the theory of modular forms that {\lambda_1(X(1))} is exactly {1/4}, but we will not do so here. By modifying the above arguments, one can in fact show that {-\Delta} on {X(1)} has absolutely continuous spectrum on {[1/4,+\infty)}.

Now we move back towards the task of establishing expansion for the Cayley graphs {Cay( SL_2(F_p), \pi_p(S) )}. Let {\Gamma(p)} denote the kernel of the projection map {\pi_p}; this is the group of matrices in {SL_2({\bf Z})} that are equal to {1} mod {p}, and is known as a principal congruence subgroup of {SL_2({\bf Z})}. It is a finite index normal subgroup of {SL_2({\bf Z})}, and the quotient {SL_2({\bf Z})/\Gamma(p)} can easily be seen to be isomorphic to {SL_2(F_p)}. In analogy with what we did for {X(1)}, we can then define the principal modular curve {X(p) := \Gamma(p) \backslash {\bf H}}, and then define the Laplacian {\Delta} on this curve and the spectral gap {\lambda_1(X(p))}. At a qualitative level, the geometry of {X(p)} is similar to that of {X(1)}, except that instead of having just one cusp (2), there are now multiple cusps (which do not necessarily go to infinity as in (2), but may instead go to some other point on the boundary {{\bf R} \cup \{\infty\}} of the hyperbolic plane).

Remark 9 One may think of {X(p)} as being formed by cutting up a finite number of of {X(1)}‘s and then (pseudo-)randomly sowing them together to create a tangled orbifold that is a continuous analogue of an expander graph; see this Notices article of Sarnak for more discussion of this perspective. Indeed, one can view {X(p)} as a continuous analogue of the Cayley graph {Cay( PSL_2(F_p), \pi'_p(S))}, where {S := \{ \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} \}}, and {\pi'_p} is the projection onto {PSL_2(F_p)}, with each vertex of the Cayley graph being replaced by a copy of the fundamental domain {\Omega} of {X(1)}, with these domains then being glued together along their edges as prescribed by the edges of the Cayley graph.

By a routine modification of Proposition 8, one can show that

\displaystyle  0 < \lambda_1(X(p)) \leq \frac{1}{4}.

(Note also that as {X(p)} is a finite isometric cover of {X(1)}, we have the trivial bound {\lambda_1(X(p)) \leq \lambda_1(X(1))}.) However, these arguments do not keep {\lambda_1(X(p))} uniformly bounded away from zero. Much more is conjectured to be true:

Conjecture 9 (Selberg’s conjecture) One has {\lambda_1(X(p)) = \frac{1}{4}} for all {p} (not necessarily prime).

This conjecture remains open (though it has been verified numerically for small values of {p}, in particular for all {p \leq 857} by Booker and Stromgbergsson). On the other hand, we have the following celebrated result of Selberg:

Theorem 10 (Selberg’s 3/16 theorem) One has {\lambda_1(X(p)) \geq \frac{3}{16}} for all {p} (not necessarily prime).

Selberg’s argument uses a serious amount of number-theoretic machinery (in particular, bounds for Kloosterman sums) and will not be reproduced here. The {\frac{3}{16}} bound has since been improved; the best bound currently known is {\frac{975}{4096}}, due to Kim and Sarnak and involving even more number-theoretic machinery (related to the Langlands conjectures); this argument will also not be discussed further here.

In the papers of by Sarnak and Xue and by Gamburd, an argument based primarily on quasirandomness that used only very elementary number theory was introduced, to obtain the following result:

Theorem 11 (Weak Selberg theorem) One has {\lambda_1(X(p)) \geq \min( \lambda_1(X(1)), \frac{5}{36}-o(1) )} for all primes {p}, where {o(1)} goes to zero as {p \rightarrow \infty}.

In particular, one has a uniform lower bound {\lambda_1(X(p)) \geq c} for some absolute constant {c>0} (and, since one can compute that {\lambda_1(X(1)) = \frac{1}{4}}, one can in fact take {c=\frac{5}{36}}). Despite giving a weaker result than Theorem 10, the argument is more flexible and can be applied to other arithmetic surfaces than {X(p)}, for which the method of Selberg does not seem to apply; see the papers of Sarnak-Xue and Gamburd for further discussion.

We will not quite prove Theorem 11 here, but instead establish the following even weaker version which uses the same ideas, but in a slightly less computation-intensive fashion (at the cost of some efficiency in the argument):

Theorem 12 (Even weaker Selberg theorem) One has {\lambda_1(X(p)) \geq \min( \lambda_1(X(1)), \frac{1}{12} - o(1) )} for all primes {p}.

Of course, this result is still strong enough to supply a uniform lower bound on {\lambda_1(X(p))}.

Before we prove Theorem 12 (a spectral gap in the continuous world), let us show how it can be transferred to deduce Theorem 6 (a spectral gap in the discrete world). Suppose for contradiction that Theorem 6 failed. Then we can find a finite symmetric generating set {S} for {SL_2({\bf Z})} (not containing the identity) and a sequence of primes {p_n} going to infinity such that the one-sided expansion constant of {Cay( SL_2(F_{p_n}), \pi_{p_n}(S) )} goes to zero. Write {G_n := SL_2(F_{p_n})}. Applying the weak discrete Cheeger inequality (Exercise 3 from Notes 2), we conclude that we can find non-empty subsets {E_n \subset G_n} of size {|E_n| \leq \frac{1}{2} |G_n|} which are almost {\pi_{p_n}(S)}-invariant in the sense that that {|E_n \pi_{p_n}(S)| = 1+o(1) |E_n|}. Since {S} generates {SL_2({\bf Z})}, we conclude in particular that

\displaystyle  |E_n \pi_{p_n}(s) \Delta E_n| = o(|E_n|) \ \ \ \ \ (3)

for any {s \in SL_2({\bf Z})} independent of {n}.

The idea now is to pass from this nearly-invariant discrete set {E_n} to a nearly-invariant continuous analogue {f_n} to which the uniform bound on the spectral gap can be applied to obtain a contradiction. (This argument is similar in spirit to Proposition 9 from Notes 2.)

We turn to the details. Let {R \geq 1} be a large parameter (independent of {n}) to be chosen later, and let {z_0} be a point on {X(p_n)} (avoiding fixed points of the {G_n} action); for sake of concretness we can take {z_0} to be the projection of {2i \in {\bf H}} to {X(p_n)}. Note that {G_n} acts on {X(p_n)}. We consider the function {f_n: X(p_n) \rightarrow [0,1]} defined by the formula

\displaystyle  f_n(z) := \max( \min( 2 - \frac{\hbox{dist}(z, E_n z_0)}{R}, 1), 0).

This function equals {1} when {z} is within {R} (in the hyperbolic metric) of a point in the orbit {E_n z_0}, and equals {0} when {z} is further than {2R} of this orbit; in particular, it is compactly supported. The function is also Lipschitz with constant {O(1/R)}, so {|\nabla f_n| \leq 1/R} (using a weak derivative).

The curve {X(p_n)} is a {|G_n|}-fold cover of {X(1)} and thus has volume {|G_n| \mu(X(1))}. Observe that {f_n} equals {1} on the {R}-neighbourhood of any point in {E_n z_0}. As these points are separated from each other by a bounded distance (independent of {n} and {R}), we conclude that

\displaystyle  \mu( \{ x \in X(p_n): f_n(x)=1 \} ) \gg |E_n|,

where the bound is uniform in {R}. Conversely, if {\gamma \in G_n} is not of the form {\gamma = \gamma_1 \pi_{p_n}(\gamma_2)} for some {\gamma_1 \in E_n} and some {\gamma_2 \in SL_2({\bf Z})} within distance {3R} from the identity, we have {f_n} equal to {0} on the {R}-neighbourhood of {\gamma z_0}. There are only {O_R(1)} possible choices for {\gamma_2}; since {R} is independent of {n}, we conclude from (3) that all but {(1+o(1)) |E_n|} {\gamma} in {G_n} are of the form described above. Since {|E_n| \leq |G_n|/2}, we conclude that

\displaystyle  \mu( \{ x \in X(p_n): f_n(x)=0 \} ) \gg |G_n|

if {n} is sufficiently large depending on {R}. As a consequence, if we let

\displaystyle  \tilde f_n := f_n - \frac{1}{\mu(X(p_n))} \int_{X(p_n)} f_n\ d\mu

be the mean-free component of {f_n}, we have the lower bound

\displaystyle  \| \tilde f_n \|_{L^2(X(p_n),\mu)} \gg |E_n|^{1/2}, \ \ \ \ \ (4)

for {n} sufficiently large depending on {R}.

On the other hand, {\nabla \tilde f_n = \nabla f_n} is non-zero only at points which are at distance between {R} and {2R} of {E_n z_0}. Call the set of such points {A}. To estimate the volume {A}, we partition {X(p_n)} into {|G_n|} sets of the form {\gamma \Omega}, where {\Omega} is a fundamental domain of {X(1)} (projected onto {X(p_n)}) and {\gamma} ranges over {G_n}. Because the ball of radius {2R} centred at {z_0} is precompact and thus meets only {O_R(1)} of the translated domains {\gamma \Omega}, we see that the only {\gamma} for which {\gamma \Omega} meets {A} are of the form {\gamma_1 \pi_{p_n}(\gamma_2)}, where {\gamma_1} lies in {E_n} and {\gamma_2} lies in a subset of {SL_2({\bf Z})} of size {O_R(1)} that is independent of {n}. From (3) we conclude that all but at most {o(|E_n|)} of these {\gamma} thus lie in {E_n}, and so

\displaystyle  \mu(A) \leq |E_n| + o(|E_n|).

Since {\nabla \tilde f_n = \nabla f_n = O(1/R)}, we conclude that

\displaystyle  \| \nabla \tilde f_n \|_{L^2(X(p_n),\mu)} \ll \frac{1}{R} |E_n|^{1/2},

for {n} sufficiently large depending on {R}. But this and (4) contradict the uniform lower bound on the spectral gap {\lambda_1(X(p_n))} (after regularising {\tilde f_n} in a standard fashion to make it smooth rather than merely Lipschitz), giving the desired contradiction.

We now turn to the proof of Theorem 12. The first step is to show that the only source of spectrum below {1/4} is provided by eigenfunctions.

Proposition 13 (Discrete spectrum below {1/4}) Suppose that {\lambda_1(X(p)) < 1/4}. Then there exists a non-zero {\phi \in L^2(X(p))} such that {-\Delta \phi = \lambda_1(X(p)) \phi} (in the distributional sense).

Note that while {\phi} is only initially in {L^2(X(p))}, it is a routine application of elliptic regularity (which we omit here) to show that {\phi} is necessarily smooth.

Proof: For notational simplicity, we will just prove the claim in the {p=1} case, though the general case is similar. Write {\lambda := \lambda_1(X(p))}, so that {\lambda < 1/4}. The argument will be similar in spirit to the proof of the lower bound {\lambda > 0}. Indeed, by definition of {\lambda}, we can find a sequence of functions {f_n \in C^\infty_c(X(1))_0} with {\|f_n\|_{L^2(X(1))}=1} such that

\displaystyle  \int_{X(1)} |\nabla f_n|^2\ d\mu= \lambda+o(1). \ \ \ \ \ (5)

As before, we can take the {f_n} to be real valued.

Using Hardy’s inequality as in the proof of Proposition 8, we see that

\displaystyle  \int_{R}^\infty \int_{-1/2}^{1/2} \frac{|f_n|^2}{y^2}\ dx dy \leq 4 \int_R^\infty \int_{-1/2}^{1/2} |(f_n)_y|^2\ dx dy + O( \int_R^{2R} \frac{|f_n|^2}{y^2}\ dx dy ) \ \ \ \ \ (6)

for any {R>C}. For any {\epsilon > 0}, one can use the pigeonhole principle to find an {R = O_\epsilon(1)} (depending on {n}) such that

\displaystyle  \int_R^{2R} \frac{|f_n|^2}{y^2}\ dx dy \leq \epsilon

and thus we see that

\displaystyle  \int_{R_\epsilon}^\infty \int_{-1/2}^{1/2} \frac{|f_n|^2}{y^2}\ dx dy \ll \epsilon + 4\lambda + o(1)

for some {R_\epsilon} depending only on {\epsilon}. If {\epsilon} is small enough, {\epsilon+4\lambda < 1 = \int_{X(1)} |f_n|^2\ d\mu}, and thus

\displaystyle  \int_{y \leq R_\epsilon} |f_n|^2\ d\mu \gg 1

for all sufficiently large {n}. By (5), {f_n} is also uniformly bounded in {H^1} norm. Thus by the Rellich compactness theorem, we may pass to a subsequence and assume that {f_n} converges weakly in {L^2(X(1))} and strongly in {L^2_{loc}} to a limit {\phi}, which is then non-zero. Also, from (6) we see that for each {\epsilon, R_0} and {n} there is an {R_0 \leq R = O_{R_0,\epsilon}(1)} such that

\displaystyle  \lambda \int_{y>R} |f_n|^2\ d\mu \leq \int_{y > R} |\nabla f_n|^2\ d\mu + O(\epsilon)

and thus

\displaystyle  \lambda \int_{y \leq R} |f_n|^2\ d\mu \geq \int_{y \leq R} |\nabla f_n|^2\ d\mu + O(\epsilon) + o(1).

Taking limits in weak {L^2} (and strong {L^2_{loc}}), we conclude that for some {R = O_{\epsilon,R_0}(1)} larger than {R_0} that

\displaystyle  \lambda \int_{y \leq R} |\phi|^2\ d\mu \geq \int_{y \leq R} |\nabla \phi|^2\ d\mu + O(\epsilon).

Sending {R_0 \rightarrow \infty} using monotone convergence, we conclude that

\displaystyle  \lambda \int_{X(1)} |\phi|^2\ d\mu \geq \int_{X(1)} |\nabla \phi|^2\ d\mu + O(\epsilon)

for any {\epsilon}; by definition of {\lambda}, we must then have

\displaystyle  \lambda \int_{X(1)} |\phi|^2\ d\mu = \int_{X(1)} |\nabla \phi|^2\ d\mu.

Perturbing {\phi} in some test function direction {g \in C^\infty_c(X_1)} of mean zero, and using the definition of {\lambda}, we conclude that

\displaystyle  \lambda \int_{X(1)} \langle \phi, g \rangle\ d\mu = \int_{X(1)} \langle \nabla \phi, \nabla g \rangle\ d\mu

for all such {g}. The mean zero condition on {g} can be removed since both sides of this equation vanish when {g} is constant. By duality we thus see that {- \Delta \phi = \lambda \phi} in the sense of distributions, as required. \Box

Exercise 20 Establish the above proposition for general {p}.

Exercise 21 Show that for any {\lambda < 1/4}, the spectrum of {-\Delta} in {[0,\lambda]} on {X(p)} is finite (and in particular consists only of eigenvalues), with each eigenvalue having finite multiplicity. (For this exercise you may use without proof the fact that {-\Delta} is essentially self-adjoint.)

We will also need a variant of the above proposition:

Lemma 14 (Eigenfunctions do not concentrate in cusps) Let {\lambda_0 < 1/4}. Then there is a compact subset {F} of {X(1)}, such that for any {p} and any eigenfunction {-\Delta \phi = \lambda \phi} on {X(p)} with some {\lambda < \lambda_0}, one has

\displaystyle  \int_{\eta_p^{-1}(F)} |\phi(x)|^2\ d\mu(x) \gg_{\lambda_0} \int_{X(p)} |\phi(x)|^2\ d\mu(x)

where the implied constant is independent of {p}, {\lambda}, and {\phi}, and {\eta_p: X(p) \rightarrow X(1)} is the covering map.

The lemma is basically proved by applying Hardy’s inequality to each cusp of {X(p)}; see the paper of Gamburd for details.

Now we can start using quasirandomness. Let {V \subset L^2(X(p))_0} be the space of all eigenfunctions of {-\Delta} of eigenvalue {\lambda}:

\displaystyle  V := \{ \phi \in L^2(X(p))_0: -\Delta \phi = \lambda \phi \}.

By the above proposition, this is a non-trivial Hilbert space. From Exercise 21, {V} is finite-dimensional (though we do not really need to know this fact yet in the argument that follows, as it will be a consequence of the computations). Since {SL_2(F_p)} acts isometrically on {X(p)}, it also acts on {V}. If {\phi} is a {SL_2(F_p)}-invariant vector in {V}, then it descends to a function on {L^2(X(1))_0}, which is impossible if {\lambda < \lambda_1(X(1))}. Applying the Frobenius lemma (Lemma 2), we conclude

Lemma 15 (Quasirandomness) If {\lambda < \lambda_1(X(1))}, then {V} has dimension at least {\frac{p-1}{2}}.

To complement this quasirandomness to get expansion, we need a flattening property, as in Proposition 4. In the discrete world, we applied a flattening property to the distribution {\mu^{*m}} of a long discrete random walk. The direct analogue of such a distribution would be a heat kernel {e^{t\Delta}} of the Laplacian {\Delta}, and this is what we shall use here. (It turns out that the heat kernel is not quite the most efficient object to analyse here; see Remark 12 below.)

We first recall the formula for the heat kernel on the hyperbolic plane {{\bf H}} (which can be found in many places, such as this text of Chavel, or this text of Terras):

Exercise 22 Show that the heat operator {e^{t\Delta}} on test functions {f} in {{\bf H}} is given by the formula

\displaystyle  e^{t\Delta} f(x) = \int_{\bf H} K_t( d(x,y) ) f(y)\ d\mu(y)

where {K_t} is the kernel

\displaystyle  K_t(\rho) := \frac{\sqrt{2}}{(4\pi t)^{3/2}} e^{-t/4} \int_{\rho}^\infty \frac{ s e^{-s^2/4t}}{(\cosh s - \cosh \rho)^{1/2}}\ ds.

(Hint: There are two main computations. One is to show that {K_t(\rho)} obeys the heat equation, which in half-cylindrical coordinates means that one has to verify that

\displaystyle  \frac{\partial}{\partial t} K_t(\rho) = (\frac{\partial^2}{\partial \rho^2} + \frac{1}{\sinh \rho} \frac{\partial}{\partial \rho}) K_t(\rho). \ \ \ \ \ (7)

The other is to show that {K_t(\rho)} resembles the Euclidean heat kernel {\frac{1}{4\pi t} e^{-\rho^2/4t}} for small {t}. There are several other ways to derive this formula in terms of formulae for other operators (e.g. the wave propagator); see for instance Terras’s book for some discussion.)

For our purposes, we only need a crude upper bound on the heat kernel:

Exercise 23 With the notation of the preceding section, show that

\displaystyle  K_t(\rho) \ll (t+\rho)^{O(1)} e^{-t/4} e^{-\rho/2} e^{-\rho^2/4t}.

when {t \geq 1} and {\rho \geq 0}.

In our applications, the polynomial factors {(t+\rho)^{O(1)}} will be negligible; only the exponential factors will be of importance. Note that if one integrates the above estimate against the measure {d\mu = \sinh \rho d \rho d \theta}, one sees that

\displaystyle  \int_{\bf H} K_t\ d\mu \ll \int_0^\infty (t+\rho)^{O(1)} e^{-t/4} e^{+\rho/2} e^{-\rho^2/4t} d \rho. \ \ \ \ \ (8)

The right-hand side evaluates to {O(t^{O(1)})}. On the other hand, as the heat kernel is a probability measure, one has {\int_{\bf H} K_t\ d\mu= 1}. Thus, up to polynomial factors, the above estimate is quite tight.

Remark 10 From Exercise 23, we see that the probability measure {K_t(\rho) \sinh \rho d\rho d\theta} concentrates around the region {\rho = t+ O(\sqrt{t})}; thus on the hyperbolic plane, Brownian motion moves “ballistically” away from its starting point at a unit speed, in contrast to the situation in Euclidean geometry, where after time {t} a Brownian motion is only expected to move by a distance {O(\sqrt{t})}. One can see this phenomenon also from the heat equation (7), which when expressed in terms of the probability density {u(\rho) := K_t(\rho) \sinh \rho} becomes a Fokker-Planck equation

\displaystyle  \frac{\partial}{\partial t} u(\rho) = \frac{\partial^2}{\partial \rho^2} u(\rho) - \frac{\partial}{\partial \rho}(\frac{1}{\tanh \rho} u)(\rho)

with unit diffusion and drift speed {\frac{1}{\tanh \rho}}. Since {\frac{1}{\tanh \rho}} rapidly approaches {1} when {\rho} becomes large, we thus expect {u} to concentrate in the region {\rho = t + O(\sqrt{t})}, as is indeed the case.

We let {t \geq 1} be a parameter to optimise in later. The heat operator {e^{t\Delta}} on {{\bf H}} descends to a heat operator on the quotient {X(p)}, defined by the formula

\displaystyle  e^{t\Delta} f(x) = \int_{X(p)} \sum_{z \in \Gamma(p) y} K_t(d(x,z)) f(y)\ d\mu(y)

for {f \in C_c(X(p))}; note that the sum {\sum_{z \in \Gamma(p) y} K_t(d(x,z))} is {\Gamma(p)}-invariant, and so makes sense for {x \in X(p)} and not just for {x \in {\bf H}}. When applied to an eigenfunction {\phi \in V}, one has {e^{t\Delta} \phi = e^{-t\lambda} \phi}; in particular, {e^{t\Delta}} preserves {V}, and thus also preserves the orthogonal complement of {V}. As {e^{t\Delta}} is positive semi-definite, it therefore splits as the sum of {e^{-t\lambda} P_V} and another positive semi-definite operator, where {P_V} is the orthogonal projection to {V}. Note that {e^{-t\lambda} P_V} is an integral operator with kernel

\displaystyle  e^{-t\lambda} \sum_{i=1}^{\hbox{dim}(V)} \phi_i(x) \overline{\phi_i(y)}

where {\phi_1,\ldots,\phi_{\hbox{dim}(V)}} is an orthonormal basis of {V}. (Here we use the fact that {V} is finite dimensional, but if one does not want to use this fact yet, one can work instead with a finite-dimensional subspace of {V} in the argument that follows.) Since positive semi-definite integral operators (with continuous kernel) are non-negative on the diagonal, we conclude the pointwise inequality

\displaystyle  e^{-t\lambda} \sum_{i=1}^{\hbox{dim}(V)} |\phi_i(x)|^2 \leq \sum_{\gamma \in \Gamma(p)} K_t(d(x,\gamma x))

for all {x \in X(p)}.

This will be our starting point to get a lower bound on {\lambda}. But first we must deal with the other quantities {\phi_i}, {K_t} in this expression. A simple way to proceed here is to integrate out in {X(p)} to exploit the {L^2} normalisation of the {\phi_i}:

\displaystyle  e^{-t\lambda} \hbox{dim}(V) \leq \int_{X(p)} \sum_{\gamma \in \Gamma(p)} K_t(d(x,\gamma x))\ d\mu(x).

However, this turns out to be a little unfavourable because the integrand on the right-hand side does not behave well enough at cusps. However, if one uses Lemma 14 first (assuming that {\lambda \leq 1/12}), and integrates over the resulting region {\eta_p^{-1}(F)}, we can avoid the cusps:

\displaystyle  e^{-t\lambda} \hbox{dim}(V) \ll \int_{\eta_p^{-1}(F)} \sum_{\gamma \in \Gamma(p)} K_t(d(x,\gamma x))\ d\mu(x).

Because the sum here is {SL_2(F_p)}-invariant, we can descend from {X(p)} to {X(1)}:

\displaystyle  e^{-t\lambda} \hbox{dim}(V) \ll |SL_2(F_p)| \int_{F} \sum_{\gamma \in \Gamma(p)} K_t(d(x,\gamma x))\ d\mu(x).

We now insert the bound in Exercise 23, as well as the bound {|SL_2(F_p)| \ll p^3}:

\displaystyle  e^{-t\lambda} \hbox{dim}(V) \ll p^3 \int_{F} \sum_{\gamma \in \Gamma(p)} (t + d(x,\gamma x))^{O(1)} e^{-t/4} e^{-d(x,\gamma x)/2} e^{-d(x,\gamma x)^2/4t}\ d\mu(x).

Because {F} is compact, we can get a good bound on {d(x,\gamma x)}:

Exercise 24

  • (i) For any {\gamma \in SL_2({\bf R})}, show that {d(i,\gamma i) = 2 \log \| \gamma \| + O(1)}, where {\| \gamma \| := (a^2+b^2+c^2+d^2)^{1/2}} is the Frobenius norm of the matrix {\gamma =: \begin{pmatrix} a & b \\ c & d \end{pmatrix}}.
  • (ii) More generally, if {F} is a compact subset of {X(1)}, show that {d(x,\gamma x) \leq C_F \log \|\gamma \| + C_F} for some constant {C_F} depending only on {F}.

Inserting these bounds, we obtain

\displaystyle  e^{-t\lambda} \hbox{dim}(V) \ll p^3 \sum_{\gamma \in \Gamma(p)} (t + \log \|\gamma\|)^{O(1)} e^{-t/4} e^{-\log \| \gamma \|} e^{-(\log \|\gamma\| + O(1))^2/t}.

Decomposing according to the integer part {R} of {\log \gamma + O(1)}, we thus have

\displaystyle  e^{-t\lambda} \hbox{dim}(V) \ll p^3 \sum_{R=1}^\infty (t+R)^{O(1)} e^{-t/4} e^{-R} e^{-R^2 / t} N_p(e^{R+O(1)}) \ \ \ \ \ (9)

where {N_p(T)} is the counting function

\displaystyle  N_p(T) := |\{ \gamma \in \Gamma(p): \|\gamma\| \leq T \}.

So one is left with the purely number-theoretic task of estimating {N_p(T)}. This is basically the number of points of {\Gamma(p) i} in the ball of radius {2 \log T} in {{\bf H}}. From the half-cylinder model, we see that the measure of this ball is {O( e^{2\log T} ) = O(T^2)}. On the other hand, {\Gamma(p)} has index {|SL_2(F_p)| \sim p^3} in {\Gamma(1)}, which has bounded covolume in {{\bf H}}. We thus heuristically expect {N_p(T)} to be {O(T^2/p^3)}. If this were the truth, then the right-hand side of (9) would be {O(t^{O(1)})} (cf. the evaluation of (8)), which when combined with quasirandomness (Lemma 15) would give a lower bound of {\lambda}, that would be particularly strong when {t} was small.

The key is then the following “flattening lemma”, that shows that {N_p(T)} is indeed roughly of the order of {O(T^2/p^3)} when {T} is large, and is the main number-theoretic input to the argument:

Lemma 16 (Flattening lemma) For any {\epsilon>0}, one has

\displaystyle  N_p(T) \ll_\epsilon \frac{T^{2+\epsilon}}{p^3} + \frac{T^{1+\epsilon}}{p} + T^\epsilon.

Proof: Using the definition of {\Gamma(p)}, we are basically counting the number of integer solutions {(a,b,c,d) \in {\bf Z}^4} to the equation

\displaystyle  ad-bc = 1

subject to the congruences

\displaystyle  a = d = 1 \hbox{ mod } p; \quad b = c = 0 \hbox{ mod } p

and the bounds

\displaystyle  a, b, c, d = O(T).

Since {b, c} are both divisible by {p}, we see also that {ad = 1 \hbox{ mod } p^2}. Similarly, as {a-1} and {d-1} are divisible by {p}, we have {(a-1)(d-1)=0 \hbox{ mod } p^2}. Subtracting, we conclude that

\displaystyle  a + d = 2 \hbox{ mod } p^2.

Now we proceed as follows. The number of integers {a = 1 \hbox{ mod } p} with {a = O(T)} is {O( \frac{T}{p} + 1 )}. For each such {a}, the number of {d} with {a+d=2 \hbox{ mod } p^2} and {d = O(T)} is {O( \frac{T}{p^2} + 1 )}. For each fixed {a} and {d}, the expression {ad-1} is of size {O(T^2)}; by the divisor bound, there are thus {O_\epsilon(T^\epsilon)} ways to factor {ad-1} into {bc}. Thus, we obtain a final bound of

\displaystyle  N_p(T) \ll_\epsilon (\frac{T}{p}+1) (\frac{T}{p^2}+1) T^\epsilon

giving the claim. \Box

Remark 11 One can obtain improved bounds to {N_p(T)} for some ranges of {T} (particularly when {T} ranges between {p} and {p^2}) by using more advanced tools, such as bounds on Kloosterman sums. Unfortunately, such improvements do not actually improve the final constants in this argument. (Kloosterman sums do however play a key role in the proof of Theorem 10, which proceeds by a different, and more highly arithmetic, argument.)

A routine calculation then finishes off the proof of Theorem 12:

Exercise 25 Using the above lemma, show that the right-hand side of (9) is

\displaystyle  \ll_\epsilon e^{\epsilon t} ( 1 + p^3 e^{-t/4} ) \ \ \ \ \ (10)

for any {\epsilon>0}. Optimising this in {t} and using Lemma 15, establish a contradiction whenever {\lambda < \min(\frac{1}{12}-\epsilon,\lambda_1(X(1)))} and {p} is sufficiently large depending on {\epsilon}, thus giving Theorem 12.

Remark 12 An inspection of the above argument shows that the {p^3 e^{-t/4}} term in (10) is the main obstacle to improving the {\frac{1}{12}} constant. This term ultimately can be “blamed” for the relatively large value of the heat kernel {K_t(\rho)} at the origin. To improve this, one can observe that the main features of the heat kernel {K_t(\rho)} that were needed for the argument were that it was positive definite, and had an explicit effect on eigenfunctions {\phi}. It turns out that there are several other kernels with these properties, and by selecting a kernel with less concentration at the identity, one can obtain a better result. In particular, an efficient choice of kernel turns out to be the convolution of a ball of radius {t} with itself. By performing some additional calculations in hyperbolic geometry (in particular, using the Selberg/Harish-Chandra theory of spherical functions) one can use this kernel to improve the {1/12} bound given here to {5/36}; see the paper of Gamburd for details. Unfortunately, the fraction {5/36} here appears to be the limit of this particular method.