You are currently browsing the yearly archive for 2011.
One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function is restricted to a narrow region of physical space, then its Fourier transform must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.
In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support of this function is finite (in practice, we will only work with functions that are supported in an interval for some natural numbers ). Then we can define the Fourier transform by the formula
where . (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)
The classical uncertainty principle, in this context, asserts that if is localised in an interval of length , then must be “smeared out” at a scale of at least (and essentially constant at scales less than ). For instance, if is supported in , then we have the Plancherel identity
while from the Cauchy-Schwarz inequality we have
for each frequency , and in particular that
for any arc in the unit circle (with denoting the length of ). In particular, an interval of length significantly less than can only capture a fraction of the energy of the Fourier transform of , which is consistent with the above informal statement of the uncertainty principle.
Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if is supported in , and are frequencies in that are -separated for some , thus for all (where denotes the distance of to the origin in ), then
The reader is encouraged to see how this inequality is consistent with the Plancherel identity (1) and the intuition that is essentially constant at scales less than . The factor can in fact be amplified a little bit to , which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates to (and replaces each frequency by their roots), and then sending (cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.
In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric on the integers (in particular, the parameter is essentially the Archimedean diameter of the support of ). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the -adic metrics play an equally important role; indeed, it is common to unify the Archimedean and -adic perspectives together into a unified adelic perspective. In the -adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of . Intersecting these balls from different -adic metrics together, we obtain residue classes with respect to various moduli (which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo “. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).
In this context, the uncertainty principle is this: the more residue classes modulo that avoids, the more the Fourier transform must spread out along multiples of . To illustrate a very simple example of this principle, let us take , and suppose that is supported only on odd numbers (thus it completely avoids the residue class ). We write out the formulae for and :
If is supported on the odd numbers, then is always equal to on the support of , and so we have . Thus, whenever has a significant presence at a frequency , it also must have an equally significant presence at the frequency ; there is a spreading out across multiples of . Note that one has a similar effect if was supported instead on the even integers instead of the odd integers.
A little more generally, suppose now that avoids a single residue class modulo a prime ; for sake of argument let us say that it avoids the zero residue class , although the situation for the other residue classes is similar. For any frequency and any , one has
From basic Fourier analysis, we know that the phases sum to zero as ranges from to whenever is not a multiple of . We thus have
Let us continue this analysis a bit further. Now suppose that avoids residue classes modulo a prime , for some . (We exclude the case as it is clearly degenerates by forcing to be identically zero.) Let be the function that equals on these residue classes and zero away from these residue classes, then
Using the periodic Fourier transform, we can write
for some coefficients , thus
Some Fourier-analytic computations reveal that
and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):
Thus we see that the more residue classes mod we exclude, the more Fourier energy has to disperse along multiples of . It is also instructive to consider the extreme case , in which is supported on just a single residue class ; in this case, one clearly has , and so spreads its energy completely evenly along multiples of .
In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:
where is the Möbius function.
We give a proof of this proposition below the fold.
Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the -adic senses. This leads to the following corollary:
Corollary 2 (Arithmetic large sieve inequality) Let be a function supported on an interval which, for each prime , avoids residue classes modulo for some . Let , and let be a finite set of natural numbers. Suppose that the frequencies with , , and are -separated. Then one has
where was defined in (4).
Indeed, from the large sieve inequality one has
while from Proposition 1 one has
whence the claim.
There is a great deal of flexibility in the above inequality, due to the ability to select the set , the frequencies , the omitted classes , and the separation parameter . Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:
Corollary 3 (Large sieve) Let be a set of integers contained in which avoids residue classes modulo for each prime , and let . Then
whenever are distinct fractions in this sequence.
If, for instance, is the set of all primes in larger than , then one can set for all , which makes , where is the Euler totient function. It is a classical estimate that
Using this fact and optimising in , we obtain (a special case of) the Brun-Titchmarsh inequality
where is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality
for any primitive residue class , where is the number of primes less than or equal to that are congruent to . By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality
for any natural numbers with . This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.
I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.
In 1964, Kemperman established the following result:
Remark 1 The estimate is sharp, as can be seen by considering the case when is a unit circle, and are arcs; similarly if is any compact connected group that projects onto the circle. The connectedness hypothesis is essential, as can be seen by considering what happens if and are a non-trivial open subgroup of . For locally compact connected groups which are unimodular but not compact, there is an analogous statement, but with now a Haar measure instead of a Haar probability measure, and the right-hand side replaced simply by . The case when is a torus is due to Macbeath, and the case when is a circle is due to Raikov. The theorem is closely related to the Cauchy-Davenport inequality; indeed, it is not difficult to use that inequality to establish the circle case, and the circle case can be used to deduce the torus case by considering increasingly dense circle subgroups of the torus (alternatively, one can also use Kneser’s theorem).
By inner regularity, the hypothesis that are compact can be replaced with Borel measurability, so long as one then adds the additional hypothesis that is also Borel measurable.
A short proof of Kemperman’s theorem was given by Ruzsa. In this post I wanted to record how this argument can be used to establish the following more “robust” version of Kemperman’s theorem, which not only lower bounds , but gives many elements of some multiplicity:
Indeed, Theorem 1 can be deduced from Theorem 2 by dividing (1) by and then taking limits as . The bound in (1) is sharp, as can again be seen by considering the case when are arcs in a circle. The analogous claim for cyclic groups for prime order was established by Pollard, and for general abelian groups by Green and Ruzsa.
for any compact set . Our task is to establish that whenever .
We first verify the extreme cases. If , then , and so in this case (since ). At the other extreme, if , then from the inclusion-exclusion principle we see that , and so again in this case.
and thus (noting that the quantities on the left are closer to each other than the quantities on the right)
at which point (2) follows by integrating over and then using the inclusion-exclusion principle.
Now introduce the function
for . From the preceding discussion vanishes at the endpoints ; our task is to show that is non-negative in the interior region . Suppose for contradiction that this was not the case. It is easy to see that is continuous (indeed, it is even Lipschitz continuous), so there must be at which is a local minimum and not locally constant. In particular, . But for any with , we have the translation-invariance
for any , and hence by (2)
Note that depends continuously on , equals when is the identity, and has an average value of . As is connected, we thus see from the intermediate value theorem that for any , we can find such that , and thus by inclusion-exclusion . By definition of , we thus have
Taking infima in (and noting that the hypotheses on are independent of ) we conclude that
for all . As is a local minimum and is arbitrarily small, this implies that is locally constant, a contradiction. This establishes Theorem 2.
We observe the following corollary:
Corollary 3 Let be a compact connected group, with a Haar probability measure . Let be compact subsets of , and let . Then one has the pointwise estimate
if , and
Once again, the bounds are completely sharp, as can be seen by computing when are arcs of a circle. For quasirandom , one can do much better than these bounds, as discussed in this recent blog post; thus, the abelian case is morally the worst case here, although it seems difficult to convert this intuition into a rigorous reduction.
Proof: By cyclic permutation we may take . For any
we can bound
where we used Theorem 2 to obtain the third line. Optimising in , we obtain the claim.
Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune. This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:
Freiman’s lemma. Let A be a finite subset of a Euclidean space with . Then A is contained in an affine subspace of dimension at most .
This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression. The advantage here is that the bound on the dimension is extremely explicit.
Our main result is
Theorem. Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and can be covered by up to K left translates of A. Then A can be covered by at most left-translates of a closed connected Lie subgroup of dimension at most .
We remark that our previous paper established a similar result, in which the dimension bound was improved to , but at the cost of worsening the covering number to , and with a much more complicated proof (91 pages instead of 8). Furthermore, the bound on is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything). Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite. A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.
To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:
Gleason Lemma (special case). Let be a finite symmetric subset of a Euclidean space, and let be a sequence of subspaces in this space, such that the sets are strictly increasing in i for . Then , where .
Proof. By hypothesis, for each , the projection of to is non-trivial, finite, and symmetric. In particular, since the vector space is torsion-free, is strictly larger than . Equivalently, one can find in that does not lie in ; in particular, and is disjoint from . As a consequence, the are disjoint and lie in 5A, whence the claim.
Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most , which is a weak version of Freiman’s lemma.
To extend the argument to the nilpotent setting we use the following idea. Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series of G, and considering the last intersection which is non-trivial, one obtains the claim. It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A. Passing to this large fraction and quotienting out the central element, we obtain a new approximate group. If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done. If, however, the process continues, then by using some Lie group theory, one can find a long sequence of connected Lie subgroups of G, such that the sets are strictly increasing in i. Using some Lie group theory and the hypotheses on G, one can deduce that the group generated by is much larger than , in the sense that the latter group has infinite index in the former. It then turns out that the Gleason argument mentioned above can be adapted to this setting.
Let be a self-adjoint operator on a finite-dimensional Hilbert space . The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence of eigenvalues and an orthonormal basis of eigenfunctions such that for all . In particular, given any function on the spectrum of , one can then define the linear operator by the formula
which then gives a functional calculus, in the sense that the map is a -algebra isometric homomorphism from the algebra of bounded continuous functions from to , to the algebra of bounded linear operators on . Thus, for instance, one can define heat operators for , Schrödinger operators for , resolvents for , and (if is positive) wave operators for . These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation
The functional calculus can also be associated to a spectral measure. Indeed, for any vectors , there is a complex measure on with the property that
indeed, one can set to be the discrete measure on defined by the formula
One can also view this complex measure as a coefficient
of a projection-valued measure on , defined by setting
Finally, one can view as unitarily equivalent to a multiplication operator on , where is the real-valued function , and the intertwining map is given by
so that .
It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space , so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators . However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators , which (initially, at least), are only defined on a dense subspace of the Hilbert space . A very typical situation arises when is the square-integrable functions on some domain or manifold (which may have a boundary or be otherwise “incomplete”), and are the smooth compactly supported functions on , and is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as or to solve equations such as (1), (2), (3), (4).
for all . These hypotheses are sufficient in the case when is bounded, and in particular when is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents are well-defined for any strictly complex , which by duality implies that the image of should be dense in . However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when is the Hilbert space , is the space of test functions, and is the one-dimensional Laplacian . Then is symmetric and positive, but the operator does not have dense image for any complex , since
for all test functions , as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).
Another example occurs when , , is the momentum operator . Then the resolvent can be uniquely defined for in the upper half-plane, but not in the lower half-plane, due to the obstruction
for all test functions (note that the function lies in when is in the lower half-plane). For related reasons, the translation operators have a problem with either uniqueness or existence (depending on whether is positive or negative), due to the unspecified boundary behaviour at the origin.
The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).
Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator , then is self-adjoint. Examples of “expected consequences” include:
- Existence of resolvents (or equivalently, dense image for );
- Existence of a contractive heat propagator semigroup (in the positive case);
- Existence of a unitary Schrödinger propagator group ;
- Existence of a unitary wave propagator group (in the positive case);
- Existence of a “reasonable” functional calculus.
- Unitary equivalence with a multiplication operator.
Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator on a smooth Riemannian manifold (using as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains and in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).
In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.
In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as or , and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.
The definition of quasirandomness is easy enough to state:
Definition 1 (Quasirandom groups) Let be a finite group, and let . We say that is -quasirandom if all non-trivial unitary representations of have dimension at least . (Recall a representation is trivial if is the identity for all .)
Exercise 1 Let be a finite group, and let . A unitary representation is said to be irreducible if has no -invariant subspaces other than and . Show that is -quasirandom if and only if every non-trivial irreducible representation of has dimension at least .
Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group to obtain mixing estimates in that group.)
Quasirandomness behaves fairly well with respect to quotients and short exact sequences:
- (i) If is -quasirandom, show that is -quasirandom also. (Equivalently: any quotient of a -quasirandom finite group is again a -quasirandom finite group.)
- (ii) Conversely, if and are both -quasirandom, show that is -quasirandom also. (In particular, the direct or semidirect product of two -quasirandom finite groups is again a -quasirandom finite group.)
Informally, we will call quasirandom if it is -quasirandom for some “large” , though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “ for some constant independent of the size of “, but other regimes of are certainly of interest.
The way we have set things up, the trivial group is infinitely quasirandom (i.e. it is -quasirandom for every ). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:
- (i) Show that if is non-trivial, then . (Hint: use the mean zero component of the regular representation .) In particular, non-trivial finite groups cannot be infinitely quasirandom.
- (ii) Show that any proper subgroup of has index . (Hint: use the mean zero component of the quasiregular representation.)
The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:
- (i) If is abelian and non-trivial, show that is not -quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
- (ii) Show that is -quasirandom if and only if it is perfect, i.e. the commutator group is equal to . (Equivalently, is -quasirandom if and only if it has no non-trivial abelian quotients.)
Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.
Exercise 5 Let be a finite -quasirandom group. Show that for any subgroup of , is -quasirandom, where is the index of in . (Hint: use induced representations.)
Now we give an example of a more quasirandom group.
This should be compared with the cardinality of the special linear group, which is easily computed to be .
Proof: We may of course take to be odd. Suppose for contradiction that we have a non-trivial representation on a unitary group of some dimension with . Set to be the group element
and suppose first that is non-trivial. Since , we have ; thus all the eigenvalues of are roots of unity. On the other hand, by conjugating by diagonal matrices in , we see that is conjugate to (and hence conjugate to ) whenever is a quadratic residue mod . As such, the eigenvalues of must be permuted by the operation for any quadratic residue mod . Since has at least one non-trivial eigenvalue, and there are distinct quadratic residues, we conclude that has at least distinct eigenvalues. But is a matrix with , a contradiction. Thus lies in the kernel of . By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate (see exercise below), and so is trivial, a contradiction.
Exercise 6 Show that for any prime , the unipotent matrices
for ranging over generate as a group.
Exercise 7 Let be a finite group, and let . If is generated by a collection of -quasirandom subgroups, show that is itself -quasirandom.
Exercise 8 Show that is -quasirandom for any and any prime . (This is not sharp; the optimal bound here is , which follows from the results of Landazuri and Seitz.)
Remark 2 One can ask whether the bound in Lemma 2 is sharp, assuming of course that is odd. Noting that acts linearly on the plane , we see that it also acts projectively on the projective line , which has elements. Thus acts via the quasiregular representation on the -dimensional space , and also on the -dimensional subspace ; this latter representation (known as the Steinberg representation) is irreducible. This shows that the bound cannot be improved beyond . More generally, given any character , acts on the -dimensional space of functions that obey the twisted dilation invariance for all and ; these are known as the principal series representations. When is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when is the quadratic representation (thus taking values in while being non-trivial), the principal series representation splits into the direct sum of two -dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed in a quadratic extension and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension , showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).
Exercise 9 Let be an odd prime. Show that for any , the alternating group is -quasirandom. (Hint: show that all cycles of order in are conjugate to each other in (and not just in ); in particular, a cycle is conjugate to its power for all . Also, as , is simple, and so the cycles of order generate the entire group.)
Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that is -quasirandom for (but is only -quasirandom for due to icosahedral symmetry, and -quasirandom for due to lack of perfectness). Using Exercise 3 with the index subgroup , we see that the bound cannot be improved. Thus, (for large ) is not as quasirandom as the special linear groups (for large and bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.
If one replaces the alternating group with the slightly larger symmetric group , then quasirandomness is destroyed (since , having the abelian quotient , is not perfect); indeed, is -quasirandom and no better.
Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group , an alternating group , or is a finite simple group of Lie type such as . (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance is constructed from in characteristic .) In the case of finite simple groups of Lie type with bounded rank , it is known from the work of Landazuri and Seitz that such groups are -quasirandom for some depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group is -quasirandom for some and is sufficiently large depending on , then is a finite simple group of Lie type with rank . It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).
A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in for any fixed ).
Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year. This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices. In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries. We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).
For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.
In the previous set of notes we introduced the notion of expansion in arbitrary -regular graphs. For the rest of the course, we will now focus attention primarily to a special type of -regular graph, namely a Cayley graph.
Definition 1 (Cayley graph) Let be a group, and let be a finite subset of . We assume that is symmetric (thus whenever ) and does not contain the identity (this is to avoid loops). Then the (right-invariant) Cayley graph is defined to be the graph with vertex set and edge set , thus each vertex is connected to the elements for , and so is a -regular graph.
Example 1 The graph in Exercise 3 of Notes 1 is the Cayley graph on with generators .
Remark 1 We call the above Cayley graphs right-invariant because every right translation on is a graph automorphism of . This group of automorphisms acts transitively on the vertex set of the Cayley graph. One can thus view a Cayley graph as a homogeneous space of , as it “looks the same” from every vertex. One could of course also consider left-invariant Cayley graphs, in which is connected to rather than . However, the two such graphs are isomorphic using the inverse map , so we may without loss of generality restrict our attention throughout to left Cayley graphs.
Remark 2 For minor technical reasons, it will be convenient later on to allow to contain the identity and to come with multiplicity (i.e. it will be a multiset rather than a set). If one does so, of course, the resulting Cayley graph will now contain some loops and multiple edges.
For the purposes of building expander families, we would of course want the underlying group to be finite. However, it will be convenient at various times to “lift” a finite Cayley graph up to an infinite one, and so we permit to be infinite in our definition of a Cayley graph.
We will also sometimes consider a generalisation of a Cayley graph, known as a Schreier graph:
Definition 2 (Schreier graph) Let be a finite group that acts (on the left) on a space , thus there is a map from to such that and for all and . Let be a symmetric subset of which acts freely on in the sense that for all and , and for all distinct and . Then the Schreier graph is defined to be the graph with vertex set and edge set .
Example 2 Every Cayley graph is also a Schreier graph , using the obvious left-action of on itself. The -regular graphs formed from permutations that were studied in the previous set of notes is also a Schreier graph provided that for all distinct , with the underlying group being the permutation group (which acts on the vertex set in the obvious manner), and .
Exercise 1 If is an even integer, show that every -regular graph is a Schreier graph involving a set of generators of cardinality . (Hint: first show that every -regular graph can be decomposed into unions of cycles, each of which partition the vertex set, then use the previous example.
We return now to Cayley graphs. It is easy to characterise qualitative expansion properties of Cayley graphs:
Exercise 2 (Qualitative expansion) Let be a finite Cayley graph.
- (i) Show that is a one-sided -expander for for some if and only if generates .
- (ii) Show that is a two-sided -expander for for some if and only if generates , and furthermore intersects each index subgroup of .
We will however be interested in more quantitative expansion properties, in which the expansion constant is independent of the size of the Cayley graph, so that one can construct non-trivial expander families of Cayley graphs.
One can analyse the expansion of Cayley graphs in a number of ways. For instance, by taking the edge expansion viewpoint, one can study Cayley graphs combinatorially, using the product set operation
of subsets of .
Exercise 3 (Combinatorial description of expansion) Let be a family of finite -regular Cayley graphs. Show that is a one-sided expander family if and only if there is a constant independent of such that for all sufficiently large and all subsets of with .
One can also give a combinatorial description of two-sided expansion, but it is more complicated and we will not use it here.
Exercise 4 (Abelian groups do not expand) Let be a family of finite -regular Cayley graphs, with the all abelian, and the generating . Show that are a one-sided expander family if and only if the Cayley graphs have bounded cardinality (i.e. ). (Hint: assume for contradiction that is a one-sided expander family with , and show by two different arguments that grows at least exponentially in and also at most polynomially in , giving the desired contradiction.)
The left-invariant nature of Cayley graphs also suggests that such graphs can be profitably analysed using some sort of Fourier analysis; as the underlying symmetry group is not necessarily abelian, one should use the Fourier analysis of non-abelian groups, which is better known as (unitary) representation theory. The Fourier-analytic nature of Cayley graphs can be highlighted by recalling the operation of convolution of two functions , defined by the formula
This convolution operation is bilinear and associative (at least when one imposes a suitable decay condition on the functions, such as compact support), but is not commutative unless is abelian. (If one is more algebraically minded, one can also identify (when is finite, at least) with the group algebra , in which case convolution is simply the multiplication operation in this algebra.) The adjacency operator on a Cayley graph can then be viewed as a convolution
whenever is orthogonal to the constant function .
We remark that the above spectral definition of expansion can be easily extended to symmetric sets which contain the identity or have multiplicity (i.e. are multisets). (We retain symmetry, though, in order to keep the operation of convolution by self-adjoint.) In particular, one can say (with some slight abuse of notation) that a set of elements of (possibly with repetition, and possibly with some elements equalling the identity) generates a one-sided or two-sided -expander if the associated symmetric probability density
We saw in the last set of notes that expansion can be characterised in terms of random walks. One can of course specialise this characterisation to the Cayley graph case:
Exercise 5 (Random walk description of expansion) Let be a family of finite -regular Cayley graphs, and let be the associated probability density functions. Let be a constant.
- Show that the are a two-sided expander family if and only if there exists a such that for all sufficiently large , one has for some , where denotes the convolution of copies of .
- Show that the are a one-sided expander family if and only if there exists a such that for all sufficiently large , one has for some .
In this set of notes, we will connect expansion of Cayley graphs to an important property of certain infinite groups, known as Kazhdan’s property (T) (or property (T) for short). In 1973, Margulis exploited this property to create the first known explicit and deterministic examples of expanding Cayley graphs. As it turns out, property (T) is somewhat overpowered for this purpose; in particular, we now know that there are many families of Cayley graphs for which the associated infinite group does not obey property (T) (or weaker variants of this property, such as property ). In later notes we will therefore turn to other methods of creating Cayley graphs that do not rely on property (T). Nevertheless, property (T) is of substantial intrinsic interest, and also has many connections to other parts of mathematics than the theory of expander graphs, so it is worth spending some time to discuss it here.
The material here is based in part on this recent text on property (T) by Bekka, de la Harpe, and Valette (available online here).
The objective of this course is to present a number of recent constructions of expander graphs, which are a type of sparse but “pseudorandom” graph of importance in computer science, the theory of random walks, geometric group theory, and in number theory. The subject of expander graphs and their applications is an immense one, and we will not possibly be able to cover it in full in this course. In particular, we will say almost nothing about the important applications of expander graphs to computer science, for instance in constructing good pseudorandom number generators, derandomising a probabilistic algorithm, constructing error correcting codes, or in building probabilistically checkable proofs. For such topics, I recommend the survey of Hoory-Linial-Wigderson. We will also only pass very lightly over the other applications of expander graphs, though if time permits I may discuss at the end of the course the application of expander graphs in finite groups such as to certain sieving problems in analytic number theory, following the work of Bourgain, Gamburd, and Sarnak.
Instead of focusing on applications, this course will concern itself much more with the task of constructing expander graphs. This is a surprisingly non-trivial problem. On one hand, an easy application of the probabilistic method shows that a randomly chosen (large, regular, bounded-degree) graph will be an expander graph with very high probability, so expander graphs are extremely abundant. On the other hand, in many applications, one wants an expander graph that is more deterministic in nature (requiring either no or very few random choices to build), and of a more specialised form. For the applications to number theory or geometric group theory, it is of particular interest to determine the expansion properties of a very symmetric type of graph, namely a Cayley graph; we will also occasionally work with the more general concept of a Schreier graph. It turns out that such questions are related to deep properties of various groups of Lie type (such as or ), such as Kazhdan’s property (T), the first nontrivial eigenvalue of a Laplacian on a symmetric space associated to , the quasirandomness of (as measured by the size of irreducible representations), and the product theory of subsets of . These properties are of intrinsic interest to many other fields of mathematics (e.g. ergodic theory, operator algebras, additive combinatorics, representation theory, finite group theory, number theory, etc.), and it is quite remarkable that a single problem – namely the construction of expander graphs – is so deeply connected with such a rich and diverse array of mathematical topics. (Perhaps this is because so many of these fields are all grappling with aspects of a single general problem in mathematics, namely when to determine whether a given mathematical object or process of interest “behaves pseudorandomly” or not, and how this is connected with the symmetry group of that object or process.)
(There are also other important constructions of expander graphs that are not related to Cayley or Schreier graphs, such as those graphs constructed by the zigzag product construction, but we will not discuss those types of graphs in this course, again referring the reader to the survey of Hoory, Linial, and Wigderson.)
Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).
Before we get to these results, let us first discuss the simpler problem of studying the determinant of a random iid matrix , such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution , thus the real and imaginary parts are independent with law ), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution (with a chance of either sign). More generally, one can consider a matrix in which all the entries are independently and identically distributed with mean zero and variance .
where ranges over the permutations of , and is the product
From the iid nature of the , we easily see that each has mean zero and variance one, and are pairwise uncorrelated as varies. We conclude that has mean zero and variance (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that is typically of size .
It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, is clearly symmetrical, so we can focus attention on the magnitude . We can interpret this quantity geometrically as the volume of an -dimensional parallelopiped whose generating vectors are independent real gaussian vectors in (i.e. their coefficients are iid with law ). Using the classical base-times-height formula, we thus have
where is the -dimensional linear subspace of spanned by (note that , having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude
Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group . Because of this, we see that if we fix (and thus , the random variable has the same distribution as , or equivalently the distribution
where are iid copies of . As this distribution does not depend on the , we conclude that the law of is given by the sum of independent -variables:
A standard computation shows that each has mean and variance , and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that has mean and variance . As such, has mean and variance . Applying a suitable version of the central limit theorem, one obtains the asymptotic law
when is a real gaussian matrix; thus, for instance, the median value of is . At first glance, this appears to conflict with the second moment bound of Turán mentioned earlier, but once one recalls that has a second moment of , we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.
It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.
If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real distribution with the complex distribution, in which the are distributed according to the complex gaussian instead of the real one). At the end of the day, one ends up with the law
(but note that this new asymptotic is still consistent with Turán’s second moment calculation).
We can now turn to the results of our paper. Here, we replace the iid matrices by Wigner matrices , which are defined similarly but are constrained to be Hermitian (or real symmetric), thus for all . Model examples here include the Gaussian Unitary Ensemble (GUE), in which for and for , the Gaussian Orthogonal Ensemble (GOE), in which for and for , and the symmetric Bernoulli ensemble, in which for (with probability of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.
The determinants of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of is still usually zero, but equals in the exceptional case when is a perfect matching (i.e. the union of exactly -cycles, a possibility that can of course only happen when is even). As such, the mean still vanishes when is odd, but for even it is equal to
(the fraction here simply being the number of perfect matchings on vertices). Using Stirling’s formula, one then computes that is comparable to when is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that is comparable to for GUE and for GOE. (The discrepancy here comes from the fact that in the GOE case, and can correlate when contains reversals of -cycles of for , but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.
Our main results are then as follows.
Theorem 1 Let be a Wigner matrix.
- If is drawn from GUE, then
- If is drawn from GOE, then
- The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)
Thus, we informally have
when is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and
when is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.
The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral
of the Stieltjes transform
of . Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.
Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.
Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:
where the are jointly independent real random variables, with being standard real Gaussians, and each having a -distribution:
where are iid complex gaussians. Let be drawn from GUE. Then the joint eigenvalue distribution of is identical to the joint eigenvalue distribution of .
Proof: Let be drawn from GUE. We can write
where is drawn from the GUE, , and is a random gaussian vector with all entries iid with distribution . Furthermore, are jointly independent.
We now apply the tridiagonal matrix algorithm. Let , then has the -distribution indicated in the proposition. We then conjugate by a unitary matrix that preserves the final basis vector , and maps to . Then we have
where is conjugate to . Now we make the crucial observation: because is distributed according to GUE (which is a unitarily invariant ensemble), and is a unitary matrix independent of , is also distributed according to GUE, and remains independent of both and .
We continue this process, expanding as
Applying a further unitary conjugation that fixes but maps to , we may replace by while transforming to another GUE matrix independent of . Iterating this process, we eventually obtain a coupling of to by unitary conjugations, and the claim follows.
The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant of the above matrix is given by solving the recursion
with and . Thus, instead of the product of a sequence of independent scalar distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent matrices whose entries are given by gaussians and distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)
In the Winter quarter (starting on January 9), I will be teaching a graduate course on expansion in groups of Lie type. This course will focus on constructions of expanding Cayley graphs on finite groups of Lie type (such as the special linear groups , or their simple quotients , but also including more exotic “twisted” groups of Lie type, such as the Steinberg or Suzuki-Ree groups), including the “classical” constructions of Margulis and of Selberg, but also the more recent constructions of Bourgain-Gamburd and later authors (including some very recent work of Ben Green, Emmanuel Breuillard, Rob Guralnick, and myself which is nearing completion and which I plan to post about shortly). As usual, I plan to start posting lecture notes on this blog before the course begins.