You are currently browsing the category archive for the ‘math.SP’ category.

Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:

Lemma 1 (Szemerédi regularity lemma) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition {V = V_1 \cup \ldots \cup V_M} for some {M \leq M(\epsilon)} with the property that for all but at most {\epsilon M^2} of the pairs {1 \leq i \leq j \leq M}, the pair {V_i, V_j} is {\epsilon}-regular in the sense that

\displaystyle  | d( A, B ) - d( V_i, V_j ) | \leq \epsilon

whenever {A \subset V_i, B \subset V_j} are such that {|A| \geq \epsilon |V_i|} and {|B| \geq \epsilon |V_j|}, and {d(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|/|A| |B|} is the edge density between {A} and {B}. Furthermore, the partition is equitable in the sense that {||V_i| - |V_j|| \leq 1} for all {1 \leq i \leq j \leq M}.

There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of {G}, which is essentially due to Frieze and Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.

For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells {V_i} by their magnitude when counting bad pairs):

Lemma 2 (Szemerédi regularity lemma, weakened variant) . Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition {V = V_1 \cup \ldots \cup V_M} for some {M \leq M(\epsilon)} with the property that for all pairs {(i,j) \in \{1,\ldots,M\}^2} outside of an exceptional set {\Sigma}, one has

\displaystyle  | E(A,B) - d_{ij} |A| |B| | \ll \epsilon |V_i| |V_j| \ \ \ \ \ (1)

whenever {A \subset V_i, B \subset V_j}, for some real number {d_{ij}}, where {E(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|} is the number of edges between {A} and {B}. Furthermore, we have

\displaystyle  \sum_{(i,j) \in \Sigma} |V_i| |V_j| \ll \epsilon |V|^2. \ \ \ \ \ (2)

Let us now prove Lemma 2. We enumerate {V} (after relabeling) as {V = \{1,\ldots,n\}}. The adjacency matrix {T} of the graph {G} is then a self-adjoint {n \times n} matrix, and thus admits an eigenvalue decomposition

\displaystyle  T = \sum_{i=1}^n \lambda_i u_i^* u_i

for some orthonormal basis {u_1,\ldots,u_n} of {{\bf C}^n} and some eigenvalues {\lambda_1,\ldots,\lambda_n \in {\bf R}}, which we arrange in decreasing order of magnitude:

\displaystyle  |\lambda_1| \geq \ldots \geq |\lambda_n|.

We can compute the trace of {T^2} as

\displaystyle  \hbox{tr}(T^2) = \sum_{i=1}^n |\lambda_i|^2.

But we also have {\hbox{tr}(T^2) = 2|E| \leq n^2}, so

\displaystyle  \sum_{i=1}^n |\lambda_i|^2 \leq n^2. \ \ \ \ \ (3)

Among other things, this implies that

\displaystyle  |\lambda_i| \leq \frac{n}{\sqrt{i}} \ \ \ \ \ (4)

for all {i \geq 1}.

Let {F: {\bf N} \rightarrow {\bf N}} be a function (depending on {\epsilon}) to be chosen later, with {F(i) \geq i} for all {i}. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find {J \leq C(F,\epsilon)} such that

\displaystyle  \sum_{J \leq i < F(J)} |\lambda_i|^2 \leq \epsilon^3 n^2.

(Indeed, the bound on {J} is basically {F} iterated {1/\epsilon^3} times.) We can now split

\displaystyle  T = T_1 + T_2 + T_3, \ \ \ \ \ (5)

where {T_1} is the “structured” component

\displaystyle  T_1 := \sum_{i < J} \lambda_i u_i^* u_i, \ \ \ \ \ (6)

{T_2} is the “small” component

\displaystyle  T_2 := \sum_{J \leq i < F(J)} \lambda_i u_i^* u_i, \ \ \ \ \ (7)

and {T_3} is the “pseudorandom” component

\displaystyle  T_3 := \sum_{i > F(J)} \lambda_i u_i^* u_i. \ \ \ \ \ (8)

We now design a vertex partition to make {T_1} approximately constant on most cells. For each {i < J}, we partition {V} into {O_{J,\epsilon}(1)} cells on which {u_i} (viewed as a function from {V} to {{\bf C}}) only fluctuates by {O(\epsilon n^{-1/2} /J)}, plus an exceptional cell of size {O( \frac{\epsilon}{J} |V|)} coming from the values where {|u_i|} is excessively large (larger than {\sqrt{\frac{J}{\epsilon}} n^{-1/2}}). Combining all these partitions together, we can write {V = V_1 \cup \ldots \cup V_{M-1} \cup V_M} for some {M = O_{J,\epsilon}(1)}, where {V_M} has cardinality at most {\epsilon |V|}, and for all {1 \leq i \leq M-1}, the eigenfunctions {u_1,\ldots,u_{J-1}} all fluctuate by at most {O(\epsilon/J)}. In particular, if {1 \leq i,j \leq M-1}, then (by (4) and (6)) the entries of {T_1} fluctuate by at most {O(\epsilon)} on each block {V_i \times V_j}. If we let {d_{ij}} be the mean value of these entries on {V_i \times V_j}, we thus have

\displaystyle  1_B^* T_1 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (9)

for any {1 \leq i,j \leq M-1} and {A \subset V_i, B \subset V_j}, where we view the indicator functions {1_A, 1_B} as column vectors of dimension {n}.

Next, we observe from (3) and (7) that {\hbox{tr} T_2^2 \leq \epsilon^3 n^2}. If we let {x_{ab}} be the coefficients of {T_2}, we thus have

\displaystyle  \sum_{a,b \in V} |x_{ab}|^2 \leq \epsilon^3 n^2

and hence by Markov’s inequality we have

\displaystyle  \sum_{a \in V_i} \sum_{b \in V_j} |x_{ab}|^2 \leq \epsilon^2 |V_i| |V_j| \ \ \ \ \ (10)

for all pairs {(i,j) \in \{1,\ldots,M-1\}^2} outside of an exceptional set {\Sigma_1} with

\displaystyle  \sum_{(i,j) \in \Sigma_1} |V_i| |V_j| \leq \epsilon |V|^2.

If {(i,j) \in \{1,\ldots,M-1\}^2} avoids {\Sigma_1}, we thus have

\displaystyle  1_B^* T_2 1_A = O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (11)

for any {A \subset V_i, B \subset V_j}, by (10) and the Cauchy-Schwarz inequality.

Finally, to control {T_3} we see from (4) and (8) that {T_3} has an operator norm of at most {n/\sqrt{F(J)}}. In particular, we have from the Cauchy-Schwarz inequality that

\displaystyle  1_B^* T_3 1_A = O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (12)

for any {A, B \subset V}.

Let {\Sigma} be the set of all pairs {(i,j) \in \{1,\ldots,M\}^2} where either {(i,j) \in \Sigma_1}, {i = M}, {j=M}, or

\displaystyle  \min(|V_i|, |V_j|) \leq \frac{\epsilon}{M} n.

One easily verifies that (2) holds. If {(i,j) \in \{1,\ldots,M\}^2} is not in {\Sigma}, then by summing (9), (11), (12) and using (5), we see that

\displaystyle  1_B^* T 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) + O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (13)

for all {A \subset V_i, B \subset V_j}. The left-hand side is just {E(A,B)}. As {(i,j) \not \in \Sigma}, we have

\displaystyle  |V_i|, |V_j| > \frac{\epsilon}{M} n

and so (since {M = O_{J,\epsilon}(1)})

\displaystyle  n^2 / \sqrt{F(J)} \ll_{J,\epsilon} |V_i| |V_j| / \sqrt{F(J)}.

If we let {F} be a sufficiently rapidly growing function of {J} that depends on {\epsilon}, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.

To prove Lemma 1, one argues similarly (after modifying {\epsilon} as necessary), except that the initial partition {V_1,\ldots,V_M} of {V} constructed above needs to be subdivided further into equitable components (of size {\epsilon |V|/M+O(1)}), plus some remainder sets which can be aggregated into an exceptional component of size {O( \epsilon |V| )} (and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.

Remark 1 It is easy to verify that {F} needs to be growing exponentially in {J} in order for the above argument to work, which leads to tower-exponential bounds in the number of cells {M} in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying {F}, one basically obtains the strong regularity lemma first established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting {F(J) := J} essentially gives the weak regularity lemma of Frieze and Kannan.

Remark 2 If we specialise to a Cayley graph, in which {V = (V,+)} is a finite abelian group and {E = \{ (a,b): a-b \in A \}} for some (symmetric) subset {A} of {V}, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes {V_i} are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components {T_1, T_2, T_3} of {T}, representing high, medium, and low eigenvalues of {T}, then become a decomposition associated to high, medium, and low Fourier coefficients of {A}.

Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).

The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble {M_n} of {n \times n} matrices, one can arbitrarily enumerate the (geometric) eigenvalues as {\lambda_1(M_n),\ldots,\lambda_n(M_n) \in {\bf C}}, and one can then define the {k}-point correlation functions {\rho^{(k)}_n: {\bf C}^k \rightarrow {\bf R}^+} to be the symmetric functions such that

\displaystyle  \int_{{\bf C}^k} F(z_1,\ldots,z_k) \rho^{(k)}_n(z_1,\ldots,z_k)\ dz_1 \ldots dz_k

\displaystyle  = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F(\lambda_1(M_n),\ldots,\lambda_k(M_n)).

In the case when {M_n} is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of {\rho^{(k)}_n} near some point {z \sqrt{n}} as {n \rightarrow \infty} and {z \in {\bf C}} is fixed are given by the determinantal rule

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow \hbox{det}( K(w_i,w_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)

for {|z| < 1} and

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow 0

for {|z| > 1}, where {K} is the reproducing kernel

\displaystyle  K(z,w) := \frac{1}{\pi} e^{-|z|^2/2 - |w|^2/2 + z \overline{w}}.

(There is also an asymptotic for the boundary case {|z|=1}, but it is more complicated to state.) In particular, we see that {\rho^{(k)}_n(z \sqrt{n}) \rightarrow \frac{1}{\pi} 1_{|z| \leq 1}} for almost every {z}, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.

Our first main result is that the asymptotic (1) for {|z|<1} also holds (in the sense of vague convergence) when {M_n} is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded {z} and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is {\sqrt{\frac{2n}{\pi}} + O(n^{1/2-c})} with probability {1-O(n^{-c})} for some absolute constant {c>0}.

There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula

\displaystyle  \log |\det(M_n-z_0)| = - \sum_{1 \leq i \leq n: \lambda_i(M_n) \in B(z_0,r)} \log \frac{r}{|\lambda_i(M_n)-z_0|}

\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |\det(M_n-z_0-re^{i\theta})|\ d\theta

that relates the local distribution of eigenvalues to the log-determinants {\log |\det(M_n-z_0)|}, and secondly the elementary identity

\displaystyle  \log |\det(M_n - z)| = \frac{1}{2} \log|\det W_{n,z}| + \frac{1}{2} n \log n

that relates the log-determinants of {M_n-z} to the log-determinants of the Hermitian matrices

\displaystyle  W_{n,z} := \frac{1}{\sqrt{n}} \begin{pmatrix} 0 & M_n -z \\ (M_n-z)^* & 0 \end{pmatrix}.

The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants {\log|\det W_{n,z}|}. This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.

The matrices {W_{n,z}} are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.

While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)

I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of a random Wigner matrix {M_n} in the limit {n \rightarrow \infty}, with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues {\lambda_i(M_n)} with {\delta n \leq i \leq (1-\delta) n} for some fixed {\delta>0}.

The location of an individual eigenvalue {\lambda_i(M_n)} is by now quite well understood. If we normalise the entries of the matrix {M_n} to have mean zero and variance {1}, then in the asymptotic limit {n \rightarrow \infty}, the Wigner semicircle law tells us that with probability {1-o(1)} one has

\displaystyle  \lambda_i(M_n) =\sqrt{n} u + o(\sqrt{n})

where the classical location {u = u_{i/n} \in [-2,2]} of the eigenvalue is given by the formula

\displaystyle  \int_{-2}^{u} \rho_{sc}(x)\ dx = \frac{i}{n}

and the semicircular distribution {\rho_{sc}(x)\ dx} is given by the formula

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

Actually, one can improve the error term here from {o(\sqrt{n})} to {O( \log^{1/2+\epsilon} n)} for any {\epsilon>0} (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).

From the semicircle law (and the fundamental theorem of calculus), one expects the {i^{th}} eigenvalue spacing {\lambda_{i+1}(M_n)-\lambda_i(M_n)} to have an average size of {\frac{1}{\sqrt{n} \rho_{sc}(u)}}. It is thus natural to introduce the normalised eigenvalue spacing

\displaystyle  X_i := \frac{\lambda_{i+1}(M_n) - \lambda_i(M_n)}{1/\sqrt{n} \rho_{sc}(u)}

and ask what the distribution of {X_i} is.

As mentioned previously, we will focus on the bulk case {\delta n \leq i\leq (1-\delta)n}, and begin with the model case when {M_n} is drawn from GUE. (In the edge case when {i} is close to {1} or to {n}, the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to {X_i}, namely the probability

\displaystyle  {\bf P}( N_{[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]} = 0) \ \ \ \ \ (1)

that an interval {[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]} near {\sqrt{n} u} of length comparable to the expected eigenvalue spacing {1/\sqrt{n} \rho_{sc}(u)} is devoid of eigenvalues. For {u} in the bulk and fixed {x,y}, they showed that this probability is equal to

\displaystyle  \det( 1 - 1_{[x,y]} P 1_{[x,y]} ) + o(1),

where {P} is the Dyson projection

\displaystyle  P f(x) = \int_{\bf R} \frac{\sin(\pi(x-y))}{\pi(x-y)} f(y)\ dy

to Fourier modes in {[-1/2,1/2]}, and {\det} is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that {X_i} will be asymptotically distributed according to the Gaudin-Mehta distribution {p(x)\ dx}, where

\displaystyle  p(x) := \frac{d^2}{dx^2} \det( 1 - 1_{[0,x]} P 1_{[0,x]} ).

A reasonably accurate approximation for {p} is given by the Wigner surmise {p(x) \approx \frac{1}{2} \pi x e^{-\pi x^2/4}}, which was presciently proposed by Wigner as early as 1957; it is exact for {n=2} but not in the asymptotic limit {n \rightarrow \infty}.

Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap {X_i}, but rather an ensemble of gaps {X_i}, where {i} is drawn from an interval {[i_0 - L, i_0 + L]} of some moderate size {L} (e.g. {L = \log n}); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices {i}). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)

The problem is that when one specifies a given window of spectrum such as {[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}, one cannot quite pin down in advance which eigenvalues {\lambda_i(M_n)} are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of {\sqrt{\log n}} or so in the {i} index (as can be quantified quite precisely by this central limit theorem of Gustavsson).

The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number {N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)} of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when {N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)} is even, but very few when {N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)} is odd. In this sort of situation, the gaps {X_i} may have different behaviour for even {i} than for odd {i}, and such anomalies would not be picked up in the averaged statistics in which {i} is allowed to range over some moderately large interval.

The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps {X_i} in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the {i} parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels {u} than on specific eigenvalue indices {i}. To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)

The main task in the proof is to show that the random variable {N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)} is largely decoupled from the event in (1) when {M_n} is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert {1 - 1_{[x,y]}P1_{[x,y]}}, where {P} is the Dyson projection and {[x,y]} is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of a random Wigner matrix {M_n} (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues {\lambda_i(M_n)} with {\delta n \leq i \leq (1-\delta) i n} for some fixed {\delta>0}, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix {M_n} to have mean zero and variance {1/n}, then in the asymptotic limit {n \rightarrow \infty}, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution {\rho_{sc}(x)\ dx}, where

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

An essentially equivalent way of saying this is that for large {n}, we expect the {i^{th}} eigenvalue {\lambda_i(M_n)} of {M_n} to stay close to the classical location {\gamma_i \in [-2,2]}, defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

\displaystyle  \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)

for all {1 \leq i \leq n}.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape {O(n^{-c})} for some absolute constant {c>0}; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that {\lambda_i(M_n)} has variance comparable to {\frac{\log n}{n^2}} in the bulk, so that the optimal error term in (1) should be about {O(\log^{1/2} n/n)}. (One may think that if one wanted bounds on (1) that were uniform in {i}, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the {\lambda_i}; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order {O(\log^{1/2} n/n)}.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

\displaystyle  N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx

with exponentially high probability for intervals {I} in the bulk that were as short as {n^{-1+\epsilon}} for some {\epsilon>0}, where {N(I)} is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues {\lambda_i} (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor {\log^{O(\log\log n)} n}. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function {F(X_1,\ldots,X_n)} of many independent random variables {X_1,\ldots,X_n} with the same function {F(Y_1,\ldots,Y_n)} of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

\displaystyle  {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))

for various smooth test functions {\phi}, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, {\phi} was a bounded test function, which allowed one to get control of the bulk of the distribution of {F(X_1,\ldots,X_n)}, and in particular in controlling probabilities such as

\displaystyle  {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )

for various thresholds {a} and {b}, but did not give good control on the tail as the error terms tended to be polynomially decaying in {n} rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

\displaystyle  {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

for various moderately large {k} (e.g. of size comparable to {\log n}), obtaining results such as

\displaystyle  {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

after performing all the relevant exchanges. As such, one can then use large deviation estimates on {F(X_1,\ldots,X_n)} to deduce large deviation estimates on {F(Y_1,\ldots,Y_n)}.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents {(M_n-z)^{-1}} (and the closely related Stieltjes transform {s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

\displaystyle  \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta

for any energy level {E \in {\bf R}}, which can be verified from elementary calculus. (In practice, we would truncate {\eta} near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions {N((-\infty,E))}, which in turn can be used to deduce concentration results for individual eigenvalues {\lambda_i(M_n)} by some basic combinatorial manipulations.

LLet {L: H \rightarrow H} be a self-adjoint operator on a finite-dimensional Hilbert space {H}. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence {\lambda_1,\ldots,\lambda_n \in {\bf R}} of eigenvalues and an orthonormal basis {e_1,\ldots,e_n} of eigenfunctions such that {L e_i = \lambda_i e_i} for all {i=1,\ldots,n}. In particular, given any function {m: \sigma(L) \rightarrow {\bf C}} on the spectrum {\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}} of {L}, one can then define the linear operator {m(L): H \rightarrow H} by the formula

\displaystyle m(L) e_i := m(\lambda_i) e_i,

which then gives a functional calculus, in the sense that the map {m \mapsto m(L)} is a {C^*}-algebra isometric homomorphism from the algebra {BC(\sigma(L) \rightarrow {\bf C})} of bounded continuous functions from {\sigma(L)} to {{\bf C}}, to the algebra {B(H \rightarrow H)} of bounded linear operators on {H}. Thus, for instance, one can define heat operators {e^{-tL}} for {t>0}, Schrödinger operators {e^{itL}} for {t \in {\bf R}}, resolvents {\frac{1}{L-z}} for {z \not \in \sigma(L)}, and (if {L} is positive) wave operators {e^{it\sqrt{L}}} for {t \in {\bf R}}. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with {L} positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

\displaystyle u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)


the Schrödinger equation

\displaystyle u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)


the wave equation

\displaystyle u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)


or the Helmholtz equation

\displaystyle (L-z) u = f. \ \ \ \ \ (4)


The functional calculus can also be associated to a spectral measure. Indeed, for any vectors {f, g \in H}, there is a complex measure {\mu_{f,g}} on {\sigma(L)} with the property that

\displaystyle \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);

indeed, one can set {\mu_{f,g}} to be the discrete measure on {\sigma(L)} defined by the formula

\displaystyle \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.

One can also view this complex measure as a coefficient

\displaystyle \mu_{f,g} = \langle \mu f, g \rangle_H

of a projection-valued measure {\mu} on {\sigma(L)}, defined by setting

\displaystyle \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.

Finally, one can view {L} as unitarily equivalent to a multiplication operator {M: f \mapsto g f} on {\ell^2(\{1,\ldots,n\})}, where {g} is the real-valued function {g(i) := \lambda_i}, and the intertwining map {U: \ell^2(\{1,\ldots,n\}) \rightarrow H} is given by

\displaystyle U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,

so that {L = U M U^{-1}}.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space {H}, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators {L: H \rightarrow H}. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators {L: D \rightarrow H}, which (initially, at least), are only defined on a dense subspace {D} of the Hilbert space {H}. A very typical situation arises when {H = L^2(\Omega)} is the square-integrable functions on some domain or manifold {\Omega} (which may have a boundary or be otherwise “incomplete”), and {D = C^\infty_c(\Omega)} are the smooth compactly supported functions on {\Omega}, and {L} is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as {e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}} or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator {L: D \rightarrow H} must be imposed. The most obvious is that of symmetry, which asserts that

\displaystyle \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)


for all {f, g \in D}. In some applications, one also wants to impose positive definiteness, which asserts that

\displaystyle \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)


for all {f \in D}. These hypotheses are sufficient in the case when {L} is bounded, and in particular when {H} is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents {(L-z)^{-1}} are well-defined for any strictly complex {z}, which by duality implies that the image of {L-z} should be dense in {H}. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when {H} is the Hilbert space {H := L^2((0,1))}, {D := C^\infty_c((0,1))} is the space of test functions, and {L} is the one-dimensional Laplacian {L := -\frac{d^2}{dx^2}}. Then {L} is symmetric and positive, but the operator {L-k^2} does not have dense image for any complex {k}, since

\displaystyle \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0

for all test functions {f \in C^\infty_c((0,1))}, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when {H := L^2((0,+\infty))}, {D := C^\infty_c((0,+\infty))}, {L} is the momentum operator {L := i \frac{d}{dx}}. Then the resolvent {(L-z)^{-1}} can be uniquely defined for {z} in the upper half-plane, but not in the lower half-plane, due to the obstruction

\displaystyle \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0

for all test functions {f} (note that the function {e^{i\bar{z} x}} lies in {L^2((0,+\infty))} when {z} is in the lower half-plane). For related reasons, the translation operators {e^{itL}} have a problem with either uniqueness or existence (depending on whether {t} is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once {L} is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing {L} on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator {L}, then {L} is self-adjoint. Examples of “expected consequences” include:

  • Existence of resolvents {(L-z)^{-1}} (or equivalently, dense image for {L-z});
  • Existence of a contractive heat propagator semigroup {e^{tL}} (in the positive case);
  • Existence of a unitary Schrödinger propagator group {e^{itL}};
  • Existence of a unitary wave propagator group {e^{it\sqrt{L}}} (in the positive case);
  • Existence of a “reasonable” functional calculus.
  • Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator {L = -\Delta_g} on a smooth Riemannian manifold {(M,g)} (using {C^\infty_c(M)} as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains {(0,1)} and {(0,+\infty)} in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

Read the rest of this entry »

In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as {SL_d({\bf Z})} or {SL_d({\bf R})}, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.

The definition of quasirandomness is easy enough to state:

Definition 1 (Quasirandom groups) Let {G} be a finite group, and let {D \geq 1}. We say that {G} is {D}-quasirandom if all non-trivial unitary representations {\rho: G \rightarrow U(H)} of {G} have dimension at least {D}. (Recall a representation is trivial if {\rho(g)} is the identity for all {g \in G}.)

Exercise 1 Let {G} be a finite group, and let {D \geq 1}. A unitary representation {\rho: G \rightarrow U(H)} is said to be irreducible if {H} has no {G}-invariant subspaces other than {\{0\}} and {H}. Show that {G} is {D}-quasirandom if and only if every non-trivial irreducible representation of {G} has dimension at least {D}.

Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group {SL_2({\bf R})} to obtain mixing estimates in that group.)

Quasirandomness behaves fairly well with respect to quotients and short exact sequences:

Exercise 2 Let {0 \rightarrow H \rightarrow G \rightarrow K \rightarrow 0} be a short exact sequence of finite groups {H,G,K}.

  • (i) If {G} is {D}-quasirandom, show that {K} is {D}-quasirandom also. (Equivalently: any quotient of a {D}-quasirandom finite group is again a {D}-quasirandom finite group.)
  • (ii) Conversely, if {H} and {K} are both {D}-quasirandom, show that {G} is {D}-quasirandom also. (In particular, the direct or semidirect product of two {D}-quasirandom finite groups is again a {D}-quasirandom finite group.)

Informally, we will call {G} quasirandom if it is {D}-quasirandom for some “large” {D}, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “{D \geq |G|^c} for some constant {c>0} independent of the size of {G}“, but other regimes of {D} are certainly of interest.

The way we have set things up, the trivial group {G = \{1\}} is infinitely quasirandom (i.e. it is {D}-quasirandom for every {D}). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:

Exercise 3 Let {D \geq 1}, and let {G} be a finite {D}-quasirandom group.

  • (i) Show that if {G} is non-trivial, then {|G| \geq D+1}. (Hint: use the mean zero component {\tau\downharpoonright_{\ell^2(G)_0}} of the regular representation {\tau: G \rightarrow U(\ell^2(G))}.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
  • (ii) Show that any proper subgroup {H} of {G} has index {[G:H] \geq D+1}. (Hint: use the mean zero component of the quasiregular representation.)

The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:

Exercise 4 (Quasirandomness, abelianness, and perfection) Let {G} be a finite group.

  • (i) If {G} is abelian and non-trivial, show that {G} is not {2}-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
  • (ii) Show that {G} is {2}-quasirandom if and only if it is perfect, i.e. the commutator group {[G,G]} is equal to {G}. (Equivalently, {G} is {2}-quasirandom if and only if it has no non-trivial abelian quotients.)

Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.

Exercise 5 Let {G} be a finite {D}-quasirandom group. Show that for any subgroup {G'} of {G}, {G'} is {D/[G:G']}-quasirandom, where {[G:G'] := |G|/|G'|} is the index of {G'} in {G}. (Hint: use induced representations.)

Now we give an example of a more quasirandom group.

Lemma 2 (Frobenius lemma) If {F_p} is a field of some prime order {p}, then {SL_2(F_p)} is {\frac{p-1}{2}}-quasirandom.

This should be compared with the cardinality {|SL_2(F_p)|} of the special linear group, which is easily computed to be {(p^2-1) \times p = p^3 - p}.

Proof: We may of course take {p} to be odd. Suppose for contradiction that we have a non-trivial representation {\rho: SL_2(F_p) \rightarrow U_d({\bf C})} on a unitary group of some dimension {d} with {d < \frac{p-1}{2}}. Set {a} to be the group element

\displaystyle a := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},

and suppose first that {\rho(a)} is non-trivial. Since {a^p=1}, we have {\rho(a)^p=1}; thus all the eigenvalues of {\rho(a)} are {p^{th}} roots of unity. On the other hand, by conjugating {a} by diagonal matrices in {SL_2(F_p)}, we see that {a} is conjugate to {a^m} (and hence {\rho(a)} conjugate to {\rho(a)^m}) whenever {m} is a quadratic residue mod {p}. As such, the eigenvalues of {\rho(a)} must be permuted by the operation {x \mapsto x^m} for any quadratic residue mod {p}. Since {\rho(a)} has at least one non-trivial eigenvalue, and there are {\frac{p-1}{2}} distinct quadratic residues, we conclude that {\rho(a)} has at least {\frac{p-1}{2}} distinct eigenvalues. But {\rho(a)} is a {d \times d} matrix with {d < \frac{p-1}{2}}, a contradiction. Thus {a} lies in the kernel of {\rho}. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate {SL_2(F_p)} (see exercise below), and so {\rho} is trivial, a contradiction. \Box

Exercise 6 Show that for any prime {p}, the unipotent matrices

\displaystyle \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ t & 1 \end{pmatrix}

for {t} ranging over {F_p} generate {SL_2(F_p)} as a group.

Exercise 7 Let {G} be a finite group, and let {D \geq 1}. If {G} is generated by a collection {G_1,\ldots,G_k} of {D}-quasirandom subgroups, show that {G} is itself {D}-quasirandom.

Exercise 8 Show that {SL_d(F_p)} is {\frac{p-1}{2}}-quasirandom for any {d \geq 2} and any prime {p}. (This is not sharp; the optimal bound here is {\gg_d p^{d-1}}, which follows from the results of Landazuri and Seitz.)

As a corollary of the above results and Exercise 2, we see that the projective special linear group {PSL_d(F_p)} is also {\frac{p-1}{2}}-quasirandom.

Remark 2 One can ask whether the bound {\frac{p-1}{2}} in Lemma 2 is sharp, assuming of course that {p} is odd. Noting that {SL_2(F_p)} acts linearly on the plane {F_p^2}, we see that it also acts projectively on the projective line {PF_p^1 := (F_p^2 \backslash \{0\}) / F_p^\times}, which has {p+1} elements. Thus {SL_2(F_p)} acts via the quasiregular representation on the {p+1}-dimensional space {\ell^2(PF_p^1)}, and also on the {p}-dimensional subspace {\ell^2(PF_p^1)_0}; this latter representation (known as the Steinberg representation) is irreducible. This shows that the {\frac{p-1}{2}} bound cannot be improved beyond {p}. More generally, given any character {\chi: F_p^\times \rightarrow S^1}, {SL_2(F_p)} acts on the {p+1}-dimensional space {V_\chi} of functions {f \in \ell^2( F_p^2 \backslash \{0\} )} that obey the twisted dilation invariance {f(tx) = \chi(t) f(x)} for all {t \in F_p^\times} and {x \in F_p^2 \backslash \{0\}}; these are known as the principal series representations. When {\chi} is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when {\chi} is the quadratic representation (thus taking values in {\{-1,+1\}} while being non-trivial), the principal series representation splits into the direct sum of two {\frac{p+1}{2}}-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed {F_p} in a quadratic extension {F_{p^2}} and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension {\frac{p-1}{2}}, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).

Exercise 9 Let {p} be an odd prime. Show that for any {n \geq p+2}, the alternating group {A_n} is {p-1}-quasirandom. (Hint: show that all cycles of order {p} in {A_n} are conjugate to each other in {A_n} (and not just in {S_n}); in particular, a cycle is conjugate to its {j^{th}} power for all {j=1,\ldots,p-1}. Also, as {n \geq 5}, {A_n} is simple, and so the cycles of order {p} generate the entire group.)

Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that {A_n} is {n-1}-quasirandom for {n \geq 6} (but is only {3}-quasirandom for {n=5} due to icosahedral symmetry, and {1}-quasirandom for {n \leq 4} due to lack of perfectness). Using Exercise 3 with the index {n} subgroup {A_{n-1}}, we see that the bound {n-1} cannot be improved. Thus, {A_n} (for large {n}) is not as quasirandom as the special linear groups {SL_d(F_p)} (for {p} large and {d} bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.

If one replaces the alternating group {A_n} with the slightly larger symmetric group {S_n}, then quasirandomness is destroyed (since {S_n}, having the abelian quotient {S_n/A_n}, is not perfect); indeed, {S_n} is {1}-quasirandom and no better.

Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group {{\bf Z}/p{\bf Z}}, an alternating group {A_n}, or is a finite simple group of Lie type such as {PSL_d(F_p)}. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance {PSL_d(F_p)} is constructed from {SL_d} in characteristic {p}.) In the case of finite simple groups {G} of Lie type with bounded rank {r=O(1)}, it is known from the work of Landazuri and Seitz that such groups are {\gg |G|^c}-quasirandom for some {c>0} depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group {G} is {|G|^c}-quasirandom for some {c>0} and {|G|} is sufficiently large depending on {c}, then {G} is a finite simple group of Lie type with rank {O_c(1)}. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).

A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in {SL_2({\bf Z}/N{\bf Z})} (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in {SL_d({\bf Z}/N{\bf Z})} for any fixed {d \geq 3}).

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year.  This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices.  In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries.  We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

In the previous set of notes we introduced the notion of expansion in arbitrary {k}-regular graphs. For the rest of the course, we will now focus attention primarily to a special type of {k}-regular graph, namely a Cayley graph.

Definition 1 (Cayley graph) Let {G = (G,\cdot)} be a group, and let {S} be a finite subset of {G}. We assume that {S} is symmetric (thus {s^{-1} \in S} whenever {s \in S}) and does not contain the identity {1} (this is to avoid loops). Then the (right-invariant) Cayley graph {Cay(G,S)} is defined to be the graph with vertex set {G} and edge set {\{ \{sx,x\}: x \in G, s \in S \}}, thus each vertex {x \in G} is connected to the {|S|} elements {sx} for {s \in S}, and so {Cay(G,S)} is a {|S|}-regular graph.

Example 1 The graph in Exercise 3 of Notes 1 is the Cayley graph on {{\bf Z}/N{\bf Z}} with generators {S = \{-1,+1\}}.

Remark 1 We call the above Cayley graphs right-invariant because every right translation {x\mapsto xg} on {G} is a graph automorphism of {Cay(G,S)}. This group of automorphisms acts transitively on the vertex set of the Cayley graph. One can thus view a Cayley graph as a homogeneous space of {G}, as it “looks the same” from every vertex. One could of course also consider left-invariant Cayley graphs, in which {x} is connected to {xs} rather than {sx}. However, the two such graphs are isomorphic using the inverse map {x \mapsto x^{-1}}, so we may without loss of generality restrict our attention throughout to left Cayley graphs.

Remark 2 For minor technical reasons, it will be convenient later on to allow {S} to contain the identity and to come with multiplicity (i.e. it will be a multiset rather than a set). If one does so, of course, the resulting Cayley graph will now contain some loops and multiple edges.

For the purposes of building expander families, we would of course want the underlying group {G} to be finite. However, it will be convenient at various times to “lift” a finite Cayley graph up to an infinite one, and so we permit {G} to be infinite in our definition of a Cayley graph.

We will also sometimes consider a generalisation of a Cayley graph, known as a Schreier graph:

Definition 2 (Schreier graph) Let {G} be a finite group that acts (on the left) on a space {X}, thus there is a map {(g,x) \mapsto gx} from {G \times X} to {X} such that {1x = x} and {(gh)x = g(hx)} for all {g,h \in G} and {x \in X}. Let {S} be a symmetric subset of {G} which acts freely on {X} in the sense that {sx \neq x} for all {s \in S} and {x \in X}, and {sx \neq s'x} for all distinct {s,s' \in S} and {x \in X}. Then the Schreier graph {Sch(X,S)} is defined to be the graph with vertex set {X} and edge set {\{ \{sx,x\}: x \in X, s \in S \}}.

Example 2 Every Cayley graph {Cay(G,S)} is also a Schreier graph {Sch(G,S)}, using the obvious left-action of {G} on itself. The {k}-regular graphs formed from {l} permutations {\pi_1,\ldots,\pi_l \in S_n} that were studied in the previous set of notes is also a Schreier graph provided that {\pi_i(v) \neq v, \pi_i^{-1}(v), \pi_j(v)} for all distinct {1 \leq i,j \leq l}, with the underlying group being the permutation group {S_n} (which acts on the vertex set {X = \{1,\ldots,n\}} in the obvious manner), and {S := \{\pi_1,\ldots,\pi_l,\pi_1^{-1},\ldots,\pi_l^{-1}\}}.

Exercise 1 If {k} is an even integer, show that every {k}-regular graph is a Schreier graph involving a set {S} of generators of cardinality {k/2}. (Hint: first show that every {k}-regular graph can be decomposed into {k/2} unions of cycles, each of which partition the vertex set, then use the previous example.

We return now to Cayley graphs. It is easy to characterise qualitative expansion properties of Cayley graphs:

Exercise 2 (Qualitative expansion) Let {Cay(G,S)} be a finite Cayley graph.

  • (i) Show that {Cay(G,S)} is a one-sided {\epsilon}-expander for {G} for some {\epsilon>0} if and only if {S} generates {G}.
  • (ii) Show that {Cay(G,S)} is a two-sided {\epsilon}-expander for {G} for some {\epsilon>0} if and only if {S} generates {G}, and furthermore {S} intersects each index {2} subgroup of {G}.

We will however be interested in more quantitative expansion properties, in which the expansion constant {\epsilon} is independent of the size of the Cayley graph, so that one can construct non-trivial expander families {Cay(G_n,S_n)} of Cayley graphs.

One can analyse the expansion of Cayley graphs in a number of ways. For instance, by taking the edge expansion viewpoint, one can study Cayley graphs combinatorially, using the product set operation

\displaystyle A \cdot B := \{ab: a \in A, b \in B \}

of subsets of {G}.

Exercise 3 (Combinatorial description of expansion) Let {Cay(G_n,S_n)} be a family of finite {k}-regular Cayley graphs. Show that {Cay(G_n,S_n)} is a one-sided expander family if and only if there is a constant {c>0} independent of {n} such that {|E_n \cup E_n S_n| \geq (1+c) |E_n|} for all sufficiently large {n} and all subsets {E_n} of {G_n} with {|E_n| \leq |G_n|/2}.

One can also give a combinatorial description of two-sided expansion, but it is more complicated and we will not use it here.

Exercise 4 (Abelian groups do not expand) Let {Cay(G_n,S_n)} be a family of finite {k}-regular Cayley graphs, with the {G_n} all abelian, and the {S_n} generating {G_n}. Show that {Cay(G_n,S_n)} are a one-sided expander family if and only if the Cayley graphs have bounded cardinality (i.e. {\sup_n |G_n| < \infty}). (Hint: assume for contradiction that {Cay(G_n,S_n)} is a one-sided expander family with {|G_n| \rightarrow \infty}, and show by two different arguments that {\sup_n |S_n^m|} grows at least exponentially in {m} and also at most polynomially in {m}, giving the desired contradiction.)

The left-invariant nature of Cayley graphs also suggests that such graphs can be profitably analysed using some sort of Fourier analysis; as the underlying symmetry group is not necessarily abelian, one should use the Fourier analysis of non-abelian groups, which is better known as (unitary) representation theory. The Fourier-analytic nature of Cayley graphs can be highlighted by recalling the operation of convolution of two functions {f, g \in \ell^2(G)}, defined by the formula

\displaystyle f * g(x) := \sum_{y \in G} f(y) g(y^{-1} x) = \sum_{y \in G} f(x y^{-1}) g(y).

This convolution operation is bilinear and associative (at least when one imposes a suitable decay condition on the functions, such as compact support), but is not commutative unless {G} is abelian. (If one is more algebraically minded, one can also identify {\ell^2(G)} (when {G} is finite, at least) with the group algebra {{\bf C} G}, in which case convolution is simply the multiplication operation in this algebra.) The adjacency operator {A} on a Cayley graph {Cay(G,S)} can then be viewed as a convolution

\displaystyle Af = |S| \mu * f,

where {\mu} is the probability density

\displaystyle \mu := \frac{1}{|S|} \sum_{s \in S} \delta_s \ \ \ \ \ (1)


where {\delta_s} is the Kronecker delta function on {s}. Using the spectral definition of expansion, we thus see that {Cay(G,S)} is a one-sided expander if and only if

\displaystyle \langle f, \mu*f \rangle \leq (1-\epsilon) \|f\|_{\ell^2(G)} \ \ \ \ \ (2)


whenever {f \in \ell^2(G)} is orthogonal to the constant function {1}, and is a two-sided expander if

\displaystyle \| \mu*f \|_{\ell^2(G)} \leq (1-\epsilon) \|f\|_{\ell^2(G)} \ \ \ \ \ (3)


whenever {f \in \ell^2(G)} is orthogonal to the constant function {1}.

We remark that the above spectral definition of expansion can be easily extended to symmetric sets {S} which contain the identity or have multiplicity (i.e. are multisets). (We retain symmetry, though, in order to keep the operation of convolution by {\mu} self-adjoint.) In particular, one can say (with some slight abuse of notation) that a set of elements {s_1,\ldots,s_l} of {G} (possibly with repetition, and possibly with some elements equalling the identity) generates a one-sided or two-sided {\epsilon}-expander if the associated symmetric probability density

\displaystyle \mu := \frac{1}{2l} \sum_{i=1}^l \delta_{s_i} + \delta_{s_i^{-1}}

obeys either (2) or (3).

We saw in the last set of notes that expansion can be characterised in terms of random walks. One can of course specialise this characterisation to the Cayley graph case:

Exercise 5 (Random walk description of expansion) Let {Cay(G_n,S_n)} be a family of finite {k}-regular Cayley graphs, and let {\mu_n} be the associated probability density functions. Let {A > 1/2} be a constant.

  • Show that the {Cay(G_n,S_n)} are a two-sided expander family if and only if there exists a {C>0} such that for all sufficiently large {n}, one has {\| \mu_n^{*m} - \frac{1}{|G_n|} \|_{\ell^2(G_n)} \leq \frac{1}{|G_n|^A}} for some {m \leq C \log |G_n|}, where {\mu_n^{*m} := \mu_n * \ldots * \mu_n} denotes the convolution of {m} copies of {\mu_n}.
  • Show that the {Cay(G_n,S_n)} are a one-sided expander family if and only if there exists a {C>0} such that for all sufficiently large {n}, one has {\| (\frac{1}{2} \delta_1 + \frac{1}{2} \mu_n)^{*m} - \frac{1}{|G_n|} \|_{\ell^2(G_n)} \leq \frac{1}{|G_n|^A}} for some {m \leq C \log |G_n|}.

In this set of notes, we will connect expansion of Cayley graphs to an important property of certain infinite groups, known as Kazhdan’s property (T) (or property (T) for short). In 1973, Margulis exploited this property to create the first known explicit and deterministic examples of expanding Cayley graphs. As it turns out, property (T) is somewhat overpowered for this purpose; in particular, we now know that there are many families of Cayley graphs for which the associated infinite group does not obey property (T) (or weaker variants of this property, such as property {\tau}). In later notes we will therefore turn to other methods of creating Cayley graphs that do not rely on property (T). Nevertheless, property (T) is of substantial intrinsic interest, and also has many connections to other parts of mathematics than the theory of expander graphs, so it is worth spending some time to discuss it here.

The material here is based in part on this recent text on property (T) by Bekka, de la Harpe, and Valette (available online here).

Read the rest of this entry »

The objective of this course is to present a number of recent constructions of expander graphs, which are a type of sparse but “pseudorandom” graph of importance in computer science, the theory of random walks, geometric group theory, and in number theory. The subject of expander graphs and their applications is an immense one, and we will not possibly be able to cover it in full in this course. In particular, we will say almost nothing about the important applications of expander graphs to computer science, for instance in constructing good pseudorandom number generators, derandomising a probabilistic algorithm, constructing error correcting codes, or in building probabilistically checkable proofs. For such topics, I recommend the survey of Hoory-Linial-Wigderson. We will also only pass very lightly over the other applications of expander graphs, though if time permits I may discuss at the end of the course the application of expander graphs in finite groups such as {SL_2(F_p)} to certain sieving problems in analytic number theory, following the work of Bourgain, Gamburd, and Sarnak.

Instead of focusing on applications, this course will concern itself much more with the task of constructing expander graphs. This is a surprisingly non-trivial problem. On one hand, an easy application of the probabilistic method shows that a randomly chosen (large, regular, bounded-degree) graph will be an expander graph with very high probability, so expander graphs are extremely abundant. On the other hand, in many applications, one wants an expander graph that is more deterministic in nature (requiring either no or very few random choices to build), and of a more specialised form. For the applications to number theory or geometric group theory, it is of particular interest to determine the expansion properties of a very symmetric type of graph, namely a Cayley graph; we will also occasionally work with the more general concept of a Schreier graph. It turns out that such questions are related to deep properties of various groups {G} of Lie type (such as {SL_2({\bf R})} or {SL_2({\bf Z})}), such as Kazhdan’s property (T), the first nontrivial eigenvalue of a Laplacian on a symmetric space {G/\Gamma} associated to {G}, the quasirandomness of {G} (as measured by the size of irreducible representations), and the product theory of subsets of {G}. These properties are of intrinsic interest to many other fields of mathematics (e.g. ergodic theory, operator algebras, additive combinatorics, representation theory, finite group theory, number theory, etc.), and it is quite remarkable that a single problem – namely the construction of expander graphs – is so deeply connected with such a rich and diverse array of mathematical topics. (Perhaps this is because so many of these fields are all grappling with aspects of a single general problem in mathematics, namely when to determine whether a given mathematical object or process of interest “behaves pseudorandomly” or not, and how this is connected with the symmetry group of that object or process.)

(There are also other important constructions of expander graphs that are not related to Cayley or Schreier graphs, such as those graphs constructed by the zigzag product construction, but we will not discuss those types of graphs in this course, again referring the reader to the survey of Hoory, Linial, and Wigderson.)

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant {\det M_n} of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant {\det A_n} of a random iid matrix {A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf R}}), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf C}}, thus the real and imaginary parts are independent with law {N(0,1/2)_{\bf R}}), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution {\zeta_{ij} \equiv \pm 1} (with a {1/2} chance of either sign). More generally, one can consider a matrix {A_n} in which all the entries {\zeta_{ij}} are independently and identically distributed with mean zero and variance {1}.

We can expand {\det A_n} using the Leibniz expansion

\displaystyle  \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)

where {\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}} ranges over the permutations of {\{1,\ldots,n\}}, and {I_\sigma} is the product

\displaystyle  I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.

From the iid nature of the {\zeta_{ij}}, we easily see that each {I_\sigma} has mean zero and variance one, and are pairwise uncorrelated as {\sigma} varies. We conclude that {\det A_n} has mean zero and variance {n!} (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that {\det A_n} is typically of size {O(\sqrt{n!})}.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, {\det A_n} is clearly symmetrical, so we can focus attention on the magnitude {|\det A_n|}. We can interpret this quantity geometrically as the volume of an {n}-dimensional parallelopiped whose generating vectors {X_1,\ldots,X_n} are independent real gaussian vectors in {{\bf R}^n} (i.e. their coefficients are iid with law {N(0,1)_{\bf R}}). Using the classical base-times-height formula, we thus have

\displaystyle  |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)

where {V_i} is the {i-1}-dimensional linear subspace of {{\bf R}^n} spanned by {X_1,\ldots,X_{i-1}} (note that {X_1,\ldots,X_n}, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

\displaystyle  \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group {O(n)}. Because of this, we see that if we fix {X_1,\ldots,X_{i-1}} (and thus {V_i}, the random variable {\hbox{dist}(X_i,V_i)} has the same distribution as {\hbox{dist}(X_i,{\bf R}^{i-1})}, or equivalently the {\chi} distribution

\displaystyle  \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}

where {\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}} are iid copies of {N(0,1)_{\bf R}}. As this distribution does not depend on the {X_1,\ldots,X_{i-1}}, we conclude that the law of {\log |\det A_n|} is given by the sum of {n} independent {\chi}-variables:

\displaystyle  \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.

A standard computation shows that each {\chi_j^2} has mean {j} and variance {2j}, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that {\log \chi_j} has mean {\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})} and variance {\frac{1}{2j}+O(1/j^{3/2})}. As such, {\log |\det A_n|} has mean {\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)} and variance {\frac{1}{2} \log n + O(1)}. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)

where {\rightarrow} denotes convergence in distribution. A bit more informally, we have

\displaystyle  |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)

when {A_n} is a real gaussian matrix; thus, for instance, the median value of {|\det A_n|} is {n^{-1/2+o(1)} \sqrt{n!}}. At first glance, this appears to conflict with the second moment bound {\mathop{\bf E} |\det A_n|^2 = n!} of Turán mentioned earlier, but once one recalls that {\exp(N(0,t)_{\bf R})} has a second moment of {\exp(2t)}, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real {\chi} distribution with the complex {\chi} distribution, in which the {\xi_{i,j}} are distributed according to the complex gaussian {N(0,1)_{\bf C}} instead of the real one). At the end of the day, one ends up with the law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)

or more informally

\displaystyle  |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices {A_n} by Wigner matrices {M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus {\zeta_{ij} = \overline{\zeta_{ji}}} for all {i,j}. Model examples here include the Gaussian Unitary Ensemble (GUE), in which {\zeta_{ij} \equiv N(0,1)_{\bf C}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i=j \leq n}, the Gaussian Orthogonal Ensemble (GOE), in which {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,2)_{\bf R}} for {1 \leq i=j \leq n}, and the symmetric Bernoulli ensemble, in which {\zeta_{ij} \equiv \pm 1} for {1 \leq i \leq j \leq n} (with probability {1/2} of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants {\det M_n} of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the {I_\sigma} are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of {I_\sigma} is still usually zero, but equals {(-1)^{n/2}} in the exceptional case when {\sigma} is a perfect matching (i.e. the union of exactly {n/2} {2}-cycles, a possibility that can of course only happen when {n} is even). As such, the mean {\mathop{\bf E} \det M_n} still vanishes when {n} is odd, but for even {n} it is equal to

\displaystyle  (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}

(the fraction here simply being the number of perfect matchings on {n} vertices). Using Stirling’s formula, one then computes that {|\mathop{\bf E} \det A_n|} is comparable to {n^{-1/4} \sqrt{n!}} when {n} is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that {\mathop{\bf E} |\det A_n|^2} is comparable to {n^{1/2} n!} for GUE and {n^{3/2} n!} for GOE. (The discrepancy here comes from the fact that in the GOE case, {I_\sigma} and {I_\rho} can correlate when {\rho} contains reversals of {k}-cycles of {\sigma} for {k \geq 3}, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let {M_n} be a Wigner matrix.

  • If {M_n} is drawn from GUE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.

  • If {M_n} is drawn from GOE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.

  • The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )

when {M_n} is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )

when {M_n} is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

\displaystyle  \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)

of the Stieltjes transform

\displaystyle  s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}

of {M_n}. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) Let {M'_n} be the random tridiagonal real symmetric matrix

\displaystyle  M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}

where the {a_1,\ldots,a_n, b_1,\ldots,b_{n-1}} are jointly independent real random variables, with {a_1,\ldots,a_n \equiv N(0,1)_{\bf R}} being standard real Gaussians, and each {b_i} having a {\chi}-distribution:

\displaystyle  b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}

where {z_{i,j} \equiv N(0,1)_{\bf C}} are iid complex gaussians. Let {M_n} be drawn from GUE. Then the joint eigenvalue distribution of {M_n} is identical to the joint eigenvalue distribution of {M'_n}.

Proof: Let {M_n} be drawn from GUE. We can write

\displaystyle  M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}

where {M_{n-1}} is drawn from the {n-1\times n-1} GUE, {a_n \equiv N(0,1)_{\bf R}}, and {X_n \in {\bf C}^{n-1}} is a random gaussian vector with all entries iid with distribution {N(0,1)_{\bf C}}. Furthermore, {M_{n-1}, X_n, a_n} are jointly independent.

We now apply the tridiagonal matrix algorithm. Let {b_{n-1} := |X_n|}, then {b_n} has the {\chi}-distribution indicated in the proposition. We then conjugate {M_n} by a unitary matrix {U} that preserves the final basis vector {e_n}, and maps {X} to {b_{n-1} e_{n-1}}. Then we have

\displaystyle  U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}

where {\tilde M_{n-1}} is conjugate to {M_{n-1}}. Now we make the crucial observation: because {M_{n-1}} is distributed according to GUE (which is a unitarily invariant ensemble), and {U} is a unitary matrix independent of {M_{n-1}}, {\tilde M_{n-1}} is also distributed according to GUE, and remains independent of both {b_{n-1}} and {a_n}.

We continue this process, expanding {U M_n U^*} as

\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}

Applying a further unitary conjugation that fixes {e_{n-1}, e_n} but maps {X_{n-1}} to {b_{n-2} e_{n-2}}, we may replace {X_{n-1}} by {b_{n-2} e_{n-2}} while transforming {M_{n-2}} to another GUE matrix {\tilde M_{n-2}} independent of {a_n, b_{n-1}, a_{n-1}, b_{n-2}}. Iterating this process, we eventually obtain a coupling of {M_n} to {M'_n} by unitary conjugations, and the claim follows. \Box

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant {D_n} of the above matrix is given by solving the recursion

\displaystyle  D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}

with {D_0=1} and {D_{-1} = 0}. Thus, instead of the product of a sequence of independent scalar {\chi} distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent {2\times 2} matrices whose entries are given by gaussians and {\chi} distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these {2 \times 2} matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)