You are currently browsing the category archive for the ‘math.PR’ category.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).

The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble ${M_n}$ of ${n \times n}$ matrices, one can arbitrarily enumerate the (geometric) eigenvalues as ${\lambda_1(M_n),\ldots,\lambda_n(M_n) \in {\bf C}}$, and one can then define the ${k}$-point correlation functions ${\rho^{(k)}_n: {\bf C}^k \rightarrow {\bf R}^+}$ to be the symmetric functions such that

$\displaystyle \int_{{\bf C}^k} F(z_1,\ldots,z_k) \rho^{(k)}_n(z_1,\ldots,z_k)\ dz_1 \ldots dz_k$

$\displaystyle = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F(\lambda_1(M_n),\ldots,\lambda_k(M_n)).$

In the case when ${M_n}$ is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of ${\rho^{(k)}_n}$ near some point ${z \sqrt{n}}$ as ${n \rightarrow \infty}$ and ${z \in {\bf C}}$ is fixed are given by the determinantal rule

$\displaystyle \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow \hbox{det}( K(w_i,w_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)$

for ${|z| < 1}$ and

$\displaystyle \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow 0$

for ${|z| > 1}$, where ${K}$ is the reproducing kernel

$\displaystyle K(z,w) := \frac{1}{\pi} e^{-|z|^2/2 - |w|^2/2 + z \overline{w}}.$

(There is also an asymptotic for the boundary case ${|z|=1}$, but it is more complicated to state.) In particular, we see that ${\rho^{(k)}_n(z \sqrt{n}) \rightarrow \frac{1}{\pi} 1_{|z| \leq 1}}$ for almost every ${z}$, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.

Our first main result is that the asymptotic (1) for ${|z|<1}$ also holds (in the sense of vague convergence) when ${M_n}$ is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded ${z}$ and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is ${\sqrt{\frac{2n}{\pi}} + O(n^{1/2-c})}$ with probability ${1-O(n^{-c})}$ for some absolute constant ${c>0}$.

There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula

$\displaystyle \log |\det(M_n-z_0)| = - \sum_{1 \leq i \leq n: \lambda_i(M_n) \in B(z_0,r)} \log \frac{r}{|\lambda_i(M_n)-z_0|}$

$\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |\det(M_n-z_0-re^{i\theta})|\ d\theta$

that relates the local distribution of eigenvalues to the log-determinants ${\log |\det(M_n-z_0)|}$, and secondly the elementary identity

$\displaystyle \log |\det(M_n - z)| = \frac{1}{2} \log|\det W_{n,z}| + \frac{1}{2} n \log n$

that relates the log-determinants of ${M_n-z}$ to the log-determinants of the Hermitian matrices

$\displaystyle W_{n,z} := \frac{1}{\sqrt{n}} \begin{pmatrix} 0 & M_n -z \\ (M_n-z)^* & 0 \end{pmatrix}.$

The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants ${\log|\det W_{n,z}|}$. This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.

The matrices ${W_{n,z}}$ are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.

While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)

I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of a random Wigner matrix ${M_n}$ in the limit ${n \rightarrow \infty}$, with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues ${\lambda_i(M_n)}$ with ${\delta n \leq i \leq (1-\delta) i n}$ for some fixed ${\delta>0}$.

The location of an individual eigenvalue ${\lambda_i(M_n)}$ is by now quite well understood. If we normalise the entries of the matrix ${M_n}$ to have mean zero and variance ${1}$, then in the asymptotic limit ${n \rightarrow \infty}$, the Wigner semicircle law tells us that with probability ${1-o(1)}$ one has

$\displaystyle \lambda_i(M_n) =\sqrt{n} u + o(\sqrt{n})$

where the classical location ${u = u_{i/n} \in [-2,2]}$ of the eigenvalue is given by the formula

$\displaystyle \int_{-2}^{u} \rho_{sc}(x)\ dx = \frac{i}{n}$

and the semicircular distribution ${\rho_{sc}(x)\ dx}$ is given by the formula

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Actually, one can improve the error term here from ${o(\sqrt{n})}$ to ${O( \log^{1/2+\epsilon} n)}$ for any ${\epsilon>0}$ (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).

From the semicircle law (and the fundamental theorem of calculus), one expects the ${i^{th}}$ eigenvalue spacing ${\lambda_{i+1}(M_n)-\lambda_i(M_n)}$ to have an average size of ${\frac{1}{\sqrt{n} \rho_{sc}(u)}}$. It is thus natural to introduce the normalised eigenvalue spacing

$\displaystyle X_i := \frac{\lambda_{i+1}(M_n) - \lambda_i(M_n)}{1/\sqrt{n} \rho_{sc}(u)}$

and ask what the distribution of ${X_i}$ is.

As mentioned previously, we will focus on the bulk case ${\delta n \leq i\leq (1-\delta)n}$, and begin with the model case when ${M_n}$ is drawn from GUE. (In the edge case when ${i}$ is close to ${1}$ or to ${n}$, the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to ${X_i}$, namely the probability

$\displaystyle {\bf P}( N_{[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]} = 0) \ \ \ \ \ (1)$

that an interval ${[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}$ near ${\sqrt{n} u}$ of length comparable to the expected eigenvalue spacing ${1/\sqrt{n} \rho_{sc}(u)}$ is devoid of eigenvalues. For ${u}$ in the bulk and fixed ${x,y}$, they showed that this probability is equal to

$\displaystyle \det( 1 - 1_{[x,y]} P 1_{[x,y]} ) + o(1),$

where ${P}$ is the Dyson projection

$\displaystyle P f(x) = \int_{\bf R} \frac{\sin(\pi(x-y))}{\pi(x-y)} f(y)\ dy$

to Fourier modes in ${[-1/2,1/2]}$, and ${\det}$ is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that ${X_i}$ will be asymptotically distributed according to the Gaudin-Mehta distribution ${p(x)\ dx}$, where

$\displaystyle p(x) := \frac{d^2}{dx^2} \det( 1 - 1_{[0,x]} P 1_{[0,x]} ).$

A reasonably accurate approximation for ${p}$ is given by the Wigner surmise ${p(x) \approx \frac{1}{2} \pi x e^{-\pi x^2/4}}$, which was presciently proposed by Wigner as early as 1957; it is exact for ${n=2}$ but not in the asymptotic limit ${n \rightarrow \infty}$.

Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap ${X_i}$, but rather an ensemble of gaps ${X_i}$, where ${i}$ is drawn from an interval ${[i_0 - L, i_0 + L]}$ of some moderate size ${L}$ (e.g. ${L = \log n}$); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices ${i}$). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)

The problem is that when one specifies a given window of spectrum such as ${[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}$, one cannot quite pin down in advance which eigenvalues ${\lambda_i(M_n)}$ are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of ${\sqrt{\log n}}$ or so in the ${i}$ index (as can be quantified quite precisely by this central limit theorem of Gustavsson).

The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is even, but very few when ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is odd. In this sort of situation, the gaps ${X_i}$ may have different behaviour for even ${i}$ than for odd ${i}$, and such anomalies would not be picked up in the averaged statistics in which ${i}$ is allowed to range over some moderately large interval.

The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps ${X_i}$ in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the ${i}$ parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels ${u}$ than on specific eigenvalue indices ${i}$. To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)

The main task in the proof is to show that the random variable ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is largely decoupled from the event in (1) when ${M_n}$ is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert ${1 - 1_{[x,y]}P1_{[x,y]}}$, where ${P}$ is the Dyson projection and ${[x,y]}$ is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.

In the last three notes, we discussed the Bourgain-Gamburd expansion machine and two of its three ingredients, namely quasirandomness and product theorems, leaving only the non-concentration ingredient to discuss. We can summarise the results of the last three notes, in the case of fields of prime order, as the following theorem.

Theorem 1 (Non-concentration implies expansion in ${SL_d}$) Let ${p}$ be a prime, let ${d \geq 1}$, and let ${S}$ be a symmetric set of elements in ${G := SL_d(F_p)}$ of cardinality ${|S|=k}$ not containing the identity. Write ${\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}$, and suppose that one has the non-concentration property

$\displaystyle \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa} \ \ \ \ \ (1)$

for some ${\kappa>0}$ and some even integer ${n \leq \Lambda \log |G|}$. Then ${Cay(G,S)}$ is a two-sided ${\epsilon}$-expander for some ${\epsilon>0}$ depending only on ${k, d, \kappa,\Lambda}$.

Proof: From (1) we see that ${\mu^{(n)}}$ is not supported in any proper subgroup ${H}$ of ${G}$, which implies that ${S}$ generates ${G}$. The claim now follows from the Bourgain-Gamburd expansion machine (Theorem 2 of Notes 4), the product theorem (Theorem 1 of Notes 5), and quasirandomness (Exercise 8 of Notes 3). $\Box$

Remark 1 The same argument also works if we replace ${F_p}$ by the field ${F_{p^j}}$ of order ${p^j}$ for some bounded ${j}$. However, there is a difficulty in the regime when ${j}$ is unbounded, because the quasirandomness property becomes too weak for the Bourgain-Gamburd expansion machine to be directly applicable. On theother hand, the above type of theorem was generalised to the setting of cyclic groups ${{\bf Z}/q{\bf Z}}$ with ${q}$ square-free by Varju, to arbitrary ${q}$ by Bourgain and Varju, and to more general algebraic groups than ${SL_d}$ and square-free ${q}$ by Salehi Golsefidy and Varju. It may be that some modification of the proof techniques in these papers may also be able to handle the field case ${F_{p^j}}$ with unbounded ${j}$.

It thus remains to construct tools that can establish the non-concentration property (1). The situation is particularly simple in ${SL_2(F_p)}$, as we have a good understanding of the subgroups of that group. Indeed, from Theorem 14 from Notes 5, we obtain the following corollary to Theorem 1:

Corollary 2 (Non-concentration implies expansion in ${SL_2}$) Let ${p}$ be a prime, and let ${S}$ be a symmetric set of elements in ${G := SL_2(F_p)}$ of cardinality ${|S|=k}$ not containing the identity. Write ${\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}$, and suppose that one has the non-concentration property

$\displaystyle \sup_{B}\mu^{(n)}(B) < |G|^{-\kappa} \ \ \ \ \ (2)$

for some ${\kappa>0}$ and some even integer ${n \leq \Lambda \log |G|}$, where ${B}$ ranges over all Borel subgroups of ${SL_2(\overline{F})}$. Then, if ${|G|}$ is sufficiently large depending on ${k,\kappa,\Lambda}$, ${Cay(G,S)}$ is a two-sided ${\epsilon}$-expander for some ${\epsilon>0}$ depending only on ${k, \kappa,\Lambda}$.

It turns out (2) can be verified in many cases by exploiting the solvable nature of the Borel subgroups ${B}$. We give two examples of this in these notes. The first result, due to Bourgain and Gamburd (with earlier partial results by Gamburd and by Shalom) generalises Selberg’s expander construction to the case when ${S}$ generates a thin subgroup of ${SL_2({\bf Z})}$:

Theorem 3 (Expansion in thin subgroups) Let ${S}$ be a symmetric subset of ${SL_2({\bf Z})}$ not containing the identity, and suppose that the group ${\langle S \rangle}$ generated by ${S}$ is not virtually solvable. Then as ${p}$ ranges over all sufficiently large primes, the Cayley graphs ${Cay(SL_2(F_p), \pi_p(S))}$ form a two-sided expander family, where ${\pi_p: SL_2({\bf Z}) \rightarrow SL_2(F_p)}$ is the usual projection.

Remark 2 One corollary of Theorem 3 (or of the non-concentration estimate (3) below) is that ${\pi_p(S)}$ generates ${SL_2(F_p)}$ for all sufficiently large ${p}$, if ${\langle S \rangle}$ is not virtually solvable. This is a special case of a much more general result, known as the strong approximation theorem, although this is certainly not the most direct way to prove such a theorem. Conversely, the strong approximation property is used in generalisations of this result to higher rank groups than ${SL_2}$.

Exercise 1 In the converse direction, if ${\langle S\rangle}$ is virtually solvable, show that for sufficiently large ${p}$, ${\pi_p(S)}$ fails to generate ${SL_2(F_p)}$. (Hint: use Theorem 14 from Notes 5 to prevent ${SL_2(F_p)}$ from having bounded index solvable subgroups.)

Exercise 2 (Lubotzsky’s 1-2-3 problem) Let ${S := \{ \begin{pmatrix}1 & \pm 3 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix}1 & 0 \\ \pm 3 & 1 \end{pmatrix}}$.

• (i) Show that ${S}$ generates a free subgroup of ${SL_2({\bf Z})}$. (Hint: use a ping-pong argument, as in Exercise 23 of Notes 2.)
• (ii) Show that if ${v, w}$ are two distinct elements of the sector ${\{ (x,y) \in {\bf R}^2_+: x/2 < y < 2x \}}$, then there os no element ${g \in \langle S \rangle}$ for which ${gv = w}$. (Hint: this is another ping-pong argument.) Conclude that ${\langle S \rangle}$ has infinite index in ${SL_2({\bf Z})}$. (Contrast this with the situation in which the ${3}$ coefficients in ${S}$ are replaced by ${1}$ or ${2}$, in which case ${\langle S \rangle}$ is either all of ${SL_2({\bf Z})}$, or a finite index subgroup, as demonstrated in Exercise 23 of Notes 2).
• (iii) Show that ${Cay(SL_2(F_p), \pi_p(S))}$ for sufficiently large primes ${p}$ form a two-sided expander family.

Remark 3 Theorem 3 has been generalised to arbitrary linear groups, and with ${F_p}$ replaced by ${{\bf Z}/q{\bf Z}}$ for square-free ${q}$; see this paper of Salehi Golsefidy and Varju. In this more general setting, the condition of virtual solvability must be replaced by the condition that the connected component of the Zariski closure of ${\langle S \rangle}$ is perfect. An effective version of Theorem 3 (with completely explicit constants) was recently obtained by Kowalski.

The second example concerns Cayley graphs constructed using random elements of ${SL_2(F_p)}$.

Theorem 4 (Random generators expand) Let ${p}$ be a prime, and let ${x,y}$ be two elements of ${SL_2(F_p)}$ chosen uniformly at random. Then with probability ${1-o_{p \rightarrow \infty}(1)}$, ${Cay(SL_2(F_p), \{x,x^{-1},y,y^{-1}\})}$ is a two-sided ${\epsilon}$-expander for some absolute constant ${\epsilon}$.

Remark 4 As with Theorem 3, Theorem 4 has also been extended to a number of other groups, such as the Suzuki groups (in this paper of Breuillard, Green, and Tao), and more generally to finite simple groups of Lie type of bounded rank (in forthcoming work of Breuillard, Green, Guralnick, and Tao). There are a number of other constructions of expanding Cayley graphs in such groups (and in other interesting groups, such as the alternating groups) beyond those discussed in these notes; see this recent survey of Lubotzky for further discussion. It has been conjectured by Lubotzky and Weiss that any pair ${x,y}$ of (say) ${SL_2(F_p)}$ that generates the group, is a two-sided ${\epsilon}$-expander for an absolute constant ${\epsilon}$: in the case of ${SL_2(F_p)}$, this has been established for a density one set of primes by Breuillard and Gamburd.

— 1. Expansion in thin subgroups —

We now prove Theorem 3. The first observation is that the expansion property is monotone in the group ${\langle S \rangle}$:

Exercise 3 Let ${S, S'}$ be symmetric subsets of ${SL_2({\bf Z})}$ not containing the identity, such that ${\langle S \rangle \subset \langle S' \rangle}$. Suppose that ${Cay(SL_2(F_p), \pi_p(S))}$ is a two-sided expander family for sufficiently large primes ${p}$. Show that ${Cay(SL_2(F_p), \pi_p(S'))}$ is also a two-sided expander family.

As a consequence, Theorem 3 follows from the following two statments:

Theorem 5 (Tits alternative) Let ${\Gamma \subset SL_2({\bf Z})}$ be a group. Then exactly one of the following statements holds:

• (i) ${\Gamma}$ is virtually solvable.
• (ii) ${\Gamma}$ contains a copy of the free group ${F_2}$ of two generators as a subgroup.

Theorem 6 (Expansion in free groups) Let ${x,y \in SL_2({\bf Z})}$ be generators of a free subgroup of ${SL_2({\bf Z})}$. Then as ${p}$ ranges over all sufficiently large primes, the Cayley graphs ${Cay(SL_2(F_p), \pi_p(\{x,y,x^{-1},y^{-1}\}))}$ form a two-sided expander family.

Theorem 5 is a special case of the famous Tits alternative, which among other things allows one to replace ${SL_2({\bf Z})}$ by ${GL_d(k)}$ for any ${d \geq 1}$ and any field ${k}$ of characteristic zero (and fields of positive characteristic are also allowed, if one adds the requirement that ${\Gamma}$ be finitely generated). We will not prove the full Tits alternative here, but instead just give an ad hoc proof of the special case in Theorem 5 in the following exercise.

Exercise 4 Given any matrix ${g \in SL_2({\bf Z})}$, the singular values are ${\|g\|_{op}}$ and ${\|g\|_{op}^{-1}}$, and we can apply the singular value decomposition to decompose

$\displaystyle g = u_1(g) \|g\|_{op} v_1^*(g) + u_2(g) \|g\|_{op}^{-1} v_2(g)^*$

where ${u_1(g),u_2(g)\in {\bf C}^2}$ and ${v_1(g), v_2(g) \in {\bf C}^2}$ are orthonormal bases. (When ${\|g\|_{op}>1}$, these bases are uniquely determined up to phase rotation.) We let ${\tilde u_1(g) \in {\bf CP}^1}$ be the projection of ${u_1(g)}$ to the projective complex plane, and similarly define ${\tilde v_2(g)}$.

Let ${\Gamma}$ be a subgroup of ${SL_2({\bf Z})}$. Call a pair ${(u,v) \in {\bf CP}^1 \times {\bf CP}^1}$ a limit point of ${\Gamma}$ if there exists a sequence ${g_n \in \Gamma}$ with ${\|g_n\|_{op} \rightarrow \infty}$ and ${(\tilde u_1(g_n), \tilde v_2(g_n)) \rightarrow (u,v)}$.

• (i) Show that if ${\Gamma}$ is infinite, then there is at least one limit point.
• (ii) Show that if ${(u,v)}$ is a limit point, then so is ${(v,u)}$.
• (iii) Show that if there are two limit points ${(u,v), (u',v')}$ with ${\{u,v\} \cap \{u',v'\} = \emptyset}$, then there exist ${g,h \in \Gamma}$ that generate a free group. (Hint: Choose ${(\tilde u_1(g), \tilde v_2(g))}$ close to ${(u,v)}$ and ${(\tilde u_1(h),\tilde v_2(h))}$ close to ${(u',v')}$, and consider the action of ${g}$ and ${h}$ on ${{\bf CP}^1}$, and specifically on small neighbourhoods of ${u,v,u',v'}$, and set up a ping-pong type situation.)
• (iv) Show that if ${g \in SL_2({\bf Z})}$ is hyperbolic (i.e. it has an eigenvalue greater than 1), with eigenvectors ${u,v}$, then the projectivisations ${(\tilde u,\tilde v)}$ of ${u,v}$ form a limit point. Similarly, if ${g}$ is regular parabolic (i.e. it has an eigenvalue at 1, but is not the identity) with eigenvector ${u}$, show that ${(\tilde u,\tilde bu)}$ is a limit point.
• (v) Show that if ${\Gamma}$ has no free subgroup of two generators, then all hyperbolic and regular parabolic elements of ${\Gamma}$ have a common eigenvector. Conclude that all such elements lie in a solvable subgroup of ${\Gamma}$.
• (vi) Show that if an element ${g \in SL_2({\bf Z})}$ is neither hyperbolic nor regular parabolic, and is not a multiple of the identity, then ${g}$ is conjugate to a rotation by ${\pi/2}$ (in particular, ${g^2=-1}$).
• (vii) Establish Theorem 5. (Hint: show that two square roots of ${-1}$ in ${SL_2({\bf Z})}$ cannot multiply to another square root of ${-1}$.)

Now we prove Theorem 6. Let ${\Gamma}$ be a free subgroup of ${SL_2({\bf Z})}$ generated by two generators ${x,y}$. Let ${\mu := \frac{1}{4} (\delta_x +\delta_{x^{-1}} + \delta_y + \delta_{y^{-1}})}$ be the probability measure generating a random walk on ${SL_2({\bf Z})}$, thus ${(\pi_p)_* \mu}$ is the corresponding generator on ${SL_2(F_p)}$. By Corollary 2, it thus suffices to show that

$\displaystyle \sup_{B}((\pi_p)_* \mu)^{(n)}(B) < p^{-\kappa} \ \ \ \ \ (3)$

for all sufficiently large ${p}$, some absolute constant ${\kappa>0}$, and some even ${n = O(\log p)}$ (depending on ${p}$, of course), where ${B}$ ranges over Borel subgroups.

As ${\pi_p}$ is a homomorphism, one has ${((\pi_p)_* \mu)^{(n)}(B) = (\pi_p)_* (\mu^{(n)})(B) = \mu^{(n)}(\pi_p^{-1}(B))}$ and so it suffices to show that

$\displaystyle \sup_{B} \mu^{(n)}(\pi_p^{-1}(B)) < p^{-\kappa}.$

To deal with the supremum here, we will use an argument of Bourgain and Gamburd, taking advantage of the fact that all Borel groups of ${SL_2}$ obey a common group law, the point being that free groups such as ${\Gamma}$ obey such laws only very rarely. More precisely, we use the fact that the Borel groups are solvable of derived length two; in particular we have

$\displaystyle [[a,b],[c,d]] = 1 \ \ \ \ \ (4)$

for all ${a,b,c,d \in B}$. Now, ${\mu^{(n)}}$ is supported on matrices in ${SL_2({\bf Z})}$ whose coefficients have size ${O(\exp(O(n)))}$ (where we allow the implied constants to depend on the choice of generators ${x,y}$), and so ${(\pi_p)_*( \mu^{(n)} )}$ is supported on matrices in ${SL_2(F_p)}$ whose coefficients also have size ${O(\exp(O(n)))}$. If ${n}$ is less than a sufficiently small multiple of ${\log p}$, these coefficients are then less than ${p^{1/10}}$ (say). As such, if ${\tilde a,\tilde b,\tilde c,\tilde d \in SL_2({\bf Z})}$ lie in the support of ${\mu^{(n)}}$ and their projections ${a = \pi_p(\tilde a), \ldots, d = \pi_p(\tilde d)}$ obey the word law (4) in ${SL_2(F_p)}$, then the original matrices ${\tilde a, \tilde b, \tilde c, \tilde d}$ obey the word law (4) in ${SL_2({\bf Z})}$. (This lifting of identities from the characteristic ${p}$ setting of ${SL_2(F_p)}$ to the characteristic ${0}$ setting of ${SL_2({\bf Z})}$ is a simple example of the “Lefschetz principle”.)

To summarise, if we let ${E_{n,p,B}}$ be the set of all elements of ${\pi_p^{-1}(B)}$ that lie in the support of ${\mu^{(n)}}$, then (4) holds for all ${a,b,c,d \in E_{n,p,B}}$. This severely limits the size of ${E_{n,p,B}}$ to only be of polynomial size, rather than exponential size:

Proposition 7 Let ${E}$ be a subset of the support of ${\mu^{(n)}}$ (thus, ${E}$ consists of words in ${x,y,x^{-1},y^{-1}}$ of length ${n}$) such that the law (4) holds for all ${a,b,c,d \in E}$. Then ${|E| \ll n^2}$.

The proof of this proposition is laid out in the exercise below.

Exercise 5 Let ${\Gamma}$ be a free group generated by two generators ${x,y}$. Let ${B}$ be the set of all words of length at most ${n}$ in ${x,y,x^{-1},y^{-1}}$.

• (i) Show that if ${a,b \in \Gamma}$ commute, then ${a, b}$ lie in the same cyclic group, thus ${a = c^i, b = c^j}$ for some ${c \in \Gamma}$ and ${i,j \in {\bf Z}}$.
• (ii) Show that if ${a \in \Gamma}$, there are at most ${O(n)}$ elements of ${B}$ that commute with ${a}$.
• (iii) Show that if ${a,c \in \Gamma}$, there are at most ${O(n)}$ elements ${b}$ of ${B}$ with ${[a,b] = c}$.
• (iv) Prove Proposition 7.

Now we can conclude the proof of Theorem 3:

Exercise 6 Let ${\Gamma}$ be a free group generated by two generators ${x,y}$.

• (i) Show that ${\| \mu^{(n)} \|_{\ell^\infty(\Gamma)} \ll c^n}$ for some absolute constant ${0 < c<1}$. (For much more precise information on ${\mu^{(n)}}$, see this paper of Kesten.)
• (ii) Conclude the proof of Theorem 3.

— 2. Random generators expand —

We now prove Theorem 4. Let ${{\bf F}_2}$ be the free group on two formal generators ${a,b}$, and let ${\mu := \frac{1}{4}(\delta_a + \delta_b + \delta_{a^{-1}}+ \delta_{b^{-1}}}$ be the generator of the random walk. For any word ${w \in {\bf F}_2}$ and any ${x,y}$ in a group ${G}$, let ${w(x,y) \in G}$ be the element of ${G}$ formed by substituting ${x,y}$ for ${a,b}$ respectively in the word ${w}$; thus ${w}$ can be viewed as a map ${w: G \times G \rightarrow G}$ for any group ${G}$. Observe that if ${w}$ is drawn randomly using the distribution ${\mu^{(n)}}$, and ${x,y \in SL_2(F_p)}$, then ${w(x,y)}$ is distributed according to the law ${\tilde \mu^{(n)}}$, where ${\tilde \mu := \frac{1}{4}(\delta_x + \delta_y + \delta_{x^{-1}}+ \delta_{y^{-1}})}$. Applying Corollary 2, it suffices to show that whenever ${p}$ is a large prime and ${x,y}$ are chosen uniformly and independently at random from ${SL_2(F_p)}$, that with probability ${1-o_{p \rightarrow \infty}(1)}$, one has

$\displaystyle \sup_B {\bf P}_w ( w(x,y) \in B ) \leq p^{-\kappa} \ \ \ \ \ (5)$

for some absolute constant ${\kappa}$, where ${B}$ ranges over all Borel subgroups of ${SL_2(\overline{F_p})}$ and ${w}$ is drawn from the law ${\mu^{(n)}}$ for some even natural number ${n = O(\log p)}$.

Let ${B_n}$ denote the words in ${{\bf F}_2}$ of length at most ${n}$. We may use the law (4) to obtain good bound on the supremum in (5) assuming a certain non-degeneracy property of the word evaluations ${w(x,y)}$:

Exercise 7 Let ${n}$ be a natural number, and suppose that ${x,y \in SL_2(F_p)}$ is such that ${w(x,y) \neq 1}$ for ${w \in B_{100n} \backslash \{1\}}$. Show that

$\displaystyle \sup_B {\bf P}_w ( w(x,y) \in B ) \ll \exp(-cn)$

for some absolute constant ${c>0}$, where ${w}$ is drawn from the law ${\mu^{(n)}}$. (Hint: use (4) and the hypothesis to lift the problem up to ${{\bf F}_2}$, at which point one can use Proposition 7 and Exercise 6.)

In view of this exercise, it suffices to show that with probability ${1-o_{p \rightarrow\infty}(1)}$, one has ${w(x,y) \neq 1}$ for all ${w \in B_{100n} \backslash \{1\}}$ for some ${n}$ comparable to a small multiple of ${\log p}$. As ${B_{100n}}$ has ${\exp(O(n))}$ elements, it thus suffices by the union bound to show that

$\displaystyle {\bf P}_{x,y}(w(x,y)=1) \leq p^{-\gamma} \ \ \ \ \ (6)$

for some absolute constant ${\gamma > 0}$, and any ${w \in {\bf F}_2 \backslash \{1\}}$ of length less than ${c\log p}$ for some sufficiently small absolute constant ${c>0}$.

Let us now fix a non-identity word ${w}$ of length ${|w|}$ less than ${c\log p}$, and consider ${w}$ as a function from ${SL_2(k) \times SL_2(k)}$ to ${SL_2(k)}$ for an arbitrary field ${k}$. We can identify ${SL_2(k)}$ with the set ${\{ (a,b,c,d)\in k^4: ad-bc=1\}}$. A routine induction then shows that the expression ${w((a,b,c,d),(a',b',c',d'))}$ is then a polynomial in the eight variables ${a,b,c,d,a',b',c',d'}$ of degree ${O(|w|)}$ and coefficients which are integers of size ${O( \exp( O(|w|) ) )}$. Let us then make the additional restriction to the case ${a,a' \neq 0}$, in which case we can write ${d = \frac{bc+1}{a}}$ and ${d' =\frac{b'c'+1}{a'}}$. Then ${w((a,b,c,d),(a',b',c',d'))}$ is now a rational function of ${a,b,c,a',b',c'}$ whose numerator is a polynomial of degree ${O(|w|)}$ and coefficients of size ${O( \exp( O(|w|) ) )}$, and the denominator is a monomial of ${a,a'}$ of degree ${O(|w|)}$.

We then specialise this rational function to the field ${k=F_p}$. It is conceivable that when one does so, the rational function collapses to the constant polynomial ${(1,0,0,1)}$, thus ${w((a,b,c,d),(a',b',c',d'))=1}$ for all ${(a,b,c,d),(a',b',c',d') \in SL_2(F_p)}$ with ${a,a' \neq 0}$. (For instance, this would be the case if ${w(x,y) = x^{|SL_2(F_p)|}}$, by Lagrange’s theorem, if it were not for the fact that ${|w|}$ is far too large here.) But suppose that this rational function does not collapse to the constant rational function. Applying the Schwarz-Zippel lemma (Exercise 23 from Notes 5), we then see that the set of pairs ${(a,b,c,d),(a',b',c',d') \in SL_2(F_p)}$ with ${a,a' \neq 0}$ and ${w((a,b,c,d),(a',b',c',d'))=1}$ is at most ${O( |w| p^5 )}$; adding in the ${a=0}$ and ${a'=0}$ cases, one still obtains a bound of ${O(|w|p^5)}$, which is acceptable since ${|SL_2(F_p)|^2 \sim p^6}$ and ${|w| = O( \log p )}$. Thus, the only remaining case to consider is when the rational function ${w((a,b,c,d),(a',b',c',d'))}$ is identically ${1}$ on ${SL_2(F_p)}$ with ${a,a' \neq 0}$.

Now we perform another “Lefschetz principle” maneuvre to change the underlying field. Recall that the denominator of rational function ${w((a,b,c,d),(a',b',c',d'))}$ is monomial in ${a,a'}$, and the numerator has coefficients of size ${O(\exp(O(|w|)))}$. If ${|w|}$ is less than ${c\log p}$ for a sufficiently small ${p}$, we conclude in particular (for ${p}$ large enough) that the coefficients all have magnitude less than ${p}$. As such, the only way that this function can be identically ${1}$ on ${SL_2(F_p)}$ is if it is identically ${1}$ on ${SL_2(k)}$ for all ${k}$ with ${a,a' \neq 0}$, and hence for ${a=0}$ or ${a'=0}$ also by taking Zariski closures.

On the other hand, we know that for some choices of ${k}$, e.g. ${k={\bf R}}$, ${SL_2(k)}$ contains a copy ${\Gamma}$ of the free group on two generators (see e.g. Exercise 23 of Notes 2). As such, it is not possible for any non-identity word ${w}$ to be identically trivial on ${SL_2(k) \times SL_2(k)}$. Thus this case cannot actually occur, completing the proof of (6) and hence of Theorem 4.

Remark 5 We see from the above argument that the existence of subgroups ${\Gamma}$ of an algebraic group with good “independence” properties – such as that of generating a free group – can be useful in studying the expansion properties of that algebraic group, even if the field of interest in the latter is distinct from that of the former. For more complicated algebraic groups than ${SL_2}$, in which laws such as (4) are not always available, it turns out to be useful to place further properties on the subgroup ${\Gamma}$, for instance by requiring that all non-abelian subgroups of that group be Zariski dense (a property which has been called strong density), as this turns out to be useful for preventing random walks from concentrating in proper algebraic subgroups. See this paper of Breuillard, Guralnick, Green and Tao for constructions of strongly dense free subgroups of algebraic groups and further discussion.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as the definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)

Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the heat flow method, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the swapping method, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as four moment theorems, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.

Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).

Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the ${k}$-point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.

In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.

We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.

Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of a random Wigner matrix ${M_n}$ (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues ${\lambda_i(M_n)}$ with ${\delta n \leq i \leq (1-\delta) i n}$ for some fixed ${\delta>0}$, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix ${M_n}$ to have mean zero and variance ${1/n}$, then in the asymptotic limit ${n \rightarrow \infty}$, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution ${\rho_{sc}(x)\ dx}$, where

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

An essentially equivalent way of saying this is that for large ${n}$, we expect the ${i^{th}}$ eigenvalue ${\lambda_i(M_n)}$ of ${M_n}$ to stay close to the classical location ${\gamma_i \in [-2,2]}$, defined by the formula

$\displaystyle \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.$

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

$\displaystyle \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)$

for all ${1 \leq i \leq n}$.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape ${O(n^{-c})}$ for some absolute constant ${c>0}$; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that ${\lambda_i(M_n)}$ has variance comparable to ${\frac{\log n}{n^2}}$ in the bulk, so that the optimal error term in (1) should be about ${O(\log^{1/2} n/n)}$. (One may think that if one wanted bounds on (1) that were uniform in ${i}$, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the ${\lambda_i}$; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order ${O(\log^{1/2} n/n)}$.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

$\displaystyle N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx$

with exponentially high probability for intervals ${I}$ in the bulk that were as short as ${n^{-1+\epsilon}}$ for some ${\epsilon>0}$, where ${N(I)}$ is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues ${\lambda_i}$ (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

$\displaystyle \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )$

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor ${\log^{O(\log\log n)} n}$. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

$\displaystyle \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )$

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function ${F(X_1,\ldots,X_n)}$ of many independent random variables ${X_1,\ldots,X_n}$ with the same function ${F(Y_1,\ldots,Y_n)}$ of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

$\displaystyle {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))$

for various smooth test functions ${\phi}$, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, ${\phi}$ was a bounded test function, which allowed one to get control of the bulk of the distribution of ${F(X_1,\ldots,X_n)}$, and in particular in controlling probabilities such as

$\displaystyle {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )$

for various thresholds ${a}$ and ${b}$, but did not give good control on the tail as the error terms tended to be polynomially decaying in ${n}$ rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

$\displaystyle {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k$

for various moderately large ${k}$ (e.g. of size comparable to ${\log n}$), obtaining results such as

$\displaystyle {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k$

after performing all the relevant exchanges. As such, one can then use large deviation estimates on ${F(X_1,\ldots,X_n)}$ to deduce large deviation estimates on ${F(Y_1,\ldots,Y_n)}$.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents ${(M_n-z)^{-1}}$ (and the closely related Stieltjes transform ${s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}$) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

$\displaystyle \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta$

for any energy level ${E \in {\bf R}}$, which can be verified from elementary calculus. (In practice, we would truncate ${\eta}$ near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions ${N((-\infty,E))}$, which in turn can be used to deduce concentration results for individual eigenvalues ${\lambda_i(M_n)}$ by some basic combinatorial manipulations.

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year.  This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices.  In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries.  We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant ${\det M_n}$ of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant ${\det A_n}$ of a random iid matrix ${A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$, thus the real and imaginary parts are independent with law ${N(0,1/2)_{\bf R}}$), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution ${\zeta_{ij} \equiv \pm 1}$ (with a ${1/2}$ chance of either sign). More generally, one can consider a matrix ${A_n}$ in which all the entries ${\zeta_{ij}}$ are independently and identically distributed with mean zero and variance ${1}$.

We can expand ${\det A_n}$ using the Leibniz expansion

$\displaystyle \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)$

where ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ ranges over the permutations of ${\{1,\ldots,n\}}$, and ${I_\sigma}$ is the product

$\displaystyle I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.$

From the iid nature of the ${\zeta_{ij}}$, we easily see that each ${I_\sigma}$ has mean zero and variance one, and are pairwise uncorrelated as ${\sigma}$ varies. We conclude that ${\det A_n}$ has mean zero and variance ${n!}$ (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that ${\det A_n}$ is typically of size ${O(\sqrt{n!})}$.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, ${\det A_n}$ is clearly symmetrical, so we can focus attention on the magnitude ${|\det A_n|}$. We can interpret this quantity geometrically as the volume of an ${n}$-dimensional parallelopiped whose generating vectors ${X_1,\ldots,X_n}$ are independent real gaussian vectors in ${{\bf R}^n}$ (i.e. their coefficients are iid with law ${N(0,1)_{\bf R}}$). Using the classical base-times-height formula, we thus have

$\displaystyle |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)$

where ${V_i}$ is the ${i-1}$-dimensional linear subspace of ${{\bf R}^n}$ spanned by ${X_1,\ldots,X_{i-1}}$ (note that ${X_1,\ldots,X_n}$, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

$\displaystyle \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).$

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group ${O(n)}$. Because of this, we see that if we fix ${X_1,\ldots,X_{i-1}}$ (and thus ${V_i}$, the random variable ${\hbox{dist}(X_i,V_i)}$ has the same distribution as ${\hbox{dist}(X_i,{\bf R}^{i-1})}$, or equivalently the ${\chi}$ distribution

$\displaystyle \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}$

where ${\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}}$ are iid copies of ${N(0,1)_{\bf R}}$. As this distribution does not depend on the ${X_1,\ldots,X_{i-1}}$, we conclude that the law of ${\log |\det A_n|}$ is given by the sum of ${n}$ independent ${\chi}$-variables:

$\displaystyle \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.$

A standard computation shows that each ${\chi_j^2}$ has mean ${j}$ and variance ${2j}$, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that ${\log \chi_j}$ has mean ${\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})}$ and variance ${\frac{1}{2j}+O(1/j^{3/2})}$. As such, ${\log |\det A_n|}$ has mean ${\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)}$ and variance ${\frac{1}{2} \log n + O(1)}$. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)$

where ${\rightarrow}$ denotes convergence in distribution. A bit more informally, we have

$\displaystyle |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)$

when ${A_n}$ is a real gaussian matrix; thus, for instance, the median value of ${|\det A_n|}$ is ${n^{-1/2+o(1)} \sqrt{n!}}$. At first glance, this appears to conflict with the second moment bound ${\mathop{\bf E} |\det A_n|^2 = n!}$ of Turán mentioned earlier, but once one recalls that ${\exp(N(0,t)_{\bf R})}$ has a second moment of ${\exp(2t)}$, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real ${\chi}$ distribution with the complex ${\chi}$ distribution, in which the ${\xi_{i,j}}$ are distributed according to the complex gaussian ${N(0,1)_{\bf C}}$ instead of the real one). At the end of the day, one ends up with the law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)$

$\displaystyle |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)$

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices ${A_n}$ by Wigner matrices ${M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus ${\zeta_{ij} = \overline{\zeta_{ji}}}$ for all ${i,j}$. Model examples here include the Gaussian Unitary Ensemble (GUE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i=j \leq n}$, the Gaussian Orthogonal Ensemble (GOE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,2)_{\bf R}}$ for ${1 \leq i=j \leq n}$, and the symmetric Bernoulli ensemble, in which ${\zeta_{ij} \equiv \pm 1}$ for ${1 \leq i \leq j \leq n}$ (with probability ${1/2}$ of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants ${\det M_n}$ of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the ${I_\sigma}$ are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of ${I_\sigma}$ is still usually zero, but equals ${(-1)^{n/2}}$ in the exceptional case when ${\sigma}$ is a perfect matching (i.e. the union of exactly ${n/2}$ ${2}$-cycles, a possibility that can of course only happen when ${n}$ is even). As such, the mean ${\mathop{\bf E} \det M_n}$ still vanishes when ${n}$ is odd, but for even ${n}$ it is equal to

$\displaystyle (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}$

(the fraction here simply being the number of perfect matchings on ${n}$ vertices). Using Stirling’s formula, one then computes that ${|\mathop{\bf E} \det A_n|}$ is comparable to ${n^{-1/4} \sqrt{n!}}$ when ${n}$ is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that ${\mathop{\bf E} |\det A_n|^2}$ is comparable to ${n^{1/2} n!}$ for GUE and ${n^{3/2} n!}$ for GOE. (The discrepancy here comes from the fact that in the GOE case, ${I_\sigma}$ and ${I_\rho}$ can correlate when ${\rho}$ contains reversals of ${k}$-cycles of ${\sigma}$ for ${k \geq 3}$, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let ${M_n}$ be a Wigner matrix.

• If ${M_n}$ is drawn from GUE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.$

• If ${M_n}$ is drawn from GOE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.$

• The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )$

when ${M_n}$ is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )$

when ${M_n}$ is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

$\displaystyle \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)$

of the Stieltjes transform

$\displaystyle s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}$

of ${M_n}$. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) \cite{trotter} Let ${M'_n}$ be the random tridiagonal real symmetric matrix

$\displaystyle M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}$

where the ${a_1,\ldots,a_n, b_1,\ldots,b_{n-1}}$ are jointly independent real random variables, with ${a_1,\ldots,a_n \equiv N(0,1)_{\bf R}}$ being standard real Gaussians, and each ${b_i}$ having a ${\chi}$-distribution:

$\displaystyle b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}$

where ${z_{i,j} \equiv N(0,1)_{\bf C}}$ are iid complex gaussians. Let ${M_n}$ be drawn from GUE. Then the joint eigenvalue distribution of ${M_n}$ is identical to the joint eigenvalue distribution of ${M'_n}$.

Proof: Let ${M_n}$ be drawn from GUE. We can write

$\displaystyle M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}$

where ${M_{n-1}}$ is drawn from the ${n-1\times n-1}$ GUE, ${a_n \equiv N(0,1)_{\bf R}}$, and ${X_n \in {\bf C}^{n-1}}$ is a random gaussian vector with all entries iid with distribution ${N(0,1)_{\bf C}}$. Furthermore, ${M_{n-1}, X_n, a_n}$ are jointly independent.

We now apply the tridiagonal matrix algorithm. Let ${b_{n-1} := |X_n|}$, then ${b_n}$ has the ${\chi}$-distribution indicated in the proposition. We then conjugate ${M_n}$ by a unitary matrix ${U}$ that preserves the final basis vector ${e_n}$, and maps ${X}$ to ${b_{n-1} e_{n-1}}$. Then we have

$\displaystyle U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}$

where ${\tilde M_{n-1}}$ is conjugate to ${M_{n-1}}$. Now we make the crucial observation: because ${M_{n-1}}$ is distributed according to GUE (which is a unitarily invariant ensemble), and ${U}$ is a unitary matrix independent of ${M_{n-1}}$, ${\tilde M_{n-1}}$ is also distributed according to GUE, and remains independent of both ${b_{n-1}}$ and ${a_n}$.

We continue this process, expanding ${U M_n U^*}$ as

$\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}$

Applying a further unitary conjugation that fixes ${e_{n-1}, e_n}$ but maps ${X_{n-1}}$ to ${b_{n-2} e_{n-2}}$, we may replace ${X_{n-1}}$ by ${b_{n-2} e_{n-2}}$ while transforming ${M_{n-2}}$ to another GUE matrix ${\tilde M_{n-2}}$ independent of ${a_n, b_{n-1}, a_{n-1}, b_{n-2}}$. Iterating this process, we eventually obtain a coupling of ${M_n}$ to ${M'_n}$ by unitary conjugations, and the claim follows. $\Box$

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant ${D_n}$ of the above matrix is given by solving the recursion

$\displaystyle D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}$

with ${D_0=1}$ and ${D_{-1} = 0}$. Thus, instead of the product of a sequence of independent scalar ${\chi}$ distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent ${2\times 2}$ matrices whose entries are given by gaussians and ${\chi}$ distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these ${2 \times 2}$ matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

Let ${n}$ be a natural number, and let ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ be a permutation of ${\{1,\ldots,n\}}$, drawn uniformly at random. Using the cycle decomposition, one can view ${\sigma}$ as the disjoint union of cycles of varying lengths (from ${1}$ to ${n}$). For each ${1 \leq k \leq n}$, let ${C_k}$ denote the number of cycles of ${\sigma}$ of length ${k}$; thus the ${C_k}$ are natural number-valued random variables with the constraint

$\displaystyle \sum_{k=1}^n k C_k = n. \ \ \ \ \ (1)$

We let ${C := \sum_{k=1}^n C_k}$ be the number of cycles (of arbitrary length); this is another natural number-valued random variable, of size at most ${n}$.

I recently had need to understand the distribution of the random variables ${C_k}$ and ${C}$. As it turns out this is an extremely classical subject, but as an exercise I worked out what I needed using a quite tedious computation involving generating functions that I will not reproduce here. But the resulting identities I got were so nice, that they strongly suggested the existence of elementary bijective (or “double counting”) proofs, in which the identities are proven with a minimum of computation, by interpreting each side of the identity as the cardinality (or probability) of the same quantity (or event), viewed in two different ways. I then found these bijective proofs, which I found to be rather cute; again, these are all extremely classical (closely related, for instance, to Stirling numbers of the first kind), but I thought some readers might be interested in trying to find these proofs themselves as an exercise (and I also wanted a place to write the identities down so I could retrieve them later), so I have listed the identities I found below.

1. For any ${1 \leq k \leq n}$, one has ${{\bf E} C_k = \frac{1}{k}}$. In particular, ${{\bf E} C = 1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + O(1)}$.
2. More generally, for any ${1 \leq k \leq n}$ and ${j \geq 1}$ with ${jk \leq n}$, one has ${{\bf E} \binom{C_k}{j} = \frac{1}{k^j j!}}$.
3. More generally still, for any ${1 \leq k_1 < \ldots < k_r \leq n}$ and ${j_1,\ldots,j_r \geq 1}$ with ${\sum_{i=1}^r j_i k_i \leq n}$, one has

$\displaystyle {\bf E} \prod_{i=1}^r \binom{C_{k_i}}{j_i} = \prod_{i=1}^r \frac{1}{k_i^{j_i} j_i!}.$

4. In particular, we have Cauchy’s formula: if ${\sum_{k=1}^n j_k k = n}$, then the probability that ${C_k = j_k}$ for all ${k=1,\ldots,n}$ is precisely ${\prod_{k=1}^n \frac{1}{k^{j_k} j_k!}}$. (This in particular leads to a reasonably tractable formula for the joint generating function of the ${C_k}$, which is what I initially used to compute everything that I needed, before finding the slicker bijective proofs.)
5. For fixed ${k}$, ${C_k}$ converges in distribution as ${n \rightarrow \infty}$ to the Poisson distribution of intensity ${\frac{1}{k}}$.
6. More generally, for fixed ${1 \leq k_1 < \ldots < k_r}$, ${C_{k_1},\ldots,C_{k_r}}$ converge in joint distribution to ${r}$ independent Poisson distributions of intensity ${\frac{1}{k_1},\ldots,\frac{1}{k_r}}$ respectively. (A more precise version of this claim can be found in this paper of Arratia and Tavaré.)
7. One has ${{\bf E} 2^C = n+1}$.
8. More generally, one has ${{\bf E} m^C = \binom{n+m-1}{n}}$ for all natural numbers ${m}$.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of eigenvectors“, submitted to Random Matrices: Theory and Applications. This paper concerns an extension of our four moment theorem for eigenvalues. Roughly speaking, that four moment theorem asserts (under mild decay conditions on the coefficients of the random matrix) that the fine-scale structure of individual eigenvalues of a Wigner random matrix depend only on the first four moments of each of the entries.

In this paper, we extend this result from eigenvalues to eigenvectors, and specifically to the coefficients of, say, the ${i^{th}}$ eigenvector ${u_i(M_n)}$ of a Wigner random matrix ${M_n}$. Roughly speaking, the main result is that the distribution of these coefficients also only depends on the first four moments of each of the entries. In particular, as the distribution of coefficients eigenvectors of invariant ensembles such as GOE or GUE are known to be asymptotically gaussian real (in the GOE case) or gaussian complex (in the GUE case), the same asymptotic automatically holds for Wigner matrices whose coefficients match GOE or GUE to fourth order.

(A technical point here: strictly speaking, the eigenvectors ${u_i(M_n)}$ are only determined up to a phase, even when the eigenvalues are simple. So, to phrase the question properly, one has to perform some sort of normalisation, for instance by working with the coefficients of the spectral projection operators ${P_i(M_n) := u_i(M_n) u_i(M_n)^*}$ instead of the eigenvectors, or rotating each eigenvector by a random phase, or by fixing the first component of each eigenvector to be positive real. This is a fairly minor technical issue here, though, and will not be discussed further.)

This theorem strengthens a four moment theorem for eigenvectors recently established by Knowles and Yin (by a somewhat different method), in that the hypotheses are weaker (no level repulsion assumption is required, and the matrix entries only need to obey a finite moment condition rather than an exponential decay condition), and a slightly stronger conclusion (less regularity is needed on the test function, and one can handle the joint distribution of polynomially many coefficients, rather than boundedly many coefficients). On the other hand, the Knowles-Yin paper can also handle generalised Wigner ensembles in which the variances of the entries are allowed to fluctuate somewhat.

The method used here is a variation of that in our original paper (incorporating the subsequent improvements to extend the four moment theorem from the bulk to the edge, and to replace exponential decay by a finite moment condition). That method was ultimately based on the observation that if one swapped a single entry (and its adjoint) in a Wigner random matrix, then an individual eigenvalue ${\lambda_i(M_n)}$ would not fluctuate much as a consequence (as long as one had already truncated away the event of an unexpectedly small eigenvalue gap). The same analysis shows that the projection matrices ${P_i(M_n)}$ obeys the same stability property.

As an application of the eigenvalue four moment theorem, we establish a four moment theorem for the coefficients of resolvent matrices ${(\frac{1}{\sqrt{n}} M_n - zI)^{-1}}$, even when ${z}$ is on the real axis (though in that case we need to make a level repulsion hypothesis, which has been already verified in many important special cases and is likely to be true in general). This improves on an earlier four moment theorem for resolvents of Erdos, Yau, and Yin, which required ${z}$ to stay some distance away from the real axis (specifically, that ${\hbox{Im}(z) \geq n^{-1-c}}$ for some small ${c>0}$).

Let ${G = (G,+)}$ be an abelian countable discrete group. A measure-preserving ${G}$-system ${X = (X, {\mathcal X}, \mu, (T_g)_{g \in G})}$ (or ${G}$-system for short) is a probability space ${(X, {\mathcal X}, \mu)}$, equipped with a measure-preserving action ${T_g: X \rightarrow X}$ of the group ${G}$, thus

$\displaystyle \mu( T_g(E) ) = \mu(E)$

for all ${E \in {\mathcal X}}$ and ${g \in G}$, and

$\displaystyle T_g T_h = T_{g+h}$

for all ${g, h \in G}$, with ${T_0}$ equal to the identity map. Classically, ergodic theory has focused on the cyclic case ${G={\bf Z}}$ (in which the ${T_g}$ are iterates of a single map ${T = T_1}$, with elements of ${G}$ being interpreted as a time parameter), but one can certainly consider actions of other groups ${G}$ also (including continuous or non-abelian groups).

A ${G}$-system is said to be strongly ${2}$-mixing, or strongly mixing for short, if one has

$\displaystyle \lim_{g \rightarrow \infty} \mu( A \cap T_g B ) = \mu(A) \mu(B)$

for all ${A, B \in {\mathcal X}}$, where the convergence is with respect to the one-point compactification of ${G}$ (thus, for every ${\epsilon > 0}$, there exists a compact (hence finite) subset ${K}$ of ${G}$ such that ${|\mu(A \cap T_g B) - \mu(A)\mu(B)| \leq \epsilon}$ for all ${g \not \in K}$).

Similarly, we say that a ${G}$-system is strongly ${3}$-mixing if one has

$\displaystyle \lim_{g,h,h-g \rightarrow \infty} \mu( A \cap T_g B \cap T_h C ) = \mu(A) \mu(B) \mu(C)$

for all ${A,B,C \in {\mathcal X}}$, thus for every ${\epsilon > 0}$, there exists a finite subset ${K}$ of ${G}$ such that

$\displaystyle |\mu( A \cap T_g B \cap T_h C ) - \mu(A) \mu(B) \mu(C)| \leq \epsilon$

whenever ${g, h, h-g}$ all lie outside ${K}$.

It is obvious that a strongly ${3}$-mixing system is necessarily strong ${2}$-mixing. In the case of ${{\bf Z}}$-systems, it has been an open problem for some time, due to Rohlin, whether the converse is true:

Problem 1 (Rohlin’s problem) Is every strongly mixing ${{\bf Z}}$-system necessarily strongly ${3}$-mixing?

This is a surprisingly difficult problem. In the positive direction, a routine application of the Cauchy-Schwarz inequality (via van der Corput’s inequality) shows that every strongly mixing system is weakly ${3}$-mixing, which roughly speaking means that ${\mu(A \cap T_g B \cap T_h C)}$ converges to ${\mu(A) \mu(B) \mu(C)}$ for most ${g, h \in {\bf Z}}$. Indeed, every weakly mixing system is in fact weakly mixing of all orders; see for instance this blog post of Carlos Matheus, or these lecture notes of myself. So the problem is to exclude the possibility of correlation between ${A}$, ${T_g B}$, and ${T_h C}$ for a small but non-trivial number of pairs ${(g,h)}$.

It is also known that the answer to Rohlin’s problem is affirmative for rank one transformations (a result of Kalikow) and for shifts with purely singular continuous spectrum (a result of Host; note that strongly mixing systems cannot have any non-trivial point spectrum). Indeed, any counterexample to the problem, if it exists, is likely to be highly pathological.

In the other direction, Rohlin’s problem is known to have a negative answer for ${{\bf Z}^2}$-systems, by a well-known counterexample of Ledrappier which can be described as follows. One can view a ${{\bf Z}^2}$-system as being essentially equivalent to a stationary process ${(x_{n,m})_{(n,m) \in {\bf Z}^2}}$ of random variables ${x_{n,m}}$ in some range space ${\Omega}$ indexed by ${{\bf Z}^2}$, with ${X}$ being ${\Omega^{{\bf Z}^2}}$ with the obvious shift map

$\displaystyle T_{(g,h)} (x_{n,m})_{(n,m) \in {\bf Z}^2} := (x_{n-g,m-h})_{(n,m) \in {\bf Z}^2}.$

In Ledrappier’s example, the ${x_{n,m}}$ take values in the finite field ${{\bf F}_2}$ of two elements, and are selected at uniformly random subject to the “Pascal’s triangle” linear constraints

$\displaystyle x_{n,m} = x_{n-1,m} + x_{n,m-1}.$

A routine application of the Kolmogorov extension theorem allows one to build such a process. The point is that due to the properties of Pascal’s triangle modulo ${2}$ (known as Sierpinski’s triangle), one has

$\displaystyle x_{n,m} = x_{n-2^k,m} + x_{n,m-2^k}$

for all powers of two ${2^k}$. This is enough to destroy strong ${3}$-mixing, because it shows a strong correlation between ${x}$, ${T_{(2^k,0)} x}$, and ${T_{(0,2^k)} x}$ for arbitrarily large ${k}$ and randomly chosen ${x \in X}$. On the other hand, one can still show that ${x}$ and ${T_g x}$ are asymptotically uncorrelated for large ${g}$, giving strong ${2}$-mixing. Unfortunately, there are significant obstructions to converting Ledrappier’s example from a ${{\bf Z}^2}$-system to a ${{\bf Z}}$-system, as pointed out by de la Rue.

In this post, I would like to record a “finite field” variant of Ledrappier’s construction, in which ${{\bf Z}^2}$ is replaced by the function field ring ${{\bf F}_3[t]}$, which is a “dyadic” (or more precisely, “triadic”) model for the integers (cf. this earlier blog post of mine). In other words:

Theorem 2 There exists a ${{\bf F}_3[t]}$-system that is strongly ${2}$-mixing but not strongly ${3}$-mixing.

The idea is much the same as that of Ledrappier; one builds a stationary ${{\bf F}_3[t]}$-process ${(x_n)_{n \in {\bf F}_3[t]}}$ in which ${x_n \in {\bf F}_3}$ are chosen uniformly at random subject to the constraints

$\displaystyle x_n + x_{n + t^k} + x_{n + 2t^k} = 0 \ \ \ \ \ (1)$

for all ${n \in {\bf F}_3[t]}$ and all ${k \geq 0}$. Again, this system is manifestly not strongly ${3}$-mixing, but can be shown to be strongly ${2}$-mixing; I give details below the fold.

As I discussed in this previous post, in many cases the dyadic model serves as a good guide for the non-dyadic model. However, in this case there is a curious rigidity phenomenon that seems to prevent Ledrappier-type examples from being transferable to the one-dimensional non-dyadic setting; once one restores the Archimedean nature of the underlying group, the constraints (1) not only reinforce each other strongly, but also force so much linearity on the system that one loses the strong mixing property.