Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n) of a Wigner matrix M_n = (\zeta_{ij})_{1 \leq i,j \leq n}, which we define to be a random Hermitian n \times n matrix whose upper-triangular entries \zeta_{ij}, 1 \leq i \leq j \leq n are independent (and whose strictly upper-triangular entries \zeta_{ij}, 1 \leq i < j \leq n are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

  1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries \zeta_{ij}, i<j are complex gaussian, and the diagonal entries \zeta_{ii} are real gaussians;
  2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
  3. The Bernoulli Ensemble, in which all entries take values \pm 1 (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues \lambda_1(M_n),\ldots,\lambda_n(M_n) for GUE is given by the explicit formula

\displaystyle C_n \prod_{1 \leq i<j \leq n} |\lambda_i-\lambda_j|^2 \exp( - \frac{1}{2n} (\lambda_1^2+\ldots+\lambda_n^2))\ d\lambda_1 \ldots d\lambda_n (0)

for some explicitly computable constant C_n on the orthant \{ \lambda_1 \leq \ldots \leq \lambda_n\} (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx (1)

as n \to \infty.  Actually, more precise statements are known for GUE; for instance, for 1 \leq i \leq n, the i^{th} eigenvalue \lambda_i(M_n) is known to equal

\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} ) (2)

with probability 1-o(1), where t(a) \in [-2,2] is the inverse cumulative distribution function of the semicircular law, thus

\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx.

Furthermore, the distribution of the normalised eigenvalue spacing \sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n)) is known; in the bulk region \varepsilon n \leq i \leq 1-\varepsilon n for fixed \varepsilon > 0, it converges as n \to \infty to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say i = n-O(1), the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose M_n is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of O(n^{-1+\delta}) for any \delta>0 rather than O(\log n/n)) and the sine kernel statistics for individual eigenvalue spacings \sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n)) (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, \log |\det M_n| = (1+o(1)) \log \sqrt{n!} with probability 1-o(1)).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)

— The replacement strategy —

Suppose one has a Wigner Hermitian random matrix M_n.  To compute the eigenvalue statistics of M_n, e.g. the expectation {\Bbb E} G( \lambda_i(M_n) ) of some suitably normalised function G() of the i^{th} eigenvalue, the Lindeberg strategy is not to perform this computation directly, but to first compare the relevant statistic {\Bbb E} G( \lambda_i(M_n) ) of M_n with the corresponding statistic {\Bbb E} G( \lambda_i(M'_n) ) of a much nicer random matrix ensemble M'_n, such as GUE.  Since the statistics of the latter have already been computed, one will be done as soon as one establishes an invariance principle {\Bbb E} G( \lambda_i(M_n) ) \approx {\Bbb E} G( \lambda_i(M'_n) ) for the statistic being studied.

After trying a number of implementations of this strategy, Van and I eventually settled on that of replacing the entries of M_n with the corresponding gaussian entries of M'_n one by one (or more precisely, two by two, since we need to keep the matrix Hermitian) and seeing how the eigenvalues evolve.  A key tool here is the Hadamard variation formula

\displaystyle \dot \lambda_i = u_i^* \dot M_n u_i

that describes the rate of change of an eigenvalue \lambda_i in terms of the rate of change \dot M_n of the matrix, and the unit eigenvector u_i of the matrix associated to that eigenvalue.  (See this earlier blog post for further discussion of this and related formulae).  If one assumes that the eigenvectors u_i are delocalised in the sense that their n coefficients have magnitude O( n^{-1/2+o(1)} ) (which is essentially the best possible, given that these magnitudes must square-sum to 1 by Pythagoras’ theorem), this formula suggests that the replacement of one or two entries in a random matrix should only move the eigenvalues by about 1/n.  (This type of delocalisation result was recently established by Erdos, Schlein, and Yau, for a slightly more restrictive class of matrices, but we were able to extend their method to our class.) Since the mean eigenvalue spacing is about 1/\sqrt{n}, this is encouraging for the purposes of establishing the invariance principle.  However, we have to perform about O(n^2) separate replacement operations.  A random walk of n^2 steps of size 1/n would be expected to move a much larger distance than the mean eigenvalue spacing 1/\sqrt{n}, so it looks like this strategy is not going to work.

Fortunately, one can salvage things by performing a Taylor expansion.  Suppose for instance that one is swapping one entry \zeta_{pq} (together with its adjoint \zeta_{qp} = \overline{\zeta_{pq}}) of a random matrix M into another entry\zeta'_{pq} to create another matrix M'.  Thus one can write

\displaystyle M=M(\zeta_{pq}), M' = M(\zeta'_{pq})


\displaystyle M(t) = M(0) + t e_{pq} + \overline{t} e_{qp},

M(0) is the matrix M with the pq and qp entries zeroed out, and e_{pq}, e_{qp} are the basis matrices corresponding to the positions pq, qp.  If we write F(t) := G(\lambda_i(M(t))), we are trying to compare {\Bbb E} F( \zeta_{pq} ) with {\Bbb E} F(\zeta'_{pq} ).  To do this, we can perform a Taylor expansion with remainder:

\displaystyle F(\zeta_{pq}) = F(0) + F'(0) \zeta_{pq} + \frac{1}{r!} F''(0) \zeta_{pq}^2 + \ldots

\displaystyle + \frac{1}{r!} F^{(r)}(0) \zeta_{pq}^r + O( \| F^{(r+1)}\|_{L^\infty} |\zeta_{pq}|^{r+1} ), (3)

where r is a parameter that one can choose (it is ultimately going to be something like 4 in practice).  Technically, the Taylor expansion is a bit messier than this because \zeta_{pq} is complex rather than real (and F is not holomorphic), but let us ignore this technicality for now.

If the moments of \zeta_{pq} and \zeta'_{pq} agree up to r^{th} order, then when one takes expectations, all the terms in (3) except for the error term remain unchanged when swapping \zeta_{pq} into \zeta'_{pq}.  A more advanced version of the Hadamard variation formula can be used to establish (at least heuristically) that F^{(r+1)} is “typically” of size n^{-(r+2)/2+o(1)}, which suggests that one should be able to swap all n^2 entries and still get fine-scale distributional results as long as r is at least 4.

It turns out that (with a certain amount of technical effort, and some tweaks to the above strategy) one can make this plan work, with the upshot that one can replace all the entries in a Wigner matrix without significantly affecting the local statistics, as long as one preserves all moments up to fourth order of the entries.  This would give Theorem 1 in the case where the entries agreed with the GUE entries up to fourth order, but as stated they only agree up to third order.  To drop the moment matching condition by one order, we use the result of Johansson on gaussian divisible matrices, combined with an elementary observation that any random variable that is supported on three or more points can be matched with a gaussian-divisible random variable up to fourth order.  (The Bernoulli random variables, which are supported on only two points, are unfortunately an extremal for the moment matching problem and cannot be matched to fourth order with any other distribution.)