Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local eigenvalue statistics up to the edge“, submitted to Comm. Math. Phys..  This is a sequel to our previous paper, in which we studied universality of local eigenvalue statistics (such as normalised eigenvalue spacings \sqrt{n} ( \lambda_{i+1}(M_n) - \lambda_i(M_n) )) for random matrices M_n of Wigner type, i.e. Hermitian (or symmetric) random matrices in which the upper-triangular entries are independent with mean zero and variance one (for technical reasons we also have to assume an exponential decay condition on the distribution of the entries).   The results in the previous paper were almost entirely focused on the bulk region, in which the index i of the eigenvalues involved was in the range \varepsilon n \leq i \leq (1-\varepsilon) n.  The main purpose of this paper is to extend the main results of the previous paper all the way up to the edge, thus allowing one to control all indices 1 \leq i \leq n.  As an application, we obtain a variant of Soshnikov’s well-known result that the largest eigenvalue of Wigner matrices is distributed (after suitable normalisation) according to the Tracy-Widom law when the coefficients are symmetric, by assuming instead that the coefficients have vanishing third moment.

As one transitions from the bulk to the edge, the density of the eigenvalues decreases to zero (in accordance to the Wigner semicircular law), and so the average spacing between eigenvalues increases.  (For instance, the spacing between eigenvalues in the bulk is of size n^{-1/2}, but at the extreme edge it increases to n^{-1/6}.)  On the one hand, the increase in average spacing should make life easier, because one does not have to work at such a fine spatial scale in order to see the eigenvalue distribution.  On the other hand, a certain key technical step in the previous paper (in which we adapted an argument of Erdos, Schlein, and Yau to show that eigenvectors of Wigner matrices were delocalised) seemed to require eigenvalue spacings to be of size O(n^{-1/2}), which was the main technical obstacle to extending the preceding results from the bulk to the edge.

The main new observation in the paper is that it was not the eigenvalue spacings \lambda_{i+1}(M_n) - \lambda_i(M_n) which were of importance to eigenvalue delocalisation, but rather the somewhat smaller interlaced eigenvalue spacings \lambda_{i+1}(M_n) - \lambda_i(M_{n-1}), where M_{n-1} is a n-1 \times n-1 minor of M_n.  The Cauchy interlacing law asserts that the latter is smaller than the former.  But the interesting thing is that at the edge (when i is close to n), the interlaced spacings are much smaller than the former, and in particular remain of size about O(n^{-1/2}) (up to log factors) even though the non-interlaced spacings increase to be as large as O(n^{-1/6}).  This is ultimately due to a sort of “attractive force” on eigenvalues that draws them towards the origin, and counteracts the usual “eigenvalue repulsion effect”, that pushes eigenvalues away from each other.  This induces “bias” for eigenvalues to move in towards the bulk rescues the delocalization result, and the remainder of the arguments in our previous paper then continue with few changes.

Below the fold I wish to give some heuristic justification of the interlacing bias phenomenon, sketch why this is relevant for eigenvector delocalisation, and finally to recall why eigenvalue delocalisation in turn is relevant for universality.

[Update, Aug 16: sign error corrected.]

– Interlacing bias –

Let M_n be an n \times n Hermitian matrix, and let M_{n-1} be a n-1 \times n-1 symmetric minor, say the upper left minor for concreteness.  The i^{th} largest eigenvalue \lambda_i(M_n) of M_n can be given by the minimax formula

\lambda_i(M_n) = \sup_{V_i} \inf_{u \in V_i: \|u\|=1} u^* M_n u

where V_i ranges over all i-dimensional subspaces of {\Bbb C}^n.  The formula for \lambda_i(M_{n-1}) is similar, but V_i is now constrained to lie in the hyperplane {\Bbb C}^{n-1} of {\Bbb C}^n.  Comparing the two formulae (and noting that any i+1-dimensional subspace of {\Bbb C}^n intersects {\Bbb C}^{n-1} in a space of dimension at least i) one is led to the Cauchy interlacing inequality

\lambda_i(M_n) \leq \lambda_i(M_{n-1}) \leq \lambda_{i+1}(M_n) (1)

or equivalently

\lambda_i(M_{n-1}) \leq \lambda_{i+1}(M_n) \leq \lambda_{i+1}(M_{n-1}). (2)

Thus, the n-1 eigenvalues of M_{n-1} intersperse (or interlace) the n eigenvalues of M_n.

One can then ask the question (for various matrix models of M_n) how \lambda_i(M_{n-1}) is distributed in the interval [\lambda_i(M_n), \lambda_{i+1}(M_n)], or how \lambda_{i+1}(M_n) is distributed in the interval [\lambda_i(M_{n-1}), \lambda_{i+1}(M_{n-1})].

The answer to the first question is rather interesting: if the eigenvalues of M_n are fixed, and the eigenvectors of M_n are chosen uniformly at random (or equivalently, if one conjugates M_n by a randomly chosen unitary matrix), then the eigenvalue \lambda_i(M_{n-1}) turns out to be distributed according to a certain piecewise polynomial distribution (connected with Gelfand-Tsetlin patterns).  The situation is easiest to describe when n=2, in which \lambda_1(M_1) is uniformly distributed in the interval [\lambda_1(M_2), \lambda_2(M_2)].  Amusingly, this fact is equivalent to Archimedes’ famous result (inscribed on his tombstone) on the equivalence of areas between a sphere and a cylinder, and is an instructive exercise.

However, this type of analysis does not seem to lend itself well to general matrix ensembles, which are not invariant under unitary conjugation.  An alternate approach proceeds by viewing M_n as the matrix M_{n-1} with an n-1-dimensional column vector X and its transpose X^* attached, together with a final coordinate x_{nn}, like so:

M_n = \begin{pmatrix} M_{n-1} & X \\ X^* & x_{nn} \end{pmatrix}.

One can compute the characteristic polynomial \det(M_n - \lambda I) of M_n in terms of x_{nn}, X, and the eigenvalues \lambda_i(M_{n-1}) and unit eigenvectors u_i(M_{n-1}) of M_{n-1} to be given by the formula

\prod_{j=1}^{n-1} (\lambda_j(M_{n-1})-\lambda) [ x_{nn}-\lambda-\sum_{j=1}^{n-1} \frac{|u_j(M_{n-1})^* X|^2}{\lambda_j(M_{n-1}) - \lambda} ]

which implies that \lambda = \lambda_i(M_n) is the unique solution in the interval [\lambda_{i-1}(M_{n-1}), \lambda_i(M_{n-1})] to the equation

\sum_{j=1}^{n-1} \frac{|u_j(M_{n-1})^* X|^2}{\lambda - \lambda_j(M_{n-1})} = \lambda - x_{nn}. (3)

(Note that the LHS is increasing in \lambda, while the RHS is decreasing, and the LHS also has poles at each eigenvalue of M_{n-1}; this establishes the uniqueness and also provides an alternate proof of the Cauchy interlacing law.)

I like to interpret the equation (3) as describing the balance between various “forces” acting on the eigenvalue \lambda = \lambda_i(M_n).  All the eigenvalues \lambda_j(M_{n-1}) to the left of \lambda (i.e. for j<i) exert a rightward force on \lambda that is inversely proportional to the distance; meanwhile, the eigenvalues on the right exert a leftward force; thus the eigenvalues of M_{n-1} “repel” the eigenvalues of M_n (cf. the eigenvalue repulsion discussed in this previous blog post).  On the other hand, the \lambda term on the right-hand side of (3) can be viewed as an attractive force towards the origin which becomes quite significant at the edge.  (The x_{nn} term tends to be insignificant in applications.)  This attractive force (which is of magnitude about \sqrt{n} at the edge) needs to be counterbalanced by repulsion from eigenvalues nearer the bulk, and one can show (under reasonable hypotheses on the matrix ensemble) that the only way this can occur is if \lambda is within O(1/\sqrt{n}) or so of an eigenvalue of M_{n-1} nearer the bulk (the numerators |u_j(M_{n-1})^* X|^2 have size about 1 on the average, as can be formalised by Berry-Esseen-type variants of the central limit theorem).  This is basically the source of the bias in the interlacing law.

(Another way to see why the interlacing gap at the edge must remain of size n^{-1/2} rather than n^{-1/6} is to consider the asymptotic \lambda_n(M_n) \sim 2 \sqrt{n}, which suggests a spacing of about n^{-1/2}, on the average at least, for the increase \lambda_n(M_n) - \lambda_{n-1}(M_{n-1}) in the largest eigenvalue.)

– Eigenvector delocalisation –

The closeness between the eigenvalues of M_n and those of M_{n-1} impacts the eigenvectors of M_n.  Indeed, suppose that u =\begin{pmatrix} u' \\ u_n \end{pmatrix} was a unit eigenvector of M_n with eigenvalue \lambda, thus

\begin{pmatrix} M_{n-1}-\lambda I & X \\ X^* & x_{nn} - \lambda \end{pmatrix} \begin{pmatrix} u' \\ u_n \end{pmatrix} = 0.

In particular

(M_{n-1} - \lambda I) u' = - X u_n.

If \lambda is close to an eigenvalue of M_{n-1}, this makes the minor M_{n-1}-\lambda I ill-conditioned.  This allows the vector u' to be quite large compared to u_n, which (since u is normalised to be of unit size) makes u_n small.  Crunching the numbers, one can show that an eigenvalue gap of size O(1/\sqrt{n}) translates (using the random nature of X) to a bound of about O(1/\sqrt{n}) on u_n (ignoring logs).  By symmetry, we get a similar bound for the other coefficients of u, thus the coefficients of u are extremely uniform in magnitude (i.e. the eigenvector is delocalised).

– Why delocalisation is needed –

The delocalisation of eigenvectors is an essential ingredient in our Lindeberg-style proof of universality, in which we replace the coefficients of M_n with a better understood distribution (e.g. a gaussian distribution) one at a time.  In order to keep the errors under control, it is necessary that the effect of any given coefficient of M_n on an eigenvalue \lambda_i is significantly less than the mean spacing between eigenvalues.  On the other hand, if one modifies the (p,q) coefficient x_{pq} of M_n by O(1), then the Hadamard variation formula (see this previous blog post) suggests that the eigenvalue \lambda_i should move by about O( |u_{i,p}| |u_{i,q}| ), where u_{i,p} is the p^th coordinate of the unit eigenvector u_i corresponding to \lambda_i.  With delocalisation, these coefficients have size about O(1/\sqrt{n}), leading to a total movement of O(1/n), which is indeed smaller than the mean spacing (n^{-1/2} in the bulk, n^{-1/6} at the extreme edge).

[Note there is a bit of room here; one could accept a somewhat weaker delocalisation result, conceding some powers of n, but one would need more moment assumptions on the distribution to compensate for this (one has to perform a higher order Taylor expansion before the net error terms become acceptable again.]