Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local eigenvalue statistics up to the edge“, submitted to Comm. Math. Phys..  This is a sequel to our previous paper, in which we studied universality of local eigenvalue statistics (such as normalised eigenvalue spacings $\sqrt{n} ( \lambda_{i+1}(M_n) - \lambda_i(M_n) )$) for random matrices $M_n$ of Wigner type, i.e. Hermitian (or symmetric) random matrices in which the upper-triangular entries are independent with mean zero and variance one (for technical reasons we also have to assume an exponential decay condition on the distribution of the entries).   The results in the previous paper were almost entirely focused on the bulk region, in which the index i of the eigenvalues involved was in the range $\varepsilon n \leq i \leq (1-\varepsilon) n$.  The main purpose of this paper is to extend the main results of the previous paper all the way up to the edge, thus allowing one to control all indices $1 \leq i \leq n$.  As an application, we obtain a variant of Soshnikov’s well-known result that the largest eigenvalue of Wigner matrices is distributed (after suitable normalisation) according to the Tracy-Widom law when the coefficients are symmetric, by assuming instead that the coefficients have vanishing third moment.

As one transitions from the bulk to the edge, the density of the eigenvalues decreases to zero (in accordance to the Wigner semicircular law), and so the average spacing between eigenvalues increases.  (For instance, the spacing between eigenvalues in the bulk is of size $n^{-1/2}$, but at the extreme edge it increases to $n^{-1/6}$.)  On the one hand, the increase in average spacing should make life easier, because one does not have to work at such a fine spatial scale in order to see the eigenvalue distribution.  On the other hand, a certain key technical step in the previous paper (in which we adapted an argument of Erdos, Schlein, and Yau to show that eigenvectors of Wigner matrices were delocalised) seemed to require eigenvalue spacings to be of size $O(n^{-1/2})$, which was the main technical obstacle to extending the preceding results from the bulk to the edge.

The main new observation in the paper is that it was not the eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_n)$ which were of importance to eigenvalue delocalisation, but rather the somewhat smaller interlaced eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_{n-1})$, where $M_{n-1}$ is a $n-1 \times n-1$ minor of $M_n$.  The Cauchy interlacing law asserts that the latter is smaller than the former.  But the interesting thing is that at the edge (when i is close to n), the interlaced spacings are much smaller than the former, and in particular remain of size about $O(n^{-1/2})$ (up to log factors) even though the non-interlaced spacings increase to be as large as $O(n^{-1/6})$.  This is ultimately due to a sort of “attractive force” on eigenvalues that draws them towards the origin, and counteracts the usual “eigenvalue repulsion effect”, that pushes eigenvalues away from each other.  This induces “bias” for eigenvalues to move in towards the bulk rescues the delocalization result, and the remainder of the arguments in our previous paper then continue with few changes.

Below the fold I wish to give some heuristic justification of the interlacing bias phenomenon, sketch why this is relevant for eigenvector delocalisation, and finally to recall why eigenvalue delocalisation in turn is relevant for universality.

[Update, Aug 16: sign error corrected.]

– Interlacing bias –

Let $M_n$ be an $n \times n$ Hermitian matrix, and let $M_{n-1}$ be a $n-1 \times n-1$ symmetric minor, say the upper left minor for concreteness.  The $i^{th}$ largest eigenvalue $\lambda_i(M_n)$ of $M_n$ can be given by the minimax formula

$\lambda_i(M_n) = \sup_{V_i} \inf_{u \in V_i: \|u\|=1} u^* M_n u$

where $V_i$ ranges over all $i$-dimensional subspaces of ${\Bbb C}^n$.  The formula for $\lambda_i(M_{n-1})$ is similar, but $V_i$ is now constrained to lie in the hyperplane ${\Bbb C}^{n-1}$ of ${\Bbb C}^n$.  Comparing the two formulae (and noting that any i+1-dimensional subspace of ${\Bbb C}^n$ intersects ${\Bbb C}^{n-1}$ in a space of dimension at least i) one is led to the Cauchy interlacing inequality

$\lambda_i(M_n) \leq \lambda_i(M_{n-1}) \leq \lambda_{i+1}(M_n)$ (1)

or equivalently

$\lambda_i(M_{n-1}) \leq \lambda_{i+1}(M_n) \leq \lambda_{i+1}(M_{n-1})$. (2)

Thus, the $n-1$ eigenvalues of $M_{n-1}$ intersperse (or interlace) the $n$ eigenvalues of $M_n$.

One can then ask the question (for various matrix models of $M_n$) how $\lambda_i(M_{n-1})$ is distributed in the interval $[\lambda_i(M_n), \lambda_{i+1}(M_n)]$, or how $\lambda_{i+1}(M_n)$ is distributed in the interval $[\lambda_i(M_{n-1}), \lambda_{i+1}(M_{n-1})]$.

The answer to the first question is rather interesting: if the eigenvalues of $M_n$ are fixed, and the eigenvectors of $M_n$ are chosen uniformly at random (or equivalently, if one conjugates $M_n$ by a randomly chosen unitary matrix), then the eigenvalue $\lambda_i(M_{n-1})$ turns out to be distributed according to a certain piecewise polynomial distribution (connected with Gelfand-Tsetlin patterns).  The situation is easiest to describe when n=2, in which $\lambda_1(M_1)$ is uniformly distributed in the interval $[\lambda_1(M_2), \lambda_2(M_2)]$.  Amusingly, this fact is equivalent to Archimedes’ famous result (inscribed on his tombstone) on the equivalence of areas between a sphere and a cylinder, and is an instructive exercise.

However, this type of analysis does not seem to lend itself well to general matrix ensembles, which are not invariant under unitary conjugation.  An alternate approach proceeds by viewing $M_n$ as the matrix $M_{n-1}$ with an $n-1$-dimensional column vector $X$ and its transpose $X^*$ attached, together with a final coordinate $x_{nn}$, like so:

$M_n = \begin{pmatrix} M_{n-1} & X \\ X^* & x_{nn} \end{pmatrix}.$

One can compute the characteristic polynomial $\det(M_n - \lambda I)$ of $M_n$ in terms of $x_{nn}, X$, and the eigenvalues $\lambda_i(M_{n-1})$ and unit eigenvectors $u_i(M_{n-1})$ of $M_{n-1}$ to be given by the formula

$\prod_{j=1}^{n-1} (\lambda_j(M_{n-1})-\lambda) [ x_{nn}-\lambda-\sum_{j=1}^{n-1} \frac{|u_j(M_{n-1})^* X|^2}{\lambda_j(M_{n-1}) - \lambda} ]$

which implies that $\lambda = \lambda_i(M_n)$ is the unique solution in the interval $[\lambda_{i-1}(M_{n-1}), \lambda_i(M_{n-1})]$ to the equation

$\sum_{j=1}^{n-1} \frac{|u_j(M_{n-1})^* X|^2}{\lambda - \lambda_j(M_{n-1})} = \lambda - x_{nn}$. (3)

(Note that the LHS is increasing in $\lambda$, while the RHS is decreasing, and the LHS also has poles at each eigenvalue of $M_{n-1}$; this establishes the uniqueness and also provides an alternate proof of the Cauchy interlacing law.)

I like to interpret the equation (3) as describing the balance between various “forces” acting on the eigenvalue $\lambda = \lambda_i(M_n)$.  All the eigenvalues $\lambda_j(M_{n-1})$ to the left of $\lambda$ (i.e. for $j) exert a rightward force on $\lambda$ that is inversely proportional to the distance; meanwhile, the eigenvalues on the right exert a leftward force; thus the eigenvalues of $M_{n-1}$ “repel” the eigenvalues of $M_n$ (cf. the eigenvalue repulsion discussed in this previous blog post).  On the other hand, the $\lambda$ term on the right-hand side of (3) can be viewed as an attractive force towards the origin which becomes quite significant at the edge.  (The $x_{nn}$ term tends to be insignificant in applications.)  This attractive force (which is of magnitude about $\sqrt{n}$ at the edge) needs to be counterbalanced by repulsion from eigenvalues nearer the bulk, and one can show (under reasonable hypotheses on the matrix ensemble) that the only way this can occur is if $\lambda$ is within $O(1/\sqrt{n})$ or so of an eigenvalue of $M_{n-1}$ nearer the bulk (the numerators $|u_j(M_{n-1})^* X|^2$ have size about 1 on the average, as can be formalised by Berry-Esseen-type variants of the central limit theorem).  This is basically the source of the bias in the interlacing law.

(Another way to see why the interlacing gap at the edge must remain of size $n^{-1/2}$ rather than $n^{-1/6}$ is to consider the asymptotic $\lambda_n(M_n) \sim 2 \sqrt{n}$, which suggests a spacing of about $n^{-1/2}$, on the average at least, for the increase $\lambda_n(M_n) - \lambda_{n-1}(M_{n-1})$ in the largest eigenvalue.)

– Eigenvector delocalisation –

The closeness between the eigenvalues of $M_n$ and those of $M_{n-1}$ impacts the eigenvectors of $M_n$.  Indeed, suppose that $u =\begin{pmatrix} u' \\ u_n \end{pmatrix}$ was a unit eigenvector of $M_n$ with eigenvalue $\lambda$, thus

$\begin{pmatrix} M_{n-1}-\lambda I & X \\ X^* & x_{nn} - \lambda \end{pmatrix} \begin{pmatrix} u' \\ u_n \end{pmatrix} = 0.$

In particular

$(M_{n-1} - \lambda I) u' = - X u_n.$

If $\lambda$ is close to an eigenvalue of $M_{n-1}$, this makes the minor $M_{n-1}-\lambda I$ ill-conditioned.  This allows the vector $u'$ to be quite large compared to $u_n$, which (since u is normalised to be of unit size) makes $u_n$ small.  Crunching the numbers, one can show that an eigenvalue gap of size $O(1/\sqrt{n})$ translates (using the random nature of X) to a bound of about $O(1/\sqrt{n})$ on $u_n$ (ignoring logs).  By symmetry, we get a similar bound for the other coefficients of u, thus the coefficients of u are extremely uniform in magnitude (i.e. the eigenvector is delocalised).

– Why delocalisation is needed –

The delocalisation of eigenvectors is an essential ingredient in our Lindeberg-style proof of universality, in which we replace the coefficients of $M_n$ with a better understood distribution (e.g. a gaussian distribution) one at a time.  In order to keep the errors under control, it is necessary that the effect of any given coefficient of $M_n$ on an eigenvalue $\lambda_i$ is significantly less than the mean spacing between eigenvalues.  On the other hand, if one modifies the (p,q) coefficient $x_{pq}$ of $M_n$ by O(1), then the Hadamard variation formula (see this previous blog post) suggests that the eigenvalue $\lambda_i$ should move by about $O( |u_{i,p}| |u_{i,q}| )$, where $u_{i,p}$ is the p^th coordinate of the unit eigenvector $u_i$ corresponding to $\lambda_i$.  With delocalisation, these coefficients have size about $O(1/\sqrt{n})$, leading to a total movement of $O(1/n)$, which is indeed smaller than the mean spacing ($n^{-1/2}$ in the bulk, $n^{-1/6}$ at the extreme edge).

[Note there is a bit of room here; one could accept a somewhat weaker delocalisation result, conceding some powers of n, but one would need more moment assumptions on the distribution to compensate for this (one has to perform a higher order Taylor expansion before the net error terms become acceptable again.]