You are currently browsing the tag archive for the ‘Van Vu’ tag.

Van Vu and I have just uploaded to the arXiv our paper “The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices“, submitted to the Proceedings of the National Academy of Sciences. This short note concerns the convergence of the ${k}$-point correlation functions of Wigner matrices in the bulk to the Dyson ${k}$-point functions, a statement conjectured by Wigner, Dyson, and Mehta. Thanks to the results of Erdös, Peche, Ramirez, Schlein, Vu, Yau, and myself, this conjecture has now been established for all Wigner matrices (assuming a finite moment condition on the entries), but only if one uses a quite weak notion of convergence, namely averaged vague convergence in which one averages in the energy parameter ${u}$. The main purpose of this note is to observe that by combining together existing results in the literature, one can improve the convergence to vague convergence (which is the natural notion of convergence in the discrete setting); and furthermore, if one assumes some regularity and decay conditions on the coefficient distribution, one can improve the convergence further to local ${L^1}$ convergence.

More precisely, let ${M_n}$ be an ${n \times n}$ Wigner matrix – a random Hermitian matrix whose off-diagonal elements ${\frac{1}{\sqrt{n}} \zeta_{ij}}$ for ${1 \leq i < j \leq n}$ are iid with mean zero and variance ${1/n}$ (and whose diagonal elements also obey similar hypotheses, which we omit here). For simplicity, we also assume that the real and imaginary parts of ${\zeta_{ij}}$ are also iid (as is the case for instance for the Gaussian Unitary Ensemble (GUE)). The eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of such a matrix are known to be asymptotically distributed accordingly to the Wigner semicircular distribution ${\rho_{sc}(u)\ du}$, where

$\displaystyle \rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)_+^{1/2}.$

In particular, this suggests that at any energy level ${u}$ in the bulk ${(-2,2)}$ of the spectrum, the average eigenvalue spacing should be about ${\frac{1}{n \rho_{sc}(u)}}$. It is then natural to introduce the normalised ${k}$-point correlation function

$\displaystyle \rho_{n,u}^{(k)}(t_1,\ldots,t_k) := \lim_{\epsilon \rightarrow 0} \frac{1}{\epsilon^k} {\bf P} E_\epsilon$

for any distinct reals ${t_1,\ldots,t_k}$ and ${k \geq 1}$, where ${E_\epsilon}$ is the event that there is an eigenvalue in each of the intervals ${[u + \frac{t_i}{n \rho_{sc}(u)}, u + \frac{t_i+\epsilon}{n \rho_{sc}(u)}]}$ for each ${1 \leq i \leq k}$. (This definition is valid when the Wigner ensemble is continuous; for discrete ensembles, one can define ${\rho_{n,u}^{(k)}}$ instead in a distributional sense.)

The Wigner-Dyson-Mehta conjecture asserts that ${\rho_{n,u}^{(k)}}$ converges (in various senses) as ${n \rightarrow \infty}$ to the Dyson ${k}$-point function

$\displaystyle \rho_{Dyson}^{(k)}(t_1,\ldots,t_k) := \hbox{det}( K( t_i,t_j) )_{1 \leq i,j \leq k}$

where ${K(t,t'):=\frac{\sin \pi(t-t')}{\pi(t-t')}}$ is the Dyson sine kernel. This conjecture was verified first for the GUE (with a quite strong notion of convergence, namely local uniform convergence) by Dyson, using an explicit formula for ${\rho_{n,u}^{(k)}}$ in the GUE case due to Gaudin and Mehta. Later results of Johansson, Erdos-Ramirez-Schlein-Yau, Erdos-Peche-Ramirez-Schlein-Yau, and Vu and myself, extended these results to increasingly wider ranges of Wigner matrices, but in the context of either weak convergence (which means that

$\displaystyle \int_{{\bf R}^k} \rho_{n,u}^{(k)}(t) F(t)\ dt \rightarrow \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt \ \ \ \ \ (1)$

for any ${L^\infty}$, compactly supported function ${F}$), or the slightly weaker notion of vague convergence (which is the same as weak convergence, except that the function ${F}$ is also required to be continuous).

In a joint paper of Erdos, Ramirez, Schlein, Vu, Yau, and myself, we established the Wigner-Dyson-Mehta conjecture for all Wigner matrices (assuming only an exponential decay condition on the entries), but using a quite weak notion of convergence, namely averaged vague convergence, which allows for averaging in the energy parameter. Specifically, we showed that

$\displaystyle \lim_{b \rightarrow 0} \lim_{n \rightarrow \infty} \frac{1}{2b} \int_{u-b}^{u+b} \int_{{\bf R}^k} \rho_{n,u'}^{(k)}(t) F(t)\ dt = \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt.$

Subsequently, Erdos, Schlein, and Yau introduced the powerful local relaxation flow method, which achieved a simpler proof of the same result which also generalised to other ensembles beyond the Wigner case. However, for technical reasons, this method was restricted to establishing averaged vague convergence only.

In the current paper, we show that by combining the argument of Erdos, Ramirez, Schlein, Vu, Yau, and myself with some more recent technical results, namely the relaxation of the exponential decay condition in the four moment theorem to a finite moment condition (established by Vu and myself) and a strong eigenvalue localisation bound of Erdos, Yau, and Yin, one can upgrade the averaged vague convergence to vague convergence, and handle all Wigner matrices that assume a finite moment condition. Vague convergence is the most natural notion of convergence for discrete random matrix ensembles; for such ensembles, the correlation function is a discrete measure, and so one does not expect convergence to a continuous limit in any stronger sense than the vague sense. Also, by carefully inspecting the earlier argument of Erdos, Peche, Ramirez, Schlein, and Yau, we were able to establish convergence in the stronger local ${L^1}$ sense once one assumed some regularity and positivity condition on the underlying coefficient distribution. These are somewhat modest and technical improvements over previous work on the Wigner-Dyson-Mehta conjecture, but they help to clarify and organise the profusion of results in this area, which are now reaching a fairly definitive form.

It may well be possible to go beyond local ${L^1}$ convergence in the case of smooth ensembles, for instance establishing local uniform convergence; this was recently accomplished in the ${k=1}$ case by Maltsev and Schlein. Indeed one may optimistically expect to even have convergence in the local smooth topology, which would basically be the strongest convergence one could hope for.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

$\displaystyle \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)$

of a Wigner random matrix ${M_n}$. More specifically, we consider ${n \times n}$ Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal ${\pm 1}$ on the diagonal, and ${\pm \frac{1}{2} \pm \frac{i}{2}}$ off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

$\displaystyle \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx$

(in the vague topology) where ${\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}}$ is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region ${\epsilon n \leq i \leq (1-\epsilon) n}$, one almost surely has

$\displaystyle \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)$

uniformly for in ${i}$, where the classical location ${\gamma_i \in [-2,2]}$ of the (normalised) ${i^{th}}$ eigenvalue ${\frac{1}{\sqrt{n}} \lambda_i}$ is defined by the formula

$\displaystyle \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.$

The bound (1) also holds in the edge case (by using the operator norm bound ${\| M_n\|_{op} = (2+o(1)) \sqrt{n}}$, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of ${\sqrt{n}}$. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of ${\sqrt{n}/n = n^{-1/2}}$. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing ${\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))}$ in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale ${n^{-1/2}}$ (and even at the slightly finer scale of ${n^{-1/2-c}}$ for some absolute constant ${c>0}$) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of ${n^{-1/2+\epsilon}}$ depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing ${\lambda_{i+1}(M_n) - \lambda_i(M_n)}$ at the critical scale ${n^{-1/2}}$, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale ${n^{-1/2}}$, the distribution of eigenvalues ${\lambda_i(M_n)}$ is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

$\displaystyle \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.$

A standard moment method computation shows that the right hand side is equal to

$\displaystyle 2n^3 + 2a n^2 + \ldots$

where ${a}$ is the fourth moment of the real part of the off-diagonal coefficients of ${M_n}$. In particular, a change in the fourth moment ${a}$ by ${O(1)}$ leads to a change in the expression ${\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)}$ by ${O(n^2)}$. Thus, for a typical ${i}$, one expects ${{\bf E}(\lambda_i(M_n)^4)}$ to shift by ${O(n)}$; since ${\lambda_i(M_n) = O(\sqrt{n})}$ on the average, we thus expect ${\lambda_i(M_n)}$ itself to shift by about ${O(n^{-1/2})}$ by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for ${\lambda_i(M_n)}$ that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that ${\lambda_i(M_n)}$ was sharply concentrated around an interval of size ${O(n^\epsilon)}$ around ${\sqrt{n} \gamma_i}$ for any ${\epsilon > 0}$ (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that ${\lambda_i(M_n)}$ was also weakly concentrated around an interval of size ${O(n^{-1/2+\epsilon})}$ around ${\sqrt{n} \gamma_i}$, in the sense that the probability that one was outside this interval was ${O(n^{-c})}$ for some absolute constant ${c>0}$. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of ${\lambda_i(M_n)}$ as ${i}$ varied from ${1}$ to ${n}$ was of the size of ${O(n^{-c})}$ for some absolute ${c>0}$.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual ${\lambda_i(M_n)}$ is ${O(n^{-c})}$ (without averaging in ${i}$). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean ${{\bf E} \lambda_i(M_n)}$ is indeed sensitive to the fourth moment of the entries at the critical scale ${n^{-1/2}}$.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue ${\lambda_i(M_n)}$ react to the available technology. To control the global behaviour of the eigenvalues (after averaging in ${i}$), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of ${\lambda_i(M_n)}$ to an interval of size ${O(n^{-1/2+\epsilon})}$, but can only localise the mean to a much larger interval of size ${O(n^{-c})}$. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as ${O(n^{-c})}$ for which all eigenvalues could deviate as far as ${O(n^\epsilon)}$ from their expected location, instead of their typical deviation of ${O(n^{-1/2})}$. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than ${n^{-c}}$ times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape ${O(n^{-c})}$. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Van Vu and I have just uploaded to the arXiv our joint paper “The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi“. In this short paper we give a different proof of a high-dimensional Littlewood-Offord result of Frankl and Füredi, and in the process also affirmatively answer one of their open problems.

Let ${v_1,\ldots,v_n}$ be ${n}$ vectors in ${{\mathbb R}^d}$, which we normalise to all have length at least ${1}$. For any given radius ${\Delta > 0}$, we consider the small ball probability

$\displaystyle p(v_1,\ldots,v_n,\Delta) := \sup_B {\bf P}( \eta_1 v_1 + \ldots + \eta_n v_n \in B )$

where ${\eta_1,\ldots,\eta_n}$ are iid Bernoulli signs (i.e. they take values ${+1}$ or ${-1}$ independently with a probability of ${1/2}$ of each), and ${B}$ ranges over all (closed) balls of radius ${\Delta}$. The Littlewood-Offord problem is to compute the quantity

$\displaystyle p_d(n,\Delta) := \sup_{v_1,\ldots,v_n} p(v_1,\ldots,v_n,\Delta)$

where ${v_1,\ldots,v_n}$ range over all vectors in ${{\mathbb R}^d}$ of length at least one. Informally, this number measures the extent to which a random walk of length ${n}$ (with all steps of size at least one) can concentrate into a ball of radius ${\Delta}$.

The one-dimensional case of this problem was answered by Erdös. First, one observes that one can normalise all the ${v_i}$ to be at least ${+1}$ (as opposed to being at most ${-1}$). In the model case when ${\Delta < 1}$, he made the following simple observation: if a random sum ${\eta_1 v_1 + \ldots + \eta_n v_n}$ fell into a ball of radius ${\Delta}$ (which in the one-dimensional case, is an interval of length less than ${2}$), and one then changed one or more of the signs ${\eta_i}$ from ${-1}$ to ${+1}$, then the new sum must necessarily lie outside of the ball. In other words, for any ball ${B}$ of radius ${\Delta}$, the set of signs ${(\eta_1,\ldots,\eta_n) \in \{-1,+1\}^n}$ for which ${\eta_1 v_1 + \ldots + \eta_n v_n \in B}$ forms an antichain. Applying Sperner’s theorem, the maximal size of this antichain is ${\binom{n}{\lfloor n/2\rfloor}}$, and this soon leads to the exact value

$\displaystyle p_1(n,\Delta) = \binom{n}{\lfloor n/2\rfloor}/2^n = \frac{\sqrt{\frac{2}{\pi}}+o(1)}{\sqrt{n}}$

when ${0 \leq \Delta < 1}$ (the bound is attained in the extreme case ${v_1=\ldots=v_n=1}$).

A similar argument works for higher values of ${\Delta}$, using Dilworth’s theorem instead of Sperner’s theorem, and gives the exact value

$\displaystyle p_1(n,\Delta) = \sum_{j=1}^s \binom{n}{m_j}/2^n = \frac{s\sqrt{\frac{2}{\pi}}+o(1)}{\sqrt{n}}$

whenever ${n \geq s}$ and ${s-1 \leq \Delta < s}$ for some natural number ${s}$, where ${\binom{n}{m_1},\ldots,\binom{n}{m_s}}$ are the ${s}$ largest binomial coefficients of ${\binom{n}{1}, \ldots, \binom{n}{n}}$.

Now consider the higher-dimensional problem. One has the obvious bound

$\displaystyle p_d(n,\Delta) \geq p_1(n,\Delta),$

but it is not obvious whether this inequality is strict. In other words, is there some way to exploit the additional freedom given by higher dimensions to make random walks concentrate more than in the one-dimensional case?

For some values of ${\Delta}$, it turns out that the answer is no, as was first observed by Kleitman (and discussed further by Frankl and Füredi). Suppose for instance that

$\displaystyle \sqrt{(s-1)^2+1} \leq \Delta < s$

for some ${s \geq 2}$. Then one can consider the example in which ${v_1=\ldots=v_{n-1}=e_1}$ is one unit vector, and ${v_n=e_2}$ is another unit vector orthogonal to ${e_1}$. The small ball probability in this case can be computed to equal ${p_1(n-1,s-1)}$ rather than ${p_1(n,s-1)}$, which is slightly larger.

In the positive direction, Frankl and Füredi established the asymptotic

$\displaystyle p_d(n,\Delta) = (1 + o(1)) p_1(n,\Delta) \ \ \ \ \ (1)$

as ${n \rightarrow \infty}$ (holding ${d}$ and ${\Delta}$ fixed). Furthermore, if ${\Delta}$ was close to an integer, and more precisely if

$\displaystyle s-1 \leq \Delta < s-1 + \frac{1}{10s^2}$

(so that the above counterexample can be avoided) they showed that ${p_d(n,\Delta) = p_1(n,\Delta)}$ for sufficiently large ${n}$ (depending on ${s,\Delta}$).

The factor ${\frac{1}{10s^2}}$ was an artefact of their method, and they conjectured in fact that one should have ${p_d(n,\Delta) = p_1(n,\Delta)}$ for sufficiently large ${n}$ whenever

$\displaystyle s-1 \leq \Delta < \sqrt{(s-1)^2+1} \ \ \ \ \ (2)$

thus matching the counterexample exactly. This conjecture was verified for ${s = 1}$ by Kleitman and for ${s=2,3}$ by Frankl and Füredi.

In this paper we verify the conjecture of Frankl and Füredi (and give a new proof of their asymptotic (1)). Our main tool is the following high-dimensional Littlewood-Offord inequality:

Theorem 1 Suppose that ${v_1,\ldots,v_n \in {\mathbb R}^d}$ which is genuinely ${d}$-dimensional in the sense that for any hyperplane ${H}$ going through the origin, one has ${\hbox{dist}(v_i,H) \geq 1}$ for at least ${k}$ values of ${i}$. Then one has

$\displaystyle p(v_1,\ldots,v_n,\Delta) \ll_{d,\Delta} k^{-d/2}.$

Theorem 1 can be viewed as a high-dimensional variant of Erdös’s inequality (but without the sharp upper bound). It is proven by the Fourier-analytic method of Halász. (This theorem was announced in my book with Van Vu several years ago, but we did not get around to publishing it until now.)

Using Theorem 1, one can verify the conjecture of Frankl and Füredi fairly quickly (the deduction takes a little over a page). The main point is that if there is excessive concentration, then Theorem 1 quickly places almost all of the vectors ${v_1,\ldots,v_n}$ to lie very close to a line. If all the vectors are close to a line, then we can project onto this line and rescale, which causes ${\Delta}$ to worsen a little bit in this reduction to the one-dimensional case, but it turns out that the bounds (2) allow us to tolerate this degradation of ${\Delta}$ once ${s>3}$ (so it is fortunate that the cases ${s \leq 3}$ were already done for us!). If instead we have a vector far from the line (as is the case in the key counterexample), then we manually eliminate that vector using the parallelogram law, which effectively drops ${\Delta}$ below ${s-1}$ (half of the time, at least) if ${\Delta}$ was initially less than ${\sqrt{(s-1)^2+1}}$, which gives enough of a saving to conclude the argument.

One moral that one can draw from this argument is that one can use a quasi-sharp estimate (such as Theorem 1), which ostensibly loses constant factors, to then deduce a sharp estimate (such as the Frankl-Furëdi conjecture) that loses no constant factors, as long as one is in an asymptotic regime (in this case, ${s \geq 3}$ and ${n}$ large depending on ${d,\Delta}$). The key is to exploit the fine structure in the main term (in this case, the piecewise constant nature of ${p_1(n,\Delta)}$ when ${\Delta}$ passes over integers) to extract gains that can absorb the losses coming from the quasi-sharp estimate).

Van Vu and I have just uploaded to the arXiv our paper “Random covariance matrices: Universality of local statistics of eigenvalues“, to be submitted shortly. This paper draws heavily on the technology of our previous paper, in which we established a Four Moment Theorem for the local spacing statistics of eigenvalues of Wigner matrices. This theorem says, roughly speaking, that these statistics are completely determined by the first four moments of the coefficients of such matrices, at least in the bulk of the spectrum. (In a subsequent paper we extended the Four Moment Theorem to the edge of the spectrum.)

In this paper, we establish the analogous result for the singular values of rectangular iid matrices ${M = M_{n,p}}$, or (equivalently) the eigenvalues of the associated covariance matrix ${\frac{1}{n} M M^*}$. As is well-known, there is a parallel theory between the spectral theory of random Wigner matrices and those of covariance matrices; for instance, just as the former has asymptotic spectral distribution governed by the semi-circular law, the latter has asymptotic spectral distribution governed by the Marcenko-Pastur law. One reason for the connection can be seen by noting that the singular values of a rectangular matrix ${M}$ are essentially the same thing as the eigenvalues of the augmented matrix

$\displaystyle \begin{pmatrix} 0 & M \\ M^* & 0\end{pmatrix}$

after eliminating sign ambiguities and degeneracies. So one can view singular values of a rectangular iid matrix as the eigenvalues of a matrix which resembles a Wigner matrix, except that two diagonal blocks of that matrix have been zeroed out.

The zeroing out of these elements prevents one from applying the entire Wigner universality theory directly to the covariance matrix setting (in particular, the crucial Talagrand concentration inequality for the magnitude of a projection of a random vector to a subspace does not work perfectly once there are many zero coefficients). Nevertheless, a large part of the theory (particularly the deterministic components of the theory, such as eigenvalue variation formulae) carry through without much difficulty. The one place where one has to spend a bit of time to check details is to ensure that the Erdos-Schlein-Yau delocalisation result (that asserts, roughly speaking, that the eigenvectors of a Wigner matrix are about as small in ${\ell^\infty}$ norm as one could hope to get) is also true for in the covariance matrix setting, but this is a straightforward (though somewhat tedious) adaptation of the method (which is based on the Stieltjes transform).

As an application, we extend the sine kernel distribution of local covariance matrix statistics, first established in the case of Wishart ensembles (when the underlying variables are gaussian) by Nagao and Wadati, and later extended to gaussian-divisible matrices by Ben Arous and Peche, to any distributions which matches one of these distributions to up to four moments, which covers virtually all complex distributions with independent iid real and imaginary parts, with basically the lone exception of the complex Bernoulli ensemble.

Recently, Erdos, Schlein, Yau, and Yin generalised their local relaxation flow method to also obtain similar universality results for distributions which have a large amount of smoothness, but without any matching moment conditions. By combining their techniques with ours as in our joint paper, one should probably be able to remove both smoothness and moment conditions, in particular now covering the complex Bernoulli ensemble.

In this paper we also record a new observation that the exponential decay hypothesis in our earlier paper can be relaxed to a finite moment condition, for a sufficiently high (but fixed) moment. This is done by rearranging the order of steps of the original argument carefully.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local eigenvalue statistics up to the edge“, submitted to Comm. Math. Phys..  This is a sequel to our previous paper, in which we studied universality of local eigenvalue statistics (such as normalised eigenvalue spacings $\sqrt{n} ( \lambda_{i+1}(M_n) - \lambda_i(M_n) )$) for random matrices $M_n$ of Wigner type, i.e. Hermitian (or symmetric) random matrices in which the upper-triangular entries are independent with mean zero and variance one (for technical reasons we also have to assume an exponential decay condition on the distribution of the entries).   The results in the previous paper were almost entirely focused on the bulk region, in which the index i of the eigenvalues involved was in the range $\varepsilon n \leq i \leq (1-\varepsilon) n$.  The main purpose of this paper is to extend the main results of the previous paper all the way up to the edge, thus allowing one to control all indices $1 \leq i \leq n$.  As an application, we obtain a variant of Soshnikov’s well-known result that the largest eigenvalue of Wigner matrices is distributed (after suitable normalisation) according to the Tracy-Widom law when the coefficients are symmetric, by assuming instead that the coefficients have vanishing third moment.

As one transitions from the bulk to the edge, the density of the eigenvalues decreases to zero (in accordance to the Wigner semicircular law), and so the average spacing between eigenvalues increases.  (For instance, the spacing between eigenvalues in the bulk is of size $n^{-1/2}$, but at the extreme edge it increases to $n^{-1/6}$.)  On the one hand, the increase in average spacing should make life easier, because one does not have to work at such a fine spatial scale in order to see the eigenvalue distribution.  On the other hand, a certain key technical step in the previous paper (in which we adapted an argument of Erdos, Schlein, and Yau to show that eigenvectors of Wigner matrices were delocalised) seemed to require eigenvalue spacings to be of size $O(n^{-1/2})$, which was the main technical obstacle to extending the preceding results from the bulk to the edge.

The main new observation in the paper is that it was not the eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_n)$ which were of importance to eigenvalue delocalisation, but rather the somewhat smaller interlaced eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_{n-1})$, where $M_{n-1}$ is a $n-1 \times n-1$ minor of $M_n$.  The Cauchy interlacing law asserts that the latter is smaller than the former.  But the interesting thing is that at the edge (when i is close to n), the interlaced spacings are much smaller than the former, and in particular remain of size about $O(n^{-1/2})$ (up to log factors) even though the non-interlaced spacings increase to be as large as $O(n^{-1/6})$.  This is ultimately due to a sort of “attractive force” on eigenvalues that draws them towards the origin, and counteracts the usual “eigenvalue repulsion effect”, that pushes eigenvalues away from each other.  This induces “bias” for eigenvalues to move in towards the bulk rescues the delocalization result, and the remainder of the arguments in our previous paper then continue with few changes.

Below the fold I wish to give some heuristic justification of the interlacing bias phenomenon, sketch why this is relevant for eigenvector delocalisation, and finally to recall why eigenvalue delocalisation in turn is relevant for universality.

[Update, Aug 16: sign error corrected.]

One further paper in this stream: László Erdős, José Ramírez, Benjamin Schlein, Van Vu, Horng-Tzer Yau, and myself have just uploaded to the arXiv the paper “Bulk universality for Wigner hermitian matrices with subexponential decay“, submitted to Mathematical Research Letters.  (Incidentally, this is my first six-author paper I have been involved in, not counting the polymath projects of course, though I have had a number of five-author papers.)

This short paper (9 pages) combines the machinery from two recent papers on the universality conjecture for the eigenvalue spacings in the bulk for Wigner random matrices (see my earlier blog post for more discussion).  On the one hand, the paper of Erdős-Ramírez-Schlein-Yau established this conjecture under the additional hypothesis that the distribution of the individual entries obeyed some smoothness and exponential decay conditions.  Meanwhile, the paper of Van Vu and myself (which I discussed in my earlier blog post) established the conjecture under a somewhat different set of hypotheses, namely that the distribution of the individual entries obeyed some moment conditions (in particular, the third moment had to vanish), a support condition (the entries had to have real part supported in at least three points), and an exponential decay condition.

After comparing our results, the six of us realised that our methods could in fact be combined rather easily to obtain a stronger result, establishing the universality conjecture assuming only a exponential decay (or more precisely, sub-exponential decay) bound ${\Bbb P}(|x_{\ell k}| > t ) \ll \exp( - t^c )$ on the coefficients; thus all regularity, moment, and support conditions have been eliminated.  (There is one catch, namely that we can no longer control a single spacing $\lambda_{i+1}-\lambda_i$ for a single fixed i, but must now average over all $1 \leq i \leq n$ before recovering the universality.  This is an annoying technical issue but it may be resolvable in the future with further refinements to the method.)

I can describe the main idea behind the unified approach here.  One can arrange the Wigner matrices in a hierarchy, from most structured to least structured:

• The most structured (or special) ensemble is the Gaussian Unitary Ensemble (GUE), in which the coefficients are gaussian. Here, one has very explicit and tractable formulae for the eigenvalue distributions, gap spacing, etc.
• The next most structured ensemble of Wigner matrices are the Gaussian-divisible or Johansson matrices, which are matrices H of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, where $\hat H$ is another Wigner matrix, V is a GUE matrix independent of $\hat H$, and $0 < t < 1$ is a fixed parameter independent of n.  Here, one still has quite explicit (though not quite as tractable) formulae for the joint eigenvalue distribution and related statistics.  Note that the limiting case t=1 is GUE.
• After this, one has the Ornstein-Uhlenbeck-evolved matrices, which are also of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, but now $t = n^{-1+\delta}$ decays at a power rate with n, rather than being comparable to 1.  Explicit formulae still exist for these matrices, but extracting universality out of this is hard work (and occupies the bulk of the paper of Erdős-Ramírez-Schlein-Yau).
• Finally, one has arbitrary Wigner matrices, which can be viewed as the t=0 limit of the above Ornstein-Uhlenbeck process.

The arguments in the paper of Erdős-Ramírez-Schlein-Yau can be summarised as follows (I assume subexponential decay throughout this discussion):

1. (Structured case) The universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.  (The case $1/4 < \delta \leq 1$ was treated in an earlier paper of Erdős-Ramírez-Schlein-Yau, while the case where t is comparable to 1 was treated by Johansson.)
2. (Matching) Every Wigner matrix with suitable smoothness conditions can be “matched” with an Ornstein-Uhlenbeck-evolved matrix, in the sense that the eigenvalue statistics for the two matrices are asymptotically identical.  (This is relatively easy due to the fact that $\delta$ can be taken arbitrarily close to zero.)
3. Combining 1. and 2. one obtains universality for all Wigner matrices obeying suitable smoothness conditions.

The arguments in the paper of Van and myself can be summarised as follows:

1. (Structured case) The universality conjecture is true for Johansson matrices, by the paper of Johansson.
2. (Matching) Every Wigner matrix with some moment and support conditions can be “matched” with a Johansson matrix, in the sense that the first four moments of the entries agree, and hence (by the Lindeberg strategy in our paper) have asymptotically identical statistics.
3. Combining 1. and 2. one obtains universality for all Wigner matrices obtaining suitable moment and support conditions.

What we realised is by combining the hard part 1. of the paper of Erdős-Ramírez-Schlein-Yau with the hard part 2. of the paper of Van and myself, we can remove all regularity, moment, and support conditions.  Roughly speaking, the unified argument proceeds as follows:

1. (Structured case) By the arguments of Erdős-Ramírez-Schlein-Yau, the universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.
2. (Matching) Every Wigner matrix $H$ can be “matched” with an Ornstein-Uhlenbeck-evolved matrix $e^{-t/2} H + (1-e^{-t})^{1/2} V$ for $t= n^{-1+0.01}$ (say), in the sense that the first four moments of the entries almost agree, which is enough (by the arguments of Van and myself) to show that these two matrices have asymptotically identical statistics on the average.
3. Combining 1. and 2. one obtains universality for the averaged statistics for all Wigner matrices.

The averaging should be removable, but this would require better convergence results to the semicircular law than are currently known (except with additional hypotheses, such as vanishing third moment).  The subexponential decay should also be relaxed to a condition of finiteness for some fixed moment ${\Bbb E} |x|^C$, but we did not pursue this direction in order to keep the paper short.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues $\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)$ of a Wigner matrix $M_n = (\zeta_{ij})_{1 \leq i,j \leq n}$, which we define to be a random Hermitian $n \times n$ matrix whose upper-triangular entries $\zeta_{ij}, 1 \leq i \leq j \leq n$ are independent (and whose strictly upper-triangular entries $\zeta_{ij}, 1 \leq i < j \leq n$ are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries $\zeta_{ij}, i are complex gaussian, and the diagonal entries $\zeta_{ii}$ are real gaussians;
2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
3. The Bernoulli Ensemble, in which all entries take values $\pm 1$ (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues $\lambda_1(M_n),\ldots,\lambda_n(M_n)$ for GUE is given by the explicit formula

$\displaystyle C_n \prod_{1 \leq i (0)

for some explicitly computable constant $C_n$ on the orthant $\{ \lambda_1 \leq \ldots \leq \lambda_n\}$ (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure $\frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}}$ for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

$\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx$ (1)

as $n \to \infty$.  Actually, more precise statements are known for GUE; for instance, for $1 \leq i \leq n$, the $i^{th}$ eigenvalue $\lambda_i(M_n)$ is known to equal

$\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} )$ (2)

with probability $1-o(1)$, where $t(a) \in [-2,2]$ is the inverse cumulative distribution function of the semicircular law, thus

$\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx$.

Furthermore, the distribution of the normalised eigenvalue spacing $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ is known; in the bulk region $\varepsilon n \leq i \leq 1-\varepsilon n$ for fixed $\varepsilon > 0$, it converges as $n \to \infty$ to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel $K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}$.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say $i = n-O(1)$, the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose $M_n$ is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of $O(n^{-1+\delta})$ for any $\delta>0$ rather than $O(\log n/n)$) and the sine kernel statistics for individual eigenvalue spacings $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, $\log |\det M_n| = (1+o(1)) \log \sqrt{n!}$ with probability $1-o(1)$).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The distribution of the smallest singular values“, submitted to Geom. Func. Anal..   This paper concerns the least singular value $\sigma_n(M)$ of a random $n \times n$ matrix $M_n = (a_{ij})_{1 \leq i,j \leq n}$ with iid entries, which for simplicity we will take to be real (we also have analogues for complex random matrices), with mean zero and variance one.  A typical model to keep in mind here is the Bernoulli model, when each $a_{ij}$ is equal to +1 or -1 with an equal probability of each, while a privileged role will be planed by the gaussian model, when each $a_{ij} \equiv N(0,1)$ is given the standard gaussian distribution.

The distribution of the least singular value $\sigma_n(M_n)$, which is of importance in smoothed analysis and also has intrinsic interest within the field of random matrices, has been intensively studied in recent years.  For instance, in the Bernoulli case, there have been several recent papers on the singularity probability ${\Bbb P}( \sigma_n(M_n) = 0 )$; it is not hard to obtain a lower bound of $(\frac{1}{2}+o(1))^n$, and this is conjectured to be the correct asymptotic.  The best upper bound so far is by Bourgain, Vu, and Wood, who obtain $(\frac{1}{\sqrt{2}}+o(1))^n$.

Upper and lower tail bounds have also been obtained, starting with the breakthrough paper of Rudelson (building upon some earlier work on rectangular matrices by Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), with subsequent papers by Van and myself, by Rudelson, and also by Rudelson and Vershynin.  To oversimplify somewhat, the conclusion of this work is that the least singular value $\sigma_n(M_n)$ has size comparable to $1/\sqrt{n}$ with high probability.  The techniques are based in part on inverse Littlewood-Offord theorems.

However, in the case of the gaussian ensemble, we know more than just the expected size of the least singular value; we know its asymptotic distribution.  Indeed, it was shown by Edelman in this case that one has

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) = 1 - e^{-t^2/2 - t} + o(1)$ (1)

for any fixed $t > 0$.  This computation was highly algebraic in nature, relying on special identities that are available only for extremely symmetric random matrix ensembles, such as the gaussian random matrix model; in particular, it is not obvious at all that the Bernoulli ensemble necessarily obeys the same distribution as the gaussian one.  Nevertheless, motivated in part by this computation, Spielman and Teng conjectured that the bound

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) \leq t + c^n$

should hold for some $c < 1$ for, say, the Bernoulli ensemble.    This conjecture was verified up to losses of a multiplicative constant by Rudelson and Vershynin.

The main result of our paper is to show that the distribution of the least singular value is in fact universal, being asymptotically the same for all iid (real) random matrix models with the same mean and variance, and with a sufficiently high number of moment conditions.  In particular, the asymptotic (1) for the gaussian ensemble is also true for the Bernoulli ensemble. Furthermore the error term o(1) can be shown to be of the shape $O(n^{-c})$ for some c > 0, which in turn confirms the Spielman-Teng conjecture (without a loss of constant) in the polynomial size regime $t \geq n^{-c'}$ for some $c' > 0$.  We also have some further results for other low singular values (e.g. $\sigma_{n-k}(M_n)$ for fixed k) but they are harder to state, and we will not do so here.

To our knowledge, this is the first universality result for the “hard edge” of the spectrum (i.e. the least few singular values) for iid square matrix models.  [For rectangular matrices, where the hard edge is bounded away from zero, universality was recently established by Feldheim and Sodin.] The bulk distribution for the singular values of such matrices has been known for some time (it is governed by the famous Marchenko-Pastur law), while the distribution at the “soft edge” of the spectrum (i.e. the largest few singular values) was established to be universal by Soshnikov (here the distribution is governed by the Tracy-Widom law for the top singular value, and by the Airy kernel for the next few singular values).  Both of these results are basically obtained by the moment method (or close relatives of this method, such as the resolvent method).  However, the moment method is not effective for discerning the hard edge of the spectrum, since the singular values here are very small compared with the bulk and so have a negligible influence on the moments.  [In the rectangular case, where the hard edge is bounded away from zero, the moment method becomes available again, though the application of it is quite delicate; see the Feldheim-Sodin paper for details.] Instead, we proceed by observing a certain central limit theorem type behaviour for the geometry of the columns of $M_n$, which is enough to give us the desired universality; more details on the proof lie below the fold.

Van Vu and I have just uploaded to the arXiv our preprint “A sharp inverse Littlewood-Offord theorem“, which we have submitted to Random Structures and Algorithms.  This paper gives a solution to the (inverse) Littlewood-Offord problem of understanding when random walks are concentrated in the case when the concentration is of polynomial size in the length $n$ of the walk; our description is sharp up to epsilon powers of $n$.  The theory of inverse Littlewood-Offord problems and related topics has been of importance in recent developments in the spectral theory of discrete random matrices (e.g. a “robust” variant of these theorems was crucial in our work on the circular law).

For simplicity I will restrict attention to the Bernoulli random walk.  Given $n$ real numbers $v_1,\ldots,v_n$, one can form the random variable

$S := \epsilon_1 v_1 + \ldots + \epsilon_n v_n$

where $\epsilon_1,\ldots,\epsilon_n \in \{-1,+1\}$ are iid random signs (with either sign +1, -1 chosen with probability 1/2).  This is a discrete random variable which typically takes $2^n$ values.  However, if there are various arithmetic relations between the step sizes $v_1,\ldots,v_n$, then many of the $2^n$ possible sums collide, and certain values may then arise with much higher probability.  To measure this, define the concentration probability $p(v_1,\ldots,v_n)$ by the formula

$p(v_1,\ldots,v_n) = \sup_x {\Bbb P}(S=x)$.

Intuitively, this probability measures the amount of additive structure present between the $v_1,\ldots,v_n$.  There are two (opposing) problems in the subject:

• (Forward Littlewood-Offord problem) Given some structural assumptions on $v_1,\ldots,v_n$, what bounds can one place on $p(v_1,\ldots,v_n)$?
• (Inverse Littlewood-Offord problem) Given some bounds on $p(v_1,\ldots,v_n)$, what structural assumptions can one then conclude about $v_1,\ldots,v_n$?

Ideally one would like answers to both of these problems which come close to inverting each other, and this is the guiding motivation for our paper.

Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood).  One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices $A_n = (a_{ij})_{1 \leq i,j \leq n}$, where the coefficients $a_{ij}$ are iid random variables with mean zero and unit variance.  (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues $\frac{1}{\sqrt{n}} \lambda_1, \ldots, \frac{1}{\sqrt{n}} \lambda_n$ of the matrix $A_n$ converges to the uniform distribution on the unit circle as $n \to \infty$.  The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.)  In a previous paper, we had established the same claim but under the additional assumption that the $(2+\eta)^{th}$ moment ${\Bbb E} |a_{ij}|^{2+\eta}$ was finite for some $\eta > 0$; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.

As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right.  Observe that the statement of the circular law can be split into two sub-statements:

1. (Universality for iid ensembles) In the asymptotic limit $n \to \infty$, the ESD of the random matrix $A_n$ is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance.  In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix $G_n$ with the same mean and variance.
2. (Circular law for gaussian matrices) In the asymptotic limit $n \to \infty$, the ESD of a gaussian matrix $G_n$ converges to the circular law.

The reason we single out the gaussian matrix ensemble $G_n$ is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))).  Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials).  In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.

These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each.  Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit $n \to \infty$ from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above).  One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average

$\displaystyle \overline{X}_n := \frac{1}{\sqrt{n}} (X_1 + \ldots + X_n )$ (1)

of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution.  (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)

Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix $A_n$ via estimates on the Stieltjes transform

$\displaystyle \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} A_n - zI )|$ (2)

of that matrix for complex numbers z.  This method required a combination of delicate analysis (in particular, a bound on the least singular values of $\frac{1}{\sqrt{n}} A_n - zI$), and algebra (in order to compute and then invert the Stieltjes transform).  [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]

What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting.   This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble.  (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)