Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood).  One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices A_n = (a_{ij})_{1 \leq i,j \leq n}, where the coefficients a_{ij} are iid random variables with mean zero and unit variance.  (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues \frac{1}{\sqrt{n}} \lambda_1, \ldots, \frac{1}{\sqrt{n}} \lambda_n of the matrix A_n converges to the uniform distribution on the unit circle as n \to \infty.  The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.)  In a previous paper, we had established the same claim but under the additional assumption that the (2+\eta)^{th} moment {\Bbb E} |a_{ij}|^{2+\eta} was finite for some \eta > 0; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.

As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right.  Observe that the statement of the circular law can be split into two sub-statements:

  1. (Universality for iid ensembles) In the asymptotic limit n \to \infty, the ESD of the random matrix A_n is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance.  In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix G_n with the same mean and variance.
  2. (Circular law for gaussian matrices) In the asymptotic limit n \to \infty, the ESD of a gaussian matrix G_n converges to the circular law.

The reason we single out the gaussian matrix ensemble G_n is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))).  Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials).  In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.

These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each.  Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit n \to \infty from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above).  One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average

\displaystyle \overline{X}_n := \frac{1}{\sqrt{n}} (X_1 + \ldots + X_n ) (1)

of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution.  (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)

Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix A_n via estimates on the Stieltjes transform

\displaystyle \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} A_n - zI )| (2)

of that matrix for complex numbers z.  This method required a combination of delicate analysis (in particular, a bound on the least singular values of \frac{1}{\sqrt{n}} A_n - zI), and algebra (in order to compute and then invert the Stieltjes transform).  [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]

What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting.   This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble.  (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)

To oversimplify a little bit, the universality principle we actually establish is the following:

Theorem 1. (Universality principle)  Let A_n = (a_{ij})_{1 \leq i,j \leq n} be a random matrix with bounded expected normalised second moment {\Bbb E} \frac{1}{n^2} \sum_{i,j=1}^n |a_{ij}|^2 such that the normalised coefficients a_{ij} - {\Bbb E} a_{ij} are iid.  Then in the limit n \to \infty, the ESD of A_n depends only on the ESD of the mean {\Bbb E} A_n, and on the variance of the coefficients.

The universality for the circular law (Statement 1) corresponds to the special case {\Bbb E} A_n = 0.  There are strong and weak versions of this principle, depending on whether one wants the limit to be in the sense of almost sure convergence or convergence in probability, but this is slightly technical and I will not detail it here.  In an appendix by Manjunath Krishnapur, the weak version of this universality principle is extended to the setting in which the normalised coefficients are not iid, but remain jointly independent, have variances uniformly bounded above and below, and also obey some technical uniform integrability conditions (e.g. the Pastur condition).

In the case of the central limit theorem, or for limit theorems for the (necessarily real) eigenvalues of Hermitian or symmetric random matrices, this sort of universality can be obtained via the moment method, and is well understood (and known as the “Lindeberg principle”).  Consider for instance the situation in the central limit theorem, and more precisely the task of understanding the distribution of the averages \overline{X}_n defined by (1) when the X_n are iid with zero mean and finite variance.  To show universality of these averages in the limit n \to \infty, it suffices to establish universality for the limits \lim_{n \to \infty} {\Bbb E} \overline{X_n}^k for k=1,2,3,\ldots, basically because a probability measure on the real line with suitable decay at infinity can be uniquely recovered from its moments.  (This statement is essentially the dual of the Weierstrass approximation theorem, combined with the Riesz representation theorem, though there are some technical issues regarding the “suitable decay at infinity” disclaimer.)

If one expands out the expected moment {\Bbb E} \overline{X_n}^k, one obtains a linear combination of mixed moments of the form

{\Bbb E} X_{i_1}^{a_1} \ldots X_{i_j}^{a_j}

for various indices 1 \leq i_1 < \ldots < i_j \leq n and exponents a_1, \ldots, a_j > 0 adding up to k.  As long as all the exponents are either 1 or 2, one can use the iid hypothesis to conclude that such moments depend only on the mean and variance of the random variable, and not on any finer properties of the distribution.  Thus, to establish universality, it suffices to show that the contribution of all the mixed moments which contain at least one exponent larger than 2 is negligible in the limit n \to \infty.  This can be done by rather crude combinatorial upper bounds regarding how many such contributions exist.

Note that this purely analytical approach did not attempt to actually compute what the limiting moments \lim_{n \to \infty} {\Bbb E} \overline{X_n}^k were; they only sought to show that such limits were universal.  Now, in this case, it is not terribly difficult to perform the enumerative combinatorics and compute these moments exactly, but once we have universality, we can pursue a simpler route: we use universality to swap the iid sequence X_1,\ldots,X_n with an iid gaussian sequence G_1,\ldots,G_n \equiv N(0,1).  The algebraic properties of gaussians then ensure that the averages \overline{G_n} are themselves gaussians N(0,1), and thus converge to a gaussian in the limit; universality then tells us that the same is true for the original iid sequence, thus proving the central limit theorem. (This proof is due to Lindeberg.)

A similar argument allows one to establish Wigner’s semicircular law by first reducing it to the case of the gaussian ensemble; see this paper of Chatterjee for a discussion and extensive generalisations.  (In this particular case, this route is not all that much simpler than the traditional method of using the enumerative combinatorics of Catalan numbers to compute the moments directly, or by using the slick machinery of free probability to clean up the computations, but it still illustrates the basic point that one can establish universality by primarily analytic means before having to understand what the limit actually is.)

In the case of non-Hermitian random matrix ensembles, such as iid ensembles, the eigenvalues are now complex instead of real, and the moment method no longer gives enough information on the ESD to settle the problem.  (For instance, in the case of the circular law, all normalised moments {\Bbb E} \frac{1}{n} \sum_{j=1}^n (\frac{1}{\sqrt{n}} \lambda_j)^k = {\Bbb E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} A_n)^k in fact vanish identically in the limit n \to \infty, even though the normalised distribution is clearly not identically zero in the limit.  A more fundamental explanation for this phenomenon is that the ESD for non-Hermitian matrices can be unstable with respect to perturbations (due to the presence of pseudospectrum), whereas moments are stable, and so cannot control the ESD on their own.)

Nevertheless, Girko observed that the (normalised) ESD of a matrix A_n can be computed in a reasonably stable manner from the distribution of its Stieltjes transforms (2) (though the latter are themselves unstable when the least singular value of \frac{1}{\sqrt{n}} A_n - zI becomes small).  Because of this, one can deduce universality for the ESD from a universality property of the Stieltjes transform.  (We formalise this statement as a “replacement principle” in our paper.)

So the remaining task is to show a universal property for the magnitude of the determinant of random matrices such as \frac{1}{\sqrt{n}} A_n - zI.  The reason why it is so much more preferable to deal with determinants rather than with eigenvalues is that there are many different and useful ways to try to compute the determinant: as a product of eigenvalues, a product of singular values, as the volume of a parallelopiped, as a cofactor expansion, etc.

In previous work on the circular law, starting from the breakthrough paper of Girko, this determinant was tackled by expressing it as a product of the singular values of \frac{1}{\sqrt{n}} A_n - zI.  The limiting distribution of these values can be determined (with some effort) from the moment method and some non-trivial algebraic computations, but to transfer this limit to that of the determinant, one needed a good rate of convergence in this distribution, as well as control of the least singular value to avoid degeneracy in the product.  This required a certain amount of analysis which was first set out rigorously (under some moderately strong hypotheses on the coefficients) by Bai.

A recent paper of Dozier and Silverstein shows that the limiting distribution of \frac{1}{\sqrt{n}} A_n - zI enjoys universality under some rather general hypotheses on A_n (and also provides an explicit formula for this limit, though we do not need that formula for our arguments). In principle, this gives universality for the determinant and thus for the original ESD of A_n, but unfortunately the results in that paper do not supply a sufficiently strong rate of convergence to make these arguments rigorous.

To resolve this, we took a slightly different tack, expressing the magnitude of the determinant as the volume of the parallelopiped generated by its rows, which by the high-school “base times height” formula is then the product of various distances from one row to the subspace spanned by previous rows.  The first few terms in this product are easy to understand, but things become difficult for the very last few rows, as the subspace now has very high dimension.  Nevertheless, thanks to existing work on the least singular value problem (and in particular, the bounds on this problem contained in our previous circular law paper), one can show that the contribution of the last few rows is in fact negligible.  For the bulk of the rows, one can use a classical singular value interlacing inequality going back to Cauchy to control the bulk contribution by the distribution of the singular values, for which the Dozier-Silverstein approach applies.  It remains to control the contribution of the intermediate rows which are close, but not too close, to the bottom.  Previously, these rows needed a good rate of convergence on the singular values in order to be controlled; but, borrowing a trick from one of our previous papers, we applied a very useful concentration-of-measure inequality of Talagrand instead to show that the contribution of these rows is negligible also.

In the appendix by Manjunath Krishnapur, a variant of the Dozier-Silverstein result, which establishes universality (in the weak sense of convergence in probability) without computing the limit for a wider class of random matrices, is established by exploiting an invariance principle of Chatterjee that generalises the Lindeberg trick.