You are currently browsing the tag archive for the ‘universality’ tag.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n) of a Wigner matrix M_n = (\zeta_{ij})_{1 \leq i,j \leq n}, which we define to be a random Hermitian n \times n matrix whose upper-triangular entries \zeta_{ij}, 1 \leq i \leq j \leq n are independent (and whose strictly upper-triangular entries \zeta_{ij}, 1 \leq i < j \leq n are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

  1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries \zeta_{ij}, i<j are complex gaussian, and the diagonal entries \zeta_{ii} are real gaussians;
  2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
  3. The Bernoulli Ensemble, in which all entries take values \pm 1 (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues \lambda_1(M_n),\ldots,\lambda_n(M_n) for GUE is given by the explicit formula

\displaystyle C_n \prod_{1 \leq i<j \leq n} |\lambda_i-\lambda_j|^2 \exp( - \frac{1}{2n} (\lambda_1^2+\ldots+\lambda_n^2))\ d\lambda_1 \ldots d\lambda_n (0)

for some explicitly computable constant C_n on the orthant \{ \lambda_1 \leq \ldots \leq \lambda_n\} (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx (1)

as n \to \infty.  Actually, more precise statements are known for GUE; for instance, for 1 \leq i \leq n, the i^{th} eigenvalue \lambda_i(M_n) is known to equal

\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} ) (2)

with probability 1-o(1), where t(a) \in [-2,2] is the inverse cumulative distribution function of the semicircular law, thus

\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx.

Furthermore, the distribution of the normalised eigenvalue spacing \sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n)) is known; in the bulk region \varepsilon n \leq i \leq 1-\varepsilon n for fixed \varepsilon > 0, it converges as n \to \infty to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say i = n-O(1), the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose M_n is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of O(n^{-1+\delta}) for any \delta>0 rather than O(\log n/n)) and the sine kernel statistics for individual eigenvalue spacings \sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n)) (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, \log |\det M_n| = (1+o(1)) \log \sqrt{n!} with probability 1-o(1)).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The distribution of the smallest singular values“, submitted to Geom. Func. Anal..   This paper concerns the least singular value \sigma_n(M) of a random n \times n matrix M_n = (a_{ij})_{1 \leq i,j \leq n} with iid entries, which for simplicity we will take to be real (we also have analogues for complex random matrices), with mean zero and variance one.  A typical model to keep in mind here is the Bernoulli model, when each a_{ij} is equal to +1 or -1 with an equal probability of each, while a privileged role will be planed by the gaussian model, when each a_{ij} \equiv N(0,1) is given the standard gaussian distribution.

The distribution of the least singular value \sigma_n(M_n), which is of importance in smoothed analysis and also has intrinsic interest within the field of random matrices, has been intensively studied in recent years.  For instance, in the Bernoulli case, there have been several recent papers on the singularity probability {\Bbb P}( \sigma_n(M_n) = 0 ); it is not hard to obtain a lower bound of (\frac{1}{2}+o(1))^n, and this is conjectured to be the correct asymptotic.  The best upper bound so far is by Bourgain, Vu, and Wood, who obtain (\frac{1}{\sqrt{2}}+o(1))^n.

Upper and lower tail bounds have also been obtained, starting with the breakthrough paper of Rudelson (building upon some earlier work on rectangular matrices by Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), with subsequent papers by Van and myself, by Rudelson, and also by Rudelson and Vershynin.  To oversimplify somewhat, the conclusion of this work is that the least singular value \sigma_n(M_n) has size comparable to 1/\sqrt{n} with high probability.  The techniques are based in part on inverse Littlewood-Offord theorems.

However, in the case of the gaussian ensemble, we know more than just the expected size of the least singular value; we know its asymptotic distribution.  Indeed, it was shown by Edelman in this case that one has

{\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) = 1 - e^{-t^2/2 - t} + o(1) (1)

for any fixed t > 0.  This computation was highly algebraic in nature, relying on special identities that are available only for extremely symmetric random matrix ensembles, such as the gaussian random matrix model; in particular, it is not obvious at all that the Bernoulli ensemble necessarily obeys the same distribution as the gaussian one.  Nevertheless, motivated in part by this computation, Spielman and Teng conjectured that the bound

{\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) \leq t + c^n

should hold for some c < 1 for, say, the Bernoulli ensemble.    This conjecture was verified up to losses of a multiplicative constant by Rudelson and Vershynin.

The main result of our paper is to show that the distribution of the least singular value is in fact universal, being asymptotically the same for all iid (real) random matrix models with the same mean and variance, and with a sufficiently high number of moment conditions.  In particular, the asymptotic (1) for the gaussian ensemble is also true for the Bernoulli ensemble. Furthermore the error term o(1) can be shown to be of the shape O(n^{-c}) for some c > 0, which in turn confirms the Spielman-Teng conjecture (without a loss of constant) in the polynomial size regime t \geq n^{-c'} for some c' > 0.  We also have some further results for other low singular values (e.g. \sigma_{n-k}(M_n) for fixed k) but they are harder to state, and we will not do so here.

To our knowledge, this is the first universality result for the “hard edge” of the spectrum (i.e. the least few singular values) for iid square matrix models.  [For rectangular matrices, where the hard edge is bounded away from zero, universality was recently established by Feldheim and Sodin.] The bulk distribution for the singular values of such matrices has been known for some time (it is governed by the famous Marchenko-Pastur law), while the distribution at the “soft edge” of the spectrum (i.e. the largest few singular values) was established to be universal by Soshnikov (here the distribution is governed by the Tracy-Widom law for the top singular value, and by the Airy kernel for the next few singular values).  Both of these results are basically obtained by the moment method (or close relatives of this method, such as the resolvent method).  However, the moment method is not effective for discerning the hard edge of the spectrum, since the singular values here are very small compared with the bulk and so have a negligible influence on the moments.  [In the rectangular case, where the hard edge is bounded away from zero, the moment method becomes available again, though the application of it is quite delicate; see the Feldheim-Sodin paper for details.] Instead, we proceed by observing a certain central limit theorem type behaviour for the geometry of the columns of M_n, which is enough to give us the desired universality; more details on the proof lie below the fold.

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc..  This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures).  We conclude by mentioning a few open problems in the subject.

While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly.  One such observation is an identity which we call the negative second moment identity, which I would like to discuss here.    Let A be an n \times n matrix; for simplicity we assume that the entries are real-valued.  Denote the n rows of A by X_1,\ldots,X_n, which we view as vectors in {\Bbb R}^n.  Let \sigma_1(A) \geq \ldots \geq \sigma_n(A) \geq 0 be the singular values of A. In our applications, the vectors X_j are easily described (e.g. they might be randomly distributed on the discrete cube \{-1,1\}^n), but the distribution of the singular values \sigma_j(A) is much more mysterious, and understanding this distribution is a key objective in this entire theory.

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood).  One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices A_n = (a_{ij})_{1 \leq i,j \leq n}, where the coefficients a_{ij} are iid random variables with mean zero and unit variance.  (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues \frac{1}{\sqrt{n}} \lambda_1, \ldots, \frac{1}{\sqrt{n}} \lambda_n of the matrix A_n converges to the uniform distribution on the unit circle as n \to \infty.  The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.)  In a previous paper, we had established the same claim but under the additional assumption that the (2+\eta)^{th} moment {\Bbb E} |a_{ij}|^{2+\eta} was finite for some \eta > 0; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.

As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right.  Observe that the statement of the circular law can be split into two sub-statements:

  1. (Universality for iid ensembles) In the asymptotic limit n \to \infty, the ESD of the random matrix A_n is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance.  In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix G_n with the same mean and variance.
  2. (Circular law for gaussian matrices) In the asymptotic limit n \to \infty, the ESD of a gaussian matrix G_n converges to the circular law.

The reason we single out the gaussian matrix ensemble G_n is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))).  Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials).  In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.

These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each.  Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit n \to \infty from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above).  One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average

\displaystyle \overline{X}_n := \frac{1}{\sqrt{n}} (X_1 + \ldots + X_n ) (1)

of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution.  (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)

Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix A_n via estimates on the Stieltjes transform

\displaystyle \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} A_n - zI )| (2)

of that matrix for complex numbers z.  This method required a combination of delicate analysis (in particular, a bound on the least singular values of \frac{1}{\sqrt{n}} A_n - zI), and algebra (in order to compute and then invert the Stieltjes transform).  [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]

What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting.   This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble.  (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: A general approach for the least singular value problem“, submitted to Israel J. Math.. This paper continues a recent series of papers by ourselves and also by Rudelson and by RudelsonVershynin on understanding the least singular value \sigma_n(A) := \inf_{\|v\|=1} \|Av\| of a large random n \times n random complex matrix A. There are many random matrix models that one can consider, but here we consider models of the form A = M_n + N_n, where M_n = {\Bbb E}(A) is a deterministic matrix depending on n, and N_n is a random matrix whose entries are iid with some complex distribution x of mean zero and unit variance. (In particular, this model is useful for studying the normalised resolvents (\frac{1}{\sqrt{n}} N_n - zI)^{-1} of random iid matrices N_n, which are of importance in the spectral theory of these matrices; understanding the least singular value of random perturbations of deterministic matrices is also important in numerical analysis, and particularly in smoothed analysis of algorithms such as the simplex method.)

In the model mean zero case M_n = 0, the normalised singular values \frac{1}{\sqrt{n}} \sigma_1(A) \geq \ldots \geq \frac{1}{\sqrt{n}} \sigma_n(A) \geq 0 of A = N_n are known to be asymptotically distributed according to the Marchenko-Pastur distribution \frac{1}{\pi} (4-x^2)_+^{1/2} dx, which in particular implies that most of the singular values are continuously distributed (via a semicircular distribution) in the interval {}[0, 2\sqrt{n}]. (Assuming only second moment hypotheses on the underlying distribution x, this result is due to Yin; there are many earlier results assuming stronger hypotheses on x.) This strongly suggests, but does not formally prove, that the least singular value \sigma_n(A) should be of size \sim 1/\sqrt{n} on the average. (To get such a sharp bound on the least singular value via the Marchenko-Pastur law would require an incredibly strong bound on the convergence rate to this law, which seems out of reach at present, especially when one does not assume strong moment conditions on x; current results such as those of Götze-Tikhomirov or Chatterjee-Bose give some upper bound on \sigma_n(A) which improves upon the trivial bound of O(n^{1/2}) by a polynomial factor assuming certain moment conditions on x, but as far as I am aware these bounds do not get close to the optimal value of O(n^{-1/2}), except perhaps in the special case when x is Gaussian.) The statement that \sigma_n(A) \sim 1/\sqrt{n} with high probability has been conjectured (in various forms) in a number of places, for instance by von Neumann, by Smale, and by Spielman-Teng.

Read the rest of this entry »