You are currently browsing the tag archive for the ‘random matrices’ tag.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

$\displaystyle \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)$

of a Wigner random matrix ${M_n}$. More specifically, we consider ${n \times n}$ Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal ${\pm 1}$ on the diagonal, and ${\pm \frac{1}{2} \pm \frac{i}{2}}$ off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

$\displaystyle \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx$

(in the vague topology) where ${\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}}$ is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region ${\epsilon n \leq i \leq (1-\epsilon) n}$, one almost surely has

$\displaystyle \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)$

uniformly for in ${i}$, where the classical location ${\gamma_i \in [-2,2]}$ of the (normalised) ${i^{th}}$ eigenvalue ${\frac{1}{\sqrt{n}} \lambda_i}$ is defined by the formula

$\displaystyle \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.$

The bound (1) also holds in the edge case (by using the operator norm bound ${\| M_n\|_{op} = (2+o(1)) \sqrt{n}}$, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of ${\sqrt{n}}$. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of ${\sqrt{n}/n = n^{-1/2}}$. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing ${\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))}$ in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale ${n^{-1/2}}$ (and even at the slightly finer scale of ${n^{-1/2-c}}$ for some absolute constant ${c>0}$) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of ${n^{-1/2+\epsilon}}$ depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing ${\lambda_{i+1}(M_n) - \lambda_i(M_n)}$ at the critical scale ${n^{-1/2}}$, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale ${n^{-1/2}}$, the distribution of eigenvalues ${\lambda_i(M_n)}$ is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

$\displaystyle \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.$

A standard moment method computation shows that the right hand side is equal to

$\displaystyle 2n^3 + 2a n^2 + \ldots$

where ${a}$ is the fourth moment of the real part of the off-diagonal coefficients of ${M_n}$. In particular, a change in the fourth moment ${a}$ by ${O(1)}$ leads to a change in the expression ${\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)}$ by ${O(n^2)}$. Thus, for a typical ${i}$, one expects ${{\bf E}(\lambda_i(M_n)^4)}$ to shift by ${O(n)}$; since ${\lambda_i(M_n) = O(\sqrt{n})}$ on the average, we thus expect ${\lambda_i(M_n)}$ itself to shift by about ${O(n^{-1/2})}$ by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for ${\lambda_i(M_n)}$ that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that ${\lambda_i(M_n)}$ was sharply concentrated around an interval of size ${O(n^\epsilon)}$ around ${\sqrt{n} \gamma_i}$ for any ${\epsilon > 0}$ (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that ${\lambda_i(M_n)}$ was also weakly concentrated around an interval of size ${O(n^{-1/2+\epsilon})}$ around ${\sqrt{n} \gamma_i}$, in the sense that the probability that one was outside this interval was ${O(n^{-c})}$ for some absolute constant ${c>0}$. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of ${\lambda_i(M_n)}$ as ${i}$ varied from ${1}$ to ${n}$ was of the size of ${O(n^{-c})}$ for some absolute ${c>0}$.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual ${\lambda_i(M_n)}$ is ${O(n^{-c})}$ (without averaging in ${i}$). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean ${{\bf E} \lambda_i(M_n)}$ is indeed sensitive to the fourth moment of the entries at the critical scale ${n^{-1/2}}$.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue ${\lambda_i(M_n)}$ react to the available technology. To control the global behaviour of the eigenvalues (after averaging in ${i}$), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of ${\lambda_i(M_n)}$ to an interval of size ${O(n^{-1/2+\epsilon})}$, but can only localise the mean to a much larger interval of size ${O(n^{-c})}$. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as ${O(n^{-c})}$ for which all eigenvalues could deviate as far as ${O(n^\epsilon)}$ from their expected location, instead of their typical deviation of ${O(n^{-1/2})}$. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than ${n^{-c}}$ times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape ${O(n^{-c})}$. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Starting in the winter quarter (Monday Jan 4, to  be precise), I will be giving a graduate course on random matrices, with lecture notes to be posted on this blog.  The topics I have in mind are somewhat fluid, but my initial plan is to cover a large fraction of the following:

• Central limit theorem, random walks, concentration of measure
• The semicircular and Marcenko-Pastur laws for bulk distribution
• A little bit on the connections with free probability
• The spectral distribution of GUE and gaussian random matrices; theory of determinantal processes
• A little bit on the connections with orthogonal polynomials and Riemann-Hilbert problems
• Singularity probability and the least singular value; connections with the Littlewood-Offord problem
• The circular law
• Universality for eigenvalue spacing; Erdos-Schlein-Yau delocalisation of eigenvectors and applications

If time permits, I may also cover

• The Tracy-Widom law
• Connections with Dyson Brownian motion and the Ornstein-Uhlenbeck process; the Erdos-Schlein-Yau approach to eigenvalue spacing universality
• Conjectural connections with zeroes of the Riemann zeta function

Depending on how the course progresses, I may also continue it into the spring quarter (or else have a spring graduate course on a different topic – one potential topic I have in mind is dynamics on nilmanifolds and applications to combinatorics).

Given a set ${S}$, a (simple) point process is a random subset ${A}$ of ${S}$. (A non-simple point process would allow multiplicity; more formally, ${A}$ is no longer a subset of ${S}$, but is a Radon measure on ${S}$, where we give ${S}$ the structure of a locally compact Polish space, but I do not wish to dwell on these sorts of technical issues here.) Typically, ${A}$ will be finite or countable, even when ${S}$ is uncountable. Basic examples of point processes include

• (Bernoulli point process) ${S}$ is an at most countable set, ${0 \leq p \leq 1}$ is a parameter, and ${A}$ a random set such that the events ${x \in A}$ for each ${x \in S}$ are jointly independent and occur with a probability of ${p}$ each. This process is automatically simple.
• (Discrete Poisson point process) ${S}$ is an at most countable space, ${\lambda}$ is a measure on ${S}$ (i.e. an assignment of a non-negative number ${\lambda(\{x\})}$ to each ${x \in S}$), and ${A}$ is a multiset where the multiplicity of ${x}$ in ${A}$ is a Poisson random variable with intensity ${\lambda(\{x\})}$, and the multiplicities of ${x \in A}$ as ${x}$ varies in ${S}$ are jointly independent. This process is usually not simple.
• (Continuous Poisson point process) ${S}$ is a locally compact Polish space with a Radon measure ${\lambda}$, and for each ${\Omega \subset S}$ of finite measure, the number of points ${|A \cap \Omega|}$ that ${A}$ contains inside ${\Omega}$ is a Poisson random variable with intensity ${\lambda(\Omega)}$. Furthermore, if ${\Omega_1,\ldots,\Omega_n}$ are disjoint sets, then the random variables ${|A \cap \Omega_1|, \ldots, |A \cap \Omega_n|}$ are jointly independent. (The fact that Poisson processes exist at all requires a non-trivial amount of measure theory, and will not be discussed here.) This process is almost surely simple iff all points in ${S}$ have measure zero.
• (Spectral point processes) The spectrum of a random matrix is a point process in ${{\mathbb C}}$ (or in ${{\mathbb R}}$, if the random matrix is Hermitian). If the spectrum is almost surely simple, then the point process is almost surely simple. In a similar spirit, the zeroes of a random polynomial are also a point process.

A remarkable fact is that many natural (simple) point processes are determinantal processes. Very roughly speaking, this means that there exists a positive semi-definite kernel ${K: S \times S \rightarrow {\mathbb R}}$ such that, for any ${x_1,\ldots,x_n \in S}$, the probability that ${x_1,\ldots,x_n}$ all lie in the random set ${A}$ is proportional to the determinant ${\det( (K(x_i,x_j))_{1 \leq i,j \leq n} )}$. Examples of processes known to be determinantal include non-intersecting random walks, spectra of random matrix ensembles such as GUE, and zeroes of polynomials with gaussian coefficients.

I would be interested in finding a good explanation (even at the heuristic level) as to why determinantal processes are so prevalent in practice. I do have a very weak explanation, namely that determinantal processes obey a large number of rather pretty algebraic identities, and so it is plausible that any other process which has a very algebraic structure (in particular, any process involving gaussians, characteristic polynomials, etc.) would be connected in some way with determinantal processes. I’m not particularly satisfied with this explanation, but I thought I would at least describe some of these identities below to support this case. (This is partly for my own benefit, as I am trying to learn about these processes, particularly in connection with the spectral distribution of random matrices.) The material here is partly based on this survey of Hough, Krishnapur, Peres, and Virág.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local eigenvalue statistics up to the edge“, submitted to Comm. Math. Phys..  This is a sequel to our previous paper, in which we studied universality of local eigenvalue statistics (such as normalised eigenvalue spacings $\sqrt{n} ( \lambda_{i+1}(M_n) - \lambda_i(M_n) )$) for random matrices $M_n$ of Wigner type, i.e. Hermitian (or symmetric) random matrices in which the upper-triangular entries are independent with mean zero and variance one (for technical reasons we also have to assume an exponential decay condition on the distribution of the entries).   The results in the previous paper were almost entirely focused on the bulk region, in which the index i of the eigenvalues involved was in the range $\varepsilon n \leq i \leq (1-\varepsilon) n$.  The main purpose of this paper is to extend the main results of the previous paper all the way up to the edge, thus allowing one to control all indices $1 \leq i \leq n$.  As an application, we obtain a variant of Soshnikov’s well-known result that the largest eigenvalue of Wigner matrices is distributed (after suitable normalisation) according to the Tracy-Widom law when the coefficients are symmetric, by assuming instead that the coefficients have vanishing third moment.

As one transitions from the bulk to the edge, the density of the eigenvalues decreases to zero (in accordance to the Wigner semicircular law), and so the average spacing between eigenvalues increases.  (For instance, the spacing between eigenvalues in the bulk is of size $n^{-1/2}$, but at the extreme edge it increases to $n^{-1/6}$.)  On the one hand, the increase in average spacing should make life easier, because one does not have to work at such a fine spatial scale in order to see the eigenvalue distribution.  On the other hand, a certain key technical step in the previous paper (in which we adapted an argument of Erdos, Schlein, and Yau to show that eigenvectors of Wigner matrices were delocalised) seemed to require eigenvalue spacings to be of size $O(n^{-1/2})$, which was the main technical obstacle to extending the preceding results from the bulk to the edge.

The main new observation in the paper is that it was not the eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_n)$ which were of importance to eigenvalue delocalisation, but rather the somewhat smaller interlaced eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_{n-1})$, where $M_{n-1}$ is a $n-1 \times n-1$ minor of $M_n$.  The Cauchy interlacing law asserts that the latter is smaller than the former.  But the interesting thing is that at the edge (when i is close to n), the interlaced spacings are much smaller than the former, and in particular remain of size about $O(n^{-1/2})$ (up to log factors) even though the non-interlaced spacings increase to be as large as $O(n^{-1/6})$.  This is ultimately due to a sort of “attractive force” on eigenvalues that draws them towards the origin, and counteracts the usual “eigenvalue repulsion effect”, that pushes eigenvalues away from each other.  This induces “bias” for eigenvalues to move in towards the bulk rescues the delocalization result, and the remainder of the arguments in our previous paper then continue with few changes.

Below the fold I wish to give some heuristic justification of the interlacing bias phenomenon, sketch why this is relevant for eigenvector delocalisation, and finally to recall why eigenvalue delocalisation in turn is relevant for universality.

[Update, Aug 16: sign error corrected.]

One further paper in this stream: László Erdős, José Ramírez, Benjamin Schlein, Van Vu, Horng-Tzer Yau, and myself have just uploaded to the arXiv the paper “Bulk universality for Wigner hermitian matrices with subexponential decay“, submitted to Mathematical Research Letters.  (Incidentally, this is my first six-author paper I have been involved in, not counting the polymath projects of course, though I have had a number of five-author papers.)

This short paper (9 pages) combines the machinery from two recent papers on the universality conjecture for the eigenvalue spacings in the bulk for Wigner random matrices (see my earlier blog post for more discussion).  On the one hand, the paper of Erdős-Ramírez-Schlein-Yau established this conjecture under the additional hypothesis that the distribution of the individual entries obeyed some smoothness and exponential decay conditions.  Meanwhile, the paper of Van Vu and myself (which I discussed in my earlier blog post) established the conjecture under a somewhat different set of hypotheses, namely that the distribution of the individual entries obeyed some moment conditions (in particular, the third moment had to vanish), a support condition (the entries had to have real part supported in at least three points), and an exponential decay condition.

After comparing our results, the six of us realised that our methods could in fact be combined rather easily to obtain a stronger result, establishing the universality conjecture assuming only a exponential decay (or more precisely, sub-exponential decay) bound ${\Bbb P}(|x_{\ell k}| > t ) \ll \exp( - t^c )$ on the coefficients; thus all regularity, moment, and support conditions have been eliminated.  (There is one catch, namely that we can no longer control a single spacing $\lambda_{i+1}-\lambda_i$ for a single fixed i, but must now average over all $1 \leq i \leq n$ before recovering the universality.  This is an annoying technical issue but it may be resolvable in the future with further refinements to the method.)

I can describe the main idea behind the unified approach here.  One can arrange the Wigner matrices in a hierarchy, from most structured to least structured:

• The most structured (or special) ensemble is the Gaussian Unitary Ensemble (GUE), in which the coefficients are gaussian. Here, one has very explicit and tractable formulae for the eigenvalue distributions, gap spacing, etc.
• The next most structured ensemble of Wigner matrices are the Gaussian-divisible or Johansson matrices, which are matrices H of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, where $\hat H$ is another Wigner matrix, V is a GUE matrix independent of $\hat H$, and $0 < t < 1$ is a fixed parameter independent of n.  Here, one still has quite explicit (though not quite as tractable) formulae for the joint eigenvalue distribution and related statistics.  Note that the limiting case t=1 is GUE.
• After this, one has the Ornstein-Uhlenbeck-evolved matrices, which are also of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, but now $t = n^{-1+\delta}$ decays at a power rate with n, rather than being comparable to 1.  Explicit formulae still exist for these matrices, but extracting universality out of this is hard work (and occupies the bulk of the paper of Erdős-Ramírez-Schlein-Yau).
• Finally, one has arbitrary Wigner matrices, which can be viewed as the t=0 limit of the above Ornstein-Uhlenbeck process.

The arguments in the paper of Erdős-Ramírez-Schlein-Yau can be summarised as follows (I assume subexponential decay throughout this discussion):

1. (Structured case) The universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.  (The case $1/4 < \delta \leq 1$ was treated in an earlier paper of Erdős-Ramírez-Schlein-Yau, while the case where t is comparable to 1 was treated by Johansson.)
2. (Matching) Every Wigner matrix with suitable smoothness conditions can be “matched” with an Ornstein-Uhlenbeck-evolved matrix, in the sense that the eigenvalue statistics for the two matrices are asymptotically identical.  (This is relatively easy due to the fact that $\delta$ can be taken arbitrarily close to zero.)
3. Combining 1. and 2. one obtains universality for all Wigner matrices obeying suitable smoothness conditions.

The arguments in the paper of Van and myself can be summarised as follows:

1. (Structured case) The universality conjecture is true for Johansson matrices, by the paper of Johansson.
2. (Matching) Every Wigner matrix with some moment and support conditions can be “matched” with a Johansson matrix, in the sense that the first four moments of the entries agree, and hence (by the Lindeberg strategy in our paper) have asymptotically identical statistics.
3. Combining 1. and 2. one obtains universality for all Wigner matrices obtaining suitable moment and support conditions.

What we realised is by combining the hard part 1. of the paper of Erdős-Ramírez-Schlein-Yau with the hard part 2. of the paper of Van and myself, we can remove all regularity, moment, and support conditions.  Roughly speaking, the unified argument proceeds as follows:

1. (Structured case) By the arguments of Erdős-Ramírez-Schlein-Yau, the universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.
2. (Matching) Every Wigner matrix $H$ can be “matched” with an Ornstein-Uhlenbeck-evolved matrix $e^{-t/2} H + (1-e^{-t})^{1/2} V$ for $t= n^{-1+0.01}$ (say), in the sense that the first four moments of the entries almost agree, which is enough (by the arguments of Van and myself) to show that these two matrices have asymptotically identical statistics on the average.
3. Combining 1. and 2. one obtains universality for the averaged statistics for all Wigner matrices.

The averaging should be removable, but this would require better convergence results to the semicircular law than are currently known (except with additional hypotheses, such as vanishing third moment).  The subexponential decay should also be relaxed to a condition of finiteness for some fixed moment ${\Bbb E} |x|^C$, but we did not pursue this direction in order to keep the paper short.

In the theory of discrete random matrices (e.g. matrices whose entries are random signs ${\pm 1}$), one often encounters the problem of understanding the distribution of the random variable ${\hbox{dist}(X,V)}$, where ${X = (x_1,\ldots,x_n) \in \{-1,+1\}^n}$ is an ${n}$-dimensional random sign vector (so ${X}$ is uniformly distributed in the discrete cube ${\{-1,+1\}^n}$), and ${V}$ is some ${d}$-dimensional subspace of ${{\bf R}^n}$ for some ${0 \leq d \leq n}$.

It is not hard to compute the second moment of this random variable. Indeed, if ${P = (p_{ij})_{1 \leq i,j \leq n}}$ denotes the orthogonal projection matrix from ${{\bf R}^n}$ to the orthogonal complement ${V^\perp}$ of ${V}$, then one observes that

$\displaystyle \hbox{dist}(X,V)^2 = X \cdot P X = \sum_{i=1}^n \sum_{j=1}^n x_i x_j p_{ij}$

and so upon taking expectations we see that

$\displaystyle {\Bbb E} \hbox{dist}(X,V)^2 = \sum_{i=1}^n p_{ii} = \hbox{tr} P = n-d \ \ \ \ \ (1)$

since ${P}$ is a rank ${n-d}$ orthogonal projection. So we expect ${\hbox{dist}(X,V)}$ to be about ${\sqrt{n-d}}$ on the average.

In fact, one has sharp concentration around this value, in the sense that ${\hbox{dist}(X,V) = \sqrt{n-d}+O(1)}$ with high probability. More precisely, we have

Proposition 1 (Large deviation inequality) For any ${t>0}$, one has

$\displaystyle {\Bbb P}( |\hbox{dist}(X,V) - \sqrt{n-d}| \geq t ) \leq C \exp(- c t^2 )$

for some absolute constants ${C, c > 0}$.

In fact the constants ${C, c}$ are very civilised; for large ${t}$ one can basically take ${C=4}$ and ${c=1/16}$, for instance. This type of concentration, particularly for subspaces ${V}$ of moderately large codimension ${n-d}$, is fundamental to much of my work on random matrices with Van Vu, starting with our first paper (in which this proposition first appears). (For subspaces of small codimension (such as hyperplanes) one has to use other tools to get good results, such as inverse Littlewood-Offord theory or the Berry-Esséen central limit theorem, but that is another story.)

Proposition 1 is an easy consequence of the second moment computation and Talagrand’s inequality, which among other things provides a sharp concentration result for convex Lipschitz functions on the cube ${\{-1,+1\}^n}$; since ${\hbox{dist}(x,V)}$ is indeed a convex Lipschitz function, this inequality can be applied immediately. The proof of Talagrand’s inequality is short and can be found in several textbooks (e.g. Alon and Spencer), but I thought I would reproduce the argument here (specialised to the convex case), mostly to force myself to learn the proof properly. Note the concentration of ${O(1)}$ obtained by Talagrand’s inequality is much stronger than what one would get from more elementary tools such as Azuma’s inequality or McDiarmid’s inequality, which would only give concentration of about ${O(\sqrt{n})}$ or so (which is in fact trivial, since the cube ${\{-1,+1\}^n}$ has diameter ${2\sqrt{n}}$); the point is that Talagrand’s inequality is very effective at exploiting the convexity of the problem, as well as the Lipschitz nature of the function in all directions, whereas Azuma’s inequality can only easily take advantage of the Lipschitz nature of the function in coordinate directions. On the other hand, Azuma’s inequality works just as well if the ${\ell^2}$ metric is replaced with the larger ${\ell^1}$ metric, and one can conclude that the ${\ell^1}$ distance between ${X}$ and ${V}$ concentrates around its median to a width ${O(\sqrt{n})}$, which is a more non-trivial fact than the ${\ell^2}$ concentration bound given by that inequality. (The computation of the median of the ${\ell^1}$ distance is more complicated than for the ${\ell^2}$ distance, though, and depends on the orientation of ${V}$.)

Remark 1 If one makes the coordinates of ${X}$ iid Gaussian variables ${x_i \equiv N(0,1)}$ rather than random signs, then Proposition 1 is much easier to prove; the probability distribution of a Gaussian vector is rotation-invariant, so one can rotate ${V}$ to be, say, ${{\bf R}^d}$, at which point ${\hbox{dist}(X,V)^2}$ is clearly the sum of ${n-d}$ independent squares of Gaussians (i.e. a chi-square distribution), and the claim follows from direct computation (or one can use the Chernoff inequality). The gaussian counterpart of Talagrand’s inequality is more classical, being essentially due to Lévy, and will also be discussed later in this post.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues $\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)$ of a Wigner matrix $M_n = (\zeta_{ij})_{1 \leq i,j \leq n}$, which we define to be a random Hermitian $n \times n$ matrix whose upper-triangular entries $\zeta_{ij}, 1 \leq i \leq j \leq n$ are independent (and whose strictly upper-triangular entries $\zeta_{ij}, 1 \leq i < j \leq n$ are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries $\zeta_{ij}, i are complex gaussian, and the diagonal entries $\zeta_{ii}$ are real gaussians;
2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
3. The Bernoulli Ensemble, in which all entries take values $\pm 1$ (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues $\lambda_1(M_n),\ldots,\lambda_n(M_n)$ for GUE is given by the explicit formula

$\displaystyle C_n \prod_{1 \leq i (0)

for some explicitly computable constant $C_n$ on the orthant $\{ \lambda_1 \leq \ldots \leq \lambda_n\}$ (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure $\frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}}$ for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

$\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx$ (1)

as $n \to \infty$.  Actually, more precise statements are known for GUE; for instance, for $1 \leq i \leq n$, the $i^{th}$ eigenvalue $\lambda_i(M_n)$ is known to equal

$\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} )$ (2)

with probability $1-o(1)$, where $t(a) \in [-2,2]$ is the inverse cumulative distribution function of the semicircular law, thus

$\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx$.

Furthermore, the distribution of the normalised eigenvalue spacing $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ is known; in the bulk region $\varepsilon n \leq i \leq 1-\varepsilon n$ for fixed $\varepsilon > 0$, it converges as $n \to \infty$ to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel $K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}$.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say $i = n-O(1)$, the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose $M_n$ is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of $O(n^{-1+\delta})$ for any $\delta>0$ rather than $O(\log n/n)$) and the sine kernel statistics for individual eigenvalue spacings $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, $\log |\det M_n| = (1+o(1)) \log \sqrt{n!}$ with probability $1-o(1)$).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The distribution of the smallest singular values“, submitted to Geom. Func. Anal..   This paper concerns the least singular value $\sigma_n(M)$ of a random $n \times n$ matrix $M_n = (a_{ij})_{1 \leq i,j \leq n}$ with iid entries, which for simplicity we will take to be real (we also have analogues for complex random matrices), with mean zero and variance one.  A typical model to keep in mind here is the Bernoulli model, when each $a_{ij}$ is equal to +1 or -1 with an equal probability of each, while a privileged role will be planed by the gaussian model, when each $a_{ij} \equiv N(0,1)$ is given the standard gaussian distribution.

The distribution of the least singular value $\sigma_n(M_n)$, which is of importance in smoothed analysis and also has intrinsic interest within the field of random matrices, has been intensively studied in recent years.  For instance, in the Bernoulli case, there have been several recent papers on the singularity probability ${\Bbb P}( \sigma_n(M_n) = 0 )$; it is not hard to obtain a lower bound of $(\frac{1}{2}+o(1))^n$, and this is conjectured to be the correct asymptotic.  The best upper bound so far is by Bourgain, Vu, and Wood, who obtain $(\frac{1}{\sqrt{2}}+o(1))^n$.

Upper and lower tail bounds have also been obtained, starting with the breakthrough paper of Rudelson (building upon some earlier work on rectangular matrices by Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), with subsequent papers by Van and myself, by Rudelson, and also by Rudelson and Vershynin.  To oversimplify somewhat, the conclusion of this work is that the least singular value $\sigma_n(M_n)$ has size comparable to $1/\sqrt{n}$ with high probability.  The techniques are based in part on inverse Littlewood-Offord theorems.

However, in the case of the gaussian ensemble, we know more than just the expected size of the least singular value; we know its asymptotic distribution.  Indeed, it was shown by Edelman in this case that one has

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) = 1 - e^{-t^2/2 - t} + o(1)$ (1)

for any fixed $t > 0$.  This computation was highly algebraic in nature, relying on special identities that are available only for extremely symmetric random matrix ensembles, such as the gaussian random matrix model; in particular, it is not obvious at all that the Bernoulli ensemble necessarily obeys the same distribution as the gaussian one.  Nevertheless, motivated in part by this computation, Spielman and Teng conjectured that the bound

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) \leq t + c^n$

should hold for some $c < 1$ for, say, the Bernoulli ensemble.    This conjecture was verified up to losses of a multiplicative constant by Rudelson and Vershynin.

The main result of our paper is to show that the distribution of the least singular value is in fact universal, being asymptotically the same for all iid (real) random matrix models with the same mean and variance, and with a sufficiently high number of moment conditions.  In particular, the asymptotic (1) for the gaussian ensemble is also true for the Bernoulli ensemble. Furthermore the error term o(1) can be shown to be of the shape $O(n^{-c})$ for some c > 0, which in turn confirms the Spielman-Teng conjecture (without a loss of constant) in the polynomial size regime $t \geq n^{-c'}$ for some $c' > 0$.  We also have some further results for other low singular values (e.g. $\sigma_{n-k}(M_n)$ for fixed k) but they are harder to state, and we will not do so here.

To our knowledge, this is the first universality result for the “hard edge” of the spectrum (i.e. the least few singular values) for iid square matrix models.  [For rectangular matrices, where the hard edge is bounded away from zero, universality was recently established by Feldheim and Sodin.] The bulk distribution for the singular values of such matrices has been known for some time (it is governed by the famous Marchenko-Pastur law), while the distribution at the “soft edge” of the spectrum (i.e. the largest few singular values) was established to be universal by Soshnikov (here the distribution is governed by the Tracy-Widom law for the top singular value, and by the Airy kernel for the next few singular values).  Both of these results are basically obtained by the moment method (or close relatives of this method, such as the resolvent method).  However, the moment method is not effective for discerning the hard edge of the spectrum, since the singular values here are very small compared with the bulk and so have a negligible influence on the moments.  [In the rectangular case, where the hard edge is bounded away from zero, the moment method becomes available again, though the application of it is quite delicate; see the Feldheim-Sodin paper for details.] Instead, we proceed by observing a certain central limit theorem type behaviour for the geometry of the columns of $M_n$, which is enough to give us the desired universality; more details on the proof lie below the fold.

Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc..  This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures).  We conclude by mentioning a few open problems in the subject.

While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly.  One such observation is an identity which we call the negative second moment identity, which I would like to discuss here.    Let A be an $n \times n$ matrix; for simplicity we assume that the entries are real-valued.  Denote the n rows of A by $X_1,\ldots,X_n$, which we view as vectors in ${\Bbb R}^n$.  Let $\sigma_1(A) \geq \ldots \geq \sigma_n(A) \geq 0$ be the singular values of A. In our applications, the vectors $X_j$ are easily described (e.g. they might be randomly distributed on the discrete cube $\{-1,1\}^n$), but the distribution of the singular values $\sigma_j(A)$ is much more mysterious, and understanding this distribution is a key objective in this entire theory.

Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood).  One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices $A_n = (a_{ij})_{1 \leq i,j \leq n}$, where the coefficients $a_{ij}$ are iid random variables with mean zero and unit variance.  (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues $\frac{1}{\sqrt{n}} \lambda_1, \ldots, \frac{1}{\sqrt{n}} \lambda_n$ of the matrix $A_n$ converges to the uniform distribution on the unit circle as $n \to \infty$.  The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.)  In a previous paper, we had established the same claim but under the additional assumption that the $(2+\eta)^{th}$ moment ${\Bbb E} |a_{ij}|^{2+\eta}$ was finite for some $\eta > 0$; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.

As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right.  Observe that the statement of the circular law can be split into two sub-statements:

1. (Universality for iid ensembles) In the asymptotic limit $n \to \infty$, the ESD of the random matrix $A_n$ is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance.  In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix $G_n$ with the same mean and variance.
2. (Circular law for gaussian matrices) In the asymptotic limit $n \to \infty$, the ESD of a gaussian matrix $G_n$ converges to the circular law.

The reason we single out the gaussian matrix ensemble $G_n$ is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))).  Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials).  In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.

These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each.  Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit $n \to \infty$ from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above).  One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average

$\displaystyle \overline{X}_n := \frac{1}{\sqrt{n}} (X_1 + \ldots + X_n )$ (1)

of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution.  (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)

Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix $A_n$ via estimates on the Stieltjes transform

$\displaystyle \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} A_n - zI )|$ (2)

of that matrix for complex numbers z.  This method required a combination of delicate analysis (in particular, a bound on the least singular values of $\frac{1}{\sqrt{n}} A_n - zI$), and algebra (in order to compute and then invert the Stieltjes transform).  [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]

What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting.   This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble.  (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)