You are currently browsing the tag archive for the ‘random matrices’ tag.

[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]

One of the most fundamental partial differential equations in mathematics is the heat equation

\displaystyle  \partial_t f = L f \ \ \ \ \ (1)

where {f: [0,+\infty) \times {\bf R}^n \rightarrow {\bf R}} is a scalar function {(t,x) \mapsto f(t,x)} of both time and space, and {L} is the Laplacian {L := \frac{1}{2} \Delta = \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2}}. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of {\frac{1}{2}} in the definition of the heat propagator {L} is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.

In probability theory, this equation takes on particular significance when {f} is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that

\displaystyle  \int_{{\bf R}^n} f(t,x)\ dx = 1

for all {t}. (Actually, it suffices to verify this constraint at time {t=0}, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret {f(t,x)\ dx} as the probability distribution of a Brownian motion

\displaystyle  dx = dB(t) \ \ \ \ \ (2)

where {x = x(t) \in {\bf R}^n} is a stochastic process with initial probability distribution {f(0,x)\ dx}; see for instance this previous blog post for more discussion.

A model example of a solution to the heat equation to keep in mind is that of the fundamental solution

\displaystyle  G(t,x) = \frac{1}{(2\pi t)^{n/2}} e^{-|x|^2/2t} \ \ \ \ \ (3)

defined for any {t>0}, which represents the distribution of Brownian motion of a particle starting at the origin {x=0} at time {t=0}. At time {t}, {G(t,x)} represents an {{\bf R}^n}-valued random variable, each coefficient of which is an independent random variable of mean zero and variance {t}. (As {t \rightarrow 0^+}, {G(t)} converges in the sense of distributions to a Dirac mass at the origin.)

The heat equation can also be viewed as the gradient flow for the Dirichlet form

\displaystyle  D(f,g) := \frac{1}{2} \int_{{\bf R}^n} \nabla f \cdot \nabla g\ dx \ \ \ \ \ (4)

since one has the integration by parts identity

\displaystyle  \int_{{\bf R}^n} Lf(x) g(x)\ dx = \int_{{\bf R}^n} f(x) Lg(x)\ dx = - D(f,g) \ \ \ \ \ (5)

for all smooth, rapidly decreasing {f,g}, which formally implies that {L f} is (half of) the negative gradient of the Dirichlet energy {D(f,f) = \frac{1}{2} \int_{{\bf R}^n} |\nabla f|^2\ dx} with respect to the {L^2({\bf R}^n,dx)} inner product. Among other things, this implies that the Dirichlet energy decreases in time:

\displaystyle  \partial_t D(f,f) = - 2 \int_{{\bf R}^n} |Lf|^2\ dx. \ \ \ \ \ (6)

For instance, for the fundamental solution (3), one can verify for any time {t>0} that

\displaystyle  D(G,G) = \frac{n}{2^{n+2} \pi^{n/2}} \frac{1}{t^{(n+2)/2}} \ \ \ \ \ (7)

(assuming I have not made a mistake in the calculation). In a similar spirit we have

\displaystyle  \partial_t \int_{{\bf R}^n} |f|^2\ dx = - 2 D(f,f). \ \ \ \ \ (8)

Since {D(f,f)} is non-negative, the formula (6) implies that {\int_{{\bf R}^n} |Lf|^2\ dx} is integrable in time, and in particular we see that {Lf} converges to zero as {t \rightarrow \infty}, in some averaged {L^2} sense at least; similarly, (8) suggests that {D(f,f)} also converges to zero. This suggests that {f} converges to a constant function; but as {f} is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in {{\bf R}^n} to decay to zero in some sense as {t \rightarrow \infty}. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the {L^\infty} norm like {O(t^{-n/2})}.

Since {L1=0}, we also observe the basic cancellation property

\displaystyle  \int_{{\bf R}^n} Lf(x) \ dx = 0 \ \ \ \ \ (9)

for any function {f}.

There are other quantities relating to {f} that also decrease in time under heat flow, particularly in the important case when {f} is a probability measure. In this case, it is natural to introduce the entropy

\displaystyle  S(f) := \int_{{\bf R}^n} f(x) \log f(x)\ dx.

Thus, for instance, if {f(x)\ dx} is the uniform distribution on some measurable subset {E} of {{\bf R}^n} of finite measure {|E|}, the entropy would be {-\log |E|}. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has {S(G) = -\frac{n}{2} \log( 2 \pi e t )} for any {t>0}, reflecting the fact that {G(t)} is approximately uniformly distributed on a ball of radius {O(\sqrt{t})} (and thus of measure {O(t^{n/2})}).

A short formal computation shows (if one assumes for simplicity that {f} is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that

\displaystyle  \partial_t S(f) = \int_{{\bf R}^n} (Lf) \log f + f \frac{Lf}{f}\ dx

\displaystyle  = \int_{{\bf R}^n} (Lf) \log f\ dx

\displaystyle  = - D( f, \log f )

\displaystyle  = - \frac{1}{2} \int_{{\bf R}^n} \frac{|\nabla f|^2}{f}\ dx

\displaystyle  = - 4D( g, g )

where {g := \sqrt{f}} is the square root of {f}. For instance, if {f} is the fundamental solution (3), one can check that {D(g,g) = \frac{n}{8t}} (note that this is a significantly cleaner formula than (7)!).

In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.

Actually, one can say more: the rate of decrease {4D(g,g)} of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have

\displaystyle  L \phi(f) = \phi'(f) Lf + \frac{1}{2} \phi''(f) |\nabla f|^2, \ \ \ \ \ (10)

valid for for any smooth function {\phi: {\bf R} \rightarrow {\bf R}}, we see from (1) that

\displaystyle  2 g \partial_t g = 2 g L g + |\nabla g|^2

and thus (again assuming that {f}, and hence {g}, is strictly positive to avoid technicalities)

\displaystyle  \partial_t g = Lg + \frac{|\nabla g|^2}{2g}.

We thus have

\displaystyle  \partial_t D(g,g) = 2 D(g,Lg) + D(g, \frac{|\nabla g|^2}{g} ).

It is now convenient to compute using the Einstein summation convention to hide the summation over indices {i,j = 1,\ldots,n}. We have

\displaystyle  2 D(g,Lg) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) (\partial_i \partial_j \partial_j g)\ dx

and

\displaystyle  D(g, \frac{|\nabla g|^2}{g} ) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) \partial_i \frac{\partial_j g \partial_j g}{g}\ dx.

By integration by parts and interchanging partial derivatives, we may write the first integral as

\displaystyle  2 D(g,Lg) = - \frac{1}{2} \int_{{\bf R}^n} (\partial_i \partial_j g) (\partial_i \partial_j g)\ dx,

and from the quotient and product rules, we may write the second integral as

\displaystyle  D(g, \frac{|\nabla g|^2}{g} ) = \int_{{\bf R}^n} \frac{(\partial_i g) (\partial_j g) (\partial_i \partial_j g)}{g} - \frac{(\partial_i g) (\partial_j g) (\partial_i g) (\partial_j g)}{2g^2}\ dx.

Gathering terms, completing the square, and making the summations explicit again, we see that

\displaystyle  \partial_t D(g,g) =- \frac{1}{2} \int_{{\bf R}^n} \frac{\sum_{i=1}^n \sum_{j=1}^n |g \partial_i \partial_j g - (\partial_i g) (\partial_j g)|^2}{g^2}\ dx

and so in particular {D(g,g)} is always decreasing.

The above identity can also be written as

\displaystyle  \partial_t D(g,g) = - \frac{1}{2} \int_{{\bf R}^n} |\nabla^2 \log g|^2 g^2\ dx.

Exercise 1 Give an alternate proof of the above identity by writing {f = e^{2u}}, {g = e^u} and deriving the equation {\partial_t u = Lu + |\nabla u|^2} for {u}.

It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).

The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble {M_n} of {n \times n} matrices, one can arbitrarily enumerate the (geometric) eigenvalues as {\lambda_1(M_n),\ldots,\lambda_n(M_n) \in {\bf C}}, and one can then define the {k}-point correlation functions {\rho^{(k)}_n: {\bf C}^k \rightarrow {\bf R}^+} to be the symmetric functions such that

\displaystyle  \int_{{\bf C}^k} F(z_1,\ldots,z_k) \rho^{(k)}_n(z_1,\ldots,z_k)\ dz_1 \ldots dz_k

\displaystyle  = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F(\lambda_1(M_n),\ldots,\lambda_k(M_n)).

In the case when {M_n} is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of {\rho^{(k)}_n} near some point {z \sqrt{n}} as {n \rightarrow \infty} and {z \in {\bf C}} is fixed are given by the determinantal rule

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow \hbox{det}( K(w_i,w_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)

for {|z| < 1} and

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow 0

for {|z| > 1}, where {K} is the reproducing kernel

\displaystyle  K(z,w) := \frac{1}{\pi} e^{-|z|^2/2 - |w|^2/2 + z \overline{w}}.

(There is also an asymptotic for the boundary case {|z|=1}, but it is more complicated to state.) In particular, we see that {\rho^{(k)}_n(z \sqrt{n}) \rightarrow \frac{1}{\pi} 1_{|z| \leq 1}} for almost every {z}, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.

Our first main result is that the asymptotic (1) for {|z|<1} also holds (in the sense of vague convergence) when {M_n} is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded {z} and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is {\sqrt{\frac{2n}{\pi}} + O(n^{1/2-c})} with probability {1-O(n^{-c})} for some absolute constant {c>0}.

There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula

\displaystyle  \log |\det(M_n-z_0)| = - \sum_{1 \leq i \leq n: \lambda_i(M_n) \in B(z_0,r)} \log \frac{r}{|\lambda_i(M_n)-z_0|}

\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |\det(M_n-z_0-re^{i\theta})|\ d\theta

that relates the local distribution of eigenvalues to the log-determinants {\log |\det(M_n-z_0)|}, and secondly the elementary identity

\displaystyle  \log |\det(M_n - z)| = \frac{1}{2} \log|\det W_{n,z}| + \frac{1}{2} n \log n

that relates the log-determinants of {M_n-z} to the log-determinants of the Hermitian matrices

\displaystyle  W_{n,z} := \frac{1}{\sqrt{n}} \begin{pmatrix} 0 & M_n -z \\ (M_n-z)^* & 0 \end{pmatrix}.

The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants {\log|\det W_{n,z}|}. This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.

The matrices {W_{n,z}} are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.

While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of a random Wigner matrix {M_n} (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues {\lambda_i(M_n)} with {\delta n \leq i \leq (1-\delta) i n} for some fixed {\delta>0}, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix {M_n} to have mean zero and variance {1/n}, then in the asymptotic limit {n \rightarrow \infty}, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution {\rho_{sc}(x)\ dx}, where

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

An essentially equivalent way of saying this is that for large {n}, we expect the {i^{th}} eigenvalue {\lambda_i(M_n)} of {M_n} to stay close to the classical location {\gamma_i \in [-2,2]}, defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

\displaystyle  \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)

for all {1 \leq i \leq n}.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape {O(n^{-c})} for some absolute constant {c>0}; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that {\lambda_i(M_n)} has variance comparable to {\frac{\log n}{n^2}} in the bulk, so that the optimal error term in (1) should be about {O(\log^{1/2} n/n)}. (One may think that if one wanted bounds on (1) that were uniform in {i}, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the {\lambda_i}; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order {O(\log^{1/2} n/n)}.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

\displaystyle  N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx

with exponentially high probability for intervals {I} in the bulk that were as short as {n^{-1+\epsilon}} for some {\epsilon>0}, where {N(I)} is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues {\lambda_i} (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor {\log^{O(\log\log n)} n}. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function {F(X_1,\ldots,X_n)} of many independent random variables {X_1,\ldots,X_n} with the same function {F(Y_1,\ldots,Y_n)} of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

\displaystyle  {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))

for various smooth test functions {\phi}, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, {\phi} was a bounded test function, which allowed one to get control of the bulk of the distribution of {F(X_1,\ldots,X_n)}, and in particular in controlling probabilities such as

\displaystyle  {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )

for various thresholds {a} and {b}, but did not give good control on the tail as the error terms tended to be polynomially decaying in {n} rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

\displaystyle  {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

for various moderately large {k} (e.g. of size comparable to {\log n}), obtaining results such as

\displaystyle  {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

after performing all the relevant exchanges. As such, one can then use large deviation estimates on {F(X_1,\ldots,X_n)} to deduce large deviation estimates on {F(Y_1,\ldots,Y_n)}.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents {(M_n-z)^{-1}} (and the closely related Stieltjes transform {s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

\displaystyle  \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta

for any energy level {E \in {\bf R}}, which can be verified from elementary calculus. (In practice, we would truncate {\eta} near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions {N((-\infty,E))}, which in turn can be used to deduce concentration results for individual eigenvalues {\lambda_i(M_n)} by some basic combinatorial manipulations.

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year.  This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices.  In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries.  We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

Van Vu and I have just uploaded to the arXiv our paper “The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices“, submitted to the Proceedings of the National Academy of Sciences. This short note concerns the convergence of the {k}-point correlation functions of Wigner matrices in the bulk to the Dyson {k}-point functions, a statement conjectured by Wigner, Dyson, and Mehta. Thanks to the results of Erdös, Peche, Ramirez, Schlein, Vu, Yau, and myself, this conjecture has now been established for all Wigner matrices (assuming a finite moment condition on the entries), but only if one uses a quite weak notion of convergence, namely averaged vague convergence in which one averages in the energy parameter {u}. The main purpose of this note is to observe that by combining together existing results in the literature, one can improve the convergence to vague convergence (which is the natural notion of convergence in the discrete setting); and furthermore, if one assumes some regularity and decay conditions on the coefficient distribution, one can improve the convergence further to local {L^1} convergence.

More precisely, let {M_n} be an {n \times n} Wigner matrix – a random Hermitian matrix whose off-diagonal elements {\frac{1}{\sqrt{n}} \zeta_{ij}} for {1 \leq i < j \leq n} are iid with mean zero and variance {1/n} (and whose diagonal elements also obey similar hypotheses, which we omit here). For simplicity, we also assume that the real and imaginary parts of {\zeta_{ij}} are also iid (as is the case for instance for the Gaussian Unitary Ensemble (GUE)). The eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of such a matrix are known to be asymptotically distributed accordingly to the Wigner semicircular distribution {\rho_{sc}(u)\ du}, where

\displaystyle  \rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)_+^{1/2}.

In particular, this suggests that at any energy level {u} in the bulk {(-2,2)} of the spectrum, the average eigenvalue spacing should be about {\frac{1}{n \rho_{sc}(u)}}. It is then natural to introduce the normalised {k}-point correlation function

\displaystyle  \rho_{n,u}^{(k)}(t_1,\ldots,t_k) := \lim_{\epsilon \rightarrow 0} \frac{1}{\epsilon^k} {\bf P} E_\epsilon

for any distinct reals {t_1,\ldots,t_k} and {k \geq 1}, where {E_\epsilon} is the event that there is an eigenvalue in each of the intervals {[u + \frac{t_i}{n \rho_{sc}(u)}, u + \frac{t_i+\epsilon}{n \rho_{sc}(u)}]} for each {1 \leq i \leq k}. (This definition is valid when the Wigner ensemble is continuous; for discrete ensembles, one can define {\rho_{n,u}^{(k)}} instead in a distributional sense.)

The Wigner-Dyson-Mehta conjecture asserts that {\rho_{n,u}^{(k)}} converges (in various senses) as {n \rightarrow \infty} to the Dyson {k}-point function

\displaystyle  \rho_{Dyson}^{(k)}(t_1,\ldots,t_k) := \hbox{det}( K( t_i,t_j) )_{1 \leq i,j \leq k}

where {K(t,t'):=\frac{\sin \pi(t-t')}{\pi(t-t')}} is the Dyson sine kernel. This conjecture was verified first for the GUE (with a quite strong notion of convergence, namely local uniform convergence) by Dyson, using an explicit formula for {\rho_{n,u}^{(k)}} in the GUE case due to Gaudin and Mehta. Later results of Johansson, Erdos-Ramirez-Schlein-Yau, Erdos-Peche-Ramirez-Schlein-Yau, and Vu and myself, extended these results to increasingly wider ranges of Wigner matrices, but in the context of either weak convergence (which means that

\displaystyle  \int_{{\bf R}^k} \rho_{n,u}^{(k)}(t) F(t)\ dt \rightarrow \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt \ \ \ \ \ (1)

for any {L^\infty}, compactly supported function {F}), or the slightly weaker notion of vague convergence (which is the same as weak convergence, except that the function {F} is also required to be continuous).

In a joint paper of Erdos, Ramirez, Schlein, Vu, Yau, and myself, we established the Wigner-Dyson-Mehta conjecture for all Wigner matrices (assuming only an exponential decay condition on the entries), but using a quite weak notion of convergence, namely averaged vague convergence, which allows for averaging in the energy parameter. Specifically, we showed that

\displaystyle  \lim_{b \rightarrow 0} \lim_{n \rightarrow \infty} \frac{1}{2b} \int_{u-b}^{u+b} \int_{{\bf R}^k} \rho_{n,u'}^{(k)}(t) F(t)\ dt = \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt.

Subsequently, Erdos, Schlein, and Yau introduced the powerful local relaxation flow method, which achieved a simpler proof of the same result which also generalised to other ensembles beyond the Wigner case. However, for technical reasons, this method was restricted to establishing averaged vague convergence only.

In the current paper, we show that by combining the argument of Erdos, Ramirez, Schlein, Vu, Yau, and myself with some more recent technical results, namely the relaxation of the exponential decay condition in the four moment theorem to a finite moment condition (established by Vu and myself) and a strong eigenvalue localisation bound of Erdos, Yau, and Yin, one can upgrade the averaged vague convergence to vague convergence, and handle all Wigner matrices that assume a finite moment condition. Vague convergence is the most natural notion of convergence for discrete random matrix ensembles; for such ensembles, the correlation function is a discrete measure, and so one does not expect convergence to a continuous limit in any stronger sense than the vague sense. Also, by carefully inspecting the earlier argument of Erdos, Peche, Ramirez, Schlein, and Yau, we were able to establish convergence in the stronger local {L^1} sense once one assumed some regularity and positivity condition on the underlying coefficient distribution. These are somewhat modest and technical improvements over previous work on the Wigner-Dyson-Mehta conjecture, but they help to clarify and organise the profusion of results in this area, which are now reaching a fairly definitive form.

It may well be possible to go beyond local {L^1} convergence in the case of smooth ensembles, for instance establishing local uniform convergence; this was recently accomplished in the {k=1} case by Maltsev and Schlein. Indeed one may optimistically expect to even have convergence in the local smooth topology, which would basically be the strongest convergence one could hope for.

I’ve just uploaded to the arXiv my paper “Outliers in the spectrum of iid matrices with bounded rank perturbations“, submitted to Probability Theory and Related Fields. This paper is concerned with outliers to the circular law for iid random matrices. Recall that if {X_n} is an {n \times n} matrix whose entries are iid complex random variables with mean zero and variance one, then the {n} complex eigenvalues of the normalised matrix {\frac{1}{\sqrt{n}} X_n} will almost surely be distributed according to the circular law distribution {\frac{1}{\pi} 1_{|z| \leq 1} d^2 z} in the limit {n \rightarrow \infty}. (See these lecture notes for further discussion of this law.)

The circular law is also stable under bounded rank perturbations: if {C_n} is a deterministic rank {O(1)} matrix of polynomial size (i.e. of operator norm {O(n^{O(1)})}), then the circular law also holds for {\frac{1}{\sqrt{n}} X_n + C_n} (this is proven in a paper of myself, Van Vu, and Manjunath Krisnhapur). In particular, the bulk of the eigenvalues (i.e. {(1-o(1)) n} of the {n} eigenvalues) will lie inside the unit disk {\{ z \in {\bf C}: |z| \leq 1 \}}.

However, this leaves open the possibility for one or more outlier eigenvalues that lie significantly outside the unit disk; the arguments in the paper cited above give some upper bound on the number of such eigenvalues (of the form {O(n^{1-c})} for some absolute constant {c>0}) but does not exclude them entirely. And indeed, numerical data shows that such outliers can exist for certain bounded rank perturbations.

In this paper, some results are given as to when outliers exist, and how they are distributed. The easiest case is of course when there is no bounded rank perturbation: {C_n=0}. In that case, an old result of Bai and Yin and of Geman shows that the spectral radius of {\frac{1}{\sqrt{n}} X_n} is almost surely {1+o(1)}, thus all eigenvalues will be contained in a {o(1)} neighbourhood of the unit disk, and so there are no significant outliers. The proof is based on the moment method.

Now we consider a bounded rank perturbation {C_n} which is nonzero, but which has a bounded operator norm: {\|C_n\|_{op} = O(1)}. In this case, it turns out that the matrix {\frac{1}{\sqrt{n}} X_n + C_n} will have outliers if the deterministic component {C_n} has outliers. More specifically (and under the technical hypothesis that the entries of {X_n} have bounded fourth moment), if {\lambda} is an eigenvalue of {C_n} with {|\lambda| > 1}, then (for {n} large enough), {\frac{1}{\sqrt{n}} X_n + C_n} will almost surely have an eigenvalue at {\lambda+o(1)}, and furthermore these will be the only outlier eigenvalues of {\frac{1}{\sqrt{n}} X_n + C_n}.

Thus, for instance, adding a bounded nilpotent low rank matrix to {\frac{1}{\sqrt{n}} X_n} will not create any outliers, because the nilpotent matrix only has eigenvalues at zero. On the other hand, adding a bounded Hermitian low rank matrix will create outliers as soon as this matrix has an operator norm greater than {1}.

When I first thought about this problem (which was communicated to me by Larry Abbott), I believed that it was quite difficult, because I knew that the eigenvalues of non-Hermitian matrices were quite unstable with respect to general perturbations (as discussed in this previous blog post), and that there were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the unit disk in which the resolvent is actually well controlled and so there is no pseudospectrum present to cause instability. This was pointed out to me by Alice Guionnet at an AIM workshop last week, after I had posed the above question during an open problems session. Furthermore, at the same workshop, Percy Deift emphasised the point that the basic determinantal identity

\displaystyle  \det(1 + AB) = \det(1 + BA) \ \ \ \ \ (1)

for {n \times k} matrices {A} and {k \times n} matrices {B} was a particularly useful identity in random matrix theory, as it converted problems about large ({n \times n}) matrices into problems about small ({k \times k}) matrices, which was particularly convenient in the regime when {n \rightarrow \infty} and {k} was fixed. (Percy was speaking in the context of invariant ensembles, but the point is in fact more general than this.)

From this, it turned out to be a relatively simple manner to transform what appeared to be an intractable {n \times n} matrix problem into quite a well-behaved {k \times k} matrix problem for bounded {k}. Specifically, suppose that {C_n} had rank {k}, so that one can factor {C_n = A_n B_n} for some (deterministic) {n \times k} matrix {A_n} and {k \times n} matrix {B_n}. To find an eigenvalue {z} of {\frac{1}{\sqrt{n}} X_n + C_n}, one has to solve the characteristic polynomial equation

\displaystyle  \det( \frac{1}{\sqrt{n}} X_n + A_n B_n - z ) = 0.

This is an {n \times n} determinantal equation, which looks difficult to control analytically. But we can manipulate it using (1). If we make the assumption that {z} is outside the spectrum of {\frac{1}{\sqrt{n}} X_n} (which we can do as long as {z} is well away from the unit disk, as the unperturbed matrix {\frac{1}{\sqrt{n}} X_n} has no outliers), we can divide by {\frac{1}{\sqrt{n}} X_n - z} to arrive at

\displaystyle  \det( 1 + (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n B_n ) = 0.

Now we apply the crucial identity (1) to rearrange this as

\displaystyle  \det( 1 + B_n (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n ) = 0.

The crucial point is that this is now an equation involving only a {k \times k} determinant, rather than an {n \times n} one, and is thus much easier to solve. The situation is particularly simple for rank one perturbations

\displaystyle  \frac{1}{\sqrt{n}} X_n + u_n v_n^*

in which case the eigenvalue equation is now just a scalar equation

\displaystyle  1 + \langle (\frac{1}{\sqrt{n}} X_n-z)^{-1} u_n, v_n \rangle = 0

that involves what is basically a single coefficient of the resolvent {(\frac{1}{\sqrt{n}} X_n-z)^{-1}}. (It is also an instructive exercise to derive this eigenvalue equation directly, rather than through (1).) There is by now a very well-developed theory for how to control such coefficients (particularly for {z} in the exterior of the unit disk, in which case such basic tools as Neumann series work just fine); in particular, one has precise enough control on these coefficients to obtain the result on outliers mentioned above.

The same method can handle some other bounded rank perturbations. One basic example comes from looking at iid matrices with a non-zero mean {\mu} and variance {1}; this can be modeled by {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \phi_n^*} where {\phi_n} is the unit vector {\phi_n := \frac{1}{\sqrt{n}} (1,\ldots,1)^*}. Here, the bounded rank perturbation {\mu \sqrt{n} \phi_n \phi_n^*} has a large operator norm (equal to {|\mu| \sqrt{n}}), so the previous result does not directly apply. Nevertheless, the self-adjoint nature of the perturbation has a stabilising effect, and I was able to show that there is still only one outlier, and that it is at the expected location of {\mu \sqrt{n}+o(1)}.

If one moves away from the case of self-adjoint perturbations, though, the situation changes. Let us now consider a matrix of the form {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*}, where {\psi_n} is a randomised version of {\phi_n}, e.g. {\psi_n := \frac{1}{\sqrt{n}} (\pm 1, \ldots, \pm 1)^*}, where the {\pm 1} are iid Bernoulli signs; such models were proposed recently by Rajan and Abbott as a model for neural networks in which some nodes are excitatory (and give columns with positive mean) and some are inhibitory (leading to columns with negative mean). Despite the superficial similarity with the previous example, the outlier behaviour is now quite different. Instead of having one extremely large outlier (of size {\sim\sqrt{n}}) at an essentially deterministic location, we now have a number of eigenvalues of size {O(1)}, scattered according to a random process. Indeed, (in the case when the entries of {X_n} were real and bounded) I was able to show that the outlier point process converged (in the sense of converging {k}-point correlation functions) to the zeroes of a random Laurent series

\displaystyle  g(z) = 1 - \mu \sum_{j=0}^\infty \frac{g_j}{z^{j+1}}

where {g_0,g_1,g_2,\ldots \equiv N(0,1)} are iid real Gaussians. This is basically because the coefficients of the resolvent {(\frac{1}{\sqrt{n}} X_n - zI)^{-1}} have a Neumann series whose coefficients enjoy a central limit theorem.

On the other hand, as already observed numerically (and rigorously, in the gaussian case) by Rajan and Abbott, if one projects such matrices to have row sum zero, then the outliers all disappear. This can be explained by another appeal to (1); this projection amounts to right-multiplying {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*} by the projection matrix {P} to the zero-sum vectors. But by (1), the non-zero eigenvalues of the resulting matrix {(\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)P} are the same as those for {P (\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)}. Since {P} annihilates {\phi_n}, we thus see that in this case the bounded rank perturbation plays no role, and the question reduces to obtaining a circular law with no outliers for {P \frac{1}{\sqrt{n}} X_n}. As it turns out, this can be done by invoking the machinery of Van Vu and myself that we used to prove the circular law for various random matrix models.

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

\displaystyle  \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)

whenever {A, B} are {n \times k} and {k \times n} matrices respectively (or more generally, {A} and {B} could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an {n \times n} determinant, while the right-hand side is a {k \times k} determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of {n \times n} determinants as {n \rightarrow \infty} can be converted via this formula to determinants of a fixed size (independent of {n}), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when {A, B} are invertible square matrices of the same size, that {1+BA} and {1+AB} are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that {AB} and {BA} have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

\displaystyle  \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)

which essentially relates the characteristic polynomial of {AB} with that of {BA}. When {n=k}, a comparison of coefficients this already gives important basic identities such as {\hbox{tr}(AB) = \hbox{tr}(BA)} and {\hbox{det}(AB) = \hbox{det}(BA)}; when {n} is not equal to {k}, an inspection of the {z^{n-k}} coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues {\lambda_1 \leq \ldots \leq \lambda_n} of the {n \times n} Gaussian Unitary Ensemble (GUE), where {n} is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

\displaystyle  \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda

where {d\lambda = d\lambda_1 \ldots d\lambda_n} is Lebesgue measure on the Weyl chamber {\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}, {Z_n} is a constant, and the Hamiltonian {H} is given by the formula

\displaystyle  H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.

At the macroscopic scale of {\sqrt{n}}, the eigenvalues {\lambda_j} are distributed according to the Wigner semicircle law

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

Indeed, if one defines the classical location {\gamma_i^{cl}} of the {i^{th}} eigenvalue to be the unique solution in {[-2\sqrt{n}, 2\sqrt{n}]} to the equation

\displaystyle  \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}

then it is known that the random variable {\lambda_i} is quite close to {\gamma_i^{cl}}. Indeed, a result of Gustavsson shows that, in the bulk region when {\epsilon n < i < (1-\epsilon) n} for some fixed {\epsilon > 0}, {\lambda_i} is distributed asymptotically as a gaussian random variable with mean {\gamma_i^{cl}} and variance {\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}. Note that from the semicircular law, the factor {\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}} is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to {1/\sqrt{n}} in the bulk, but can be as large as {n^{-1/6}} at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale {1/\sqrt{n}} and the macroscopic scale {\sqrt{n}}, for instance in correlating the eigenvalues {\lambda_i} and {\lambda_j} in the regime {|i-j| \sim n^\theta} for some {0 < \theta < 1}. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both {\lambda_i} and {\lambda_j} behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to {1-\theta} (in the bulk, at least); thus, for instance, adjacent eigenvalues {\lambda_{i+1}} and {\lambda_i} are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as {\lambda_{n/4}} and {\lambda_{3n/4}}, have a correlation comparable to {1/\log n}. One way to get a sense of this is to look at the trace

\displaystyle  \lambda_1 + \ldots + \lambda_n.

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of {n}. In contrast, each of the {\lambda_i} (in the bulk, at least) has a variance comparable to {\log n/n}. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of {1/\log n}.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian {H(\lambda)} around the minimum {\gamma}, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

Read the rest of this entry »

Let {n} be a large integer, and let {M_n} be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution

\displaystyle  C_n e^{-\hbox{tr}(M_n^2)/2} dM_n

where {dM_n} is a Haar measure on Hermitian matrices and {C_n} is the normalisation constant required to make the distribution of unit mass. The eigenvalues {\lambda_1 < \ldots < \lambda_n} of this matrix are then a coupled family of {n} real random variables. For any {1 \leq k \leq n}, we can define the {k}-point correlation function {\rho_k( x_1,\ldots,x_k )} to be the unique symmetric measure on {{\bf R}^k} such that

\displaystyle  \int_{{\bf R}^k} F(x_1,\ldots,x_k) \rho_k(x_1,\ldots,x_k) = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F( \lambda_{i_1}, \ldots, \lambda_{i_k} ).

A standard computation (given for instance in these lecture notes of mine) gives the Ginebre formula

\displaystyle  \rho_n(x_1,\ldots,x_n) = C'_n (\prod_{1 \leq i < j \leq n} |x_i-x_j|^2) e^{-\sum_{j=1}^n |x_j|^2/2}.

for the {n}-point correlation function, where {C'_n} is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as

\displaystyle  \rho_n(x_1,\ldots,x_n) = C''_n \det(K_n(x_i,x_j))_{1 \leq i, j \leq n}

where the kernel {K_n} is given by

\displaystyle  K_n(x,y) := \sum_{k=0}^{n-1} \phi_k(x) \phi_k(y)

where {\phi_k(x) := P_k(x) e^{-x^2/4}} and {P_0, P_1, \ldots} are the ({L^2}-normalised) Hermite polynomials (thus the {\phi_k} are an orthonormal family, with each {P_k} being a polynomial of degree {k}). Integrating out one or more of the variables, one is led to the Gaudin-Mehta formula

\displaystyle  \rho_k(x_1,\ldots,x_k) = \det(K_n(x_i,x_j))_{1 \leq i, j \leq k}. \ \ \ \ \ (1)

(In particular, the normalisation constant {C''_n} in the previous formula turns out to simply be equal to {1}.) Again, see these lecture notes for details.

The functions {\phi_k(x)} can be viewed as an orthonormal basis of eigenfunctions for the harmonic oscillator operator

\displaystyle  L \phi := (-\frac{d^2}{dx^2} + \frac{x^2}{4})\phi;

indeed it is a classical fact that

\displaystyle  L \phi_k = (k + \frac{1}{2}) \phi_k.

As such, the kernel {K_n} can be viewed as the integral kernel of the spectral projection operator {1_{(-\infty,n+\frac{1}{2}]}(L)}.

From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of {K_n} as {n \rightarrow \infty}. The two main asymptotics of interest are given by the following lemmas:

Lemma 1 (Asymptotics of {K_n} in the bulk) Let {x_0 \in (-2,2)}, and let {\rho_{sc}(x_0) := \frac{1}{2\pi} (4-x_0^2)^{1/2}_+} be the semicircular law density at {x_0}. Then, we have

\displaystyle  K_n( x_0 \sqrt{n} + \frac{y}{\sqrt{n} \rho_{sc}(x_0)}, x_0 \sqrt{n} + \frac{z}{\sqrt{n} \rho_{sc}(x_0)} )

\displaystyle  \rightarrow \frac{\sin(\pi(y-z))}{\pi(y-z)} \ \ \ \ \ (2)

as {n \rightarrow \infty} for any fixed {y,z \in {\bf R}} (removing the singularity at {y=z} in the usual manner).

Lemma 2 (Asymptotics of {K_n} at the edge) We have

\displaystyle  K_n( 2\sqrt{n} + \frac{y}{n^{1/6}}, 2\sqrt{n} + \frac{z}{n^{1/6}} )

\displaystyle  \rightarrow \frac{Ai(y) Ai'(z) - Ai'(y) Ai(z)}{y-z} \ \ \ \ \ (3)

as {n \rightarrow \infty} for any fixed {y,z \in {\bf R}}, where {Ai} is the Airy function

\displaystyle  Ai(x) := \frac{1}{\pi} \int_0^\infty \cos( \frac{t^3}{3} + tx )\ dt

and again removing the singularity at {y=z} in the usual manner.

The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).

Read the rest of this entry »

The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM).  I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”).  While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult.  At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.

I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure.   Below the fold is a first draft of the article; I would definitely welcome feedback and corrections.  It does not yet have any pictures, but I plan to rectify that in the final draft.  It also does not have a title, but this will be easy to address later.   But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience.  Any suggestions in this regard would be particularly appreciated.

I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

\displaystyle  \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)

of a Wigner random matrix {M_n}. More specifically, we consider {n \times n} Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal {\pm 1} on the diagonal, and {\pm \frac{1}{2} \pm \frac{i}{2}} off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

\displaystyle  \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx

(in the vague topology) where {\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}} is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region {\epsilon n \leq i \leq (1-\epsilon) n}, one almost surely has

\displaystyle  \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)

uniformly for in {i}, where the classical location {\gamma_i \in [-2,2]} of the (normalised) {i^{th}} eigenvalue {\frac{1}{\sqrt{n}} \lambda_i} is defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.

The bound (1) also holds in the edge case (by using the operator norm bound {\| M_n\|_{op} = (2+o(1)) \sqrt{n}}, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of {\sqrt{n}}. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of {\sqrt{n}/n = n^{-1/2}}. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing {\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))} in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale {n^{-1/2}} (and even at the slightly finer scale of {n^{-1/2-c}} for some absolute constant {c>0}) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of {n^{-1/2+\epsilon}} depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing {\lambda_{i+1}(M_n) - \lambda_i(M_n)} at the critical scale {n^{-1/2}}, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale {n^{-1/2}}, the distribution of eigenvalues {\lambda_i(M_n)} is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

\displaystyle  \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.

A standard moment method computation shows that the right hand side is equal to

\displaystyle  2n^3 + 2a n^2 + \ldots

where {a} is the fourth moment of the real part of the off-diagonal coefficients of {M_n}. In particular, a change in the fourth moment {a} by {O(1)} leads to a change in the expression {\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)} by {O(n^2)}. Thus, for a typical {i}, one expects {{\bf E}(\lambda_i(M_n)^4)} to shift by {O(n)}; since {\lambda_i(M_n) = O(\sqrt{n})} on the average, we thus expect {\lambda_i(M_n)} itself to shift by about {O(n^{-1/2})} by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for {\lambda_i(M_n)} that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that {\lambda_i(M_n)} was sharply concentrated around an interval of size {O(n^\epsilon)} around {\sqrt{n} \gamma_i} for any {\epsilon > 0} (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that {\lambda_i(M_n)} was also weakly concentrated around an interval of size {O(n^{-1/2+\epsilon})} around {\sqrt{n} \gamma_i}, in the sense that the probability that one was outside this interval was {O(n^{-c})} for some absolute constant {c>0}. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of {\lambda_i(M_n)} as {i} varied from {1} to {n} was of the size of {O(n^{-c})} for some absolute {c>0}.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual {\lambda_i(M_n)} is {O(n^{-c})} (without averaging in {i}). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean {{\bf E} \lambda_i(M_n)} is indeed sensitive to the fourth moment of the entries at the critical scale {n^{-1/2}}.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue {\lambda_i(M_n)} react to the available technology. To control the global behaviour of the eigenvalues (after averaging in {i}), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of {\lambda_i(M_n)} to an interval of size {O(n^{-1/2+\epsilon})}, but can only localise the mean to a much larger interval of size {O(n^{-c})}. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as {O(n^{-c})} for which all eigenvalues could deviate as far as {O(n^\epsilon)} from their expected location, instead of their typical deviation of {O(n^{-1/2})}. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than {n^{-c}} times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape {O(n^{-c})}. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,875 other followers