You are currently browsing the tag archive for the ‘Van Vu’ tag.

Van Vu and I have just uploaded to the arXiv our joint paper “Local universality of zeroes of random polynomials“. This paper is a sequel to our previous work on local universality of eigenvalues of (non-Hermitian) random matrices {M_n} with independent entries. One can re-interpret these previous results as a universality result for a certain type of random polynomial {f: {\bf C} \rightarrow {\bf C}}, namely the characteristic polynomial {f(z) = \hbox{det}(M_n-z)} of the random matrix {M_n}. In this paper, we consider the analogous problem for a different model of random polynomial, namely polynomials {f} with independent random coefficients. More precisely, we consider random polynomials {f = f_n} of the form

\displaystyle  f(z) = \sum_{i=0}^n c_i \xi_i z^n

where {c_0,\ldots,c_n \in {\bf C}} are deterministic complex coefficients, and {\xi_0,\ldots,\xi_n \equiv \xi} are independent identically distributed copies of a complex random variable {\xi}, which we normalise to have mean zero and variance one. For simplicity we will ignore the technical issue that the leading coefficient {c_n \xi_n} is allowed to vanish; then {f} has {n} zeroes {\zeta_1,\ldots,\zeta_n \in {\bf C}} (counting multiplicity), which can be viewed as a random point process {\Sigma = \{\zeta_1,\ldots,\zeta_n\}} in the complex plane. In analogy with other models (such as random matrix models), we expect the (suitably normalised) asymptotic statistics of this point process in the limit {n \rightarrow \infty} to be universal, in the sense that it is largely independent of the precise distribution of the atom variable {\xi}.

Our results are fairly general with regard to the choice of coefficients {c_i}, but we isolate three particular choice of coefficients that are particularly natural and well-studied in the literature:

  • Flat polynomials (or Weyl polynomials) in which {c_i := \frac{1}{\sqrt{i!}}}.
  • Elliptic polynomials (or binomial polynomials) in which {c_i := \sqrt{\binom{n}{i}}}.
  • Kac polynomials in which {c_i := 1}.

The flat and elliptic polynomials enjoy special symmetries in the model case when the atom distribution {\xi} is a complex Gaussian {N(0,1)_{\bf C}}. Indeed, the zeroes {\Sigma} of elliptic polynomials with complex Gaussian coefficients have a distribution which is invariant with respect to isometries {T: {\bf C} \cup \{\infty\} \rightarrow {\bf C} \cup \{\infty\}} of the Riemann sphere {{\bf C} \cup \{\infty\}} (thus {T\Sigma} has the same distribution as {\Sigma}), while the zeroes of the limiting case {\sum_{i=0}^\infty \frac{1}{\sqrt{i!}} \xi_i z^i} of the flat polynomials with complex Gaussian coefficients are similarly invariant with respect to isometries {T: {\bf C} \rightarrow {\bf C}} of the complex plane {{\bf C}}. (For a nice geometric proof of this facts, I recommend the nice book of Hough, Krishnapur, Peres, and Virag.)

The global (i.e. coarse-scale) distribution of zeroes of these polynomials is well understood, first in the case of gaussian distributions using the fundamental tool of the Kac-Rice formula, and then for more general choices of atom distribution in the recent work of Kabluchko and Zaporozhets. The zeroes of the flat polynomials are asymptotically distributed according to the circular law, normalised to be uniformly distributed on the disk {B(0,\sqrt{n})} of radius {\sqrt{n}} centred at the origin. To put it a bit informally, the zeroes are asymptotically distributed according to the measure {\frac{1}{\pi} 1_{|z| \leq \sqrt{n}} dz}, where {dz} denotes Lebesgue measure on the complex plane. One can non-rigorously see the scale {\sqrt{n}} appear by observing that when {|z|} is much larger than {\sqrt{n}}, we expect the leading term {\frac{1}{\sqrt{n!}} \xi_n z^n} of the flat polynomial {\sum_{i=0}^n \frac{1}{\sqrt{i!}} \xi_i z^i} to dominate, so that the polynomial should not have any zeroes in this region.

Similarly, the distribution of the elliptic polynomials is known to be asymptotically distributed according to a Cauchy-type distribution {\frac{1}{\pi} \frac{1}{1+|z|^2/n} dz}. The Kac polynomials {\sum_{i=0}^n \xi_i z^i} behave differently; the zeroes concentrate uniformly on the unit circle {|z|=1} (which is reasonable, given that one would expect the top order term {\xi_i z^i} to dominate for {|z| > 1} and the bottom order term {\xi_0} to dominate for {|z| < 1}). In particular, whereas the typical spacing between zeroes in the flat and elliptic cases would be expected to be comparable to {1}, the typical spacing between zeroes for a Kac polynomial would be expected instead to be comparable to {1/n}.

In our paper we studied the local distribution of zeroes at the scale of the typical spacing. In the case of polynomials with continuous complex atom disribution {\xi}, the natural statistic to measure here is the {k}-point correlation function {\rho^{(k)}(z_1,\ldots,z_k)}, which for distinct complex numbers {z_1,\ldots,z_k} can be defined as the probability that there is a zero in each of the balls {B(z_1,\varepsilon),\ldots,B(z_k,\varepsilon)} for some infinitesimal {\epsilon >0}, divided by the normalising factor {(\pi \epsilon^2)^k}. (One can also define a {k}-point correlation function in the case of a discrete distribution, but it will be a measure rather than a function in that case.) Our first main theorem is a general replacement principle which asserts, very roughly speaking, that the asymptotic {k}-point correlation functions of two random polynomials {f, \tilde f} will agree if the log-magnitudes {\log |f(z)|, \log |\tilde f(z)|} have asymptotically the same distribution (actually we have to consider the joint distribution of {\log |f(z_1)|, \ldots \log |f(z_k)|} for several points {z_1,\ldots,z_k}, but let us ignore this detail for now), and if the polynomials {f, \tilde f} obeys a “non-clustering property” which asserts, roughly speaking, that not too many of the zeroes of {f} can typically concentrate in a small ball. This replacement principle was implicit in our previous paper (and can be viewed as a local-scale version of the global-scale replacement principle in this earlier paper of ours). Specialising the replacement principle to the elliptic or flat polynomials, and using the Lindeberg swapping argument, we obtain a Two Moment Theorem that asserts, roughly speaking, that the asymptotic behaviour of the {k}-point correlation functions depends only on the first two moments of the real and imaginary components of {\xi}, as long as one avoids some regions of space where universality is expected to break down. (In particular, because {f(0) = c_0 \xi_0} does not have a universal distribution, one does not expect universality to hold near the origin; there is a similar problem near infinity.) Closely related results, by a slightly different method, have also been obtained recently by Ledoan, Merkli, and Starr. A similar result holds for the Kac polynomials after rescaling to account for the narrower spacing between zeroes.

We also have analogous results in the case of polynomials with real coefficients (so that the coefficients {c_i} and the atom distribution {\xi} are both real). In this case one expects to see a certain number of real zeroes, together with conjugate pairs of complex zeroes. Instead of the {k}-point correlation function {\rho^{(k)}(z_1,\ldots,z_k)}, the natural object of study is now the mixed {(k,l)}-point correlation function {\rho^{(k,l)}(x_1,\ldots,x_k,z_1,\ldots,z_l)} that (roughly speaking) controls how often one can find a real zero near the real numbers {x_1,\ldots,x_k}, and a complex zero near the points {z_1,\ldots,z_l}. It turns out that one can disentangle the real and strictly complex zeroes and obtain separate universality results for both zeroes, provided that at least one of the polynomials involved obeys a “weak repulsion estimate” that shows that the real zeroes do not cluster very close to each other (and that the complex zeroes do not cluster very close to their complex conjugates). Such an estimate is needed to avoid the situation of two nearby real zeroes “colliding” to create a (barely) complex zero and its complex conjugate, or the time reversal of such a collision. Fortunately, in the case of real gaussian polynomials one can use the Kac-Rice formula to establish such a weak repulsion estimate, allowing analogues of the above universality results for complex random polynomials in the real case. Among other things, this gives universality results for the number {N_{\bf R}} of real zeroes of a random flat or elliptic polynomial; it turns out this number is typically equal to {\frac{2}{\pi} \sqrt{n} + O(n^{1/2-c})} and {\frac{n} + O(n^{1/2-c})} respectively. (For Kac polynomials, the situation is somewhat different; it was already known that {N_{\bf R} = \frac{2}{\pi} \log n + o(\log n)} thanks to a long series of papers, starting with the foundational work of Kac and culminating in the work of Ibragimov and Maslova.)

While our methods are based on our earlier work on eigenvalues of random matrices, the situation with random polynomials is actually somewhat easier to deal with. This is because the log-magnitude {\log |f(z)|} of a random polynomial with independent coefficients is significantly easier to control than the log-determinant {\log |\hbox{det}(M-z)|} of a random matrix, as the former can be controlled by the central limit theorem, while the latter requires significantly more difficult arguments (in particular, bounds on the least singular value combined with Girko’s Hermitization trick). As such, one could view the current paper as an introduction to our more complicated previous paper, and with this in mind we have written the current paper to be self-contained (though this did make the paper somewhat lengthy, checking in at 68 pages).

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).

The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble {M_n} of {n \times n} matrices, one can arbitrarily enumerate the (geometric) eigenvalues as {\lambda_1(M_n),\ldots,\lambda_n(M_n) \in {\bf C}}, and one can then define the {k}-point correlation functions {\rho^{(k)}_n: {\bf C}^k \rightarrow {\bf R}^+} to be the symmetric functions such that

\displaystyle  \int_{{\bf C}^k} F(z_1,\ldots,z_k) \rho^{(k)}_n(z_1,\ldots,z_k)\ dz_1 \ldots dz_k

\displaystyle  = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F(\lambda_1(M_n),\ldots,\lambda_k(M_n)).

In the case when {M_n} is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of {\rho^{(k)}_n} near some point {z \sqrt{n}} as {n \rightarrow \infty} and {z \in {\bf C}} is fixed are given by the determinantal rule

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow \hbox{det}( K(w_i,w_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)

for {|z| < 1} and

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow 0

for {|z| > 1}, where {K} is the reproducing kernel

\displaystyle  K(z,w) := \frac{1}{\pi} e^{-|z|^2/2 - |w|^2/2 + z \overline{w}}.

(There is also an asymptotic for the boundary case {|z|=1}, but it is more complicated to state.) In particular, we see that {\rho^{(k)}_n(z \sqrt{n}) \rightarrow \frac{1}{\pi} 1_{|z| \leq 1}} for almost every {z}, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.

Our first main result is that the asymptotic (1) for {|z|<1} also holds (in the sense of vague convergence) when {M_n} is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded {z} and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is {\sqrt{\frac{2n}{\pi}} + O(n^{1/2-c})} with probability {1-O(n^{-c})} for some absolute constant {c>0}.

There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula

\displaystyle  \log |\det(M_n-z_0)| = - \sum_{1 \leq i \leq n: \lambda_i(M_n) \in B(z_0,r)} \log \frac{r}{|\lambda_i(M_n)-z_0|}

\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |\det(M_n-z_0-re^{i\theta})|\ d\theta

that relates the local distribution of eigenvalues to the log-determinants {\log |\det(M_n-z_0)|}, and secondly the elementary identity

\displaystyle  \log |\det(M_n - z)| = \frac{1}{2} \log|\det W_{n,z}| + \frac{1}{2} n \log n

that relates the log-determinants of {M_n-z} to the log-determinants of the Hermitian matrices

\displaystyle  W_{n,z} := \frac{1}{\sqrt{n}} \begin{pmatrix} 0 & M_n -z \\ (M_n-z)^* & 0 \end{pmatrix}.

The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants {\log|\det W_{n,z}|}. This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.

The matrices {W_{n,z}} are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.

While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as the definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)

Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the heat flow method, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the swapping method, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as four moment theorems, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.

Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).

Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the {k}-point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.

In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.

We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.

Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of a random Wigner matrix {M_n} (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues {\lambda_i(M_n)} with {\delta n \leq i \leq (1-\delta) i n} for some fixed {\delta>0}, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix {M_n} to have mean zero and variance {1/n}, then in the asymptotic limit {n \rightarrow \infty}, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution {\rho_{sc}(x)\ dx}, where

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

An essentially equivalent way of saying this is that for large {n}, we expect the {i^{th}} eigenvalue {\lambda_i(M_n)} of {M_n} to stay close to the classical location {\gamma_i \in [-2,2]}, defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

\displaystyle  \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)

for all {1 \leq i \leq n}.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape {O(n^{-c})} for some absolute constant {c>0}; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that {\lambda_i(M_n)} has variance comparable to {\frac{\log n}{n^2}} in the bulk, so that the optimal error term in (1) should be about {O(\log^{1/2} n/n)}. (One may think that if one wanted bounds on (1) that were uniform in {i}, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the {\lambda_i}; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order {O(\log^{1/2} n/n)}.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

\displaystyle  N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx

with exponentially high probability for intervals {I} in the bulk that were as short as {n^{-1+\epsilon}} for some {\epsilon>0}, where {N(I)} is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues {\lambda_i} (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor {\log^{O(\log\log n)} n}. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function {F(X_1,\ldots,X_n)} of many independent random variables {X_1,\ldots,X_n} with the same function {F(Y_1,\ldots,Y_n)} of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

\displaystyle  {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))

for various smooth test functions {\phi}, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, {\phi} was a bounded test function, which allowed one to get control of the bulk of the distribution of {F(X_1,\ldots,X_n)}, and in particular in controlling probabilities such as

\displaystyle  {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )

for various thresholds {a} and {b}, but did not give good control on the tail as the error terms tended to be polynomially decaying in {n} rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

\displaystyle  {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

for various moderately large {k} (e.g. of size comparable to {\log n}), obtaining results such as

\displaystyle  {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

after performing all the relevant exchanges. As such, one can then use large deviation estimates on {F(X_1,\ldots,X_n)} to deduce large deviation estimates on {F(Y_1,\ldots,Y_n)}.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents {(M_n-z)^{-1}} (and the closely related Stieltjes transform {s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

\displaystyle  \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta

for any energy level {E \in {\bf R}}, which can be verified from elementary calculus. (In practice, we would truncate {\eta} near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions {N((-\infty,E))}, which in turn can be used to deduce concentration results for individual eigenvalues {\lambda_i(M_n)} by some basic combinatorial manipulations.

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year.  This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices.  In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries.  We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant {\det M_n} of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant {\det A_n} of a random iid matrix {A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf R}}), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf C}}, thus the real and imaginary parts are independent with law {N(0,1/2)_{\bf R}}), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution {\zeta_{ij} \equiv \pm 1} (with a {1/2} chance of either sign). More generally, one can consider a matrix {A_n} in which all the entries {\zeta_{ij}} are independently and identically distributed with mean zero and variance {1}.

We can expand {\det A_n} using the Leibniz expansion

\displaystyle  \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)

where {\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}} ranges over the permutations of {\{1,\ldots,n\}}, and {I_\sigma} is the product

\displaystyle  I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.

From the iid nature of the {\zeta_{ij}}, we easily see that each {I_\sigma} has mean zero and variance one, and are pairwise uncorrelated as {\sigma} varies. We conclude that {\det A_n} has mean zero and variance {n!} (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that {\det A_n} is typically of size {O(\sqrt{n!})}.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, {\det A_n} is clearly symmetrical, so we can focus attention on the magnitude {|\det A_n|}. We can interpret this quantity geometrically as the volume of an {n}-dimensional parallelopiped whose generating vectors {X_1,\ldots,X_n} are independent real gaussian vectors in {{\bf R}^n} (i.e. their coefficients are iid with law {N(0,1)_{\bf R}}). Using the classical base-times-height formula, we thus have

\displaystyle  |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)

where {V_i} is the {i-1}-dimensional linear subspace of {{\bf R}^n} spanned by {X_1,\ldots,X_{i-1}} (note that {X_1,\ldots,X_n}, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

\displaystyle  \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group {O(n)}. Because of this, we see that if we fix {X_1,\ldots,X_{i-1}} (and thus {V_i}, the random variable {\hbox{dist}(X_i,V_i)} has the same distribution as {\hbox{dist}(X_i,{\bf R}^{i-1})}, or equivalently the {\chi} distribution

\displaystyle  \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}

where {\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}} are iid copies of {N(0,1)_{\bf R}}. As this distribution does not depend on the {X_1,\ldots,X_{i-1}}, we conclude that the law of {\log |\det A_n|} is given by the sum of {n} independent {\chi}-variables:

\displaystyle  \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.

A standard computation shows that each {\chi_j^2} has mean {j} and variance {2j}, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that {\log \chi_j} has mean {\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})} and variance {\frac{1}{2j}+O(1/j^{3/2})}. As such, {\log |\det A_n|} has mean {\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)} and variance {\frac{1}{2} \log n + O(1)}. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)

where {\rightarrow} denotes convergence in distribution. A bit more informally, we have

\displaystyle  |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)

when {A_n} is a real gaussian matrix; thus, for instance, the median value of {|\det A_n|} is {n^{-1/2+o(1)} \sqrt{n!}}. At first glance, this appears to conflict with the second moment bound {\mathop{\bf E} |\det A_n|^2 = n!} of Turán mentioned earlier, but once one recalls that {\exp(N(0,t)_{\bf R})} has a second moment of {\exp(2t)}, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real {\chi} distribution with the complex {\chi} distribution, in which the {\xi_{i,j}} are distributed according to the complex gaussian {N(0,1)_{\bf C}} instead of the real one). At the end of the day, one ends up with the law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)

or more informally

\displaystyle  |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices {A_n} by Wigner matrices {M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus {\zeta_{ij} = \overline{\zeta_{ji}}} for all {i,j}. Model examples here include the Gaussian Unitary Ensemble (GUE), in which {\zeta_{ij} \equiv N(0,1)_{\bf C}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i=j \leq n}, the Gaussian Orthogonal Ensemble (GOE), in which {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,2)_{\bf R}} for {1 \leq i=j \leq n}, and the symmetric Bernoulli ensemble, in which {\zeta_{ij} \equiv \pm 1} for {1 \leq i \leq j \leq n} (with probability {1/2} of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants {\det M_n} of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the {I_\sigma} are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of {I_\sigma} is still usually zero, but equals {(-1)^{n/2}} in the exceptional case when {\sigma} is a perfect matching (i.e. the union of exactly {n/2} {2}-cycles, a possibility that can of course only happen when {n} is even). As such, the mean {\mathop{\bf E} \det M_n} still vanishes when {n} is odd, but for even {n} it is equal to

\displaystyle  (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}

(the fraction here simply being the number of perfect matchings on {n} vertices). Using Stirling’s formula, one then computes that {|\mathop{\bf E} \det A_n|} is comparable to {n^{-1/4} \sqrt{n!}} when {n} is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that {\mathop{\bf E} |\det A_n|^2} is comparable to {n^{1/2} n!} for GUE and {n^{3/2} n!} for GOE. (The discrepancy here comes from the fact that in the GOE case, {I_\sigma} and {I_\rho} can correlate when {\rho} contains reversals of {k}-cycles of {\sigma} for {k \geq 3}, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let {M_n} be a Wigner matrix.

  • If {M_n} is drawn from GUE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.

  • If {M_n} is drawn from GOE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.

  • The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )

when {M_n} is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )

when {M_n} is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

\displaystyle  \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)

of the Stieltjes transform

\displaystyle  s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}

of {M_n}. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) \cite{trotter} Let {M'_n} be the random tridiagonal real symmetric matrix

\displaystyle  M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}

where the {a_1,\ldots,a_n, b_1,\ldots,b_{n-1}} are jointly independent real random variables, with {a_1,\ldots,a_n \equiv N(0,1)_{\bf R}} being standard real Gaussians, and each {b_i} having a {\chi}-distribution:

\displaystyle  b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}

where {z_{i,j} \equiv N(0,1)_{\bf C}} are iid complex gaussians. Let {M_n} be drawn from GUE. Then the joint eigenvalue distribution of {M_n} is identical to the joint eigenvalue distribution of {M'_n}.

Proof: Let {M_n} be drawn from GUE. We can write

\displaystyle  M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}

where {M_{n-1}} is drawn from the {n-1\times n-1} GUE, {a_n \equiv N(0,1)_{\bf R}}, and {X_n \in {\bf C}^{n-1}} is a random gaussian vector with all entries iid with distribution {N(0,1)_{\bf C}}. Furthermore, {M_{n-1}, X_n, a_n} are jointly independent.

We now apply the tridiagonal matrix algorithm. Let {b_{n-1} := |X_n|}, then {b_n} has the {\chi}-distribution indicated in the proposition. We then conjugate {M_n} by a unitary matrix {U} that preserves the final basis vector {e_n}, and maps {X} to {b_{n-1} e_{n-1}}. Then we have

\displaystyle  U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}

where {\tilde M_{n-1}} is conjugate to {M_{n-1}}. Now we make the crucial observation: because {M_{n-1}} is distributed according to GUE (which is a unitarily invariant ensemble), and {U} is a unitary matrix independent of {M_{n-1}}, {\tilde M_{n-1}} is also distributed according to GUE, and remains independent of both {b_{n-1}} and {a_n}.

We continue this process, expanding {U M_n U^*} as

\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}

Applying a further unitary conjugation that fixes {e_{n-1}, e_n} but maps {X_{n-1}} to {b_{n-2} e_{n-2}}, we may replace {X_{n-1}} by {b_{n-2} e_{n-2}} while transforming {M_{n-2}} to another GUE matrix {\tilde M_{n-2}} independent of {a_n, b_{n-1}, a_{n-1}, b_{n-2}}. Iterating this process, we eventually obtain a coupling of {M_n} to {M'_n} by unitary conjugations, and the claim follows. \Box

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant {D_n} of the above matrix is given by solving the recursion

\displaystyle  D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}

with {D_0=1} and {D_{-1} = 0}. Thus, instead of the product of a sequence of independent scalar {\chi} distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent {2\times 2} matrices whose entries are given by gaussians and {\chi} distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these {2 \times 2} matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of eigenvectors“, submitted to Random Matrices: Theory and Applications. This paper concerns an extension of our four moment theorem for eigenvalues. Roughly speaking, that four moment theorem asserts (under mild decay conditions on the coefficients of the random matrix) that the fine-scale structure of individual eigenvalues of a Wigner random matrix depend only on the first four moments of each of the entries.

In this paper, we extend this result from eigenvalues to eigenvectors, and specifically to the coefficients of, say, the {i^{th}} eigenvector {u_i(M_n)} of a Wigner random matrix {M_n}. Roughly speaking, the main result is that the distribution of these coefficients also only depends on the first four moments of each of the entries. In particular, as the distribution of coefficients eigenvectors of invariant ensembles such as GOE or GUE are known to be asymptotically gaussian real (in the GOE case) or gaussian complex (in the GUE case), the same asymptotic automatically holds for Wigner matrices whose coefficients match GOE or GUE to fourth order.

(A technical point here: strictly speaking, the eigenvectors {u_i(M_n)} are only determined up to a phase, even when the eigenvalues are simple. So, to phrase the question properly, one has to perform some sort of normalisation, for instance by working with the coefficients of the spectral projection operators {P_i(M_n) := u_i(M_n) u_i(M_n)^*} instead of the eigenvectors, or rotating each eigenvector by a random phase, or by fixing the first component of each eigenvector to be positive real. This is a fairly minor technical issue here, though, and will not be discussed further.)

This theorem strengthens a four moment theorem for eigenvectors recently established by Knowles and Yin (by a somewhat different method), in that the hypotheses are weaker (no level repulsion assumption is required, and the matrix entries only need to obey a finite moment condition rather than an exponential decay condition), and a slightly stronger conclusion (less regularity is needed on the test function, and one can handle the joint distribution of polynomially many coefficients, rather than boundedly many coefficients). On the other hand, the Knowles-Yin paper can also handle generalised Wigner ensembles in which the variances of the entries are allowed to fluctuate somewhat.

The method used here is a variation of that in our original paper (incorporating the subsequent improvements to extend the four moment theorem from the bulk to the edge, and to replace exponential decay by a finite moment condition). That method was ultimately based on the observation that if one swapped a single entry (and its adjoint) in a Wigner random matrix, then an individual eigenvalue {\lambda_i(M_n)} would not fluctuate much as a consequence (as long as one had already truncated away the event of an unexpectedly small eigenvalue gap). The same analysis shows that the projection matrices {P_i(M_n)} obeys the same stability property.

As an application of the eigenvalue four moment theorem, we establish a four moment theorem for the coefficients of resolvent matrices {(\frac{1}{\sqrt{n}} M_n - zI)^{-1}}, even when {z} is on the real axis (though in that case we need to make a level repulsion hypothesis, which has been already verified in many important special cases and is likely to be true in general). This improves on an earlier four moment theorem for resolvents of Erdos, Yau, and Yin, which required {z} to stay some distance away from the real axis (specifically, that {\hbox{Im}(z) \geq n^{-1-c}} for some small {c>0}).

Van Vu and I have just uploaded to the arXiv our paper “The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices“, submitted to the Proceedings of the National Academy of Sciences. This short note concerns the convergence of the {k}-point correlation functions of Wigner matrices in the bulk to the Dyson {k}-point functions, a statement conjectured by Wigner, Dyson, and Mehta. Thanks to the results of Erdös, Peche, Ramirez, Schlein, Vu, Yau, and myself, this conjecture has now been established for all Wigner matrices (assuming a finite moment condition on the entries), but only if one uses a quite weak notion of convergence, namely averaged vague convergence in which one averages in the energy parameter {u}. The main purpose of this note is to observe that by combining together existing results in the literature, one can improve the convergence to vague convergence (which is the natural notion of convergence in the discrete setting); and furthermore, if one assumes some regularity and decay conditions on the coefficient distribution, one can improve the convergence further to local {L^1} convergence.

More precisely, let {M_n} be an {n \times n} Wigner matrix – a random Hermitian matrix whose off-diagonal elements {\frac{1}{\sqrt{n}} \zeta_{ij}} for {1 \leq i < j \leq n} are iid with mean zero and variance {1/n} (and whose diagonal elements also obey similar hypotheses, which we omit here). For simplicity, we also assume that the real and imaginary parts of {\zeta_{ij}} are also iid (as is the case for instance for the Gaussian Unitary Ensemble (GUE)). The eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of such a matrix are known to be asymptotically distributed accordingly to the Wigner semicircular distribution {\rho_{sc}(u)\ du}, where

\displaystyle  \rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)_+^{1/2}.

In particular, this suggests that at any energy level {u} in the bulk {(-2,2)} of the spectrum, the average eigenvalue spacing should be about {\frac{1}{n \rho_{sc}(u)}}. It is then natural to introduce the normalised {k}-point correlation function

\displaystyle  \rho_{n,u}^{(k)}(t_1,\ldots,t_k) := \lim_{\epsilon \rightarrow 0} \frac{1}{\epsilon^k} {\bf P} E_\epsilon

for any distinct reals {t_1,\ldots,t_k} and {k \geq 1}, where {E_\epsilon} is the event that there is an eigenvalue in each of the intervals {[u + \frac{t_i}{n \rho_{sc}(u)}, u + \frac{t_i+\epsilon}{n \rho_{sc}(u)}]} for each {1 \leq i \leq k}. (This definition is valid when the Wigner ensemble is continuous; for discrete ensembles, one can define {\rho_{n,u}^{(k)}} instead in a distributional sense.)

The Wigner-Dyson-Mehta conjecture asserts that {\rho_{n,u}^{(k)}} converges (in various senses) as {n \rightarrow \infty} to the Dyson {k}-point function

\displaystyle  \rho_{Dyson}^{(k)}(t_1,\ldots,t_k) := \hbox{det}( K( t_i,t_j) )_{1 \leq i,j \leq k}

where {K(t,t'):=\frac{\sin \pi(t-t')}{\pi(t-t')}} is the Dyson sine kernel. This conjecture was verified first for the GUE (with a quite strong notion of convergence, namely local uniform convergence) by Dyson, using an explicit formula for {\rho_{n,u}^{(k)}} in the GUE case due to Gaudin and Mehta. Later results of Johansson, Erdos-Ramirez-Schlein-Yau, Erdos-Peche-Ramirez-Schlein-Yau, and Vu and myself, extended these results to increasingly wider ranges of Wigner matrices, but in the context of either weak convergence (which means that

\displaystyle  \int_{{\bf R}^k} \rho_{n,u}^{(k)}(t) F(t)\ dt \rightarrow \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt \ \ \ \ \ (1)

for any {L^\infty}, compactly supported function {F}), or the slightly weaker notion of vague convergence (which is the same as weak convergence, except that the function {F} is also required to be continuous).

In a joint paper of Erdos, Ramirez, Schlein, Vu, Yau, and myself, we established the Wigner-Dyson-Mehta conjecture for all Wigner matrices (assuming only an exponential decay condition on the entries), but using a quite weak notion of convergence, namely averaged vague convergence, which allows for averaging in the energy parameter. Specifically, we showed that

\displaystyle  \lim_{b \rightarrow 0} \lim_{n \rightarrow \infty} \frac{1}{2b} \int_{u-b}^{u+b} \int_{{\bf R}^k} \rho_{n,u'}^{(k)}(t) F(t)\ dt = \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt.

Subsequently, Erdos, Schlein, and Yau introduced the powerful local relaxation flow method, which achieved a simpler proof of the same result which also generalised to other ensembles beyond the Wigner case. However, for technical reasons, this method was restricted to establishing averaged vague convergence only.

In the current paper, we show that by combining the argument of Erdos, Ramirez, Schlein, Vu, Yau, and myself with some more recent technical results, namely the relaxation of the exponential decay condition in the four moment theorem to a finite moment condition (established by Vu and myself) and a strong eigenvalue localisation bound of Erdos, Yau, and Yin, one can upgrade the averaged vague convergence to vague convergence, and handle all Wigner matrices that assume a finite moment condition. Vague convergence is the most natural notion of convergence for discrete random matrix ensembles; for such ensembles, the correlation function is a discrete measure, and so one does not expect convergence to a continuous limit in any stronger sense than the vague sense. Also, by carefully inspecting the earlier argument of Erdos, Peche, Ramirez, Schlein, and Yau, we were able to establish convergence in the stronger local {L^1} sense once one assumed some regularity and positivity condition on the underlying coefficient distribution. These are somewhat modest and technical improvements over previous work on the Wigner-Dyson-Mehta conjecture, but they help to clarify and organise the profusion of results in this area, which are now reaching a fairly definitive form.

It may well be possible to go beyond local {L^1} convergence in the case of smooth ensembles, for instance establishing local uniform convergence; this was recently accomplished in the {k=1} case by Maltsev and Schlein. Indeed one may optimistically expect to even have convergence in the local smooth topology, which would basically be the strongest convergence one could hope for.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

\displaystyle  \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)

of a Wigner random matrix {M_n}. More specifically, we consider {n \times n} Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal {\pm 1} on the diagonal, and {\pm \frac{1}{2} \pm \frac{i}{2}} off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

\displaystyle  \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx

(in the vague topology) where {\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}} is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region {\epsilon n \leq i \leq (1-\epsilon) n}, one almost surely has

\displaystyle  \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)

uniformly for in {i}, where the classical location {\gamma_i \in [-2,2]} of the (normalised) {i^{th}} eigenvalue {\frac{1}{\sqrt{n}} \lambda_i} is defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.

The bound (1) also holds in the edge case (by using the operator norm bound {\| M_n\|_{op} = (2+o(1)) \sqrt{n}}, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of {\sqrt{n}}. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of {\sqrt{n}/n = n^{-1/2}}. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing {\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))} in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale {n^{-1/2}} (and even at the slightly finer scale of {n^{-1/2-c}} for some absolute constant {c>0}) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of {n^{-1/2+\epsilon}} depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing {\lambda_{i+1}(M_n) - \lambda_i(M_n)} at the critical scale {n^{-1/2}}, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale {n^{-1/2}}, the distribution of eigenvalues {\lambda_i(M_n)} is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

\displaystyle  \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.

A standard moment method computation shows that the right hand side is equal to

\displaystyle  2n^3 + 2a n^2 + \ldots

where {a} is the fourth moment of the real part of the off-diagonal coefficients of {M_n}. In particular, a change in the fourth moment {a} by {O(1)} leads to a change in the expression {\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)} by {O(n^2)}. Thus, for a typical {i}, one expects {{\bf E}(\lambda_i(M_n)^4)} to shift by {O(n)}; since {\lambda_i(M_n) = O(\sqrt{n})} on the average, we thus expect {\lambda_i(M_n)} itself to shift by about {O(n^{-1/2})} by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for {\lambda_i(M_n)} that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that {\lambda_i(M_n)} was sharply concentrated around an interval of size {O(n^\epsilon)} around {\sqrt{n} \gamma_i} for any {\epsilon > 0} (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that {\lambda_i(M_n)} was also weakly concentrated around an interval of size {O(n^{-1/2+\epsilon})} around {\sqrt{n} \gamma_i}, in the sense that the probability that one was outside this interval was {O(n^{-c})} for some absolute constant {c>0}. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of {\lambda_i(M_n)} as {i} varied from {1} to {n} was of the size of {O(n^{-c})} for some absolute {c>0}.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual {\lambda_i(M_n)} is {O(n^{-c})} (without averaging in {i}). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean {{\bf E} \lambda_i(M_n)} is indeed sensitive to the fourth moment of the entries at the critical scale {n^{-1/2}}.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue {\lambda_i(M_n)} react to the available technology. To control the global behaviour of the eigenvalues (after averaging in {i}), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of {\lambda_i(M_n)} to an interval of size {O(n^{-1/2+\epsilon})}, but can only localise the mean to a much larger interval of size {O(n^{-c})}. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as {O(n^{-c})} for which all eigenvalues could deviate as far as {O(n^\epsilon)} from their expected location, instead of their typical deviation of {O(n^{-1/2})}. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than {n^{-c}} times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape {O(n^{-c})}. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Van Vu and I have just uploaded to the arXiv our joint paper “The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi“. In this short paper we give a different proof of a high-dimensional Littlewood-Offord result of Frankl and Füredi, and in the process also affirmatively answer one of their open problems.

Let {v_1,\ldots,v_n} be {n} vectors in {{\mathbb R}^d}, which we normalise to all have length at least {1}. For any given radius {\Delta > 0}, we consider the small ball probability

\displaystyle  p(v_1,\ldots,v_n,\Delta) := \sup_B {\bf P}( \eta_1 v_1 + \ldots + \eta_n v_n \in B )

where {\eta_1,\ldots,\eta_n} are iid Bernoulli signs (i.e. they take values {+1} or {-1} independently with a probability of {1/2} of each), and {B} ranges over all (closed) balls of radius {\Delta}. The Littlewood-Offord problem is to compute the quantity

\displaystyle  p_d(n,\Delta) := \sup_{v_1,\ldots,v_n} p(v_1,\ldots,v_n,\Delta)

where {v_1,\ldots,v_n} range over all vectors in {{\mathbb R}^d} of length at least one. Informally, this number measures the extent to which a random walk of length {n} (with all steps of size at least one) can concentrate into a ball of radius {\Delta}.

The one-dimensional case of this problem was answered by Erdös. First, one observes that one can normalise all the {v_i} to be at least {+1} (as opposed to being at most {-1}). In the model case when {\Delta < 1}, he made the following simple observation: if a random sum {\eta_1 v_1 + \ldots + \eta_n v_n} fell into a ball of radius {\Delta} (which in the one-dimensional case, is an interval of length less than {2}), and one then changed one or more of the signs {\eta_i} from {-1} to {+1}, then the new sum must necessarily lie outside of the ball. In other words, for any ball {B} of radius {\Delta}, the set of signs {(\eta_1,\ldots,\eta_n) \in \{-1,+1\}^n} for which {\eta_1 v_1 + \ldots + \eta_n v_n \in B} forms an antichain. Applying Sperner’s theorem, the maximal size of this antichain is {\binom{n}{\lfloor n/2\rfloor}}, and this soon leads to the exact value

\displaystyle  p_1(n,\Delta) = \binom{n}{\lfloor n/2\rfloor}/2^n = \frac{\sqrt{\frac{2}{\pi}}+o(1)}{\sqrt{n}}

when {0 \leq \Delta < 1} (the bound is attained in the extreme case {v_1=\ldots=v_n=1}).

A similar argument works for higher values of {\Delta}, using Dilworth’s theorem instead of Sperner’s theorem, and gives the exact value

\displaystyle  p_1(n,\Delta) = \sum_{j=1}^s \binom{n}{m_j}/2^n = \frac{s\sqrt{\frac{2}{\pi}}+o(1)}{\sqrt{n}}

whenever {n \geq s} and {s-1 \leq \Delta < s} for some natural number {s}, where {\binom{n}{m_1},\ldots,\binom{n}{m_s}} are the {s} largest binomial coefficients of {\binom{n}{1}, \ldots, \binom{n}{n}}.

Now consider the higher-dimensional problem. One has the obvious bound

\displaystyle  p_d(n,\Delta) \geq p_1(n,\Delta),

but it is not obvious whether this inequality is strict. In other words, is there some way to exploit the additional freedom given by higher dimensions to make random walks concentrate more than in the one-dimensional case?

For some values of {\Delta}, it turns out that the answer is no, as was first observed by Kleitman (and discussed further by Frankl and Füredi). Suppose for instance that

\displaystyle  \sqrt{(s-1)^2+1} \leq \Delta < s

for some {s \geq 2}. Then one can consider the example in which {v_1=\ldots=v_{n-1}=e_1} is one unit vector, and {v_n=e_2} is another unit vector orthogonal to {e_1}. The small ball probability in this case can be computed to equal {p_1(n-1,s-1)} rather than {p_1(n,s-1)}, which is slightly larger.

In the positive direction, Frankl and Füredi established the asymptotic

\displaystyle  p_d(n,\Delta) = (1 + o(1)) p_1(n,\Delta) \ \ \ \ \ (1)

as {n \rightarrow \infty} (holding {d} and {\Delta} fixed). Furthermore, if {\Delta} was close to an integer, and more precisely if

\displaystyle  s-1 \leq \Delta < s-1 + \frac{1}{10s^2}

(so that the above counterexample can be avoided) they showed that {p_d(n,\Delta) = p_1(n,\Delta)} for sufficiently large {n} (depending on {s,\Delta}).

The factor {\frac{1}{10s^2}} was an artefact of their method, and they conjectured in fact that one should have {p_d(n,\Delta) = p_1(n,\Delta)} for sufficiently large {n} whenever

\displaystyle  s-1 \leq \Delta < \sqrt{(s-1)^2+1} \ \ \ \ \ (2)

thus matching the counterexample exactly. This conjecture was verified for {s = 1} by Kleitman and for {s=2,3} by Frankl and Füredi.

In this paper we verify the conjecture of Frankl and Füredi (and give a new proof of their asymptotic (1)). Our main tool is the following high-dimensional Littlewood-Offord inequality:

Theorem 1 Suppose that {v_1,\ldots,v_n \in {\mathbb R}^d} which is genuinely {d}-dimensional in the sense that for any hyperplane {H} going through the origin, one has {\hbox{dist}(v_i,H) \geq 1} for at least {k} values of {i}. Then one has

\displaystyle  p(v_1,\ldots,v_n,\Delta) \ll_{d,\Delta} k^{-d/2}.

Theorem 1 can be viewed as a high-dimensional variant of Erdös’s inequality (but without the sharp upper bound). It is proven by the Fourier-analytic method of Halász. (This theorem was announced in my book with Van Vu several years ago, but we did not get around to publishing it until now.)

Using Theorem 1, one can verify the conjecture of Frankl and Füredi fairly quickly (the deduction takes a little over a page). The main point is that if there is excessive concentration, then Theorem 1 quickly places almost all of the vectors {v_1,\ldots,v_n} to lie very close to a line. If all the vectors are close to a line, then we can project onto this line and rescale, which causes {\Delta} to worsen a little bit in this reduction to the one-dimensional case, but it turns out that the bounds (2) allow us to tolerate this degradation of {\Delta} once {s>3} (so it is fortunate that the cases {s \leq 3} were already done for us!). If instead we have a vector far from the line (as is the case in the key counterexample), then we manually eliminate that vector using the parallelogram law, which effectively drops {\Delta} below {s-1} (half of the time, at least) if {\Delta} was initially less than {\sqrt{(s-1)^2+1}}, which gives enough of a saving to conclude the argument.

One moral that one can draw from this argument is that one can use a quasi-sharp estimate (such as Theorem 1), which ostensibly loses constant factors, to then deduce a sharp estimate (such as the Frankl-Furëdi conjecture) that loses no constant factors, as long as one is in an asymptotic regime (in this case, {s \geq 3} and {n} large depending on {d,\Delta}). The key is to exploit the fine structure in the main term (in this case, the piecewise constant nature of {p_1(n,\Delta)} when {\Delta} passes over integers) to extract gains that can absorb the losses coming from the quasi-sharp estimate).

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,712 other followers