You are currently browsing the tag archive for the ‘circular law’ tag.
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).
The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble of matrices, one can arbitrarily enumerate the (geometric) eigenvalues as , and one can then define the -point correlation functions to be the symmetric functions such that
In the case when is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of near some point as and is fixed are given by the determinantal rule
for , where is the reproducing kernel
(There is also an asymptotic for the boundary case , but it is more complicated to state.) In particular, we see that for almost every , which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.
Our first main result is that the asymptotic (1) for also holds (in the sense of vague convergence) when is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is with probability for some absolute constant .
There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula
that relates the local distribution of eigenvalues to the log-determinants , and secondly the elementary identity
that relates the log-determinants of to the log-determinants of the Hermitian matrices
The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants . This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.
The matrices are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.
While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)
I’ve just uploaded to the arXiv my paper “Outliers in the spectrum of iid matrices with bounded rank perturbations“, submitted to Probability Theory and Related Fields. This paper is concerned with outliers to the circular law for iid random matrices. Recall that if is an matrix whose entries are iid complex random variables with mean zero and variance one, then the complex eigenvalues of the normalised matrix will almost surely be distributed according to the circular law distribution in the limit . (See these lecture notes for further discussion of this law.)
The circular law is also stable under bounded rank perturbations: if is a deterministic rank matrix of polynomial size (i.e. of operator norm ), then the circular law also holds for (this is proven in a paper of myself, Van Vu, and Manjunath Krisnhapur). In particular, the bulk of the eigenvalues (i.e. of the eigenvalues) will lie inside the unit disk .
However, this leaves open the possibility for one or more outlier eigenvalues that lie significantly outside the unit disk; the arguments in the paper cited above give some upper bound on the number of such eigenvalues (of the form for some absolute constant ) but does not exclude them entirely. And indeed, numerical data shows that such outliers can exist for certain bounded rank perturbations.
In this paper, some results are given as to when outliers exist, and how they are distributed. The easiest case is of course when there is no bounded rank perturbation: . In that case, an old result of Bai and Yin and of Geman shows that the spectral radius of is almost surely , thus all eigenvalues will be contained in a neighbourhood of the unit disk, and so there are no significant outliers. The proof is based on the moment method.
Now we consider a bounded rank perturbation which is nonzero, but which has a bounded operator norm: . In this case, it turns out that the matrix will have outliers if the deterministic component has outliers. More specifically (and under the technical hypothesis that the entries of have bounded fourth moment), if is an eigenvalue of with , then (for large enough), will almost surely have an eigenvalue at , and furthermore these will be the only outlier eigenvalues of .
Thus, for instance, adding a bounded nilpotent low rank matrix to will not create any outliers, because the nilpotent matrix only has eigenvalues at zero. On the other hand, adding a bounded Hermitian low rank matrix will create outliers as soon as this matrix has an operator norm greater than .
When I first thought about this problem (which was communicated to me by Larry Abbott), I believed that it was quite difficult, because I knew that the eigenvalues of non-Hermitian matrices were quite unstable with respect to general perturbations (as discussed in this previous blog post), and that there were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the unit disk in which the resolvent is actually well controlled and so there is no pseudospectrum present to cause instability. This was pointed out to me by Alice Guionnet at an AIM workshop last week, after I had posed the above question during an open problems session. Furthermore, at the same workshop, Percy Deift emphasised the point that the basic determinantal identity
for matrices and matrices was a particularly useful identity in random matrix theory, as it converted problems about large () matrices into problems about small () matrices, which was particularly convenient in the regime when and was fixed. (Percy was speaking in the context of invariant ensembles, but the point is in fact more general than this.)
From this, it turned out to be a relatively simple manner to transform what appeared to be an intractable matrix problem into quite a well-behaved matrix problem for bounded . Specifically, suppose that had rank , so that one can factor for some (deterministic) matrix and matrix . To find an eigenvalue of , one has to solve the characteristic polynomial equation
This is an determinantal equation, which looks difficult to control analytically. But we can manipulate it using (1). If we make the assumption that is outside the spectrum of (which we can do as long as is well away from the unit disk, as the unperturbed matrix has no outliers), we can divide by to arrive at
Now we apply the crucial identity (1) to rearrange this as
The crucial point is that this is now an equation involving only a determinant, rather than an one, and is thus much easier to solve. The situation is particularly simple for rank one perturbations
in which case the eigenvalue equation is now just a scalar equation
that involves what is basically a single coefficient of the resolvent . (It is also an instructive exercise to derive this eigenvalue equation directly, rather than through (1).) There is by now a very well-developed theory for how to control such coefficients (particularly for in the exterior of the unit disk, in which case such basic tools as Neumann series work just fine); in particular, one has precise enough control on these coefficients to obtain the result on outliers mentioned above.
The same method can handle some other bounded rank perturbations. One basic example comes from looking at iid matrices with a non-zero mean and variance ; this can be modeled by where is the unit vector . Here, the bounded rank perturbation has a large operator norm (equal to ), so the previous result does not directly apply. Nevertheless, the self-adjoint nature of the perturbation has a stabilising effect, and I was able to show that there is still only one outlier, and that it is at the expected location of .
If one moves away from the case of self-adjoint perturbations, though, the situation changes. Let us now consider a matrix of the form , where is a randomised version of , e.g. , where the are iid Bernoulli signs; such models were proposed recently by Rajan and Abbott as a model for neural networks in which some nodes are excitatory (and give columns with positive mean) and some are inhibitory (leading to columns with negative mean). Despite the superficial similarity with the previous example, the outlier behaviour is now quite different. Instead of having one extremely large outlier (of size ) at an essentially deterministic location, we now have a number of eigenvalues of size , scattered according to a random process. Indeed, (in the case when the entries of were real and bounded) I was able to show that the outlier point process converged (in the sense of converging -point correlation functions) to the zeroes of a random Laurent series
where are iid real Gaussians. This is basically because the coefficients of the resolvent have a Neumann series whose coefficients enjoy a central limit theorem.
On the other hand, as already observed numerically (and rigorously, in the gaussian case) by Rajan and Abbott, if one projects such matrices to have row sum zero, then the outliers all disappear. This can be explained by another appeal to (1); this projection amounts to right-multiplying by the projection matrix to the zero-sum vectors. But by (1), the non-zero eigenvalues of the resulting matrix are the same as those for . Since annihilates , we thus see that in this case the bounded rank perturbation plays no role, and the question reduces to obtaining a circular law with no outliers for . As it turns out, this can be done by invoking the machinery of Van Vu and myself that we used to prove the circular law for various random matrix models.
In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)
The basic result in this area is
Theorem 1 (Circular law) Let be an iid matrix, whose entries , are iid with a fixed (complex) distribution of mean zero and variance one. Then the spectral measure converges both in probability and almost surely to the circular law , where are the real and imaginary coordinates of the complex plane.
This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts , was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.
At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.
Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc.. This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures). We conclude by mentioning a few open problems in the subject.
While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly. One such observation is an identity which we call the negative second moment identity, which I would like to discuss here. Let A be an matrix; for simplicity we assume that the entries are real-valued. Denote the n rows of A by , which we view as vectors in . Let be the singular values of A. In our applications, the vectors are easily described (e.g. they might be randomly distributed on the discrete cube ), but the distribution of the singular values is much more mysterious, and understanding this distribution is a key objective in this entire theory.
Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood). One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices , where the coefficients are iid random variables with mean zero and unit variance. (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues of the matrix converges to the uniform distribution on the unit circle as . The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.) In a previous paper, we had established the same claim but under the additional assumption that the moment was finite for some ; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.
As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right. Observe that the statement of the circular law can be split into two sub-statements:
- (Universality for iid ensembles) In the asymptotic limit , the ESD of the random matrix is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance. In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix with the same mean and variance.
- (Circular law for gaussian matrices) In the asymptotic limit , the ESD of a gaussian matrix converges to the circular law.
The reason we single out the gaussian matrix ensemble is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))). Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials). In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.
These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each. Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above). One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average
of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution. (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)
Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix via estimates on the Stieltjes transform
of that matrix for complex numbers z. This method required a combination of delicate analysis (in particular, a bound on the least singular values of ), and algebra (in order to compute and then invert the Stieltjes transform). [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]
What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting. This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble. (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)
This week I am at Rutgers University, giving the Lewis Memorial Lectures for this year, which are also concurrently part of a workshop in random matrices. I gave four lectures, three of which were on random matrices, and one of which was on the Szemerédi regularity lemma.
The titles, abstracts, and slides of these talks are as follows.
- Szemerédi’s lemma revisited. In this general-audience talk, I discuss the Szemerédi regularity lemma (which, roughly speaking, shows that an arbitrary large dense graph can always be viewed as the disjoint union of a bounded number of pseudorandom components), and how it has recently been reinterpreted in a more analytical (and infinitary) language using the theory of graph limits or of exchangeable measures. I also discuss arithmetic analogues of this lemma, including one which (implicitly) underlies my result with Ben Green that the primes contain arbitrarily long arithmetic progressions.
- Singularity and determinant of random matrices. Here, I present recent progress in understanding the question of how likely a random matrix (e.g. one whose entries are all +1 or -1 with equal probability) is to be invertible, as well as the related question of how large the determinant should be. The case of continuous matrix ensembles (such as the Gaussian ensemble) is well understood, but the discrete case contains some combinatorial difficulties and took longer to understand properly. In particular I present the results of Kahn-Komlós-Szemerédi and later authors showing that discrete random matrices are invertible with exponentially high probability, and also give some results for the distribution of the determinant.
- The least singular value of random matrices. A more quantitative version of the question “when is a matrix invertible?” is “what is the least singular value of that matrix”? I present here the recent results of Litvak-Pajor-Rudelson-Tomczak-Jaegermann, Rudelson, myself and Vu, and Rudelson–Vershynin on addressing this question in the discrete case. A central role is played by the inverse Littlewood-Offord theorems of additive combinatorics, which give reasonably sharp necessary conditions for a discrete random walk to concentrate in a small ball.
- The circular law. One interesting application of the above theory is to extend the circular law for the spectrum of random matrices from the continuous case to the discrete case. Previous arguments of Girko and Bai for the continuous case can be transplanted to the discrete case, but the key new ingredient needed is a least singular value bound for shifted matrices in order to avoid the spectrum being overwhelmed by pseudospectrum. It turns out that the results of the preceding lecture are almost precisely what are needed to accomplish this.
[Update, Mar 31: first lecture slides corrected. Thanks to Yoshiyasu Ishigami for pointing out a slight inaccuracy in the text.]