Van Vu and I have recently uploaded our joint paper, “Random matrices: the circular law“, submitted to Contributions to Discrete Mathematics. In this paper we come close to fully resolving the circular law conjecture regarding the eigenvalue distribution of random matrices, for arbitrary choices of coefficient distribution.
More precisely, suppose we have an matrix
for some large n, where each coefficient
of
is an independent identically distributed copy of a single random variable x (possibly complex-valued). x could be continuous (e.g. a Gaussian) or discrete (e.g. a Bernoulli random variable, taking values +1 and -1 with equal probability). For simplicity, let us normalise x to have mean 0 and variance 1 (in particular, the second moment is finite). This matrix will not be self-adjoint or normal, but we still expect it to be diagonalisable, with n complex eigenvalues. Heuristic arguments suggest that these eigenvalues should mostly have magnitude
; for instance, one can see this by observing that the Hilbert-Schmidt norm (a.k.a. the Frobenius norm)
, which can be shown to dominate the sum of squares of the magnitudes of the eigenvalues, is of size comparable to
on the average. Because of this, it is customary to normalise the matrix by
; thus let
be the n complex eigenvalues of
, arranged in any order.
Numerical evidence (as seen for instance here) soon reveals that these n eigenvalues appear to distribute themselves uniformly in the unit circle in the limit
. This phenomenon is known as the circular law. It can be made more precise; if we define the empirical spectral distribution
to be the function
then with probability 1, should converge uniformly to the uniform distribution
of the unit circle, defined as
This statement is known as the circular law conjecture. In the case when x is a complex Gaussian, this law was verified by Mehta (using an explicit formula of Ginibre for the joint density function of the eigenvalues in this case). A strategy for attacking the general case was then formulated by Girko, although a fully rigorous execution of that strategy was first achieved by Bai (and then improved slightly by Bai and Silverstein). They established the circular law under the assumption that x had slightly better than bounded second moment (i.e. for some
), but more importantly that the probability density function of x in the complex plane was bounded (in particular, this ruled out all discrete random variables, such as the Bernoulli random variable). The reason for this latter restriction was in order to control the event that the matrix
(or more precisely
for various complex numbers z) becomes too ill-conditioned by having a very small least singular value.
In the last few years, work of Rudelson, of myself with Van Vu, and of Rudelson-Vershynin (building upon earlier work of Kahn, Komlos, and Szemerédi, and of Van and myself), have opened the way to control the condition number of random matrices even when the matrices are discrete, and so there have been a recent string of results using these techniques to extend the circular law to discrete settings. In particular, Gotze and Tikhomirov established the circular law for discrete random variables which were sub-Gaussian, which was then relaxed by Pan and Zhou to an assumption of bounded fourth moment. In our paper, we get very close to the ideal assumption of bounded second moment; we need for some
. (The power of 16 in the logarithm can certainly be improved, though our methods do not allow the logarithm to be removed entirely.)
The main new difficulty that arises when relaxing the moment condition so close to the optimal one is that one begins to lose control on the largest singular value of , i.e. on the operator norm of
. Under high moment assumptions (e.g. fourth moment) one can keep this operator norm bounded with reasonable probability (especially after truncating away some exceptionally large elements), but when the moment conditions are loosened, one can only bound this operator norm by a quantity bounded polynomially in n, even after truncation. This in turn causes certain metric entropy computations to become significantly more delicate, as one has to reduce the scale
of the net below what one would ordinarily like to have.
The proof of the circular law can be reduced by standard methods to control on the least singular value of the matrices ; the latter is then controlled by some standard counting methods coupled with a new inverse Littlewood-Offord theorem. Let me now briefly discuss these three distinct steps.
First, the reduction to the least singular value problem. Because the matrix is not self-adjoint or normal, approaches which rely on the spectral theorem, such as the moment method (i.e. computing expressions such as
for various k) do not work terribly well (except to get some crude upper bounds). Any approach based on explicit formulae also seem hopeless. Instead, since the work of Girko, all work on the circular law has proceeded through an analysis of the Stieltjes transform
of the empirical spectral distribution . On the one hand, this rational function encodes the same information as
; on the other hand, its real part can be expressed as
(1)
where denotes the derivative with respect to the real part
of z. Because of this, one can (in principle, at least) deduce the circular law from a study of the spectral distribution of
. The key advantage in doing so is that this latter matrix is self-adjoint (i.e. Hermitian), which makes many additional tools available (not just the moment method, but also various perturbation methods, since the spectrum of a self-adjoint matrix is very stable, whereas the spectrum of an arbitrary matrix is not [due to the influence of the pseudospectrum]). As it turns out, one particularly useful tool is useful is the interspersing property: given an
minor of a self-adjoint
matrix, the eigenvalues of the former intersperse the eigenvalues of the latter. (This is a nice exercise to derive from the min-max characterisation of eigenvalues of self-adjoint matrices. It also shows up in work on Horn’s conjecture on sums of Hermitian matrices, but that is a whole story in itself.)
Anyway, using all this machinery one can formally derive the circular law by various (mostly algebraic) manipulations, but there is a significant analytical issue that has to be addressed. In order to exploit (1), one has to handle the two singularities of the logarithm function, one at 0 and the other at infinity. The singularity at infinity is not difficult to deal with, because one has reasonable bounds on the size of ; but the singularity at 0 causes a lot of trouble. Basically, one can only control the effect of this singularity if one has good control on the least singular value of the matrix
for generic values of z (of course, things are bad when z is an eigenvalue; the problem is if there are also large regions of pseudospectrum in which this matrix has very large inverse even though z is far from any eigenvalue).
So matters reduce to that of controlling the least singular value of . To simplify the discussion let us just look at the z=0 case, which already contains most of the main ideas. Here, we have to obtain good upper bounds for probabilities such as
(2)
for various choices of the parameter . If one had only a single unit vector x to deal with, one could easily use a variety of techniques to estimate this probability; the trouble is that we have to consider all possible values of x at once.
There are two main strategies for dealing with expressions such as (2), which I call the entropy strategy and the swapping strategy. The entropy strategy proceeds by using some operator norm bound on to discretise the space of all possible x, so that x now only ranges over a finite set. One then gets as good an estimate as one can for each individual x in that set, and sums in x. (There is a slightly more sophisticated multi-scale version of this idea, known as the method of generic chaining, but it turns out we do not need that method here.) This method works well as long as the metric entropy (a.k.a. covering numbers) of the set of x which one has to deal with is small. The swapping method, in contrast, is much more indirect; it uses various manipulations using conditional expectation (and heavy exploitation of independence) to compare probabilities such as (2) in terms of fractions of other probabilities which can be more easily controlled, for instance using the trivial bound that any probability is bounded above by 1. The swapping method was used heavily in some earlier work (for instance in these two papers of Van and myself) but ends up not being very effective here (especially once one reinstates the perturbation
for arbitrary z); we only use it to handle those vectors x which are “poor” in the sense that x is unlikely to be close to orthogonal to any of the rows of
(and thus
is unlikely to be small).
To deal with the “rich” vectors, we are thus left with the entropy method, and so we need to control the covering numbers of the space of rich vectors. This problem falls under the purview of inverse Littlewood-Offord theory, which seeks to classify those coefficients for which the random walk
is unusually concentrated (e.g. it hits a small ball with high probability), where
are iid copies of x. (In contrast, the more classical forward Littlewood-Offord theory starts with some assumptions on these coefficients
, and then bounds the “small ball probability”, which measures the extent to which the random walk can concentrate.) This theory is somewhat analogous to the inverse sumset theory developed by Freiman, Ruzsa, and others, and indeed Fourier analysis and the geometry of numbers are used quite heavily in this subject. Here, we need to develop some fairly precise (but technical) inverse Littlewood-Offord theorems in order to obtain an acceptable entropy bound; roughly speaking, we show that most of the coefficients
are efficiently contained within a generalised arithmetic progression of bounded rank, with nearly sharp bounds on the size of the progression. One key difficulty is that the variable x is fairly arbitrary; since the information one extracts from Fourier-analytic methods depends heavily on the characteristic function of x, one has to manipulate this information using very general geometry-of-numbers tools, requiring a number of rather technical lemmas (for instance, we need to locate large lacunary sets inside generalised arithmetic progressions).
28 comments
Comments feed for this article
23 August, 2007 at 10:23 am
Mark Meckes
How important is the assumption that the matrix entries are identically distributed? Is it vital to (some of) the methods used here, or is it really a technical convenience?
23 August, 2007 at 10:57 am
Terence Tao
Dear Mark,
One can partially relax the hypothesis of identical distribution; it is enough to assume that the
are independent, and are “Fourier-dominated” by another random variable y which has bounded
moment (say) and nonzero variance. By Fourier dominated, I mean that we have
for all i,j and all complex numbers
, thus the characteristic functions of the
are uniformly controlled by the characteristic function of y. This generalisation is more or less automatic from our methods, as the iid hypothesis is primarily exploited via Fourier-analytic techniques.
As a concrete example, we can handle the case in which all of the
are supported in a fixed finite set (e.g.
), but have different distributions (but still all have mean zero and variance 1, of course).
On the other hand, the assumption of independence is absolutely crucial for us (though perhaps we might be able to tolerate a small (
) amount of “bad” entries, e.g. we can “freeze” some entries to be deterministic).
23 August, 2007 at 11:19 am
Mark Meckes
Dear Terry,
Of course it’s to be expected that weakening the assumption of independence is a much more complicated manner, since it can be done so many ways. Recent work in the setting of symmetric random matrices with dependent entries includes cases in which one sees the same asymptotic spectral behavior as in the classical case (like here), or something completely new (like here).
23 August, 2007 at 11:34 am
Terence Tao
Dear Mark,
Thanks for the links! (Incidentally, Mathscinet has a useful feature called “make link” which creates a civilised and stable URL for any article, in particular it points to the article itself rather than the search query that located that article.)
In the case of discrete matrices, even the simplest non-independent case, namely that of random symmetric Bernoulli matrices, is already very difficult; I was able to show with Kevin Costello and Van Vu that those matrices are non-singular with probability 1-o(1), but to get condition number bounds (i.e. to control the least singular value) looks much more difficult. (Of course, in this particular case, the matrix is self-adjoint, and one can use moment methods to get a spectral distribution – the semicircular law, I believe – but I have no idea what happens for, say, symmetric complex discrete random matrices.)
23 August, 2007 at 6:23 pm
Jim Clarage
What form does the Circle Law take when the eigenvalues are all real?
I’m asking as a biophysicist. E.g., the dynamics of large protein molecules (large N-body) involves matrices in the atomic motions. It would seem the real-resctricted form of the conjecture would allow me to make some general limiting statements about the dynamics. (instead of running hours of mindless computer grid time as I do now).
Would they be evenly distributed on real interval? Or would they be weighted at origin since a disk projection on interval would weight in middle?
-signed biophysicist posting to (my favorite) math blog for first time
23 August, 2007 at 8:03 pm
Terence Tao
Dear Jim,
For random self-adjoint matrices with iid entries (which automatically have real eigenvalues), the relevant law is Wigner’s semicircular law:
http://mathworld.wolfram.com/WignersSemicircleLaw.html
In particular, there is definitely a concentration near the origin.
More generally, the theory of random self-adjoint matrices is extremely well developed, in large part due to the effectiveness of the moment method in this setting. See for instance the book by Mehta on random matrices.
If instead you are asking about the distribution of the real parts of the complex eigenvalues of a random matrix, the distribution will again obey the semicircular law, since if one projects the uniform distribution of the circular law onto the real axis, one obtains the semicircular distribution on an interval. It is a curious coincidence that this distribution comes up in two different ways.
23 August, 2007 at 8:23 pm
Jim Clarage
Dear Terry,
Thanks so much for the answer. I’ll either goto sleep now, or figure out what this implies for the dynamics of very large protein molecules.
ps., I find it wonderfully coincidental that you mentioned the “curious coincidence” between the strictly Real and the real-parts of the Complex, since of course Wigner himself stressed such
seemingly unreasonable coincidences in mathematics and natural science.
23 August, 2007 at 9:37 pm
Top Posts « WordPress.com
[…] Random matrices: the circular law Van Vu and I have recently uploaded our joint paper, “Random matrices: the circular law“, submitted to Acta […] […]
24 August, 2007 at 7:47 am
Emmanuel Kowalski
Are there any results concerning the spectrum of (normalized) random unimodular matrices, obtained for instance by random walks on $SL(n,\mathbf{Z})$, using systems of generators like elementary matrices which seem to give a decent dependency on $n$? Of course there are two parameters involved here, the length (say $k$) of the random walk, and the size $n$ of the matrix, instead of simply the size $n$, and it’s not clear if fixing $k$ and letting $n$ go to infinity will lead to something interesting. For the limit of large $k$, with fixed $n$, one might expect to obtain a limit however (related to Haar measure on $SU(n)$?) And one could imagine “mixed” limits where $k$ is a function
of $n$, so that the $k$-th step is throughly mixed.
In another direction, I’m wondering if in the bounds for the probability that a random Bernoulli-type discrete random matrix is singular, one can expect to have similar bounds for the probability that the characteristic polynomial is, say, irreducible, or has big splitting field? This is the type of questions that can be attacked by sieve in the case of random walks, using harmonic analysis (uniform spectral gaps of Cayley graphs), but for Bernoulli-type random matrices, the same approach would need to know quite precisely how the the characteristic polynomial of such a matrix is distributed modulo a prime $p$, which seems interesting but not obvious at all.
24 August, 2007 at 9:25 am
Terence Tao
Dear Emmanuel,
Regarding your first question, I don’t have a direct answer, but there is some work which is in the spirit of your questions. Random walks on
are essentially the same as transfer matrices for discrete one-dimensional Schrodinger (or Dirac) operators with random potentials, and there is an immense literature on these (the buzzword here is “Anderson localization”). Presumably, much of this literature extends to
in the regime where n is fixed and k is going to infinity.
There is a small amount of literature on how close a random walk in a fixed non-abelian Lie group can get to the origin, see for instance the Kaloshin-Rodnianski paper on a conjecture of Gamburd-Jacobson-Sarnak. More recently, the work of Bourgain and Gamburd establishes rapidly converging uniform distribution properties of random walks on SU(2).
In the opposite regime in which k is fixed and n is going to infinity, then one presumably wants to use the tools of free probability, free convolution, and so forth.
As for your second question, there is some literature on what the determinant mod p of a random Bernoulli matrix looks like (in the regime where p is fixed and n goes to infinity) – as one would expect, it becomes uniformly distributed on {1,…,p-1} and has a spike at 0, and of course one can also easily figure out what is going on with the trace and a few moments, but this is nowhere near enough control on the characteristic polynomial to start addressing questions such as irreducibility. It might be possible to make some conjectures though; if the matrix is not actually singular, then one expects the characteristic polynomial modulo p to look rather generic.
24 August, 2007 at 11:04 am
Mark Meckes
Dear Terry,
I wouldn’t say that the two different occurences of the semicircle law are just a coincidence. Rather they illustrate an interesting phenomenon. In addressing Jim’s question you pointed out that the semicircle law approximately describes the distribution of:
1) the (necessarily real) eigenvalues of a suitably normalized random self-adjoint matrix; and
2) the real parts of the eigenvalues of $\frac{1}{\sqrt{n}} N_n$.
If we, as usual, define the real part of $N_n$ to be the matrix $\frac{1}{2} (N_n + N_n^*)$, then the real part of $N_n$ is precisely the kind of random self-adjoint matrix covered by Wigner’s theorem. In general, the real parts of the eigenvalues of a matrix $N_n$ are not the same as the eigenvalues of the real part of $N_n$. But the two different occurences of the semicircle law show that when the matrix is large and its entries are i.i.d. random variables, those two sets of numbers are nearly the same with high probability.
24 August, 2007 at 6:24 pm
Anonymous
To Mark and Terry,
It sounds interesting. Would there be a “soft” way to show that the distributions
of the real parts of the eigenvalues and the eigenvalues of the real parts are tbe same, without computing these distributions precisely ?
27 August, 2007 at 1:19 pm
Terence Tao
Dear anonymous,
As far as I know there is no “soft” way to do this, which is a pity as it would probably provide a much simpler proof of the circular law (which, at present, requires a much more intricate argument than the corresponding argument for the semicircular law, since the method of moments works very well for the latter but not the former).
One reason for caution, though, is that once one leaves the realm of self-adjoint (or at least normal) matrices, and considers arbitrary matrices, then the spectrum becomes a very unstable concept; small perturbations in the matrix can lead to dramatic changes in the spectrum, due to the presence of pseudospectrum. Because of this, it seems hopeless to get any control on the spectrum of such matrices by purely soft arguments, unless one has some control on pseudospectrum (such as the least singular value bounds alluded to in the main post), which almost certainly requires “hard analysis” methods to establish. But perhaps if one assumed that kind of control as a “black box”, then there is scope for softer arguments to get somewhere.
6 September, 2007 at 2:09 am
Fabrizio Dabbene
Dear Terence,
some time ago I wrote a brief paper
On the Stability of Random Matrice for a book on Unsolved Problems in Mathematical Systems and Control Theory. I was wondering if your recent research could shed a new light on these open problems in the field of control theory.
6 September, 2007 at 5:02 pm
Terence Tao
Dear Fabrizio,
There is certainly a lot of work on the spectrum of randomly perturbed matrices; for instance, the condition number and least singular value of such matrices was studied by Spielman and Teng, and then by many other authors (including ourelves). I am not so familiar with the literature regarding your specific problem, but I would imagine that one should first look at simple ensembles such as the GUE; our methods here are more complicated and mainly designed to handle discrete random matrices, which are of importance in digital applications but probably not so much in analog control theory situations.
8 September, 2007 at 8:53 am
Djalil
Dear Terence & Fabrizio
It seems that the polynomial bound on the smallest singular value allows to handle the non central case: http://arxiv.org/abs/0709.0036
I used the logarithmic potential approach. I ignore if the Fourier-Stieltjes approach of Girko & Bai allows to do the same.
The non-central case gives to Frabrizio an example of a random non-Hermitian perturbation of a rank one matrix…
14 January, 2008 at 12:41 pm
Mark Meckes
Regarding my last comment, I think I was too sloppy about the normalization. The eigenvalues of the real part of
are distributed according to the semicircle law with radius
, whereas the real parts of the eigenvalues of
are distributed according to the semicircle law with radius 1. So maybe the two appearances of the semicircle law are a coincidence after all, although there may be some heuristic explanation for that factor of
.
10 February, 2008 at 1:58 am
Djalil Chafaï
Dear Terence,
One of the main result of your paper with Van Vu is the polynomial control of the smallest singular value of generic random matrices with i.i.d. entries of finite variance. Is it also the case for random matrices with i.i.d. exchangeable rows? Think about the uniform law on some l_p ball, or the uniform law on Markov kernels. The circular law theorem is probably true for such random matrices (confirmed by computer simulations).
10 February, 2008 at 9:25 am
Terence Tao
Dear Djalil,
Of course, some non-degeneracy condition on the rows would be required – if the rows were all constant, for instance, then the matrix would always be singular. More generally, if X denotes a random vector with the distribution of the rows, then the arguments used by Van and myself and by others basically require that X does not concentrate too often within a small neighbourhood of a hyperplane, unless the hyperplane is of a specific form (basically, the normal vector of that hyperplane has to obey some arithmetic relations between its coordinates, up to small error, in order for concentration to be allowed). It is conceivable that such properties could be verified for specific row distributions, though we do not yet have a general abstract theorem that would identify these properties precisely and give an efficient means to test for them. (For the l_p ball, though, I would imagine that concentration of measure effects would make this ensemble sufficiently close to the fully iid ensemble that one could probably deduce results on the former from the latter without much difficulty.)
14 February, 2008 at 2:32 pm
Djalil Chafaï
Thank you for your answer, Terence. Of course, exchangeability is not enough. I wonder if there is some simple additional assumption on the correlation and marginal law in order to ensure that the law is close enough to the i.i.d. case or to a more subtle “good” case as you explained. Concentration of measure is a good idea for some class of distributions, but quite restrictive maybe. I was thinking about exchangeability as a weak replacement of i.i.d. Well, my idea is probably too naive in view of the “arithmetic hyperplane property”.
15 February, 2008 at 5:35 am
Mark Meckes
Djalil,
I haven’t seen anything like what you’re asking about for the circular law, but if you don’t already know them, the following might interest you:
“Random points in the unit ball of
“, preprint available here. This shows that the Marchenko-Pastur and Bai-Yin results hold for random matrices with rows uniform in an $latex l_p” ball.
“A generalization of the Lindeberg principle“, article available here. This shows that the semicircle law holds for random symmetric matrices with exchangeable entries above the main diagonal.
What is the “arithmetic hyperplane property” you refer to?
15 February, 2008 at 9:36 am
Djalil Chafaï
Thanks Mark. I’ve already noticed these two interesting papers.
“arithmetic hyperplane property” is a shortcut that I used for the explanation of Terence given above about the approach used by him and Van Vu for the polynomial control of the smallest singular value.
See the last message of Terence above.
22 May, 2008 at 1:06 pm
Random matrices: A general approach for the least singular value problem « What’s new
[…] size and with no assumptions on x other than zero mean and unit variance. (See also my earlier blog post on this […]
6 August, 2008 at 5:44 pm
Random matrices: Universality of ESDs and the circular law « What’s new
[…] with the weak and strong law of large numbers, and in fact this law is used in the proof.) In a previous paper, we had established the same claim but under the additional assumption that the moment was finite […]
14 March, 2010 at 11:32 am
254A, Notes 8: The circular law « What’s new
[…] were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath […]
2 January, 2013 at 7:54 am
Small ball probability and Inverse Littlewood-Offord theorems « Vuhavan's Blog
[…] discuss the connection from here to the Circular Law conjecture, but interested reader can check this link or this link or this link for more […]
10 April, 2013 at 9:48 pm
Eric Foxall
Hi,
You mention it is possible to show that the sum of squares of singular values dominates the sum of squares of the eigenvalues of a matrix. Is there an easy way to see why this should be true?
Thanks,
Eric
11 April, 2013 at 1:23 pm
Terence Tao
This can be done directly for upper-triangular matrices (in which the eigenvalues are just the diagonal entries, and the sum of squares of singular values is the Frobenius norm), and the general case then follows from the Schur decomposition.