Van Vu and I have recently uploaded our joint paper, “Random matrices: the circular law“, submitted to Contributions to Discrete Mathematics. In this paper we come close to fully resolving the circular law conjecture regarding the eigenvalue distribution of random matrices, for arbitrary choices of coefficient distribution.
More precisely, suppose we have an matrix for some large n, where each coefficient of is an independent identically distributed copy of a single random variable x (possibly complex-valued). x could be continuous (e.g. a Gaussian) or discrete (e.g. a Bernoulli random variable, taking values +1 and -1 with equal probability). For simplicity, let us normalise x to have mean 0 and variance 1 (in particular, the second moment is finite). This matrix will not be self-adjoint or normal, but we still expect it to be diagonalisable, with n complex eigenvalues. Heuristic arguments suggest that these eigenvalues should mostly have magnitude ; for instance, one can see this by observing that the Hilbert-Schmidt norm (a.k.a. the Frobenius norm) , which can be shown to dominate the sum of squares of the magnitudes of the eigenvalues, is of size comparable to on the average. Because of this, it is customary to normalise the matrix by ; thus let be the n complex eigenvalues of , arranged in any order.
Numerical evidence (as seen for instance here) soon reveals that these n eigenvalues appear to distribute themselves uniformly in the unit circle in the limit . This phenomenon is known as the circular law. It can be made more precise; if we define the empirical spectral distribution to be the function
then with probability 1, should converge uniformly to the uniform distribution of the unit circle, defined as
This statement is known as the circular law conjecture. In the case when x is a complex Gaussian, this law was verified by Mehta (using an explicit formula of Ginibre for the joint density function of the eigenvalues in this case). A strategy for attacking the general case was then formulated by Girko, although a fully rigorous execution of that strategy was first achieved by Bai (and then improved slightly by Bai and Silverstein). They established the circular law under the assumption that x had slightly better than bounded second moment (i.e. for some ), but more importantly that the probability density function of x in the complex plane was bounded (in particular, this ruled out all discrete random variables, such as the Bernoulli random variable). The reason for this latter restriction was in order to control the event that the matrix (or more precisely for various complex numbers z) becomes too ill-conditioned by having a very small least singular value.
In the last few years, work of Rudelson, of myself with Van Vu, and of Rudelson-Vershynin (building upon earlier work of Kahn, Komlos, and Szemerédi, and of Van and myself), have opened the way to control the condition number of random matrices even when the matrices are discrete, and so there have been a recent string of results using these techniques to extend the circular law to discrete settings. In particular, Gotze and Tikhomirov established the circular law for discrete random variables which were sub-Gaussian, which was then relaxed by Pan and Zhou to an assumption of bounded fourth moment. In our paper, we get very close to the ideal assumption of bounded second moment; we need for some . (The power of 16 in the logarithm can certainly be improved, though our methods do not allow the logarithm to be removed entirely.)
The main new difficulty that arises when relaxing the moment condition so close to the optimal one is that one begins to lose control on the largest singular value of , i.e. on the operator norm of . Under high moment assumptions (e.g. fourth moment) one can keep this operator norm bounded with reasonable probability (especially after truncating away some exceptionally large elements), but when the moment conditions are loosened, one can only bound this operator norm by a quantity bounded polynomially in n, even after truncation. This in turn causes certain metric entropy computations to become significantly more delicate, as one has to reduce the scale of the net below what one would ordinarily like to have.
The proof of the circular law can be reduced by standard methods to control on the least singular value of the matrices ; the latter is then controlled by some standard counting methods coupled with a new inverse Littlewood-Offord theorem. Let me now briefly discuss these three distinct steps.
First, the reduction to the least singular value problem. Because the matrix is not self-adjoint or normal, approaches which rely on the spectral theorem, such as the moment method (i.e. computing expressions such as for various k) do not work terribly well (except to get some crude upper bounds). Any approach based on explicit formulae also seem hopeless. Instead, since the work of Girko, all work on the circular law has proceeded through an analysis of the Stieltjes transform
of the empirical spectral distribution . On the one hand, this rational function encodes the same information as ; on the other hand, its real part can be expressed as
where denotes the derivative with respect to the real part of z. Because of this, one can (in principle, at least) deduce the circular law from a study of the spectral distribution of . The key advantage in doing so is that this latter matrix is self-adjoint (i.e. Hermitian), which makes many additional tools available (not just the moment method, but also various perturbation methods, since the spectrum of a self-adjoint matrix is very stable, whereas the spectrum of an arbitrary matrix is not [due to the influence of the pseudospectrum]). As it turns out, one particularly useful tool is useful is the interspersing property: given an minor of a self-adjoint matrix, the eigenvalues of the former intersperse the eigenvalues of the latter. (This is a nice exercise to derive from the min-max characterisation of eigenvalues of self-adjoint matrices. It also shows up in work on Horn’s conjecture on sums of Hermitian matrices, but that is a whole story in itself.)
Anyway, using all this machinery one can formally derive the circular law by various (mostly algebraic) manipulations, but there is a significant analytical issue that has to be addressed. In order to exploit (1), one has to handle the two singularities of the logarithm function, one at 0 and the other at infinity. The singularity at infinity is not difficult to deal with, because one has reasonable bounds on the size of ; but the singularity at 0 causes a lot of trouble. Basically, one can only control the effect of this singularity if one has good control on the least singular value of the matrix for generic values of z (of course, things are bad when z is an eigenvalue; the problem is if there are also large regions of pseudospectrum in which this matrix has very large inverse even though z is far from any eigenvalue).
So matters reduce to that of controlling the least singular value of . To simplify the discussion let us just look at the z=0 case, which already contains most of the main ideas. Here, we have to obtain good upper bounds for probabilities such as
for various choices of the parameter . If one had only a single unit vector x to deal with, one could easily use a variety of techniques to estimate this probability; the trouble is that we have to consider all possible values of x at once.
There are two main strategies for dealing with expressions such as (2), which I call the entropy strategy and the swapping strategy. The entropy strategy proceeds by using some operator norm bound on to discretise the space of all possible x, so that x now only ranges over a finite set. One then gets as good an estimate as one can for each individual x in that set, and sums in x. (There is a slightly more sophisticated multi-scale version of this idea, known as the method of generic chaining, but it turns out we do not need that method here.) This method works well as long as the metric entropy (a.k.a. covering numbers) of the set of x which one has to deal with is small. The swapping method, in contrast, is much more indirect; it uses various manipulations using conditional expectation (and heavy exploitation of independence) to compare probabilities such as (2) in terms of fractions of other probabilities which can be more easily controlled, for instance using the trivial bound that any probability is bounded above by 1. The swapping method was used heavily in some earlier work (for instance in these two papers of Van and myself) but ends up not being very effective here (especially once one reinstates the perturbation for arbitrary z); we only use it to handle those vectors x which are “poor” in the sense that x is unlikely to be close to orthogonal to any of the rows of (and thus is unlikely to be small).
To deal with the “rich” vectors, we are thus left with the entropy method, and so we need to control the covering numbers of the space of rich vectors. This problem falls under the purview of inverse Littlewood-Offord theory, which seeks to classify those coefficients for which the random walk is unusually concentrated (e.g. it hits a small ball with high probability), where are iid copies of x. (In contrast, the more classical forward Littlewood-Offord theory starts with some assumptions on these coefficients , and then bounds the “small ball probability”, which measures the extent to which the random walk can concentrate.) This theory is somewhat analogous to the inverse sumset theory developed by Freiman, Ruzsa, and others, and indeed Fourier analysis and the geometry of numbers are used quite heavily in this subject. Here, we need to develop some fairly precise (but technical) inverse Littlewood-Offord theorems in order to obtain an acceptable entropy bound; roughly speaking, we show that most of the coefficients are efficiently contained within a generalised arithmetic progression of bounded rank, with nearly sharp bounds on the size of the progression. One key difficulty is that the variable x is fairly arbitrary; since the information one extracts from Fourier-analytic methods depends heavily on the characteristic function of x, one has to manipulate this information using very general geometry-of-numbers tools, requiring a number of rather technical lemmas (for instance, we need to locate large lacunary sets inside generalised arithmetic progressions).