In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)
The basic result in this area is
Theorem 1 (Circular law) Let
be an
iid matrix, whose entries
,
are iid with a fixed (complex) distribution
of mean zero and variance one. Then the spectral measure
converges both in probability and almost surely to the circular law
, where
are the real and imaginary coordinates of the complex plane.
This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts , was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.
At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.
— 1. Spectral instability —
One of the basic difficulties present in the non-Hermitian case is spectral instability: small perturbations in a large matrix can lead to large fluctuations in the spectrum. In order for any sort of analytic technique to be effective, this type of instability must somehow be precluded.
The canonical example of spectral instability comes from perturbing the right shift matrix
to the matrix
for some .
The matrix is nilpotent:
. Its characteristic polynomial is
, and it thus has
repeated eigenvalues at the origin. In contrast,
obeys the equation
, its characteristic polynomial is
, and it thus has
eigenvalues at the
roots
,
of
. Thus, even for exponentially small values of
, say
, the eigenvalues for
can be quite far from the eigenvalues of
, and can wander all over the unit disk. This is in sharp contrast with the Hermitian case, where eigenvalue inequalities such as the Weyl inequalities or Hoffman-Wielandt inequalities (Notes 3a) ensure stability of the spectrum.
One can explain the problem in terms of pseudospectrum. The only spectrum of is at the origin, so the resolvents
of
are finite for all non-zero
. However, while these resolvents are finite, they can be extremely large. Indeed, from the nilpotent nature of
we have the Neumann series
so for we see that the resolvent has size roughly
, which is exponentially large in the interior of the unit disk. This exponentially large size of resolvent is consistent with the exponential instability of the spectrum:
Exercise 2 Let
be a square matrix, and let
be a complex number. Show that
if and only if there exists a perturbation
of
with
such that
has
as an eigenvalue.
This already hints strongly that if one wants to rigorously prove control on the spectrum of near
, one needs some sort of upper bound on
, or equivalently one needs some sort of lower bound on the least singular value
of
.
Without such a bound, though, the instability precludes the direct use of the truncation method, which was so useful in the Hermitian case. In particular, there is no obvious way to reduce the proof of the circular law to the case of bounded coefficients, in contrast to the semi-circular law where this reduction follows easily from the Hoffman-Wielandt inequality (see Notes 4). Instead, we must continue working with unbounded random variables throughout the argument (unless, of course, one makes an additional decay hypothesis, such as assuming certain moments are finite; this helps explain the presence of such moment conditions in many papers on the circular law).
— 2. Incompleteness of the moment method —
In the Hermitian case, the moments
of a matrix can be used (in principle) to understand the distribution completely (at least, when the measure
has sufficient decay at infinity. This is ultimately because the space of real polynomials
is dense in various function spaces (the Weierstrass approximation theorem).
In the non-Hermitian case, the spectral measure is now supported on the complex plane rather than the real line. One still has the formula
but it is much less useful now, because the space of complex polynomials no longer has any good density properties. (For instance, the uniform closure of the space of polynomials on the unit disk is not the space of continuous functions, but rather the space of holomorphic functions that are continuous on the closed unit disk.) In particular, the moments no longer uniquely determine the spectral measure.
This can be illustrated with the shift examples given above. It is easy to see that and
have vanishing moments up to
order, i.e.
for . Thus we have
for . Despite this enormous number of matching moments, the spectral measures
and
are vastly different; the former is a Dirac mass at the origin, while the latter can be arbitrarily close to the unit circle. Indeed, even if we set all moments equal to zero,
for , then there are an uncountable number of possible (continuous) probability measures that could still be the (asymptotic) spectral measure
: for instance, any measure which is rotationally symmetric around the origin would obey these conditions.
If one could somehow control the mixed moments
of the spectral measure, then this problem would be resolved, and one could use the moment method to reconstruct the spectral measure accurately. However, there does not appear to be any obvious way to compute this quantity; the obvious guess of works when the matrix
is normal, as
and
then share the same basis of eigenvectors, but generically one does not expect these matrices to be normal.
Remark 3 The failure of the moment method to control the spectral measure is consistent with the instability of spectral measure with respect to perturbations, because moments are stable with respect to perturbations.
Exercise 4 Let
be an integer, and let
be an iid matrix whose entries have a fixed distribution
with mean zero, variance
, and with
moment finite. Show that
converges to zero as
in expectation, in probability, and in the almost sure sense. Thus we see that
converges to zero in these three senses also. This is of course consistent with the circular law, but does not come close to establishing that law, for the reasons given above.
The failure of the moment method also shows that methods of free probability (Notes 5) do not work directly. For instance, observe that for fixed ,
and
(in the noncommutative probability space
) both converge in the sense of
-moments as
to that of the right shift operator on
(with the trace
, with
being the Kronecker delta at
); but the spectral measures of
and
are different. Thus the spectral measure cannot be read off directly from the free probability limit.
— 3. The logarithmic potential —
With the moment method out of consideration, attention naturally turns to the Stieltjes transform
Even though the measure is now supported on
rather than
, the Stieltjes transform is still well-defined. The Plemelj formula for reconstructing spectral measure from the Stieltjes transform that was used in previous notes is no longer applicable, but there are other formulae one can use instead, in particular one has
Exercise 5 Show that
in the sense of distributions, where
is the Cauchy-Riemann operator, and one interprets
in a principal value sense.
One can control the Stieltjes transform quite effectively away from the origin. Indeed, for iid matrices with subgaussian entries, one can show (using the methods from Notes 3) that the operator norm of is
almost surely; this, combined with (4) and Laurent expansion, tells us that
almost surely converges to
locally uniformly in the region
, and that the spectral measure
converges almost surely to zero in this region (which can of course also be deduced directly from the operator norm bound). This is of course consistent with the circular law, but is not sufficient to prove it (for instance, the above information is also consistent with the scenario in which the spectral measure collapses towards the origin). One also needs to control the Stieltjes transform inside the disk
in order to fully control the spectral measure.
For this, existing methods (such as predecessor comparison) are not particularly effective (mainly because of the spectral instability, and also because of the lack of analyticity in the interior of the spectrum). Instead, one proceeds by relating the Stieltjes transform to the logarithmic potential
It is easy to see that is essentially the (distributional) gradient of
:
and thus is related to the spectral measure by the distributional formula
where is the Laplacian. (This is of course just reflecting the fact that
is the Newtonian potential in two dimensions.)
In analogy to previous continuity theorems, we have
Theorem 6 (Logarithmic potential continuity theorem) Let
be a sequence of random matrices with
almost surely, and suppose that for almost every complex number
,
converges almost surely (resp. in probability) to
for some compactly supported probability measure
. Then
converges almost surely (resp. in probability) to
in the vague topology.
We remark that the bound on the operator norm can be relaxed, for instance to a bound on the Frobenius norm; similarly, the compact support hypothesis on can be relaxed to a moment condition such as
.
Proof: We prove the almost sure version of this theorem, and leave the convergence in probability version as an exercise.
On any bounded set in the complex plane, the functions
lie in
uniformly in
for
in a bounded set. From Minkowski’s integral inequality, we conclude that the
and
are uniformly bounded in
. On the other hand, almost surely the
converge pointwise to
. From the dominated convergence theorem this implies that
converges in
to zero for any
; using the uniform bound in
to compare
with
and then sending
, we conclude that
converges to
in
. In particular,
converges to
in the sense of distributions; taking distributional Laplacians using (1) we obtain the claim.
Exercise 7 Establish the convergence in probability version of Theorem 6.
Thus, the task of establishing the circular law then reduces to showing, for almost every , that the logarithmic potential
converges (in probability or almost surely) to the right limit
.
Observe that the logarithmic potential
can be rewritten as a log-determinant:
To compute this determinant, we recall that the determinant of a matrix is not only the product of its eigenvalues, but also has a magnitude equal to the product of its singular values:
and thus
where is the spectral measure of the matrix
.
The advantage of working with this spectral measure, as opposed to the original spectral measure , is that the matrix
is self-adjoint, and so methods such as the moment method or free probability can now be safely applied to compute the limiting spectral distribution. Indeed, Girko established that for almost every
,
converged both in probability and almost surely to an explicit (though slightly complicated) limiting measure
in the vague topology. Formally, this implied that
would converge pointwise (almost surely and in probability) to
A lengthy but straightforward computation then showed that this expression was indeed the logarithmic potential of the circular measure
, so that the circular law would then follow from the logarithmic potential continuity theorem.
Unfortunately, the vague convergence of to
only allows one to deduce the convergence of
to
for
continuous and compactly supported. Unfortunately,
has singularities at zero and at infinity, and so the convergence
can fail if the spectral measure sends too much of its mass to zero or to infinity.
The latter scenario can be easily excluded, either by using operator norm bounds on (when one has enough moment conditions) or even just the Frobenius norm bounds (which require no moment conditions beyond the unit variance). The real difficulty is with preventing mass from going to the origin.
The approach of Bai proceeded in two steps. Firstly, he established a polynomial lower bound
asymptotically almost surely for the least singular value of . This has the effect of capping off the
integrand to be of size
. Next, by using Stieltjes transform methods, the convergence of
to
in an appropriate metric (e.g. the Levi distance metric) was shown to be polynomially fast, so that the distance decayed like
for some
. The
gain can safely absorb the
loss, and this leads to a proof of the circular law assuming enough boundedness and continuity hypotheses to ensure the least singular value bound and the convergence rate. This basic paradigm was also followed by later works such as that of Gotze-Tikhomirov, Pan-Zhou, and Tao-Vu, with the main new ingredient being the advances in the understanding of the least singular value (Notes 7).
Unfortunately, to get the polynomial convergence rate, one needs some moment conditions beyond the zero mean and unit variance rate (e.g. finite moment for some
). In my paper with Vu and Krishnapur, we used the additional tool of the Talagrand concentration inequality to eliminate the need for the polynomial convergence. Intuitively, the point is that only a small fraction of the singular values of
are going to be as small as
; most will be much larger than this, and so the
bound is only going to be needed for a small fraction of the measure. To make this rigorous, it turns out to be convenient to work with a slightly different formula for the determinant magnitude
of a square matrix than the product of the eigenvalues, namely the base-times-height formula
where is the
row and
is the span of
.
Exercise 8 Establish the inequality
for any
. (Hint: the middle product is the product of the singular values of the first
rows of
, and so one should try to use the Cauchy interlacing inequality for singular values, see Notes 3a.) Thus we see that
is a variant of
.
The least singular value bounds, translated in this language (with ), tell us that
with high probability; this lets ignore the most dangerous values of
, namely those
that are equal to
(say). For low values of
, say
for some small
, one can use the moment method to get a good lower bound for the distances and the singular values, to the extent that the logarithmic singularity of
no longer causes difficulty in this regime; the limit of this contribution can then be seen by moment method or Stieltjes transform techniques to be universal in the sense that it does not depend on the precise distribution of the components of
. In the medium regime
, one can use Talagrand’s inequality to show that
has magnitude about
, giving rise to a net contribution to
of the form
, which is small. Putting all this together, one can show that
converges to a universal limit as
(independent of the component distributions); see my paper with Vu and Krishnapur for details. As a consequence, once the circular law is established for one class of iid matrices, such as the complex gaussian random matrix ensemble, it automatically holds for all other ensembles also.
— 4. Brown measure —
We mentioned earlier that due to eigenvalue instability (or equivalently, due to the least singular value of shifts possibly going to zero), the moment method (and thus, by extension, free probability) was not sufficient by itself to compute the asymptotic spectral measure of non-Hermitian matrices in the large limit. However, this method can be used to give a heuristic prediction as to what that measure is, known as the Brown measure, introduced by Brown. While Brown measure is not always the limiting spectral measure of a sequence of matrices, it turns out in practice that this measure can (with some effort) be shown to be the limiting spectral measure in key cases. As Brown measure can be computed (again, after some effort) in many cases, this gives a general strategy towards computing asymptotic spectral measure for various ensembles.
To define Brown measure, we use the language of free probability (Notes 5). Let be a bounded element (not necessarily self-adjoint) of a non-commutative probability space
, which we will assume to be tracial. To derive Brown measure, we mimic the Girko strategy used for the circular law. Firstly, for each complex number
, we let
be the spectral measure of the non-negative self-adjoint element
.
Exercise 9 Verify that the spectral measure of a positive element
is automatically supported on the non-negative real axis. (Hint: Show that
for any real polynomial
, and use the spectral theorem.)
By the above exercise, is a compactly supported probability measure on
. We then define the logarithmic potential
by the formula
Note that may equal
at some points.
To understand this determinant, we introduce the regularised determinant
for . From the monotone convergence theorem we see that
decreases pointwise to
as
.
We now invoke the Gelfand-Naimark theorem and embed into the space of bounded operators on
, so that we may now obtain a functional calculus. (If
is not faithful, this embedding need not be injective, but this will not be an issue in what follows.) Then we can write
One can compute the first variation of :
Exercise 10 Let
. Show that the function
is continuously differentiable with
and
Then, one can compute the second variation at, say, the origin:
Exercise 11 Let
. Show that the function
is twice continuously differentiable with
and
We conclude in particular that
or equivalently
Exercise 12 Show that
(Hint: Adapt the proof of Lemma 5 from Notes 5.)
We conclude that is non-negative at zero. Translating
by any complex number we see that
is non-negative everywhere, that is to say that
is subharmonic. Taking limits we see that
is subharmonic also; thus if we define the Brown measure
of
as
(cf. (1)) then is a non-negative measure.
Exercise 13 Show that for
,
is continuously differentiable with
and
and conclude that
is harmonic in this region; thus Brown measure is supported in the disk
. Using Green’s theorem, conclude also that Brown measure is a probability measure.
Exercise 14 In a finite-dimensional non-commutative probability space
, show that Brown measure is the same as spectral measure.
Exercise 15 In a commutative probability space
, show that Brown measure is the same as the probability distribution.
Exercise 16 If
is the left shift on
(with the trace
), show that the Brown measure of
is the uniform measure on the unit circle
.
This last exercise illustrates the limitations of Brown measure for understanding asymptotic spectral measure. The shift and the perturbed shift
introduced in previous sections both converge in the sense of
-moments as
(holding
fixed) to the left shift
. For non-zero
, the spectral measure of
does indeed converge to the Brown measure of
, but for
this is not the case. This illustrates a more general principle, that Brown measure is the right asymptotic limit for “generic” matrices, but not for exceptional matrices. (See this paper of Sniady for a precise formulation of this heuristic, using gaussian regularisation.)
The machinery used to establish the circular law in full generality can be used to show that Brown measure is the correct asymptotic spectral limit for other models:
Theorem 17 Let
be a sequence of random matrices whose entries are jointly independent and with all moments uniformly bounded, with variance uniformly bounded from below, and which converges in the sense of
-moments to an element
of a non-commutative probability space. Then the spectral measure
converges almost surely and in probability to the Brown measure of
.
This theorem is essentially Theorem 1.20 of my paper with Van Vu and Manjunath Krishnapur. The main ingredients are those mentioned earlier, namely a polynomial lower bound on the least singular value, and the use of Talagrand’s inequality to control medium singular values (or medium codimension distances to subspaces). Of the two ingredients, the former is more crucial, and is much more heavily dependent at present on the joint independence hypothesis; it would be of interest to see how to obtain lower bounds on the least singular value in more general settings.
23 comments
Comments feed for this article
14 March, 2010 at 1:58 pm
James Smith
‘if wants’ should be ‘if one wants’ after exercise 1
[Corrected, thanks – T.]
15 March, 2010 at 6:17 am
Mario Figueiredo
Hi,
In the statement of Theorem 1, shouldn’t it be
$\mu_{\frac{1}{\sqrt{n}} M_n}$ instead of
$\mu_{\frac{1}{\sqrt{n} M_n}}$ ?
Mario.
[Corrected, thanks. -T]
16 March, 2010 at 5:41 am
Américo Tavares
Dear Professor Terence Tao
This is a HTML query. Maybe it would be better not asking it here.
In Exercice 10 you have more than one paragraph. How do you manage to insert the border?
I use, for instance
p style=”text-align: justify; border: green 1px solid; padding: 20px;”
enclosed by “”, but it works for one paragraph only.
19 March, 2010 at 11:24 am
Pif
I don’t know the answer, but I know you can get it by looking at the source of the page (something like right-button of the mouse and then source).
19 March, 2010 at 4:14 pm
Américo Tavares
Thank you.
19 March, 2010 at 2:51 pm
Terence Tao
I use a customised CSS for the paragraph formatting; see the bottom of
https://terrytao.wordpress.com/about/
23 July, 2010 at 3:02 pm
Mustafa Said
Dear Terry,
I have come upon a problem in my research project , concerning random matrices, that I need some help on. In particular: Suppose that
are square matrices that are independent and drawn from the uniform distribution on the sphere
with respect to the Hilbert-Schmidt norm,
. I want to find a lower bound on
, where
denotes the expectation.
22 September, 2010 at 5:33 pm
John Jiang
“Weilandt” -> “Wielandt” before section 2.
[Corrected, thanks – T.]
22 December, 2010 at 9:08 pm
Outliers in the spectrum of iid matrices with bounded rank permutations « What’s new
[…] will almost surely be distributed according to the circular law distribution in the limit . (See these lecture notes for further discussion of this […]
2 November, 2011 at 12:11 pm
Smolin (2011) | Research Notebook
[…] We can interpret the eigenvalue of this matrix valued process as the fraction of the time agent spends attending to the asset he follows most closely, the eigenvalue as the fraction of the time he spends attending to the asset he follows second most closely, and so on… Thus, we know that the distribution of attention across assets would subscribe to a Wigner Semi-Circle distribution as described here. […]
3 February, 2014 at 6:15 pm
Anonymous
How does one solve exercise 3 about the Cauchy-Riemann operator?
20 April, 2017 at 1:47 pm
keej
I have a question about the meaning of “almost sure convergence” in Theorem 1 (Circular Law), but it’s a more general question as well. Are the matrices to be interpreted as the n X n corner of an infinite matrix, or as independent matrices? I’m comparing this to the law of large numbers, where almost sure convergence relies on the numbers in the beginning of the sequence being “reused,” and doesn’t work if you take means of independent samples of n numbers.
[The former. If I recall correctly, this is needed in order to obtain almost sure uniform bounds on the operator norm of the matrices. -T.]
20 April, 2017 at 8:59 pm
keej
Thank you! I will try to check this.
20 April, 2017 at 8:57 pm
keej
In Section 2, “Incompleteness of the moment method,” the integrals involving
are meant to be over
, not
as written, right?
Regarding Exercise 3: Even in the simplest case where
is just a point mass at the origin, I am not quite sure how to make sense of
as a distribution. I guess one should take the Cauchy principal value, but is there a justification for this?
[Corrected, thanks. Yes, one takes the principal value interpretation here. -T.]
11 November, 2017 at 4:34 pm
tornado92
I have a question about the proof of Theorem 6, specifically “using the uniform bound in
to compare
with
and then sending $M \rightarrow \infty$ , we conclude that
converges to
in
.”
I’m having trouble converting this part from post-rigorous to rigorous, which I guess a mathematician could easily do. Could someone spell out the details for me? I have an ugly ad-hoc proof but it doesn’t really follow the high-level reasoning in the quote, so I’m curious what was intended here.
11 November, 2017 at 4:41 pm
tornado92
I didn’t mean to say it wasn’t rigorous. Maybe I should just say I didn’t follow the proof.
6 December, 2017 at 3:13 pm
keej
To prove weak convergence of probability measures, it seems it suffices to prove convergence of the fourier transforms, stieltjes transforms, log potentials, or possibly other transforms. The drawbacks to using the Stieltjes transform in this case were described nicely in this post.
I’m trying to understand the chief advantage of the log potential. Is the log potential favored here mainly because
(which allows us to consider measures on
)? Or are there other important benefits to using the log potential which I failed to discern?
14 December, 2017 at 2:51 pm
Terence Tao
Yes, this is the key identity that makes the Girko Hermitization method work (provided one can prevent the
from getting too close to zero, of course.
12 January, 2018 at 9:38 am
keej
When working with the noncommutative probability space of
matrices with state
, and when taking the limit as $n \to \infty$, what state does one use for the limiting probability space? It doesn’t seem like
has any nice limiting version, or in other words, I don’t know how to take the “normalized trace” of an infinite dimensional matrix. How do people get around this?
In the perturbed shift example you just take the vector state for
, but I’m not sure what the motivation is for this.
12 January, 2018 at 4:14 pm
Terence Tao
There is no unique choice here, but one can for instance (after choosing a non-principal ultrafilter) take an ultraproduct of the finitary noncommutative probability spaces, and take the ultralimit of the finitary states, to obtain a limiting noncommutative space (where the limit will be along the ultrafilter, at least).
14 June, 2020 at 2:29 am
Giovanni Barbarino
In the proof of Theorem 6, it is written that
lie uniformly in
for any
. It seems to me that it holds only when
is chosen in a bounded set, since
for big values of
.
This is important since in the next line you use Minkowski, probably as follows:
Here the integral that is inside is not uniformly bounded on
, and actually yields an unbounded function on
, so the overall integral should depend on
.
Probably I am wrongly interpreting the proof, so what is the correct line of reasoning here?
14 June, 2020 at 10:35 am
Terence Tao
Oops, I had forgotten here to impose moment conditions on the various measures involved. In this particular application the measure
and also the empirical measures
in fact all have uniform compact support, so this problem is avoided. (More generally, due to the very slow divergence of the logarithm function, virtually any moment condition will be fine here, e.g., it will be enough to obtain an almost sure bound on the Frobenius norm
.)
14 June, 2020 at 10:49 pm
Giovanni Barbarino
I think that a generalization is present in “Around the circular law
is uniformly integrable for every
.
” by Charles Bordenave and Djalil Chafaı̈, where they just assume that
And shouldn’t the Frobenius norm be greater than the operator norm?
is not a relaxation of 
[I am bounding
, not
– T.]