In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)

The basic result in this area is

Theorem 1 (Circular law)Let be an iid matrix, whose entries , are iid with a fixed (complex) distribution of mean zero and variance one. Then the spectral measure converges both in probability and almost surely to the circular law , where are the real and imaginary coordinates of the complex plane.

This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts , was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.

At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.

** — 1. Spectral instability — **

One of the basic difficulties present in the non-Hermitian case is *spectral instability*: small perturbations in a large matrix can lead to large fluctuations in the spectrum. In order for any sort of analytic technique to be effective, this type of instability must somehow be precluded.

The canonical example of spectral instability comes from perturbing the right shift matrix

to the matrix

for some .

The matrix is nilpotent: . Its characteristic polynomial is , and it thus has repeated eigenvalues at the origin. In contrast, obeys the equation , its characteristic polynomial is , and it thus has eigenvalues at the roots , of . Thus, even for exponentially small values of , say , the eigenvalues for can be quite far from the eigenvalues of , and can wander all over the unit disk. This is in sharp contrast with the Hermitian case, where eigenvalue inequalities such as the Weyl inequalities or Hoffman-Wielandt inequalities (Notes 3a) ensure stability of the spectrum.

One can explain the problem in terms of pseudospectrum. The only spectrum of is at the origin, so the resolvents of are finite for all non-zero . However, while these resolvents are finite, they can be extremely large. Indeed, from the nilpotent nature of we have the Neumann series

so for we see that the resolvent has size roughly , which is exponentially large in the interior of the unit disk. This exponentially large size of resolvent is consistent with the exponential instability of the spectrum:

Exercise 2Let be a square matrix, and let be a complex number. Show that if and only if there exists a perturbation of with such that has as an eigenvalue.

This already hints strongly that if one wants to rigorously prove control on the spectrum of near , one needs some sort of upper bound on , or equivalently one needs some sort of lower bound on the least singular value of .

Without such a bound, though, the instability precludes the direct use of the *truncation method*, which was so useful in the Hermitian case. In particular, there is no obvious way to reduce the proof of the circular law to the case of bounded coefficients, in contrast to the semi-circular law where this reduction follows easily from the Hoffman-Wielandt inequality (see Notes 4). Instead, we must continue working with unbounded random variables throughout the argument (unless, of course, one makes an additional decay hypothesis, such as assuming certain moments are finite; this helps explain the presence of such moment conditions in many papers on the circular law).

** — 2. Incompleteness of the moment method — **

In the Hermitian case, the moments

of a matrix can be used (in principle) to understand the distribution completely (at least, when the measure has sufficient decay at infinity. This is ultimately because the space of real polynomials is dense in various function spaces (the Weierstrass approximation theorem).

In the non-Hermitian case, the spectral measure is now supported on the complex plane rather than the real line. One still has the formula

but it is much less useful now, because the space of complex polynomials no longer has any good density properties. (For instance, the uniform closure of the space of polynomials on the unit disk is not the space of continuous functions, but rather the space of holomorphic functions that are continuous on the closed unit disk.) In particular, the moments no longer uniquely determine the spectral measure.

This can be illustrated with the shift examples given above. It is easy to see that and have vanishing moments up to order, i.e.

for . Thus we have

for . Despite this enormous number of matching moments, the spectral measures and are vastly different; the former is a Dirac mass at the origin, while the latter can be arbitrarily close to the unit circle. Indeed, even if we set *all* moments equal to zero,

for , then there are an uncountable number of possible (continuous) probability measures that could still be the (asymptotic) spectral measure : for instance, any measure which is rotationally symmetric around the origin would obey these conditions.

If one could somehow control the mixed moments

of the spectral measure, then this problem would be resolved, and one could use the moment method to reconstruct the spectral measure accurately. However, there does not appear to be any obvious way to compute this quantity; the obvious guess of works when the matrix is normal, as and then share the same basis of eigenvectors, but generically one does not expect these matrices to be normal.

Remark 3The failure of the moment method to control the spectral measure is consistent with the instability of spectral measure with respect to perturbations, because moments are stable with respect to perturbations.

Exercise 4Let be an integer, and let be an iid matrix whose entries have a fixed distribution with mean zero, variance , and with moment finite. Show that converges to zero as in expectation, in probability, and in the almost sure sense. Thus we see that converges to zero in these three senses also. This is of course consistent with the circular law, but does not come close to establishing that law, for the reasons given above.

The failure of the moment method also shows that methods of free probability (Notes 5) do not work directly. For instance, observe that for fixed , and (in the noncommutative probability space ) both converge in the sense of -moments as to that of the right shift operator on (with the trace , with being the Kronecker delta at ); but the spectral measures of and are different. Thus the spectral measure cannot be read off directly from the free probability limit.

** — 3. The logarithmic potential — **

With the moment method out of consideration, attention naturally turns to the Stieltjes transform

Even though the measure is now supported on rather than , the Stieltjes transform is still well-defined. The Plemelj formula for reconstructing spectral measure from the Stieltjes transform that was used in previous notes is no longer applicable, but there are other formulae one can use instead, in particular one has

Exercise 5Show thatin the sense of distributions, where

is the Cauchy-Riemann operator, and one interprets in a principal value sense.

One can control the Stieltjes transform quite effectively away from the origin. Indeed, for iid matrices with subgaussian entries, one can show (using the methods from Notes 3) that the operator norm of is almost surely; this, combined with (4) and Laurent expansion, tells us that almost surely converges to locally uniformly in the region , and that the spectral measure converges almost surely to zero in this region (which can of course also be deduced directly from the operator norm bound). This is of course consistent with the circular law, but is not sufficient to prove it (for instance, the above information is also consistent with the scenario in which the spectral measure collapses towards the origin). One also needs to control the Stieltjes transform inside the disk in order to fully control the spectral measure.

For this, existing methods (such as predecessor comparison) are not particularly effective (mainly because of the spectral instability, and also because of the lack of analyticity in the interior of the spectrum). Instead, one proceeds by relating the Stieltjes transform to the *logarithmic potential*

It is easy to see that is essentially the (distributional) gradient of :

and thus is related to the spectral measure by the distributional formula

where is the Laplacian. (This is of course just reflecting the fact that is the Newtonian potential in two dimensions.)

In analogy to previous continuity theorems, we have

Theorem 6 (Logarithmic potential continuity theorem)Let be a sequence of random matrices, and suppose that for almost every complex number , converges almost surely (resp. in probability) tofor some probability measure . Then converges almost surely (resp. in probability) to in the vague topology.

*Proof:* We prove the almost sure version of this theorem, and leave the convergence in probability version as an exercise.

On any bounded set in the complex plane, the functions lie in uniformly in . From Minkowski’s integral inequality, we conclude that the and are uniformly bounded in . On the other hand, almost surely the converge pointwise to . From the dominated convergence theorem this implies that converges in to zero for any ; using the uniform bound in to compare with and then sending , we conclude that converges to in . In particular, converges to in the sense of distributions; taking distributional Laplacians using (1) we obtain the claim.

Exercise 7Establish the convergence in probability version of Theorem 6.

Thus, the task of establishing the circular law then reduces to showing, for almost every , that the logarithmic potential converges (in probability or almost surely) to the right limit .

Observe that the logarithmic potential

can be rewritten as a log-determinant:

To compute this determinant, we recall that the determinant of a matrix is not only the product of its eigenvalues, but also has a magnitude equal to the product of its *singular* values:

and thus

where is the spectral measure of the matrix .

The advantage of working with this spectral measure, as opposed to the original spectral measure , is that the matrix is *self-adjoint*, and so methods such as the moment method or free probability can now be safely applied to compute the limiting spectral distribution. Indeed, Girko established that for almost every , converged both in probability and almost surely to an explicit (though slightly complicated) limiting measure in the vague topology. Formally, this implied that would converge pointwise (almost surely and in probability) to

A lengthy but straightforward computation then showed that this expression was indeed the logarithmic potential of the circular measure , so that the circular law would then follow from the logarithmic potential continuity theorem.

Unfortunately, the vague convergence of to only allows one to deduce the convergence of to for continuous and compactly supported. Unfortunately, has singularities at zero and at infinity, and so the convergence

can fail if the spectral measure sends too much of its mass to zero or to infinity.

The latter scenario can be easily excluded, either by using operator norm bounds on (when one has enough moment conditions) or even just the Frobenius norm bounds (which require no moment conditions beyond the unit variance). The real difficulty is with preventing mass from going to the origin.

The approach of Bai proceeded in two steps. Firstly, he established a polynomial lower bound

asymptotically almost surely for the least singular value of . This has the effect of capping off the integrand to be of size . Next, by using Stieltjes transform methods, the convergence of to in an appropriate metric (e.g. the Levi distance metric) was shown to be polynomially fast, so that the distance decayed like for some . The gain can safely absorb the loss, and this leads to a proof of the circular law assuming enough boundedness and continuity hypotheses to ensure the least singular value bound and the convergence rate. This basic paradigm was also followed by later works such as that of Gotze-Tikhomirov, Pan-Zhou, and Tao-Vu, with the main new ingredient being the advances in the understanding of the least singular value (Notes 7).

Unfortunately, to get the polynomial convergence rate, one needs some moment conditions beyond the zero mean and unit variance rate (e.g. finite moment for some ). In my paper with Vu and Krishnapur, we used the additional tool of the Talagrand concentration inequality to eliminate the need for the polynomial convergence. Intuitively, the point is that only a small fraction of the singular values of are going to be as small as ; most will be much larger than this, and so the bound is only going to be needed for a small fraction of the measure. To make this rigorous, it turns out to be convenient to work with a slightly different formula for the determinant magnitude of a square matrix than the product of the eigenvalues, namely the base-times-height formula

where is the row and is the span of .

Exercise 8Establish the inequalityfor any . (Hint: the middle product is the product of the singular values of the first rows of , and so one should try to use the Cauchy interlacing inequality for singular values, see Notes 3a.) Thus we see that is a variant of .

The least singular value bounds, translated in this language (with ), tell us that with high probability; this lets ignore the most dangerous values of , namely those that are equal to (say). For low values of , say for some small , one can use the moment method to get a good lower bound for the distances and the singular values, to the extent that the logarithmic singularity of no longer causes difficulty in this regime; the limit of this contribution can then be seen by moment method or Stieltjes transform techniques to be *universal* in the sense that it does not depend on the precise distribution of the components of . In the medium regime , one can use Talagrand’s inequality to show that has magnitude about , giving rise to a net contribution to of the form , which is small. Putting all this together, one can show that converges to a universal limit as (independent of the component distributions); see my paper with Vu and Krishnapur for details. As a consequence, once the circular law is established for one class of iid matrices, such as the complex gaussian random matrix ensemble, it automatically holds for all other ensembles also.

** — 4. Brown measure — **

We mentioned earlier that due to eigenvalue instability (or equivalently, due to the least singular value of shifts possibly going to zero), the moment method (and thus, by extension, free probability) was not sufficient by itself to compute the asymptotic spectral measure of non-Hermitian matrices in the large limit. However, this method can be used to give a heuristic *prediction* as to what that measure is, known as the *Brown measure*, introduced by Brown. While Brown measure is not *always* the limiting spectral measure of a sequence of matrices, it turns out in practice that this measure can (with some effort) be shown to be the limiting spectral measure in key cases. As Brown measure can be computed (again, after some effort) in many cases, this gives a general strategy towards computing asymptotic spectral measure for various ensembles.

To define Brown measure, we use the language of free probability (Notes 5). Let be a bounded element (not necessarily self-adjoint) of a non-commutative probability space , which we will assume to be tracial. To derive Brown measure, we mimic the Girko strategy used for the circular law. Firstly, for each complex number , we let be the spectral measure of the non-negative self-adjoint element .

Exercise 9Verify that the spectral measure of a positive element is automatically supported on the non-negative real axis. (Hint:Show that for any real polynomial , and use the spectral theorem.)

By the above exercise, is a compactly supported probability measure on . We then define the logarithmic potential by the formula

Note that may equal at some points.

To understand this determinant, we introduce the regularised determinant

for . From the monotone convergence theorem we see that decreases pointwise to as .

We now invoke the Gelfand-Naimark theorem and embed into the space of bounded operators on , so that we may now obtain a functional calculus. (If is not faithful, this embedding need not be injective, but this will not be an issue in what follows.) Then we can write

One can compute the first variation of :

Exercise 10Let . Show that the function is continuously differentiable withand

Then, one can compute the second variation at, say, the origin:

Exercise 11Let . Show that the function is twice continuously differentiable withand

We conclude in particular that

or equivalently

Exercise 12Show that(

Hint:Adapt the proof of Lemma 5 from Notes 5.)

We conclude that is non-negative at zero. Translating by any complex number we see that is non-negative everywhere, that is to say that is subharmonic. Taking limits we see that is subharmonic also; thus if we define the *Brown measure* of as

(cf. (1)) then is a non-negative measure.

Exercise 13Show that for , is continuously differentiable withand

and conclude that is harmonic in this region; thus Brown measure is supported in the disk . Using Green’s theorem, conclude also that Brown measure is a probability measure.

Exercise 14In a finite-dimensional non-commutative probability space , show that Brown measure is the same as spectral measure.

Exercise 15In a commutative probability space , show that Brown measure is the same as the probability distribution.

Exercise 16If is the left shift on (with the trace ), show that the Brown measure of is the uniform measure on the unit circle .

This last exercise illustrates the limitations of Brown measure for understanding asymptotic spectral measure. The shift and the perturbed shift introduced in previous sections both converge in the sense of -moments as (holding fixed) to the left shift . For non-zero , the spectral measure of does indeed converge to the Brown measure of , but for this is not the case. This illustrates a more general principle, that Brown measure is the right asymptotic limit for “generic” matrices, but not for exceptional matrices. (See this paper of Sniady for a precise formulation of this heuristic, using gaussian regularisation.)

The machinery used to establish the circular law in full generality can be used to show that Brown measure is the correct asymptotic spectral limit for other models:

Theorem 17Let be a sequence of random matrices whose entries are joint independent and with all moments uniformly bounded, with variance uniformly bounded from below, and which converges in the sense of -moments to an element of a non-commutative probability space. Then the spectral measure converges almost surely and in probability to the Brown measure of .

This theorem is essentially Theorem 1.20 of my paper with Van Vu and Manjunath Krishnapur. The main ingredients are those mentioned earlier, namely a polynomial lower bound on the least singular value, and the use of Talagrand’s inequality to control medium singular values (or medium codimension distances to subspaces). Of the two ingredients, the former is more crucial, and is much more heavily dependent at present on the joint independence hypothesis; it would be of interest to see how to obtain lower bounds on the least singular value in more general settings.

## 20 comments

Comments feed for this article

14 March, 2010 at 1:58 pm

James Smith‘if wants’ should be ‘if one wants’ after exercise 1

[Corrected, thanks – T.]15 March, 2010 at 6:17 am

Mario FigueiredoHi,

In the statement of Theorem 1, shouldn’t it be

$\mu_{\frac{1}{\sqrt{n}} M_n}$ instead of

$\mu_{\frac{1}{\sqrt{n} M_n}}$ ?

Mario.

[Corrected, thanks. -T]16 March, 2010 at 5:41 am

Américo TavaresDear Professor Terence Tao

This is a HTML query. Maybe it would be better not asking it here.

In Exercice 10 you have more than one paragraph. How do you manage to insert the border?

I use, for instance

p style=”text-align: justify; border: green 1px solid; padding: 20px;”

enclosed by “”, but it works for one paragraph only.

19 March, 2010 at 11:24 am

PifI don’t know the answer, but I know you can get it by looking at the source of the page (something like right-button of the mouse and then source).

19 March, 2010 at 4:14 pm

Américo TavaresThank you.

19 March, 2010 at 2:51 pm

Terence TaoI use a customised CSS for the paragraph formatting; see the bottom of

https://terrytao.wordpress.com/about/

23 July, 2010 at 3:02 pm

Mustafa SaidDear Terry,

I have come upon a problem in my research project , concerning random matrices, that I need some help on. In particular: Suppose that are square matrices that are independent and drawn from the uniform distribution on the sphere with respect to the Hilbert-Schmidt norm, . I want to find a lower bound on , where denotes the expectation.

22 September, 2010 at 5:33 pm

John Jiang“Weilandt” -> “Wielandt” before section 2.

[Corrected, thanks – T.]22 December, 2010 at 9:08 pm

Outliers in the spectrum of iid matrices with bounded rank permutations « What’s new[…] will almost surely be distributed according to the circular law distribution in the limit . (See these lecture notes for further discussion of this […]

2 November, 2011 at 12:11 pm

Smolin (2011) | Research Notebook[…] We can interpret the eigenvalue of this matrix valued process as the fraction of the time agent spends attending to the asset he follows most closely, the eigenvalue as the fraction of the time he spends attending to the asset he follows second most closely, and so on… Thus, we know that the distribution of attention across assets would subscribe to a Wigner Semi-Circle distribution as described here. […]

3 February, 2014 at 6:15 pm

AnonymousHow does one solve exercise 3 about the Cauchy-Riemann operator?

20 April, 2017 at 1:47 pm

keejI have a question about the meaning of “almost sure convergence” in Theorem 1 (Circular Law), but it’s a more general question as well. Are the matrices to be interpreted as the n X n corner of an infinite matrix, or as independent matrices? I’m comparing this to the law of large numbers, where almost sure convergence relies on the numbers in the beginning of the sequence being “reused,” and doesn’t work if you take means of independent samples of n numbers.

[The former. If I recall correctly, this is needed in order to obtain almost sure uniform bounds on the operator norm of the matrices. -T.]20 April, 2017 at 8:59 pm

keejThank you! I will try to check this.

20 April, 2017 at 8:57 pm

keejIn Section 2, “Incompleteness of the moment method,” the integrals involving are meant to be over , not as written, right?

Regarding Exercise 3: Even in the simplest case where is just a point mass at the origin, I am not quite sure how to make sense of as a distribution. I guess one should take the Cauchy principal value, but is there a justification for this?

[Corrected, thanks. Yes, one takes the principal value interpretation here. -T.]11 November, 2017 at 4:34 pm

tornado92I have a question about the proof of Theorem 6, specifically “using the uniform bound in to compare with and then sending $M \rightarrow \infty$ , we conclude that converges to in .”

I’m having trouble converting this part from post-rigorous to rigorous, which I guess a mathematician could easily do. Could someone spell out the details for me? I have an ugly ad-hoc proof but it doesn’t really follow the high-level reasoning in the quote, so I’m curious what was intended here.

11 November, 2017 at 4:41 pm

tornado92I didn’t mean to say it wasn’t rigorous. Maybe I should just say I didn’t follow the proof.

6 December, 2017 at 3:13 pm

keejTo prove weak convergence of probability measures, it seems it suffices to prove convergence of the fourier transforms, stieltjes transforms, log potentials, or possibly other transforms. The drawbacks to using the Stieltjes transform in this case were described nicely in this post.

I’m trying to understand the chief advantage of the log potential. Is the log potential favored here mainly because (which allows us to consider measures on )? Or are there other important benefits to using the log potential which I failed to discern?

14 December, 2017 at 2:51 pm

Terence TaoYes, this is the key identity that makes the Girko Hermitization method work (provided one can prevent the from getting too close to zero, of course.

12 January, 2018 at 9:38 am

keejWhen working with the noncommutative probability space of matrices with state , and when taking the limit as $n \to \infty$, what state does one use for the limiting probability space? It doesn’t seem like has any nice limiting version, or in other words, I don’t know how to take the “normalized trace” of an infinite dimensional matrix. How do people get around this?

In the perturbed shift example you just take the vector state for , but I’m not sure what the motivation is for this.

12 January, 2018 at 4:14 pm

Terence TaoThere is no unique choice here, but one can for instance (after choosing a non-principal ultrafilter) take an ultraproduct of the finitary noncommutative probability spaces, and take the ultralimit of the finitary states, to obtain a limiting noncommutative space (where the limit will be along the ultrafilter, at least).