In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)
The basic result in this area is

Theorem 1 (Circular law) Let {M_n} be an {n \times n} iid matrix, whose entries {\xi_{ij}}, {1 \leq i,j \leq n} are iid with a fixed (complex) distribution {\xi_{ij} \equiv \xi} of mean zero and variance one. Then the spectral measure {\mu_{\frac{1}{\sqrt{n}} M_n}} converges both in probability and almost surely to the circular law {\mu_{circ} := \frac{1}{\pi} 1_{|x|^2+|y|^2 \leq 1}\ dx dy}, where {x, y} are the real and imaginary coordinates of the complex plane.

This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts {\frac{1}{\sqrt{n}} M_n - zI}, was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.
At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.

— 1. Spectral instability —

One of the basic difficulties present in the non-Hermitian case is spectral instability: small perturbations in a large matrix can lead to large fluctuations in the spectrum. In order for any sort of analytic technique to be effective, this type of instability must somehow be precluded.
The canonical example of spectral instability comes from perturbing the right shift matrix

\displaystyle  U_0 := \begin{pmatrix} 0 & 1 & 0 & \ldots & 0 \\ 0 & 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & 0 \end{pmatrix}

to the matrix

\displaystyle  U_\varepsilon := \begin{pmatrix} 0 & 1 & 0 & \ldots & 0 \\ 0 & 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \varepsilon & 0 & 0 & \ldots & 0 \end{pmatrix}

for some {\varepsilon > 0}.
The matrix {U_0} is nilpotent: {U_0^n = 0}. Its characteristic polynomial is {(-\lambda)^n}, and it thus has {n} repeated eigenvalues at the origin. In contrast, {U_\varepsilon} obeys the equation {U_\varepsilon^n = \varepsilon I}, its characteristic polynomial is {(-\lambda)^n - \varepsilon (-1)^n}, and it thus has {n} eigenvalues at the {n^{th}} roots {\varepsilon^{1/n} e^{2\pi i j/n}}, {j=0,\ldots,n-1} of {\varepsilon}. Thus, even for exponentially small values of {\varepsilon}, say {\varepsilon = 2^{-n}}, the eigenvalues for {U_\varepsilon} can be quite far from the eigenvalues of {U_0}, and can wander all over the unit disk. This is in sharp contrast with the Hermitian case, where eigenvalue inequalities such as the Weyl inequalities or Hoffman-Wielandt inequalities (Notes 3a) ensure stability of the spectrum.
One can explain the problem in terms of pseudospectrum. The only spectrum of {U} is at the origin, so the resolvents {(U-zI)^{-1}} of {U} are finite for all non-zero {z}. However, while these resolvents are finite, they can be extremely large. Indeed, from the nilpotent nature of {U_0} we have the Neumann series

\displaystyle  (U_0-zI)^{-1} = -\frac{1}{z} - \frac{U_0}{z^2} - \ldots - \frac{U^{n-1}_0}{z^n}

so for {|z| < 1} we see that the resolvent has size roughly {|z|^{-n}}, which is exponentially large in the interior of the unit disk. This exponentially large size of resolvent is consistent with the exponential instability of the spectrum:

Exercise 2 Let {M} be a square matrix, and let {z} be a complex number. Show that {\| (M-zI)^{-1} \|_{op} \geq R} if and only if there exists a perturbation {M+E} of {M} with {\|E\|_{op} \leq 1/R} such that {M+E} has {z} as an eigenvalue.

This already hints strongly that if one wants to rigorously prove control on the spectrum of {M} near {z}, one needs some sort of upper bound on {\|(M-zI)^{-1}\|_{op}}, or equivalently one needs some sort of lower bound on the least singular value {\sigma_n(M-zI)} of {M-zI}.
Without such a bound, though, the instability precludes the direct use of the truncation method, which was so useful in the Hermitian case. In particular, there is no obvious way to reduce the proof of the circular law to the case of bounded coefficients, in contrast to the semi-circular law where this reduction follows easily from the Hoffman-Wielandt inequality (see Notes 4). Instead, we must continue working with unbounded random variables throughout the argument (unless, of course, one makes an additional decay hypothesis, such as assuming certain moments are finite; this helps explain the presence of such moment conditions in many papers on the circular law).

— 2. Incompleteness of the moment method —

In the Hermitian case, the moments

\displaystyle  \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M)^k = \int_{\bf R} x^k\ d\mu_{\frac{1}{\sqrt{n} M_n}}(x)

of a matrix can be used (in principle) to understand the distribution {\mu_{\frac{1}{\sqrt{n} M_n}}} completely (at least, when the measure {\mu_{\frac{1}{\sqrt{n} M_n}}} has sufficient decay at infinity. This is ultimately because the space of real polynomials {P(x)} is dense in various function spaces (the Weierstrass approximation theorem).
In the non-Hermitian case, the spectral measure {\mu_{\frac{1}{\sqrt{n} M_n}}} is now supported on the complex plane rather than the real line. One still has the formula

\displaystyle  \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M)^k = \int_{\bf R} z^k\ d\mu_{\frac{1}{\sqrt{n} M_n}}(z)

but it is much less useful now, because the space of complex polynomials {P(z)} no longer has any good density properties. (For instance, the uniform closure of the space of polynomials on the unit disk is not the space of continuous functions, but rather the space of holomorphic functions that are continuous on the closed unit disk.) In particular, the moments no longer uniquely determine the spectral measure.
This can be illustrated with the shift examples given above. It is easy to see that {U} and {U_\varepsilon} have vanishing moments up to {(n-1)^{th}} order, i.e.

\displaystyle  \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} U)^k = \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} U_\varepsilon)^k = 0

for {k=1,\ldots,n-1}. Thus we have

\displaystyle  \int_{\bf C} z^k\ d\mu_{\frac{1}{\sqrt{n} U}}(z) = \int_{\bf C} z^k\ d\mu_{\frac{1}{\sqrt{n} U_\varepsilon}}(z) = 0

for {k=1,\ldots,n-1}. Despite this enormous number of matching moments, the spectral measures {\mu_{\frac{1}{\sqrt{n} U}}} and {\mu_{\frac{1}{\sqrt{n} U_\varepsilon}}} are vastly different; the former is a Dirac mass at the origin, while the latter can be arbitrarily close to the unit circle. Indeed, even if we set all moments equal to zero,

\displaystyle  \int_{\bf C} z^k\ d\mu = 0

for {k=1,2,\ldots}, then there are an uncountable number of possible (continuous) probability measures that could still be the (asymptotic) spectral measure {\mu}: for instance, any measure which is rotationally symmetric around the origin would obey these conditions.
If one could somehow control the mixed moments

\displaystyle  \int_{\bf C} z^k \overline{z}^l\ d\mu_{\frac{1}{\sqrt{n}} M_n}(z) = \frac{1}{n} \sum_{j=1}^n (\frac{1}{\sqrt{n}} \lambda_j(M_n))^k (\frac{1}{\sqrt{n}} \overline{\lambda_j}(M_n))^l

of the spectral measure, then this problem would be resolved, and one could use the moment method to reconstruct the spectral measure accurately. However, there does not appear to be any obvious way to compute this quantity; the obvious guess of {\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k (\frac{1}{\sqrt{n}} M_n^*)^l} works when the matrix {M_n} is normal, as {M_n} and {M_n^*} then share the same basis of eigenvectors, but generically one does not expect these matrices to be normal.

Remark 3 The failure of the moment method to control the spectral measure is consistent with the instability of spectral measure with respect to perturbations, because moments are stable with respect to perturbations.

Exercise 4 Let {k \geq 1} be an integer, and let {M_n} be an iid matrix whose entries have a fixed distribution {\xi} with mean zero, variance {1}, and with {k^{th}} moment finite. Show that {\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k} converges to zero as {n \rightarrow \infty} in expectation, in probability, and in the almost sure sense. Thus we see that {\int_{\bf R} z^k\ d\mu_{\frac{1}{\sqrt{n} M_n}}(z)} converges to zero in these three senses also. This is of course consistent with the circular law, but does not come close to establishing that law, for the reasons given above.

The failure of the moment method also shows that methods of free probability (Notes 5) do not work directly. For instance, observe that for fixed {\varepsilon}, {U_0} and {U_\varepsilon} (in the noncommutative probability space {(\hbox{Mat}_n({\bf C}), \frac{1}{n} \hbox{tr})}) both converge in the sense of {*}-moments as {n \rightarrow \infty} to that of the right shift operator on {\ell^2({\bf Z})} (with the trace {\tau(T) = \langle e_0, T e_0 \rangle}, with {e_0} being the Kronecker delta at {0}); but the spectral measures of {U_0} and {U_\varepsilon} are different. Thus the spectral measure cannot be read off directly from the free probability limit.

— 3. The logarithmic potential —

With the moment method out of consideration, attention naturally turns to the Stieltjes transform

\displaystyle  s_n(z) = \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n - z I)^{-1} = \int_{{\bf C}} \frac{d\mu_{\frac{1}{\sqrt{n}} M_n}(w)}{w-z}.

Even though the measure {\mu_{\frac{1}{\sqrt{n}} M_n}} is now supported on {{\bf C}} rather than {{\bf R}}, the Stieltjes transform is still well-defined. The Plemelj formula for reconstructing spectral measure from the Stieltjes transform that was used in previous notes is no longer applicable, but there are other formulae one can use instead, in particular one has

Exercise 5 Show that

\displaystyle  \mu_{\frac{1}{\sqrt{n}} M_n} = \frac{1}{\pi} \partial_{\bar z} s_n(z)

in the sense of distributions, where

\displaystyle  \partial_{\bar z} := \frac{1}{2}( \frac{\partial}{\partial x} + i \frac{\partial}{\partial y})

is the Cauchy-Riemann operator, and one interprets {\frac{1}{w-z}} in a principal value sense.

One can control the Stieltjes transform quite effectively away from the origin. Indeed, for iid matrices with subgaussian entries, one can show (using the methods from Notes 3) that the operator norm of {\frac{1}{\sqrt{n}} M_n} is {1+o(1)} almost surely; this, combined with (4) and Laurent expansion, tells us that {s_n(z)} almost surely converges to {-1/z} locally uniformly in the region {\{ z: |z| > 1 \}}, and that the spectral measure {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to zero in this region (which can of course also be deduced directly from the operator norm bound). This is of course consistent with the circular law, but is not sufficient to prove it (for instance, the above information is also consistent with the scenario in which the spectral measure collapses towards the origin). One also needs to control the Stieltjes transform inside the disk {\{ z: |z| \leq 1 \}} in order to fully control the spectral measure.
For this, existing methods (such as predecessor comparison) are not particularly effective (mainly because of the spectral instability, and also because of the lack of analyticity in the interior of the spectrum). Instead, one proceeds by relating the Stieltjes transform to the logarithmic potential

\displaystyle  f_n(z) := \int_{{\bf C}} \log|w-z| d\mu_{\frac{1}{\sqrt{n}} M_n}(w).

It is easy to see that {s_n(z)} is essentially the (distributional) gradient of {f_n(z)}:

\displaystyle  s_n(z) = (- \frac{\partial}{\partial x} + i \frac{\partial}{\partial y}) f_n(z),

and thus {g_n} is related to the spectral measure by the distributional formula

\displaystyle  \mu_{\frac{1}{\sqrt{n}} M_n} = \frac{1}{2\pi} \Delta f_n \ \ \ \ \ (1)

where {\Delta := \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}} is the Laplacian. (This is of course just reflecting the fact that {\frac{1}{2\pi} \log |z|} is the Newtonian potential in two dimensions.)
In analogy to previous continuity theorems, we have

Theorem 6 (Logarithmic potential continuity theorem) Let {M_n} be a sequence of random matrices with {\| \frac{1}{\sqrt{n}} M_n\|_{op} = O(1)} almost surely, and suppose that for almost every complex number {z}, {f_n(z)} converges almost surely (resp. in probability) to

\displaystyle  f(z) := \int_{\bf C} \log |z-w| d\mu(w)

for some compactly supported probability measure {\mu}. Then {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely (resp. in probability) to {\mu} in the vague topology.

We remark that the bound on the operator norm can be relaxed, for instance to a bound on the Frobenius norm; similarly, the compact support hypothesis on {\mu} can be relaxed to a moment condition such as {\int_{\bf C} |z|^2\ d\mu(z) < \infty}.
Proof: We prove the almost sure version of this theorem, and leave the convergence in probability version as an exercise.
On any bounded set {K} in the complex plane, the functions {\log |\cdot-w|} lie in {L^2(K)} uniformly in {w} for {w} in a bounded set. From Minkowski’s integral inequality, we conclude that the {f_n} and {f} are uniformly bounded in {L^2(K)}. On the other hand, almost surely the {f_n} converge pointwise to {f}. From the dominated convergence theorem this implies that {\min( |f_n-f|, M )} converges in {L^1(K)} to zero for any {M}; using the uniform bound in {L^2(K)} to compare {\min(|f_n-f|,M)} with {|f_n-f|} and then sending {M \rightarrow \infty}, we conclude that {f_n} converges to {f} in {L^1(K)}. In particular, {f_n} converges to {f} in the sense of distributions; taking distributional Laplacians using (1) we obtain the claim. \Box

Exercise 7 Establish the convergence in probability version of Theorem 6.

Thus, the task of establishing the circular law then reduces to showing, for almost every {z}, that the logarithmic potential {f_n(z)} converges (in probability or almost surely) to the right limit {f(z)}.
Observe that the logarithmic potential

\displaystyle  f_n(z) = \frac{1}{n} \sum_{j=1}^n \log|\frac{\lambda_j(M_n)}{\sqrt{n}}-z|

can be rewritten as a log-determinant:

\displaystyle  f_n(z) = \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} M_n - zI )|.

To compute this determinant, we recall that the determinant of a matrix {A} is not only the product of its eigenvalues, but also has a magnitude equal to the product of its singular values:

\displaystyle  |\det A| = \prod_{j=1}^n \sigma_j(A) = \prod_{j=1}^n \lambda_j(A^* A)^{1/2}

and thus

\displaystyle  f_n(z) = \frac{1}{2} \int_0^\infty \log x\ d\nu_{n,z}(x)

where {d\nu_{n,z}} is the spectral measure of the matrix {(\frac{1}{\sqrt{n}} M_n - zI)^* (\frac{1}{\sqrt{n}} M_n - zI)}.
The advantage of working with this spectral measure, as opposed to the original spectral measure {\mu_{\frac{1}{\sqrt{n}} M_n}}, is that the matrix {(\frac{1}{\sqrt{n}} M_n - zI)^* (\frac{1}{\sqrt{n}} M_n - zI)} is self-adjoint, and so methods such as the moment method or free probability can now be safely applied to compute the limiting spectral distribution. Indeed, Girko established that for almost every {z}, {\nu_{n,z}} converged both in probability and almost surely to an explicit (though slightly complicated) limiting measure {\nu_z} in the vague topology. Formally, this implied that {f_n(z)} would converge pointwise (almost surely and in probability) to

\displaystyle  \frac{1}{2} \int_0^\infty \log x\ d\nu_z(x).

A lengthy but straightforward computation then showed that this expression was indeed the logarithmic potential {f(z)} of the circular measure {\mu_{circ}}, so that the circular law would then follow from the logarithmic potential continuity theorem.
Unfortunately, the vague convergence of {\nu_{n,z}} to {\nu_z} only allows one to deduce the convergence of {\int_0^\infty F(x)\ d\nu_{n,z}} to {\int_0^\infty F(x)\ d\nu_{z}} for {F} continuous and compactly supported. Unfortunately, {\log x} has singularities at zero and at infinity, and so the convergence

\displaystyle \int_0^\infty \log x\ d\nu_{n,z}(x) \rightarrow \int_0^\infty \log x\ d\nu_{z}(x)

can fail if the spectral measure {\nu_{n,z}} sends too much of its mass to zero or to infinity.
The latter scenario can be easily excluded, either by using operator norm bounds on {M_n} (when one has enough moment conditions) or even just the Frobenius norm bounds (which require no moment conditions beyond the unit variance). The real difficulty is with preventing mass from going to the origin.
The approach of Bai proceeded in two steps. Firstly, he established a polynomial lower bound

\displaystyle  \sigma_n( \frac{1}{\sqrt{n}} M_n - z I ) \geq n^{-C}

asymptotically almost surely for the least singular value of {\frac{1}{\sqrt{n}} M_n - z I}. This has the effect of capping off the {\log x} integrand to be of size {O(\log n)}. Next, by using Stieltjes transform methods, the convergence of {\nu_{n,z}} to {\nu_z} in an appropriate metric (e.g. the Levi distance metric) was shown to be polynomially fast, so that the distance decayed like {O(n^{-c})} for some {c>0}. The {O(n^{-c})} gain can safely absorb the {O(\log n)} loss, and this leads to a proof of the circular law assuming enough boundedness and continuity hypotheses to ensure the least singular value bound and the convergence rate. This basic paradigm was also followed by later works such as that of Gotze-Tikhomirov, Pan-Zhou, and Tao-Vu, with the main new ingredient being the advances in the understanding of the least singular value (Notes 7).
Unfortunately, to get the polynomial convergence rate, one needs some moment conditions beyond the zero mean and unit variance rate (e.g. finite {2+\eta^{th}} moment for some {\eta>0}). In my paper with Vu and Krishnapur, we used the additional tool of the Talagrand concentration inequality to eliminate the need for the polynomial convergence. Intuitively, the point is that only a small fraction of the singular values of {\frac{1}{\sqrt{n}} M_n - zI} are going to be as small as {n^{-c}}; most will be much larger than this, and so the {O(\log n)} bound is only going to be needed for a small fraction of the measure. To make this rigorous, it turns out to be convenient to work with a slightly different formula for the determinant magnitude {|\det( A )|} of a square matrix than the product of the eigenvalues, namely the base-times-height formula

\displaystyle  |\det( A )| = \prod_{j=1}^n \hbox{dist}(X_j, V_j)

where {X_j} is the {j^{th}} row and {V_j} is the span of {X_1,\ldots,X_{j-1}}.

Exercise 8 Establish the inequality

\displaystyle  \prod_{j=n+1-m}^n \sigma_j(A) \leq \prod_{j=1}^m \hbox{dist}(X_j, V_j) \leq \prod_{j=1}^m \sigma_j(A)

for any {1 \leq m \leq n}. (Hint: the middle product is the product of the singular values of the first {m} rows of {A}, and so one should try to use the Cauchy interlacing inequality for singular values, see Notes 3a.) Thus we see that {\hbox{dist}(X_j, V_j)} is a variant of {\sigma_j(A)}.

The least singular value bounds, translated in this language (with {A := \frac{1}{\sqrt{n}} M_n - zI}), tell us that {\hbox{dist}(X_j,V_j) \geq n^{-C}} with high probability; this lets ignore the most dangerous values of {j}, namely those {j} that are equal to {n - O(n^{0.99})} (say). For low values of {j}, say {j \leq (1-\delta) n} for some small {\delta}, one can use the moment method to get a good lower bound for the distances and the singular values, to the extent that the logarithmic singularity of {\log x} no longer causes difficulty in this regime; the limit of this contribution can then be seen by moment method or Stieltjes transform techniques to be universal in the sense that it does not depend on the precise distribution of the components of {M_n}. In the medium regime {(1-\delta)n < j < n-n^{0.99}}, one can use Talagrand’s inequality to show that {\hbox{dist}(X_j,V_j)} has magnitude about {\sqrt{n-j}}, giving rise to a net contribution to {f_n(z)} of the form {\frac{1}{n} \sum_{(1-\delta)n < j < n - n^{0.99}} O( \log \sqrt{n-j} )}, which is small. Putting all this together, one can show that {f_n(z)} converges to a universal limit as {n \rightarrow \infty} (independent of the component distributions); see my paper with Vu and Krishnapur for details. As a consequence, once the circular law is established for one class of iid matrices, such as the complex gaussian random matrix ensemble, it automatically holds for all other ensembles also.

— 4. Brown measure —

We mentioned earlier that due to eigenvalue instability (or equivalently, due to the least singular value of shifts possibly going to zero), the moment method (and thus, by extension, free probability) was not sufficient by itself to compute the asymptotic spectral measure of non-Hermitian matrices in the large {n} limit. However, this method can be used to give a heuristic prediction as to what that measure is, known as the Brown measure, introduced by Brown. While Brown measure is not always the limiting spectral measure of a sequence of matrices, it turns out in practice that this measure can (with some effort) be shown to be the limiting spectral measure in key cases. As Brown measure can be computed (again, after some effort) in many cases, this gives a general strategy towards computing asymptotic spectral measure for various ensembles.
To define Brown measure, we use the language of free probability (Notes 5). Let {u} be a bounded element (not necessarily self-adjoint) of a non-commutative probability space {({\mathcal A}, \tau)}, which we will assume to be tracial. To derive Brown measure, we mimic the Girko strategy used for the circular law. Firstly, for each complex number {z}, we let {\nu_z} be the spectral measure of the non-negative self-adjoint element {(u-z)^* (u-z)}.

Exercise 9 Verify that the spectral measure of a positive element {u^* u} is automatically supported on the non-negative real axis. (Hint: Show that {\tau( P(u^* u) u^* u P(u^* u) ) \geq 0} for any real polynomial {P}, and use the spectral theorem.)

By the above exercise, {\nu_z} is a compactly supported probability measure on {[0,+\infty)}. We then define the logarithmic potential {f(z)} by the formula

\displaystyle  f(z) = \frac{1}{2} \int_0^\infty \log x\ d\nu_z(x).

Note that {f} may equal {-\infty} at some points.
To understand this determinant, we introduce the regularised determinant

\displaystyle  f_\varepsilon(z) := \frac{1}{2} \int_0^\infty \log (\varepsilon + x)\ d\nu_z(x)

for {\varepsilon > 0}. From the monotone convergence theorem we see that {f_\varepsilon(z)} decreases pointwise to {f(z)} as {\varepsilon \rightarrow 0}.
We now invoke the Gelfand-Naimark theorem and embed {{\mathcal A}} into the space of bounded operators on {L^2(\tau)}, so that we may now obtain a functional calculus. (If {\tau} is not faithful, this embedding need not be injective, but this will not be an issue in what follows.) Then we can write

\displaystyle  f_\varepsilon(z) = \frac{1}{2} \tau( \log( \varepsilon + (u-z)^* (u-z) ) ).

One can compute the first variation of {f_\varepsilon}:

Exercise 10 Let {\varepsilon > 0}. Show that the function {f_\varepsilon} is continuously differentiable with

\displaystyle  \partial_x f_\varepsilon(z) = -\hbox{Re} \tau( (\varepsilon + (u-z)^* (u-z))^{-1} (u-z) )


\displaystyle  \partial_y f_\varepsilon(z) = -\hbox{Im} \tau( (\varepsilon + (u-z)^* (u-z))^{-1} (u-z) ).

Then, one can compute the second variation at, say, the origin:

Exercise 11 Let {\varepsilon > 0}. Show that the function {f_\varepsilon} is twice continuously differentiable with

\displaystyle  \partial_{xx} f_\varepsilon(0) = \hbox{Re} \tau( (\varepsilon + u^* u)^{-1} - (\varepsilon + u^* u)^{-1} (u+u^*) (\varepsilon + u^* u)^{-1} u )


\displaystyle  \partial_{yy} f_\varepsilon(0) = \hbox{Re} \tau( (\varepsilon + u^* u)^{-1} - (\varepsilon + u^* u)^{-1} (u^*-u) (\varepsilon + u^* u)^{-1} u ).

We conclude in particular that

\displaystyle  \Delta f_\varepsilon(0) = 2\hbox{Re} \tau( (\varepsilon + u^* u)^{-1} - (\varepsilon + u^* u)^{-1} u^* (\varepsilon + u^* u)^{-1} u )

or equivalently

\displaystyle  \Delta f_\varepsilon(0) = 2 ( \| (\varepsilon + u^* u)^{-1/2} \|_{L^2(\tau)}^2 - \| (\varepsilon + u^* u)^{-1/2} u (\varepsilon + u^* u)^{-1/2} \|_{L^2(\tau)}^2 ).

Exercise 12 Show that

\displaystyle  \| (\varepsilon + u^* u)^{-1/2} u (\varepsilon + u^* u)^{-1/2} \|_{L^2(\tau)} \leq \| (\varepsilon + u^* u)^{-1/2} \|_{L^2(\tau)}.

(Hint: Adapt the proof of Lemma 5 from Notes 5.)

We conclude that {\Delta f_\varepsilon} is non-negative at zero. Translating {u} by any complex number we see that {\Delta f_\varepsilon} is non-negative everywhere, that is to say that {f_\varepsilon} is subharmonic. Taking limits we see that {f} is subharmonic also; thus if we define the Brown measure {\mu = \mu_u} of {u} as

\displaystyle  \mu := \frac{1}{2\pi} \Delta f

(cf. (1)) then {\mu} is a non-negative measure.

Exercise 13 Show that for {|z| > \rho(u) := \rho(u^* u)^{1/2}}, {f} is continuously differentiable with

\displaystyle  \partial_x f(z) = -\hbox{Re} \tau( (u-z)^{-1} )


\displaystyle  \partial_y f(z) = \hbox{Im} \tau( (u-z)^{-1} )

and conclude that {f} is harmonic in this region; thus Brown measure is supported in the disk {\{ z: |z| \leq \rho(u) \}}. Using Green’s theorem, conclude also that Brown measure is a probability measure.

Exercise 14 In a finite-dimensional non-commutative probability space {(\hbox{Mat}_n({\bf C}), \frac{1}{n} \hbox{tr})}, show that Brown measure is the same as spectral measure.

Exercise 15 In a commutative probability space {(L^\infty(\Omega), {\bf E})}, show that Brown measure is the same as the probability distribution.

Exercise 16 If {u} is the left shift on {\ell^2({\bf Z})} (with the trace {\tau(T) := \langle T e_0, e_0 \rangle}), show that the Brown measure of {u} is the uniform measure on the unit circle {\{ z \in {\bf C}: |z|=1\}}.

This last exercise illustrates the limitations of Brown measure for understanding asymptotic spectral measure. The shift {U_0} and the perturbed shift {U_\varepsilon} introduced in previous sections both converge in the sense of {*}-moments as {n \rightarrow \infty} (holding {\varepsilon} fixed) to the left shift {u}. For non-zero {\varepsilon}, the spectral measure of {U_\varepsilon} does indeed converge to the Brown measure of {u}, but for {\varepsilon=0} this is not the case. This illustrates a more general principle, that Brown measure is the right asymptotic limit for “generic” matrices, but not for exceptional matrices. (See this paper of Sniady for a precise formulation of this heuristic, using gaussian regularisation.)
The machinery used to establish the circular law in full generality can be used to show that Brown measure is the correct asymptotic spectral limit for other models:

Theorem 17 Let {M_n} be a sequence of random matrices whose entries are jointly independent and with all moments uniformly bounded, with variance uniformly bounded from below, and which converges in the sense of {*}-moments to an element {u} of a non-commutative probability space. Then the spectral measure {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely and in probability to the Brown measure of {u}.

This theorem is essentially Theorem 1.20 of my paper with Van Vu and Manjunath Krishnapur. The main ingredients are those mentioned earlier, namely a polynomial lower bound on the least singular value, and the use of Talagrand’s inequality to control medium singular values (or medium codimension distances to subspaces). Of the two ingredients, the former is more crucial, and is much more heavily dependent at present on the joint independence hypothesis; it would be of interest to see how to obtain lower bounds on the least singular value in more general settings.