We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble {M_n = (\xi_{ij})_{1 \leq i,j \leq n}} of Hermitian matrices (thus {\xi_{ij} = \overline{\xi_{ji}}}; this includes real symmetric matrices as an important special case), in which the upper-triangular entries {\xi_{ij}}, {i>j} are iid complex random variables with mean zero and unit variance, and the diagonal entries {\xi_{ii}} are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).

In previous notes we saw that the operator norm of {M_n} was typically of size {O(\sqrt{n})}, so it is natural to work with the normalised matrix {\frac{1}{\sqrt{n}} M_n}. Accordingly, given any {n \times n} Hermitian matrix {M_n}, we can form the (normalised) empirical spectral distribution (or ESD for short)

\displaystyle \mu_{\frac{1}{\sqrt{n}} M_n} := \frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j(M_n) / \sqrt{n}},

of {M_n}, where {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} are the (necessarily real) eigenvalues of {M_n}, counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of {M_n}.

When {M_n} is a random matrix ensemble, then the ESD {\mu_{\frac{1}{\sqrt{n}} M_n}} is now a random measure – i.e. a random variable taking values in the space {\hbox{Pr}({\mathbb R})} of probability measures on the real line. (Thus, the distribution of {\mu_{\frac{1}{\sqrt{n}} M_n}} is a probability measure on probability measures!)

Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles {M_n} as {n \rightarrow \infty}. Recall from Notes 0 that for any sequence of random variables in a {\sigma}-compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on {{\mathbb R}}, and to deterministic limits, we see that a sequence of random ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} converge in probability (resp. converge almost surely) to a deterministic limit {\mu \in \hbox{Pr}({\mathbb R})} (which, confusingly enough, is a deterministic probability measure!) if, for every test function {\varphi \in C_c({\mathbb R})}, the quantities {\int_{\mathbb R} \varphi\ d\mu_{\frac{1}{\sqrt{n}} M_n}} converge in probability (resp. converge almost surely) to {\int_{\mathbb R} \varphi\ d\mu}.

Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD {\mu_{\frac{1}{\sqrt{n}} M_n}}, one can form its expectation {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} \in \hbox{Pr}({\mathbb R})}, defined via duality (the Riesz representation theorem) as

\displaystyle \int_{\mathbb R} \varphi\ d{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} := {\bf E} \int_{\mathbb R} \varphi\ d \mu_{\frac{1}{\sqrt{n}} M_n};

this probability measure can be viewed as the law of a random eigenvalue {\frac{1}{\sqrt{n}}\lambda_i(M_n)} drawn from a random matrix {M_n} from the ensemble. We then say that the ESDs converge in expectation to a limit {\mu \in \hbox{Pr}({\mathbb R})} if {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}} converges the vague topology to {\mu}, thus

\displaystyle {\bf E} \int_{\mathbb R} \varphi\ d \mu_{\frac{1}{\sqrt{n}} M_n} \rightarrow \int_{\mathbb R} \varphi\ d\mu

for all {\phi \in C_c({\mathbb R})}.

In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.

Exercise 1 Let {M_n} be a sequence of {n \times n} Hermitian matrix ensembles, and let {\mu} be a continuous probability measure on {{\mathbb R}}.

  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to {\mu} if and only if {\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges almost surely to {\mu(-\infty,\lambda)} for all {\lambda \in {\mathbb R}}.
  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges in probability to {\mu} if and only if {\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges in probability to {\mu(-\infty,\lambda)} for all {\lambda \in {\mathbb R}}.
  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges in expectation to {\mu} if and only if {\mathop{\mathbb E} \mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges to {\mu(-\infty,\lambda)} for all {\lambda \in {\mathbb R}}.

We can now state the Wigner semi-circular law.

Theorem 1 (Semicircular law) Let {M_n} be the top left {n \times n} minors of an infinite Wigner matrix {(\xi_{ij})_{i,j \geq 1}}. Then the ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution

\displaystyle \mu_{sc} := \frac{1}{2\pi} (4-|x|^2)^{1/2}_+\ dx. \ \ \ \ \ (1)

A numerical example of this theorem in action can be seen at the MathWorld entry for this law.

The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices {\frac{1}{\sqrt{n}} M_n} almost surely has operator norm at most {2+o(1)}. Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval {[-2,2]} with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of {\frac{1}{\sqrt{n}} M_n} is almost surely at least {2-o(1)}, and thus (in the finite fourth moment case) the norm is in fact equal to {2+o(1)}. Indeed, we have just shown that the semi-circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).

As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.

There are several ways to prove (or at least to heuristically justify) the semi-circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).

— 1. Preliminary reductions —

Before we begin any of the proofs of the semi-circular law, we make some simple observations which will reduce the difficulty of the arguments in the sequel.

The first observation is that the Cauchy interlacing law (Exercise 14 from Notes 3a) shows that the ESD of {\frac{1}{\sqrt{n}} M_n} is very stable in {n}. Indeed, we see from the interlacing law that

\displaystyle \frac{n}{m} \mu_{\frac{1}{\sqrt{n}} M_n}( -\infty, \lambda / \sqrt{n}) - \frac{n-m}{m} \leq \mu_{\frac{1}{\sqrt{m}} M_m}( -\infty, \lambda / \sqrt{m})

\displaystyle \leq \frac{n}{m} \mu_{\frac{1}{\sqrt{n}} M_n}( -\infty, \lambda / \sqrt{n})

for any threshold {\lambda} and any {n > m > 0}.

Exercise 2 Using this observation, show that to establish the semi-circular law (in any of the three senses of convergence), it suffices to do so for an arbitrary lacunary sequence {n_1, n_2, \ldots} of {n} (thus {n_{j+1}/n_j \geq c} for some {c>1} and all {j}).

The above lacunary reduction does not help one establish convergence in probability or expectation, but will be useful when establishing almost sure convergence, as it significantly reduces the inefficiency of the union bound. (Note that a similar lacunary reduction was also used to prove the strong law of large numbers in Notes 1.)

Next, we exploit the stability of the ESD with respect to perturbations, by taking advantage of the Weilandt-Hoffmann inequality

\displaystyle \sum_{j=1}^n |\lambda_j(A+B)-\lambda_j(A)|^2 \leq \|B\|_F^2 \ \ \ \ \ (2)

for Hermitian matrices {A, B}, where {\|B\|_F := (\hbox{tr} B^2)^{1/2}} is the Frobenius norm of {B}. (This inequality was established in Exercise 6 or Exercise 11 of Notes 3a.) We convert this inequality into an inequality about ESDs:

Lemma 2 For any {n \times n} Hermitian matrices {A, B}, any {\lambda}, and any {\epsilon > 0}, we have

\displaystyle \mu_{\frac{1}{\sqrt{n}}(A+B)}(-\infty, \lambda) \leq \mu_{\frac{1}{\sqrt{n}}(A)}(-\infty, \lambda+\epsilon) + \frac{1}{\epsilon^2 n^2} \|B\|_F^2

and similarly

\displaystyle \mu_{\frac{1}{\sqrt{n}}(A+B)}(-\infty, \lambda) \geq \mu_{\frac{1}{\sqrt{n}}(A)}(-\infty, \lambda-\epsilon) - \frac{1}{\epsilon^2 n^2} \|B\|_F^2.

Proof: We just prove the first inequality, as the second is similar (and also follows from the first, by reversing the sign of {A, B}).

Let {\lambda_i(A+B)} be the largest eigenvalue of {A+B} less than {\lambda \sqrt{n}}, and let {\lambda_j(A)} be the largest eigenvalue of {A} less than {(\lambda+\epsilon) \sqrt{n}}. Our task is to show that

\displaystyle i \leq j + \frac{1}{\epsilon^2 n} \|B\|_F^2.

If {i \leq j} then we are clearly done, so suppose that {i>j}. Then we have {|\lambda_l(A+B)-\lambda_l(A)| \geq \epsilon \sqrt{n}} for all {j < l \leq i}, and hence

\displaystyle \sum_{j=1}^n |\lambda_j(A+B)-\lambda_j(A)|^2 \geq \epsilon^2 (j-i) n.

The claim now follows from (2). \Box

This has the following corollary:

Exercise 3 (Stability of ESD laws wrt small perturbations) Let {M_n} be a sequence of random Hermitian matrix ensembles such that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to a limit {\mu}. Let {N_n} be another sequence of Hermitian random matrix ensembles such that {\frac{1}{n^2} \|N_n\|_F^2} converges almost surely to zero. Show that {\mu_{\frac{1}{\sqrt{n}}(M_n+N_n)}} converges almost surely to {\mu}.

Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.

Informally, this exercise allows us to discard any portion of the matrix which is {o(n)} in the Frobenius norm. For instance, the diagonal entries of {M_n} have a Frobenius norm of {O(\sqrt{n})} almost surely, by the strong law of large numbers. Hence, without loss of generality, we may set the diagonal equal to zero for the purposes of the semi-circular law.

One can also remove any component of {M_n} that is of rank {o(n)}:

Exercise 4 (Stability of ESD laws wrt small rank perturbations) Let {M_n} be a sequence of random Hermitian matrix ensembles such that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to a limit {\mu}. Let {N_n} be another sequence of random Hermitian matrix ensembles such that {\frac{1}{n} \hbox{rank}(N_n)} converges almost surely to zero. Show that {\mu_{\frac{1}{\sqrt{n}}(M_n+N_n)}} converges almost surely to {\mu}. (Hint: use the Weyl inequalities instead of the Wielandt-Hoffman law.)

Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.

In a similar vein, we may apply the truncation argument (much as was done for the central limit theorem in Notes 2) to reduce the semi-circular law to the bounded case:

Exercise 5 Show that in order to prove the semi-circular law (in the almost sure sense), it suffices to do so under the additional hypothesis that the random variables are bounded. Similarly for the convergence in probability or in expectation senses.

Remark 2 These facts ultimately rely on the stability of eigenvalues with respect to perturbations. This stability is automatic in the Hermitian case, but for non-symmetric matrices, serious instabilities can occur due to the presence of pseudospectrum. We will discuss this phenomenon more in later lectures (but see also this earlier blog post).

— 2. The moment method —

We now prove the semi-circular law via the method of moments, which we have already used several times in the previous notes. In order to use this method, it is convenient to use the preceding reductions to assume that the coefficients are bounded, the diagonal vanishes, and that {n} ranges over a lacunary sequence. We will implicitly assume these hypotheses throughout the rest of the section.

As we have already discussed the moment method extensively, much of the argument here will be delegated to exercises. A full treatment of these computations can be found in the book of Bai and Silverstein.

The basic starting point is the observation that the moments of the ESD {\mu_{\frac{1}{\sqrt{n}} M_n}} can be written as normalised traces of powers of {M_n}:

\displaystyle \int_{\mathbb R} x^k\ d\mu_{\frac{1}{\sqrt{n}} M_n}(x) = \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k. \ \ \ \ \ (3)

In particular, on taking expectations, we have

\displaystyle \int_{\mathbb R} x^k\ d{\bf E}\mu_{\frac{1}{\sqrt{n}} M_n}(x) = {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k.

From concentration of measure (and the Bai-Yin theorem) for the operator norm of a random matrix (Proposition 7 of Notes 3), we see that the {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}} are uniformly subgaussian, indeed we have

\displaystyle {\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}\{ |x| \geq \lambda \} \leq C e^{-c \lambda^2 n^2}

for {\lambda > C}, where {C, c} are absolute (so the decay in fact improves quite rapidly with {n}). From this and the moment continuity theorem (Theorem 4 of Notes 2), we can now establish the semi-circular law through computing the mean and variance of moments:

Exercise 6

  • Show that to prove convergence in expectation to the semi-circular law, it suffices to show that

    \displaystyle {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k = \int_{\mathbb R} x^k\ d\mu_{sc}(x) + o_k(1) \ \ \ \ \ (4)

    for {k=1,2,\ldots}, where {o_k(1)} is an expression that goes to zero as {n \rightarrow \infty} for fixed {k} (and fixed choice of coefficient distribution {\xi}).

  • Show that to prove convergence in probability to the semi-circular law, it suffices to show (4) together with the variance bound

    \displaystyle {\bf Var}(\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k) = o_k(1) \ \ \ \ \ (5)

    for {k=1,2,\ldots}.

  • Show that to prove almost sure convergence to the semi-circular law, it suffices to show (4) together with the variance bound

    \displaystyle {\bf Var}(\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k) = O_k(n^{-c_k}) \ \ \ \ \ (6)

    for {k=1,2,\ldots} and some {c_k>0}. (Note here that it is useful to restrict {n} to a lacunary sequence!)

Ordinarily, computing second-moment quantities such as the left-hand side of (5) is harder than computing first-moment quantities such as (4). But one can obtain the required variance bounds from concentration of measure:

Exercise 7

  • When {k} is a positive even integer, Use Talagrand’s inequality and convexity of the Schatten norm {\|A\|_{S^k} = (\hbox{tr}(A^k))^{1/k}} to establish (6) (and hence (5)) when {k} is even.
  • For {k} odd, the formula {\|A\|_{S^k} = (\hbox{tr}(A^k))^{1/k}} still applies as long as {A} is positive definite. Applying this observation, the Bai-Yin theorem, and Talagrand’s inequality to the {S^k} norms of {\frac{1}{\sqrt{n}} M_n + c I_n} for a constant {c>2}, establish (6) (and hence (5)) when {k} is odd also.

Remark 3 More generally, concentration of measure results (such as Talagrand’s inequality) can often be used to automatically upgrade convergence in expectation to convergence in probability or almost sure convergence. We will not attempt to formalise this principle here.

It is not difficult to establish (6), (5) through the moment method as well. Indeed, recall from Theorem 10 of Notes 3 that we have the expected moment

\displaystyle {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k = C_{k/2} + o_k(1) \ \ \ \ \ (7)

for all {k=1,2,\ldots}, where the Catalan number {C_{k/2}} is zero when {k} is odd, and is equal to

\displaystyle C_{k/2} := \frac{k!}{(k/2+1)! (k/2)!} \ \ \ \ \ (8)

for {k} even.

Exercise 8 By modifying the proof of that theorem, show that

\displaystyle {\bf E} |\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k|^2 = C_{k/2}^2 + o_k(1) \ \ \ \ \ (9)

and deduce (5). By refining the error analysis (e.g. using Theorem 12 of Notes 3, also establish (6).

In view of the above computations, the establishment of the semi-circular law now reduces to computing the moments of the semi-circular distribution:

Exercise 9 Show that for any {k=1,2,3,\ldots}, one has

\displaystyle \int_{\mathbb R} x^k\ d\mu_{sc}(x) = C_{k/2}.

(Hint: use a trigonometric substitution {x = 2 \cos \theta}, and then express the integrand in terms of Fourier phases {e^{in\theta}}.)

This concludes the proof of the semi-circular law (for any of the three modes of convergence).

Remark 4 In the spirit of the Lindeberg exchange method, observe that Exercise (9) is unnecessary if one already knows that the semi-circular law holds for at least one ensemble of Wigner matrices (e.g. the GUE ensemble). Indeed, Exercise 9 can be deduced from such a piece of knowledge. In such a situation, it is not necessary to actually compute the main term {C_{k/2}} on the right of (4); it would be sufficient to know that that limit is universal, in that it does not depend on the underlying distribution. In fact, it would even suffice to establish the slightly weaker statement

\displaystyle {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k = {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M'_n)^k + o_k(1)

whenever {M_n, M'_n} are two ensembles of Wigner matrices arising from different underlying distributions (but still normalised to have mean zero, unit variance, and to be bounded (or at worst subgaussian)). We will take advantage of this perspective later in these notes.

— 3. The Stieltjes transform method —

The moment method was computationally intensive, but straightforward. As noted in Remark 4, even without doing much of the algebraic computation, it is clear that the moment method will show that some universal limit for Wigner matrices exists (or, at least, that the differences between the distributions of two different Wigner matrices converge to zero). But it is not easy to see from this method why the limit should be given by the semi-circular law, as opposed to some other distribution (although one could eventually work this out from an inverse moment computation).

When studying the central limit theorem, we were able to use the Fourier method to control the distribution of random matrices in a cleaner way than in the moment method. Analogues of this method exist, but require non-trivial formulae from noncommutative Fourier analysis, such as the Harish-Chandra integration formula (and also only work for highly symmetric ensembles, such as GUE or GOE), and will not be discussed in this course. (Our later notes on determinantal processes, however, will contain some algebraic identities related in some ways to the noncommutative Fourier-analytic approach.)

We now turn to another method, the Stieltjes transform method, which uses complex-analytic methods rather than Fourier-analytic methods, and has turned out to be one of the most powerful and accurate tools in dealing with the ESD of random Hermitian matrices. Whereas the moment method started from the identity (3), the Stieltjes transform method proceeds from the identity

\displaystyle \int_{\mathbb R} \frac{1}{x-z}\ d\mu_{\frac{1}{\sqrt{n}} M_n}(x) = \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n-zI)^{-1}

for any complex {z} not in the support of {\mu_{\frac{1}{\sqrt{n}} M_n}}. We refer to the expression on the left-hand side as the Stieltjes transform of {M_n} or of {\mu_{\frac{1}{\sqrt{n}} M_n}}, and denote it by {s_{\mu_{\frac{1}{n}} M_n}} or as {s_n} for short. The expression {(\frac{1}{\sqrt{n}} M_n-zI)^{-1}} is the normalised resolvent of {M_n}, and plays an important role in the spectral theory of that matrix. Indeed, in contrast to general-purpose methods such as the moment method, the Stieltjes transform method draws heavily on the specific linear-algebraic structure of this problem, and in particular on the rich structure of resolvents.

On the other hand, the Stieltjes transform can be viewed as a generating function of the moments via the Taylor series expansion

\displaystyle s_n(z) = -\frac{1}{z} - \frac{1}{z^2} \frac{1}{n} \hbox{tr} M_n - \frac{1}{z^3} \frac{1}{n} \hbox{tr} M_n^2 - \ldots,

valid for {z} sufficiently large. This is somewhat (though not exactly) analogous to how the characteristic function {{\bf E} e^{itX}} of a scalar random variable can be viewed as a generating function of the moments {{\bf E} X^k}.

Now let us study the Stieltjes transform more systematically. Given any probability measure {\mu} on the real line, we can form its Stieltjes transform

\displaystyle s_\mu(z) := \int_{\mathbb R} \frac{1}{x-z}\ d\mu(x)

for any {z} outside of the support of {\mu}; in particular, the Stieltjes transform is well-defined on the upper and lower half-planes in the complex plane. Even without any further hypotheses on {\mu} other than it is a probability measure, we can say a remarkable amount about how this transform behaves in {z}. Applying conjugations we obtain the symmetry

\displaystyle \overline{s_\mu(z)} = s_\mu(\overline{z}) \ \ \ \ \ (10)

so we may as well restrict attention to {z} in the upper half-plane (say). Next, from the trivial bound

\displaystyle |\frac{1}{x-z}| \leq \frac{1}{|\hbox{Im}(z)|}

one has the pointwise bound

\displaystyle |s_\mu(z)| \leq \frac{1}{|\hbox{Im}(z)|}. \ \ \ \ \ (11)

In a similar spirit, an easy application of dominated convergence gives the asymptotic

\displaystyle s_\mu(z) = \frac{-1+o_\mu(1)}{z} \ \ \ \ \ (12)

where {o_\mu(1)} is an expression that, for any fixed {\mu}, goes to zero as {z} goes to infinity non-tangentially in the sense that {|\hbox{Re}(z)|/|\hbox{Im(z)}|} is kept bounded, where the rate of convergence is allowed to depend on {\mu}. From differentiation under the integral sign (or an application of Morera’s theorem and Fubini’s theorem) we see that {s_\mu(z)} is complex analytic on the upper and lower half-planes; in particular, it is smooth away from the real axis. From the Cauchy integral formula (or differentiation under the integral sign) we in fact get some bounds for higher derivatives of the Stieltjes transform away from this axis:

\displaystyle |\frac{d^j}{dz^j} s_\mu(z)| = O_j( \frac{1}{|\hbox{Im}(z)|^{j+1}} ). \ \ \ \ \ (13)

Informally, {s_\mu} “behaves like a constant” at scales significantly less than the distance {|\hbox{Im}(z)|} to the real axis; all the really interesting action here is going on near that axis.

The imaginary part of the Stieltjes transform is particularly interesting. Writing {z = a+ib}, we observe that

\displaystyle \hbox{Im}\frac{1}{x-z} = \frac{b}{(x-a)^2 + b^2} > 0

and so we see that

\displaystyle \hbox{Im}( s_\mu(z) ) > 0

for {z} in the upper half-plane; thus {s_\mu} is a complex-analytic map from the upper half-plane to itself, a type of function known as a Herglotz function. (In fact, all complex-analytic maps from the upper half-plane to itself that obey the asymptotic (12) are of this form; this is a special case of the Herglotz representation theorem, which also gives a slightly more general description in the case when the asymptotic (12) is not assumed. A good reference for this material and its consequences is this book of Garnett.)

One can also express the imaginary part of the Stieltjes transform as a convolution

\displaystyle \hbox{Im}( s_\mu(a+ib) ) = \pi \mu * P_b(a) \ \ \ \ \ (14)

where {P_b} is the Poisson kernel

\displaystyle P_b(x) := \frac{1}{\pi} \frac{b}{x^2+b^2} = \frac{1}{b} P_1(\frac{x}{b}).

As is well known, these kernels form a family of approximations to the identity, and thus {\mu * P_b} converges in the vague topology to {\mu} (see e.g. my notes on distributions). Thus we see that

\displaystyle \hbox{Im} s_\mu(\cdot+ib) \rightharpoonup \pi \mu

as {b \rightarrow 0^+} in the vague topology,or equivalently (by (10)) that

\displaystyle \frac{s_\mu(\cdot+ib) - s_\mu(\cdot-ib)}{2\pi i} \rightharpoonup \mu \ \ \ \ \ (15)

as {b \rightarrow 0^+} (this is closely related to the Plemelj formula in potential theory). Thus we see that a probability measure {\mu} can be recovered in terms of the limiting behaviour of the Stieltjes transform on the real axis.

A variant of the above machinery gives us a criterion for convergence:

Exercise 10 (Stieltjes continuity theorem) Let {\mu_n} be a sequence of random probability measures on the real line, and let {\mu} be a deterministic probability measure.

  • {\mu_n} converges almost surely to {\mu} in the vague topology if and only if {s_{\mu_n}(z)} converges almost surely to {s_\mu(z)} for every {z} in the upper half-plane.
  • {\mu_n} converges in probability to {\mu} in the vague topology if and only if {s_{\mu_n}(z)} converges in probability to {s_\mu(z)} for every {z} in the upper half-plane.
  • {\mu_n} converges in expectation to {\mu} in the vague topology if and only if {{\bf E} s_{\mu_n}(z)} converges to {s_\mu(z)} for every {z} in the upper half-plane.

(Hint: The “only if” parts are fairly easy. For the “if” parts, take a test function {\phi \in C_c({\mathbb R})} and approximate {\int_{\mathbb R} \phi\ d\mu} by {\int_{\mathbb R} \phi*P_b\ d\mu = \frac{1}{\pi} \int_{\mathbb R} s_\mu(a+ib) \phi(a)\ da}. Then approximate this latter integral in turn by a Riemann sum, using (13).)

Thus, to prove the semi-circular law, it suffices to show that for each {z} in the upper half-plane, the Stieltjes transform

\displaystyle s_n(z) = s_{\mu_{\frac{1}{\sqrt{n}} M_n}}(z) = \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - zI )^{-1}

converges almost surely (and thus in probability and in expectation) to the Stieltjes transform {s_{\mu_{sc}}(z)} of the semi-circular law.

It is not difficult to compute the Stieltjes transform {s_{\mu_{sc}}} of the semi-circular law, but let us hold off on that task for now, because we want to illustrate how the Stieltjes transform method can be used to find the semi-circular law, even if one did not know this law in advance, by directly controlling {s_n(z)}. We will fix {z=a+ib} to be a complex number not on the real line, and allow all implied constants in the discussion below to depend on {a} and {b} (we will focus here only on the behaviour as {n \rightarrow \infty}).

The main idea here is predecessor comparison: to compare the transform {s_n(z)} of the {n \times n} matrix {M_n} with the transform {s_{n-1}(z)} of the top left {n-1 \times n-1} minor {M_{n-1}}, or of other minors. For instance, we have the Cauchy interlacing law (Exercise 14 from Notes 3a), which asserts that the eigenvalues {\lambda_1(M_{n-1}),\ldots,\lambda_{n-1}(M_{n-1})} of {M_{n-1}} intersperse that of {\lambda_1(M_n),\ldots,\lambda_n(M_n)}. This implies that for a complex number {a+ib} with {b>0}, the difference

\displaystyle \sum_{j=1}^{n-1} \frac{b}{(\lambda_j(M_{n-1})/\sqrt{n}-a)^2 + b^2} - \sum_{j=1}^{n} \frac{b}{(\lambda_j(M_{n})/\sqrt{n}-a)^2 + b^2}

is an alternating sum of evaluations of the function {x \mapsto \frac{b}{(x-a)^2+b^2}}. The total variation of this function is {O( 1 )} (recall that we are suppressing dependence of constaants on {a,b}), and so the alternating sum above is {O(1)}. Writing this in terms of the Stieltjes transform, we conclude that

\displaystyle \sqrt{n(n-1)} s_{n-1}( \frac{\sqrt{n}}{\sqrt{n-1}}(a+ib) ) - n s_n( a+ib ) = O(1).

Applying (13) to approximate {s_{n-1}( \frac{\sqrt{n}}{\sqrt{n-1}}(a+ib) )} by {s_{n-1}(a+ib)}, we conclude that

\displaystyle s_n(a+ib) = s_{n-1}(a+ib) + O( \frac{1}{n} ). \ \ \ \ \ (16)

So for fixed {z=a+ib} away from the real axis, the Stieltjes transform {s_n(z)} is quite stable in {n}.

This stability has the following important consequence. Observe that while the left-hand side of (16) depends on the {n \times n} matrix {M_n}, the right-hand side depends only on the top left minor {M_{n-1}} of that matrix. In particular, it is independent of the {n^{th}} row and column of {M_n}. This implies that this entire row and column has only a limited amount of influence on the Stieltjes transform {s_n(a+ib)}: no matter what value one assigns to this row and column (including possibly unbounded values, as long as one keeps the matrix Hermitian of course), the transform {s_n(a+ib)} can only move by {O( \frac{|a|+|b|}{b^2 n} )}.

By permuting the rows and columns, we obtain that in fact any row or column of {M_n} can influence {s_n(a+ib)} is at most {O( \frac{1}{n} )}. (This is closely related to the observation in Exercise 4 that low rank perturbations do not significantly affect the ESD.) On the other hand, the rows of (the upper triangular portion of) {M_n} are jointly independent. When {M_n} is a Wigner random matrix, we can then apply a standard concentration of measure result, such as McDiarmid’s inequality (Theorem 7 from Notes 1) to conclude concetration of {s_n} around its mean:

\displaystyle {\bf P}( |s_n(a+ib) - {\Bbb E} s_n(a+ib)| \geq \lambda/\sqrt{n} ) \leq C e^{-c\lambda^2} \ \ \ \ \ (17)

for all {\lambda > 0} and some absolute constants {C, c > 0}. (This is not necessarily the strongest concentration result one can establish for the Stieltjes transform, but it will certainly suffice for our discussion here.) In particular, we see from the Borel-Cantelli lemma (Exercise 24 of Notes 0a)that for any fixed {z} away from the real line, {s_n(z) - {\bf E} s_n(z)} converges almost surely (and thus also in probability) to zero. As a consequence, convergence of {s_n(z)} in expectation automatically implies convergence in probability or almost sure convergence.

However, while concentration of measure tells us that {s_n(z)} is close to its mean, it does not shed much light as to what this mean is. For this, we have to go beyond the Cauchy interlacing formula and deal with the resolvent {(\frac{1}{\sqrt{n}} M_n - z I_n)^{-1}} more directly. Firstly, we observe from the linearity of trace that

\displaystyle {\bf E} s_n(z) = \frac{1}{n} \sum_{j=1}^n {\bf E} [ (\frac{1}{\sqrt{n}} M_n - z I_n)^{-1} ]_{jj}

where {[A]_{jj}} denotes the {jj} component of a matrix {A}. Because {M_n} is a Wigner matrix, it is easy to see on permuting the rows and columns that all of the random variables {[ (\frac{1}{\sqrt{n}} M_n - z I_n)^{-1} ]_{jj}} have the same distribution. Thus we may simplify the above formula as

\displaystyle {\bf E} s_n(z) = {\bf E} [ (\frac{1}{\sqrt{n}} M_n - z I_n)^{-1} ]_{nn}. \ \ \ \ \ (18)

So now we have to compute the last entry of an inverse of a matrix. There are of course a number of formulae for this, such as Cramer’s rule. But it will be more convenient here to use a formula based instead on the Schur complement:

Exercise 11 Let {A_n} be a {n \times n} matrix, let {A_{n-1}} be the top left {n-1 \times n-1} minor, let {a_{nn}} be the bottom right entry of {A_n}, let {X \in {\mathbb C}^{n-1}} be the right column of {A_n} with the bottom right entry removed, and let {(X')^* \in ({\mathbb C}^{n-1})^*} be the bottom row with the bottom right entry removed. In other words,

\displaystyle A_n = \begin{pmatrix} A_{n-1} & X \\ (X')^* & a_{nn} \end{pmatrix}.

Assume that {A_n} and {A_{n-1}} are both invertible. Show that

\displaystyle [A_n^{-1}]_{nn} = \frac{1}{a_{nn} - (X')^* A_{n-1}^{-1} X}.

(Hint: Solve the equation {A_n v = e_n}, where {e_n} is the {n^{th}} basis vector, using the method of Schur complements (or from first principles).)

The point of this identity is that it describes (part of) the inverse of {A_n} in terms of the inverse of {A_{n-1}}, which will eventually provide a non-trivial recursive relationship between {s_n(z)} and {s_{n-1}(z)}, which can then be played off against (16) to solve for {s_n(z)} in the asymptotic limit {n \rightarrow \infty}.

In our situation, the matrix {\frac{1}{\sqrt{n}} M_n - z I_n} and its minor {\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1}} is automatically invertible. Inserting the above formula into (18) (and recalling that we normalised the diagonal of {M_n} to vanish), we conclude that

\displaystyle {\bf E} s_n(z) = - {\bf E} \frac{1}{z + \frac{1}{n} X^* (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1} X }, \ \ \ \ \ (19)

where {X \in {\mathbb C}^{n-1}} is the top right column of {M_n} with the bottom entry {\xi_{nn}} removed.

One may be concerned that the denominator here could vanish. However, observe that {z} has imaginary part {b} if {z=a+ib}. Furthermore, from the spectral theorem we see that the imaginary part of {(\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1}} is positive definite, and so {X^* (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1} X} has non-negative imaginary part. As a consequence the magnitude of the denominator here is bounded below by {|b|}, and so its reciprocal is {O(1)} (compare with (11)). So the reciprocal here is not going to cause any discontinuity, as we are considering {b} is fixed and non-zero.

Now we need to understand the expression {X^* (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1} X}. We write this as {X^* R X}, where {R} is the resolvent matrix {R := (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1}}. The distribution of the random matrix {R} could conceivably be quite complicated. However, the key point is that the vector {X} only involves the entries of {M_n} that do not lie in {M_{n-1}}, and so the random matrix {R} and the vector {X} are independent. Because of this, we can use the randomness of {X} to do most of the work in understanding the expression {X^* R X}, without having to know much about {R} at all.

To understand this, let us first condition {R} to be a deterministic matrix {R = (r_{ij})_{1 \leq i,j \leq n-1}}, and see what we can do with the expression {X^* R X}.

Firstly, observe that {R} will not be arbitrary; indeed, from the spectral theorem we see that {R} will have operator norm at most {O(1)}. Meanwhile, from the Chernoff (or Hoeffding) inequality (Theorem 2 or Exercise 4 of Notes 1) we know that {X} has magnitude {O( \sqrt{n} )} with overwhelming probability. So we know that {X^* R X} has magnitude {O( n )} with overwhelming probability.

Furthermore, we can use concentration of measure as follows. Given any positive semi-definite matrix {A} of operator norm {O(1)}, the expression {(X^* A X)^{1/2} = \| A^{1/2} X \|} is a Lipschitz function of {X} with operator norm {O(1)}. Applying Talagrand’s inequality (Theorem 9 of Notes 1) we see that this expression concentrates around its median:

\displaystyle {\bf P}( |(X^* A X)^{1/2} - {\bf M} (X^* A X)^{1/2}| \geq \lambda ) \leq C e^{-c\lambda^2}

for any {\lambda > 0}. On the other hand, {\|A^{1/2} X\| = O( \|X\| )} has magnitude {O(\sqrt{n})} with overwhelming probability, so the median {{\bf M} (X^* A X)^{1/2}} must be {O(\sqrt{n})}. Squaring, we conclude that

\displaystyle {\bf P}( |X^* A X - {\bf M} X^* A X| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2}

(possibly after adjusting the absolute constants {C, c}). As usual, we may replace the median with the expectation:

\displaystyle {\bf P}( |X^* A X - {\bf E} X^* A X| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2}

This was for positive-definite matrices, but one can easily use the triangle inequality to generalise to self-adjoint matrices, and then to arbitrary matrices, of operator norm {1}, and conclude that

\displaystyle {\bf P}( |X^* R X - {\bf E} X^* R X| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2} \ \ \ \ \ (20)

for any deterministic matrix {R} of operator norm {O(1)}.

But what is the expectation {{\bf E} X^* R X}? This can be expressed in components as

\displaystyle {\bf E} X^* R X = \sum_{i=1}^{n-1} \sum_{j=1}^{n-1} {\bf E} \overline{\xi_{in}} r_{ij} \xi_{jn}

where {\xi_{in}} are the entries of {X}, and {r_{ij}} are the entries of {R}. But the {\xi_{in}} are iid with mean zero and variance one, so the standard second moment computation shows that this expectation is nothing more than the trace

\displaystyle \hbox{tr}(R) = \sum_{i=1}^{n-1} r_{ii}

of {R}. We have thus shown the concentration of measure result

\displaystyle {\bf P}( |X^* R X - \hbox{tr}(R)| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2} \ \ \ \ \ (21)

for any deterministic matrix {R} of operator norm {O(1)}, and any {\lambda > 0}. Informally, {X^* R X} is typically {\hbox{tr}(R) +O(\sqrt{n})}.

The bound (21) was proven for deterministic matrices, but by using conditional expectation it also applies for any random matrix {R}, so long as that matrix is independent of {X}. In particular, we may apply it to our specific matrix of interest

\displaystyle R := (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1}.

The trace of this matrix is essentially just the Stieltjes transform {s_{n-1}(z)} at {z}. Actually, due to the normalisation factor being slightly off, we actually have

\displaystyle \hbox{tr}(R) = n \frac{\sqrt{n}}{\sqrt{n-1}} s_{n-1}( \frac{\sqrt{n}}{\sqrt{n-1}} z ),

but by using the smoothness (13) of the Stieltjes transform, together with the stability property (16) we can simplify this as

\displaystyle \hbox{tr}(R) = n ( s_n(z) + o(1) ).

In particular, from (21) and (17), we see that

\displaystyle X^* R X = n ( {\bf E} s_n(z) + o(1) )

with overwhelming probability. Putting this back into (19), and recalling that the denominator is bounded away from zero, we have the remarkable equation

\displaystyle {\bf E} s_n(z) = - \frac{1}{z + {\bf E} s_n(z)} + o(1). \ \ \ \ \ (22)

Note how this equation came by playing off two ways in which the spectral properties of a matrix {M_n} interacted with that of its minor {M_{n-1}}; firstly via the Cauchy interlacing inequality, and secondly via the Schur complement formula.

This equation already describes the behaviour of {{\bf E} s_n(z)} quite well, but we will content ourselves with understanding the limiting behaviour as {n \rightarrow \infty}. From (13) and Fubini’s theorem we know that the function {{\bf E} s_n} is locally uniformly equicontinuous and locally uniformly bounded away from the real line. Applying the Arzelá-Ascoli theorem, we thus conclude that on a subsequence at least, {{\bf E} s_n} converges locally uniformly to a limit {s}. This will be a Herglotz function (i.e. an analytic function mapping the upper half-plane to the upper half-plane), and taking limits in (22) (observing that the imaginary part of the denominator here is bounded away from zero) we end up with the exact equation

\displaystyle s(z) = -\frac{1}{z+s(z)}. \ \ \ \ \ (23)

We can of course solve this by the quadratic formula, obtaining

\displaystyle s(z) = - \frac{z \pm \sqrt{z^2-4}}{2} = \frac{2}{z \pm \sqrt{z^2-4}}.

To figure out what branch of the square root one has to use here, we use (12), which easily implies that

\displaystyle s(z) = \frac{1+o(1)}{z}

as {z} goes to infinity non-tangentially away from the real line. (To justify this, one has to make the error term in (12) uniform in {n}, but this can be accomplished without difficulty using the Bai-Yin theorem (for instance).) Also, we know that {s} has to be complex analytic (and in particular, continuous) away from the real line. From this and basic complex analysis, we conclude that

\displaystyle s(z) = \frac{-z + \sqrt{z^2-4}}{2} \ \ \ \ \ (24)

where {\sqrt{z^2-4}} is the branch of the square root with a branch cut at {[-2,2]} and which equals {z} at infinity.

As there is only one possible subsequence limit of the {{\bf E} s_n}, we conclude that {{\bf E} s_n} converges locally uniformly (and thus pointwise) to the function (24), and thus (by the concentration of measure of {s_n(z)}) we see that for each {z}, {s_n(z)} converges almost surely (and in probability) to {s(z)}.

Exercise 12 Find a direct proof (starting from (22), (12), and the smoothness of {{\bf E} s_n(z)}) that {{\bf E} s_n(z) = s(z) + o(1)} for any fixed {z}, that avoids using the Arzelá-Ascoli theorem. (The basic point here is that one has to solve the approximate equation (22), using some robust version of the quadratic formula. The fact that {{\bf E} s_n} is a Herglotz function will help eliminate various unwanted possibilities, such as one coming from the wrong branch of the square root.)

To finish computing the limiting ESD of Wigner matrices, we have to figure out what probability measure {s} comes from. But this is easily read off from (24) and (15):

\displaystyle \frac{s(\cdot+ib) - s(\cdot-ib)}{2\pi i} \rightharpoonup \frac{1}{2\pi} (4-x^2)^{1/2}_+\ dx = \mu_{sc} \ \ \ \ \ (25)

as {b \rightarrow 0}. Thus the semi-circular law is the only possible measure which has Stieltjes transform {s}, and indeed a simple application of the Cauchy integral formula and (25) shows us that {s} is indeed the Stieltjes transform of {\mu_{sc}}.

Putting all this together, we have completed the Stieltjes transform proof of the semi-circular law.

Remark 5 In order to simplify the above exposition, we opted for a qualitative analysis of the semi-circular law here, ignoring such questions as the rate of convergence to this law. However, an inspection of the above arguments reveals that it is easy to make all of the above analysis quite quantitative, with quite reasonable control on all terms. (One has to use Exercise 12 instead of the Arzelá-Ascoli theorem if one wants everything to be quantitative.) In particular, it is not hard to use the above analysis to show that for {|\hbox{Im}(z)| \geq n^{-c}} for some small absolute constant {c>0}, one has {s_n(z) = s(z) + O(n^{-c})} with overwhelming probability. Combining this with a suitably quantitative version of the Stieltjes continuity theorem, this in turn gives a polynomial rate of convergence of the ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} to the semi-circular law {\mu_{sc}}, in that one has

\displaystyle \mu_{\frac{1}{\sqrt{n}} M_n}( -\infty, \lambda ) = \mu_{sc}(-\infty,\lambda) + O(n^{-c})

with overwhelming probability for all {\lambda \in {\mathbb R}}.

A variant of this quantitative analysis can in fact get very good control on this ESD down to quite fine scales, namely to scales {\frac{\log^{O(1)} n}{n}}, which is only just a little bit larger than the mean spacing {O(1/n)} of the normalised eigenvalues (recall that we have {n} normalised eigenvalues, constrained to lie in the interval {[-2-o(1), 2+o(1)]} by the Bai-Yin theorem). This was accomplished by Erdös, Schlein, and Yau (under some additional regularity hypotheses on the distribution {\xi}, but these can be easily removed with the assistance of Talagrand’s inequality) by using an additional observation, namely that the eigenvectors of a random matrix are very likely to be delocalised in the sense that their {\ell^2} energy is dispersed more or less evenly across its coefficients. We will return to this point in later notes.

— 4. Dyson Brownian motion and the Stieltjes transform —

We now explore how the Stieltjes transform interacts with the Dyson Brownian motion introduced in Notes 3b. We let {n} be a large number, and let {M_{n}(t)} be a Wiener process of Hermitian random matrices, with associated eigenvalues {\lambda_{1}(t),\ldots,\lambda_{n}(t)}, Stieltjes transforms

\displaystyle s(t,z) := \frac{1}{n} \sum_{j=1}^n \frac{1}{\lambda_{j}(t)/\sqrt{n} - z} \ \ \ \ \ (26)

and spectral measures

\displaystyle \mu(t,z) := \frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j(t)/\sqrt{n}}. \ \ \ \ \ (27)

We now study how {s}, {\mu} evolve in time in the asymptotic limit {n \rightarrow \infty}. Our computation will be only heuristic in nature.

Recall from Notes 3b that the eigenvalues {\lambda_i = \lambda_i(t)} undergo Dyson Brownian motion

\displaystyle d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}. \ \ \ \ \ (28)

Applying (26) and Taylor expansion (dropping all terms of higher order than {dt}, using the Ito heuristic {dB_i = O(dt^{1/2})}), we conclude that

\displaystyle ds = - \frac{1}{n^{3/2}} \sum_{i=1}^n \frac{dB_i}{(\lambda_i/\sqrt{n}-z)^2} - \frac{1}{n^2} \sum_{i=1}^n \frac{|dB_i|^2}{(\lambda_i/\sqrt{n}-z)^3}

\displaystyle - \frac{1}{n^{3/2}} \sum_{1 \leq i,j \leq n: i \neq j}\frac{dt}{(\lambda_i - \lambda_j) (\lambda_j/\sqrt{n}-z)^2}.

For {z} away from the real line, the term {\frac{1}{n^2} \sum_{i=1}^n \frac{|dB_i|^2}{(\lambda_i/\sqrt{n}-z)^3}} is of size {O( dt / n )} and can heuristically be ignored in the limit {n \rightarrow \infty}. Dropping this term, and then taking expectations to remove the Brownian motion term {dB_i}, we are led to

\displaystyle {\bf E} ds = - {\bf E} \frac{1}{n^{3/2}} \sum_{1 \leq i,j \leq n: i \neq j}\frac{dt}{(\lambda_i - \lambda_j) (\lambda_j/\sqrt{n}-z)^2}.

Performing the {i} summation using (26) we obtain

\displaystyle {\bf E} ds = - {\bf E} \frac{1}{n} \sum_{1 \leq j \leq n} \frac{s(\lambda_j/\sqrt{n}) dt}{(\lambda_j/\sqrt{n}-z)^2}

where we adopt the convention that for real {x}, {s(x)} is the average of {s(x+i0)} and {s(x-i0)}. Using (27), this becomes

\displaystyle {\bf E} s_t = - {\bf E} \int_{\mathbb R} \frac{s(x)}{(x-z)^2}\ d\mu(x) \ \ \ \ \ (29)

where the {t} subscript denotes differentiation in {t}. From (15) we heuristically have

\displaystyle s(x \pm i0) = s(x) \pm \pi i \mu(x)

(heuristically treating {\mu} as a function rather than a measure) and on squaring one obtains

\displaystyle s(x \pm i0)^2 = (s(x)^2 - \pi^2 \mu^2(x)) \pm 2 \pi i s(x) \mu(x).

From this the Cauchy integral formula around a slit in real axis (using the bound (11) to ignore the contributions near infinity) we thus have

\displaystyle s^2(z) = \int_{\mathbb R} \frac{2s(x)}{x-z}\ d\mu(x)

and thus on differentiation in {z}

\displaystyle 2 s s_z(z) = \int_{\mathbb R} \frac{2s(x)}{(x-z)^2}\ d\mu(x).

Comparing this with (29), we obtain

\displaystyle {\bf E} s_t + {\bf E} s s_z = 0.

From concentration of measure, we expect {s} to concentrate around its mean {\overline{s} := {\bf E} s}, and similarly {s_z} should concentrate around {\overline{s}_z}. In the limit {n \rightarrow \infty}, the expected Stieltjes transform {\overline{s}} should thus obey Burgers’ equation

\displaystyle s_t + s s_z = 0. \ \ \ \ \ (30)

To illustrate how this equation works in practice, let us give an informal derivation of the semi-circular law. We consider the case when the Wiener process starts from {M(0) = 0}, thus {M_t \equiv \sqrt{t} G} for a GUE matrix {G}. As such, we have the scaling symmetry

\displaystyle s(t,z) = \frac{1}{\sqrt{t}} s_{GUE}(\frac{z}{\sqrt{t}})

where {s_{GUE}} is the asymptotic Stieltjes transform for GUE (which we secretly know to be given by (24), but let us pretend that we did not yet know this fact). Inserting this self-similar ansatz into (30) and setting {t=1}, we conclude that

\displaystyle -\frac{1}{2} s_{GUE} - \frac{1}{2} z s'_{GUE} + s s'_{GUE} = 0;

multiplying by two and integrating, we conclude that

\displaystyle z s_{GUE} + s_{GUE}^2 = C

for some constant {C}. But from the asymptotic (12) we see that {C} must equal {-1}. But then the above equation can be rearranged into (23), and so by repeating the arguments at the end of the previous section we can deduce the formula (24), which then gives the semi-circular law by (15).

As is well known in PDE, one can solve Burgers’ equation more generally by the method of characteristics. For reasons that will be come clearer in the next set of notes, I will solve this equation by a slightly different (but ultimately equivalent) method. The idea is that rather than think of {s=s(t,z)} as a function of {z} for fixed {t}, we think of {z=z(t,s)} as a function of {s} for fixed {t}. (This trick is sometimes known as the hodograph transform, especially if one views {s} as “velocity” and {z} as “position”.) Note from (12) that we expect to be able to invert the relationship between {s} and {z} as long as {z} is large (and {s} is small).

To exploit this change of perspective, we think of {s, z, t} as all varying by infinitesimal amounts {ds, dz, dt} respectively. Using (30) and the total derivative formula {ds = s_t dt + s_z dz}, we see that

\displaystyle ds = - s s_z dt + s_z dz.

If we hold {s} fixed (i.e. {ds=0}), so that {z} is now just a function of {t}, and cancel off the {s_z} factor, we conclude that

\displaystyle \frac{dz}{dt} = s.

Integrating this, we see that

\displaystyle z(t,s) = z(0,s) + ts. \ \ \ \ \ (31)

This, in principle, gives a way to compute {s(t,z)} from {s(0,z)}. First, we invert the relationship {s=s(0,z)} to {z=z(0,s)}; then we add {ts} to {z(0,s)}; then we invert again to recover {s(t,z)}.

Since {M_t \equiv M_0 + \sqrt{t} G}, where {G} is a GUE matrix independent of {M_0}, we have thus given a formula to describe the Stieltjes transform of {M_0 + \sqrt{t} G} in terms of the Stieltjes transform of {M_0}. This formula is a special case of a more general formula of Voiculescu for free convolution, with the operation of inverting the Stieltjes transform essentially being the famous {R}-transform of Voiculescu; we will discuss this more in the next section.