We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a *Wigner Hermitian matrix ensemble* is a random matrix ensemble of Hermitian matrices (thus ; this includes *real symmetric matrices* as an important special case), in which the upper-triangular entries , are iid complex random variables with mean zero and unit variance, and the diagonal entries are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the *Gaussian Orthogonal Ensemble (GOE)*, the *symmetric random sign matrices* (aka *symmetric Bernoulli ensemble*), and the Gaussian Unitary Ensemble (GUE).

In previous notes we saw that the operator norm of was typically of size , so it is natural to work with the normalised matrix . Accordingly, given any Hermitian matrix , we can form the (normalised) *empirical spectral distribution* (or *ESD* for short)

of , where are the (necessarily real) eigenvalues of , counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of .

When is a random matrix ensemble, then the ESD is now a *random* measure – i.e. a random variable taking values in the space of probability measures on the real line. (Thus, the distribution of is a probability measure on probability measures!)

Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles as . Recall from Notes 0 that for any sequence of random variables in a -compact metrisable space, one can define notions of *convergence in probability* and *convergence almost surely*. Specialising these definitions to the case of random probability measures on , and to deterministic limits, we see that a sequence of random ESDs *converge in probability* (resp. *converge almost surely*) to a deterministic limit (which, confusingly enough, is a deterministic probability measure!) if, for every test function , the quantities converge in probability (resp. converge almost surely) to .

Remark 1As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namelyconvergence in expectation, defined as follows. Given a random ESD , one can form itsexpectation, defined via duality (the Riesz representation theorem) asthis probability measure can be viewed as the law of a

randomeigenvalue drawn from a random matrix from the ensemble. We then say that the ESDs converge in expectation to a limit if converges the vague topology to , thusfor all .

In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.

Exercise 1Let be a sequence of Hermitian matrix ensembles, and let be a continuous probability measure on .

- Show that converges almost surely to if and only if converges almost surely to for all .
- Show that converges in probability to if and only if converges in probability to for all .
- Show that converges in expectation to if and only if converges to for all .

We can now state the Wigner semi-circular law.

Theorem 1 (Semicircular law)Let be the top left minors of an infinite Wigner matrix . Then the ESDs converge almost surely (and hence also in probability and in expectation) to theWigner semi-circular distribution

A numerical example of this theorem in action can be seen at the MathWorld entry for this law.

The semi-circular law nicely complements the *upper Bai-Yin theorem* from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices almost surely has operator norm at most . Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval with positive density on the interior of this interval, Theorem 1 easily supplies the *lower Bai-Yin theorem*, that the operator norm of is almost surely *at least* , and thus (in the finite fourth moment case) the norm is in fact *equal* to . Indeed, we have just shown that the semi-circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).

As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.

There are several ways to prove (or at least to heuristically justify) the semi-circular law. In this set of notes we shall focus on the two most popular methods, the *moment method* and the *Stieltjes transform method*, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).

** — 1. Preliminary reductions — **

Before we begin any of the proofs of the semi-circular law, we make some simple observations which will reduce the difficulty of the arguments in the sequel.

The first observation is that the Cauchy interlacing law (Exercise 14 from Notes 3a) shows that the ESD of is very stable in . Indeed, we see from the interlacing law that

for any threshold and any .

Exercise 2Using this observation, show that to establish the semi-circular law (in any of the three senses of convergence), it suffices to do so for an arbitrary lacunary sequence of (thus for some and all ).

The above lacunary reduction does not help one establish convergence in probability or expectation, but will be useful when establishing almost sure convergence, as it significantly reduces the inefficiency of the union bound. (Note that a similar lacunary reduction was also used to prove the strong law of large numbers in Notes 1.)

Next, we exploit the stability of the ESD with respect to perturbations, by taking advantage of the *Weilandt-Hoffmann inequality*

for Hermitian matrices , where is the Frobenius norm of . (This inequality was established in Exercise 6 or Exercise 11 of Notes 3a.) We convert this inequality into an inequality about ESDs:

Lemma 2For any Hermitian matrices , any , and any , we haveand similarly

*Proof:* We just prove the first inequality, as the second is similar (and also follows from the first, by reversing the sign of ).

Let be the largest eigenvalue of less than , and let be the largest eigenvalue of less than . Our task is to show that

If then we are clearly done, so suppose that . Then we have for all , and hence

The claim now follows from (2).

This has the following corollary:

Exercise 3 (Stability of ESD laws wrt small perturbations)Let be a sequence of random Hermitian matrix ensembles such that converges almost surely to a limit . Let be another sequence of Hermitian random matrix ensembles such that converges almost surely to zero. Show that converges almost surely to .Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.

Informally, this exercise allows us to discard any portion of the matrix which is in the Frobenius norm. For instance, the diagonal entries of have a Frobenius norm of almost surely, by the strong law of large numbers. Hence, without loss of generality, we may set the diagonal equal to zero for the purposes of the semi-circular law.

One can also remove any component of that is of rank :

Exercise 4 (Stability of ESD laws wrt small rank perturbations)Let be a sequence of random Hermitian matrix ensembles such that converges almost surely to a limit . Let be another sequence of random Hermitian matrix ensembles such that converges almost surely to zero. Show that converges almost surely to . (Hint: use the Weyl inequalities instead of the Wielandt-Hoffman law.)Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.

In a similar vein, we may apply the truncation argument (much as was done for the central limit theorem in Notes 2) to reduce the semi-circular law to the bounded case:

Exercise 5Show that in order to prove the semi-circular law (in the almost sure sense), it suffices to do so under the additional hypothesis that the random variables are bounded. Similarly for the convergence in probability or in expectation senses.

Remark 2These facts ultimately rely on the stability of eigenvalues with respect to perturbations. This stability is automatic in the Hermitian case, but for non-symmetric matrices, serious instabilities can occur due to the presence of pseudospectrum. We will discuss this phenomenon more in later lectures (but see also this earlier blog post).

** — 2. The moment method — **

We now prove the semi-circular law via the *method of moments*, which we have already used several times in the previous notes. In order to use this method, it is convenient to use the preceding reductions to assume that the coefficients are bounded, the diagonal vanishes, and that ranges over a lacunary sequence. We will implicitly assume these hypotheses throughout the rest of the section.

As we have already discussed the moment method extensively, much of the argument here will be delegated to exercises. A full treatment of these computations can be found in the book of Bai and Silverstein.

The basic starting point is the observation that the moments of the ESD can be written as normalised traces of powers of :

In particular, on taking expectations, we have

From concentration of measure (and the Bai-Yin theorem) for the operator norm of a random matrix (Proposition 7 of Notes 3), we see that the are uniformly subgaussian, indeed we have

for , where are absolute (so the decay in fact improves quite rapidly with ). From this and the moment continuity theorem (Theorem 4 of Notes 2), we can now establish the semi-circular law through computing the mean and variance of moments:

Exercise 6

- Show that to prove convergence in expectation to the semi-circular law, it suffices to show that
for , where is an expression that goes to zero as for fixed (and fixed choice of coefficient distribution ).

- Show that to prove convergence in probability to the semi-circular law, it suffices to show (4) together with the variance bound
- Show that to prove almost sure convergence to the semi-circular law, it suffices to show (4) together with the variance bound
for and some . (Note here that it is useful to restrict to a lacunary sequence!)

Ordinarily, computing second-moment quantities such as the left-hand side of (5) is harder than computing first-moment quantities such as (4). But one can obtain the required variance bounds from concentration of measure:

Exercise 7

- When is a positive even integer, Use Talagrand’s inequality and convexity of the Schatten norm to establish (6) (and hence (5)) when is even.
- For odd, the formula still applies as long as is positive definite. Applying this observation, the Bai-Yin theorem, and Talagrand’s inequality to the norms of for a constant , establish (6) (and hence (5)) when is odd also.

Remark 3More generally, concentration of measure results (such as Talagrand’s inequality) can often be used to automatically upgrade convergence in expectation to convergence in probability or almost sure convergence. We will not attempt to formalise this principle here.

It is not difficult to establish (6), (5) through the moment method as well. Indeed, recall from Theorem 10 of Notes 3 that we have the expected moment

for all , where the Catalan number is zero when is odd, and is equal to

Exercise 8By modifying the proof of that theorem, show thatand deduce (5). By refining the error analysis (e.g. using Theorem 12 of Notes 3, also establish (6).

In view of the above computations, the establishment of the semi-circular law now reduces to computing the moments of the semi-circular distribution:

Exercise 9Show that for any , one has(Hint: use a trigonometric substitution , and then express the integrand in terms of Fourier phases .)

This concludes the proof of the semi-circular law (for any of the three modes of convergence).

Remark 4In the spirit of the Lindeberg exchange method, observe that Exercise (9) is unnecessary if one already knows that the semi-circular law holds for at least one ensemble of Wigner matrices (e.g. the GUE ensemble). Indeed, Exercise 9 can bededucedfrom such a piece of knowledge. In such a situation, it is not necessary to actually compute the main term on the right of (4); it would be sufficient to know that that limit isuniversal, in that it does not depend on the underlying distribution. In fact, it would even suffice to establish the slightly weaker statementwhenever are two ensembles of Wigner matrices arising from different underlying distributions (but still normalised to have mean zero, unit variance, and to be bounded (or at worst subgaussian)). We will take advantage of this perspective later in these notes.

** — 3. The Stieltjes transform method — **

The moment method was computationally intensive, but straightforward. As noted in Remark 4, even without doing much of the algebraic computation, it is clear that the moment method will show that some universal limit for Wigner matrices exists (or, at least, that the differences between the distributions of two different Wigner matrices converge to zero). But it is not easy to see from this method why the limit should be given by the semi-circular law, as opposed to some other distribution (although one could eventually work this out from an inverse moment computation).

When studying the central limit theorem, we were able to use the Fourier method to control the distribution of random matrices in a cleaner way than in the moment method. Analogues of this method exist, but require non-trivial formulae from noncommutative Fourier analysis, such as the Harish-Chandra integration formula (and also only work for highly symmetric ensembles, such as GUE or GOE), and will not be discussed in this course. (Our later notes on determinantal processes, however, will contain some algebraic identities related in some ways to the noncommutative Fourier-analytic approach.)

We now turn to another method, the *Stieltjes transform method*, which uses complex-analytic methods rather than Fourier-analytic methods, and has turned out to be one of the most powerful and accurate tools in dealing with the ESD of random Hermitian matrices. Whereas the moment method started from the identity (3), the Stieltjes transform method proceeds from the identity

for any complex not in the support of . We refer to the expression on the left-hand side as the *Stieltjes transform* of or of , and denote it by or as for short. The expression is the normalised resolvent of , and plays an important role in the spectral theory of that matrix. Indeed, in contrast to general-purpose methods such as the moment method, the Stieltjes transform method draws heavily on the specific linear-algebraic structure of this problem, and in particular on the rich structure of resolvents.

On the other hand, the Stieltjes transform can be viewed as a generating function of the moments via the Taylor series expansion

valid for sufficiently large. This is somewhat (though not exactly) analogous to how the characteristic function of a scalar random variable can be viewed as a generating function of the moments .

Now let us study the Stieltjes transform more systematically. Given any probability measure on the real line, we can form its *Stieltjes transform*

for any outside of the support of ; in particular, the Stieltjes transform is well-defined on the upper and lower half-planes in the complex plane. Even without any further hypotheses on other than it is a probability measure, we can say a remarkable amount about how this transform behaves in . Applying conjugations we obtain the symmetry

so we may as well restrict attention to in the upper half-plane (say). Next, from the trivial bound

In a similar spirit, an easy application of dominated convergence gives the asymptotic

where is an expression that, for any fixed , goes to zero as goes to infinity *non-tangentially* in the sense that is kept bounded, where the rate of convergence is allowed to depend on . From differentiation under the integral sign (or an application of Morera’s theorem and Fubini’s theorem) we see that is *complex analytic* on the upper and lower half-planes; in particular, it is smooth away from the real axis. From the Cauchy integral formula (or differentiation under the integral sign) we in fact get some bounds for higher derivatives of the Stieltjes transform away from this axis:

Informally, “behaves like a constant” at scales significantly less than the distance to the real axis; all the really interesting action here is going on near that axis.

The imaginary part of the Stieltjes transform is particularly interesting. Writing , we observe that

and so we see that

for in the upper half-plane; thus is a complex-analytic map from the upper half-plane to itself, a type of function known as a *Herglotz function*. (In fact, all complex-analytic maps from the upper half-plane to itself that obey the asymptotic (12) are of this form; this is a special case of the *Herglotz representation theorem*, which also gives a slightly more general description in the case when the asymptotic (12) is not assumed. A good reference for this material and its consequences is this book of Garnett.)

One can also express the imaginary part of the Stieltjes transform as a convolution

where is the Poisson kernel

As is well known, these kernels form a family of approximations to the identity, and thus converges in the vague topology to (see e.g. my notes on distributions). Thus we see that

as in the vague topology,or equivalently (by (10)) that

as (this is closely related to the *Plemelj formula* in potential theory). Thus we see that a probability measure can be recovered in terms of the limiting behaviour of the Stieltjes transform on the real axis.

A variant of the above machinery gives us a criterion for convergence:

Exercise 10 (Stieltjes continuity theorem)Let be a sequence of random probability measures on the real line, and let be a deterministic probability measure.

- converges almost surely to in the vague topology if and only if converges almost surely to for every in the upper half-plane.
- converges in probability to in the vague topology if and only if converges in probability to for every in the upper half-plane.
- converges in expectation to in the vague topology if and only if converges to for every in the upper half-plane.
(Hint: The “only if” parts are fairly easy. For the “if” parts, take a test function and approximate by . Then approximate this latter integral in turn by a Riemann sum, using (13).)

Thus, to prove the semi-circular law, it suffices to show that for each in the upper half-plane, the Stieltjes transform

converges almost surely (and thus in probability and in expectation) to the Stieltjes transform of the semi-circular law.

It is not difficult to compute the Stieltjes transform of the semi-circular law, but let us hold off on that task for now, because we want to illustrate how the Stieltjes transform method can be used to *find* the semi-circular law, even if one did not know this law in advance, by directly controlling . We will fix to be a complex number not on the real line, and allow all implied constants in the discussion below to depend on and (we will focus here only on the behaviour as ).

The main idea here is *predecessor comparison*: to compare the transform of the matrix with the transform of the top left minor , or of other minors. For instance, we have the *Cauchy interlacing law* (Exercise 14 from Notes 3a), which asserts that the eigenvalues of intersperse that of . This implies that for a complex number with , the difference

is an alternating sum of evaluations of the function . The total variation of this function is (recall that we are suppressing dependence of constaants on ), and so the alternating sum above is . Writing this in terms of the Stieltjes transform, we conclude that

Applying (13) to approximate by , we conclude that

So for fixed away from the real axis, the Stieltjes transform is quite stable in .

This stability has the following important consequence. Observe that while the left-hand side of (16) depends on the matrix , the right-hand side depends only on the top left minor of that matrix. In particular, it is *independent* of the row and column of . This implies that this entire row and column has only a limited amount of *influence* on the Stieltjes transform : no matter what value one assigns to this row and column (including possibly unbounded values, as long as one keeps the matrix Hermitian of course), the transform can only move by .

By permuting the rows and columns, we obtain that in fact any row or column of can influence is at most . (This is closely related to the observation in Exercise 4 that low rank perturbations do not significantly affect the ESD.) On the other hand, the rows of (the upper triangular portion of) are jointly independent. When is a Wigner random matrix, we can then apply a standard concentration of measure result, such as McDiarmid’s inequality (Theorem 7 from Notes 1) to conclude concetration of around its mean:

for all and some absolute constants . (This is not necessarily the strongest concentration result one can establish for the Stieltjes transform, but it will certainly suffice for our discussion here.) In particular, we see from the Borel-Cantelli lemma (Exercise 24 of Notes 0a)that for any fixed away from the real line, converges almost surely (and thus also in probability) to zero. As a consequence, convergence of in expectation automatically implies convergence in probability or almost sure convergence.

However, while concentration of measure tells us that is close to its mean, it does not shed much light as to what this mean *is*. For this, we have to go beyond the Cauchy interlacing formula and deal with the resolvent more directly. Firstly, we observe from the linearity of trace that

where denotes the component of a matrix . Because is a Wigner matrix, it is easy to see on permuting the rows and columns that all of the random variables have the same distribution. Thus we may simplify the above formula as

So now we have to compute the last entry of an inverse of a matrix. There are of course a number of formulae for this, such as Cramer’s rule. But it will be more convenient here to use a formula based instead on the Schur complement:

Exercise 11Let be a matrix, let be the top left minor, let be the bottom right entry of , let be the right column of with the bottom right entry removed, and let be the bottom row with the bottom right entry removed. In other words,Assume that and are both invertible. Show that

(

Hint:Solve the equation , where is the basis vector, using the method of Schur complements (or from first principles).)

The point of this identity is that it describes (part of) the inverse of in terms of the inverse of , which will eventually provide a non-trivial recursive relationship between and , which can then be played off against (16) to solve for in the asymptotic limit .

In our situation, the matrix and its minor is automatically invertible. Inserting the above formula into (18) (and recalling that we normalised the diagonal of to vanish), we conclude that

where is the top right column of with the bottom entry removed.

One may be concerned that the denominator here could vanish. However, observe that has imaginary part if . Furthermore, from the spectral theorem we see that the imaginary part of is positive definite, and so has non-negative imaginary part. As a consequence the magnitude of the denominator here is bounded below by , and so its reciprocal is (compare with (11)). So the reciprocal here is not going to cause any discontinuity, as we are considering is fixed and non-zero.

Now we need to understand the expression . We write this as , where is the resolvent matrix . The distribution of the random matrix could conceivably be quite complicated. However, the key point is that the vector only involves the entries of that do not lie in , and so the random matrix and the vector are *independent*. Because of this, we can use the randomness of to do most of the work in understanding the expression , without having to know much about at all.

To understand this, let us first condition to be a *deterministic* matrix , and see what we can do with the expression .

Firstly, observe that will not be arbitrary; indeed, from the spectral theorem we see that will have operator norm at most . Meanwhile, from the Chernoff (or Hoeffding) inequality (Theorem 2 or Exercise 4 of Notes 1) we know that has magnitude with overwhelming probability. So we know that has magnitude with overwhelming probability.

Furthermore, we can use concentration of measure as follows. Given any positive semi-definite matrix of operator norm , the expression is a Lipschitz function of with operator norm . Applying Talagrand’s inequality (Theorem 9 of Notes 1) we see that this expression concentrates around its median:

for any . On the other hand, has magnitude with overwhelming probability, so the median must be . Squaring, we conclude that

(possibly after adjusting the absolute constants ). As usual, we may replace the median with the expectation:

This was for positive-definite matrices, but one can easily use the triangle inequality to generalise to self-adjoint matrices, and then to arbitrary matrices, of operator norm , and conclude that

for any deterministic matrix of operator norm .

But what is the expectation ? This can be expressed in components as

where are the entries of , and are the entries of . But the are iid with mean zero and variance one, so the standard second moment computation shows that this expectation is nothing more than the trace

of . We have thus shown the concentration of measure result

for any deterministic matrix of operator norm , and any . Informally, is typically .

The bound (21) was proven for deterministic matrices, but by using conditional expectation it also applies for any random matrix , so long as that matrix is independent of . In particular, we may apply it to our specific matrix of interest

The trace of this matrix is essentially just the Stieltjes transform at . Actually, due to the normalisation factor being slightly off, we actually have

but by using the smoothness (13) of the Stieltjes transform, together with the stability property (16) we can simplify this as

In particular, from (21) and (17), we see that

with overwhelming probability. Putting this back into (19), and recalling that the denominator is bounded away from zero, we have the remarkable equation

Note how this equation came by playing off two ways in which the spectral properties of a matrix interacted with that of its minor ; firstly via the Cauchy interlacing inequality, and secondly via the Schur complement formula.

This equation already describes the behaviour of quite well, but we will content ourselves with understanding the limiting behaviour as . From (13) and Fubini’s theorem we know that the function is locally uniformly equicontinuous and locally uniformly bounded away from the real line. Applying the Arzelá-Ascoli theorem, we thus conclude that on a subsequence at least, converges locally uniformly to a limit . This will be a Herglotz function (i.e. an analytic function mapping the upper half-plane to the upper half-plane), and taking limits in (22) (observing that the imaginary part of the denominator here is bounded away from zero) we end up with the exact equation

We can of course solve this by the quadratic formula, obtaining

To figure out what branch of the square root one has to use here, we use (12), which easily implies that

as goes to infinity non-tangentially away from the real line. (To justify this, one has to make the error term in (12) uniform in , but this can be accomplished without difficulty using the Bai-Yin theorem (for instance).) Also, we know that has to be complex analytic (and in particular, continuous) away from the real line. From this and basic complex analysis, we conclude that

where is the branch of the square root with a branch cut at and which equals at infinity.

As there is only one possible subsequence limit of the , we conclude that converges locally uniformly (and thus pointwise) to the function (24), and thus (by the concentration of measure of ) we see that for each , converges almost surely (and in probability) to .

Exercise 12Find a direct proof (starting from (22), (12), and the smoothness of ) that for any fixed , that avoids using the Arzelá-Ascoli theorem. (The basic point here is that one has to solve the approximate equation (22), using some robust version of the quadratic formula. The fact that is a Herglotz function will help eliminate various unwanted possibilities, such as one coming from the wrong branch of the square root.)

To finish computing the limiting ESD of Wigner matrices, we have to figure out what probability measure comes from. But this is easily read off from (24) and (15):

as . Thus the semi-circular law is the only possible measure which has Stieltjes transform , and indeed a simple application of the Cauchy integral formula and (25) shows us that is indeed the Stieltjes transform of .

Putting all this together, we have completed the Stieltjes transform proof of the semi-circular law.

Remark 5In order to simplify the above exposition, we opted for a qualitative analysis of the semi-circular law here, ignoring such questions as the rate of convergence to this law. However, an inspection of the above arguments reveals that it is easy to make all of the above analysis quite quantitative, with quite reasonable control on all terms. (One has to use Exercise 12 instead of the Arzelá-Ascoli theorem if one wants everything to be quantitative.) In particular, it is not hard to use the above analysis to show that for for some small absolute constant , one has with overwhelming probability. Combining this with a suitably quantitative version of the Stieltjes continuity theorem, this in turn gives apolynomial rateof convergence of the ESDs to the semi-circular law , in that one haswith overwhelming probability for all .

A variant of this quantitative analysis can in fact get very good control on this ESD down to quite fine scales, namely to scales , which is only just a little bit larger than the mean spacing of the normalised eigenvalues (recall that we have normalised eigenvalues, constrained to lie in the interval by the Bai-Yin theorem). This was accomplished by Erdös, Schlein, and Yau (under some additional regularity hypotheses on the distribution , but these can be easily removed with the assistance of Talagrand’s inequality) by using an additional observation, namely that the

eigenvectorsof a random matrix are very likely to bedelocalisedin the sense that their energy is dispersed more or less evenly across its coefficients. We will return to this point in later notes.

** — 4. Dyson Brownian motion and the Stieltjes transform — **

We now explore how the Stieltjes transform interacts with the Dyson Brownian motion introduced in Notes 3b. We let be a large number, and let be a Wiener process of Hermitian random matrices, with associated eigenvalues , Stieltjes transforms

We now study how , evolve in time in the asymptotic limit . Our computation will be only heuristic in nature.

Recall from Notes 3b that the eigenvalues undergo Dyson Brownian motion

Applying (26) and Taylor expansion (dropping all terms of higher order than , using the Ito heuristic ), we conclude that

For away from the real line, the term is of size and can heuristically be ignored in the limit . Dropping this term, and then taking expectations to remove the Brownian motion term , we are led to

Performing the summation using (26) we obtain

where we adopt the convention that for real , is the average of and . Using (27), this becomes

where the subscript denotes differentiation in . From (15) we heuristically have

(heuristically treating as a function rather than a measure) and on squaring one obtains

From this the Cauchy integral formula around a slit in real axis (using the bound (11) to ignore the contributions near infinity) we thus have

and thus on differentiation in

Comparing this with (29), we obtain

From concentration of measure, we expect to concentrate around its mean , and similarly should concentrate around . In the limit , the expected Stieltjes transform should thus obey Burgers’ equation

To illustrate how this equation works in practice, let us give an informal derivation of the semi-circular law. We consider the case when the Wiener process starts from , thus for a GUE matrix . As such, we have the scaling symmetry

where is the asymptotic Stieltjes transform for GUE (which we secretly know to be given by (24), but let us pretend that we did not yet know this fact). Inserting this self-similar ansatz into (30) and setting , we conclude that

multiplying by two and integrating, we conclude that

for some constant . But from the asymptotic (12) we see that must equal . But then the above equation can be rearranged into (23), and so by repeating the arguments at the end of the previous section we can deduce the formula (24), which then gives the semi-circular law by (15).

As is well known in PDE, one can solve Burgers’ equation more generally by the method of characteristics. For reasons that will be come clearer in the next set of notes, I will solve this equation by a slightly different (but ultimately equivalent) method. The idea is that rather than think of as a function of for fixed , we think of as a function of for fixed . (This trick is sometimes known as the hodograph transform, especially if one views as “velocity” and as “position”.) Note from (12) that we expect to be able to invert the relationship between and as long as is large (and is small).

To exploit this change of perspective, we think of as all varying by infinitesimal amounts respectively. Using (30) and the total derivative formula , we see that

If we hold fixed (i.e. ), so that is now just a function of , and cancel off the factor, we conclude that

This, in principle, gives a way to compute from . First, we invert the relationship to ; then we add to ; then we invert again to recover .

Since , where is a GUE matrix independent of , we have thus given a formula to describe the Stieltjes transform of in terms of the Stieltjes transform of . This formula is a special case of a more general formula of Voiculescu for *free convolution*, with the operation of inverting the Stieltjes transform essentially being the famous *-transform* of Voiculescu; we will discuss this more in the next section.

## 55 comments

Comments feed for this article

2 February, 2010 at 6:11 pm

DavidIn the first sentence of the second paragraph I think the “normalized matrix” is misprinted.

[Corrected, thanks – T.]4 February, 2010 at 10:09 am

JanThx for the nice article and explanation!

6 February, 2010 at 8:16 pm

OrrIn the proof of Lemma 2, in the last line of text, perhaps you wanted to wrote .

[Corrected, thanks – T.]7 February, 2010 at 4:41 pm

anoDear Prof. Tao,

Sorry for a non-mathematical question — how did you manage to make the WordPress feed show only brief excerpts for the 254a posts, but full content for the others?

Thanks,

10 February, 2010 at 11:01 am

TobyDear Prof. Tao,

In exercise #1, why the requirement that ? It seems that by constructing matrix ensembles with all negative eigenvalues, you could give a counterexample. For instance, if consists of (deterministic) point masses at (which you could do by making just be a deterministic, diagonal matrix with the right things on the diagonal) then converges almost surely to a uniform measure on . But for would always be 1, so this would converge to for

anysupported on . Is a mistake, or is there some stronger meaning to Hermitian matrix ensemble that I’m missing?Thanks,

Toby

10 February, 2010 at 11:38 am

Terence TaoOops, you’re right, should be real throughout. This should be corrected now…

10 February, 2010 at 2:54 pm

TobyThanks!

10 February, 2010 at 10:56 pm

245A, Notes 5: Free probability « What’s new[…] Much as free groups are in some sense “maximally non-commutative”, freely independent random variables are about as far from being commuting as possible. For instance, if are freely independent and of expectation zero, then vanishes, but instead factors as . As a consequence, the behaviour of freely independent random variables can be quite different from the behaviour of their classically independent commuting counterparts. Nevertheless there is a remarkably strong analogy between the two types of independence, in that results which are true in the classically independent case often have an interesting analogue in the freely independent setting. For instance, the central limit theorem (Notes 2) for averages of classically independent random variables, which roughly speaking asserts that such averages become gaussian in the large limit, has an analogue for averages of freely independent variables, the free central limit theorem, which roughly speaking asserts that such averages become semicircular in the large limit. One can then use this theorem to provide yet another proof of Wigner’s semicircle law (Notes 4). […]

15 February, 2010 at 11:37 am

Choongbum LeeSeems like in Lemma 2 and Exercise 3, the $n^2$ terms should be $n$.

15 February, 2010 at 11:46 am

Choongbum LeeOh, it’s just in Lemma 2. And in the discussion following exercise 3,

shouldn’t it be we can discard matrices whose Frobenius norm is , and the diagonal entries of having Frobenius norm ?

[Corrected, thanks. -T.]16 February, 2010 at 4:14 pm

TobyDear Prof. Tao,

At the beginning of this post you suggest that the space of probability measures on is -compact. I set out to check this and eventually decided it was actually false. Am I mistaken? My proof works (or so I hope) by supposing you can decompose the space of measures into countably many compact (and therefore tight) sets, and then diagonalizing to produce a measure not in any of them.

Thanks,

Toby

16 February, 2010 at 4:37 pm

Terence TaoAh, yes, there is a slight subtlety here. The space of probability measures is not sigma-compact, but the larger space of non-negative measures with total mass at most 1 is sigma-compact (in fact it is compact) in the vague topology, basically by the Banach-Alaoglu theorem. So one can define the notions of convergence in probability, etc. in the larger space and then descend to the smaller space. (Actually, if all one wants is the definition, then sigma-compactness is not really needed; it’s only when one wants to use the definitions, e.g. to show that two different definitions are equivalent, that some regularity hypotheses on the underlying space are needed, and sigma compactness is probably overkill in most cases.)

16 February, 2010 at 5:24 pm

TobyAhh, thank you. That clears things up for me.

Toby

17 February, 2010 at 9:46 am

anonymousDear Prof. Tao,

You state that the Stieltjes transform “can be viewed as a generating function of the moments via the Taylor series expansion ” and that “This is somewhat (though not exactly) analogous to how the characteristic function {{\bf E} e^{itX}} of a scalar random variable can be viewed as a generating function of the moments {{\bf E} X^k}.”

Do (14) and (15) also have natural translations in terms of characteristic functions?

Thank you

17 February, 2010 at 12:37 pm

Terence TaoI suppose the Fourier inversion formula would be the closest analogue. The analogy between the Stieltjes transform and the Fourier transform only seems to be particularly strong near infinity (where the Laurent series look very similar, except for a factor of k!); otherwise the two transforms seem rather different in nature.

18 February, 2010 at 9:25 pm

Sungjin KimIn Exercise 2,

I think it is enough considering lacunary sequence of the form

a_n= [ a^n ], where [] is the floor function, and a>1.

Also, ‘it suffices to do so for a lacunary sequence ‘ should be changed to ‘it suffices to do so for every lacunary sequences’

[Thanks, I’ve reworded. – T.]19 February, 2010 at 9:26 am

Sungjin KimIn Exercise6, for the last part(almost sure convergence),

the error term in (4) should also be $O(n^{-c_k})$ ??

19 February, 2010 at 9:30 am

Terence TaoThis is not necessary; one can derive almost sure convergence without the stronger bound in (4) (although this bound is true). Consider for instance the extreme case when the error term in (4) is o(1), but the variance estimate is infinitely strong (i.e. the variance is zero).

19 February, 2010 at 11:19 pm

Sungjin KimI see. Thank you.

23 February, 2010 at 9:18 am

BenPossible typo. In the preliminary reduction section for the inequalities coming from the Cauchy interlacing law, is it supposed to be m>n?

23 February, 2010 at 9:31 am

Terence TaoAh, there was a sign error; I think I’ve fixed the problem now.

23 February, 2010 at 10:03 pm

254A, Notes 6: Gaussian ensembles « What’s new[…] can then be solved to obtain the semi-circular law as in previous notes. Remark 4 Recall from Notes 3b that Dyson Brownian motion can be used to derive the formula (1). […]

8 March, 2010 at 5:04 pm

BenI’d appreciate any help showing that pointwise convergence of gives weak convergence of the measures. Don’t you need some sort of uniformity of over n? Also, the hint about using (13) is very mysterious to me.

10 March, 2010 at 6:34 pm

Terence TaoOne can use (14) to control the integral of any function of the form F*P_b against mu_n with an integral of the corresponding Stieltjes transform against F. The latter can be shown to be convergent using (13).

10 March, 2010 at 9:13 am

Benpossible typo: in the formula before (28), it should be ?

[Corrected, thanks. – T.]14 March, 2010 at 11:33 am

254A, Notes 8: The circular law « What’s new[…] the semi-circular law where this reduction follows easily from the Hoffman-Weilandt inequality (see Notes 4). Instead, we must continue working with unbounded random variables throughout the argument […]

20 May, 2010 at 7:45 am

Random matrices: Localization of the eigenvalues and the necessity of four moments « What’s new[…] the vague topology) where is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this […]

22 October, 2010 at 6:27 am

AnonymousDear Prof. Tao,

Do you plan to post on Tracy-Widom distribution (and hence somehow Airy function, Painleve II equation)?I would be great.

Thanks

3 March, 2011 at 10:11 am

FP Recent Follows « YW's E-Profile[…] https://terrytao.wordpress.com/2010/02/02/254a-notes-4-the-semi-circular-law/ […]

27 November, 2012 at 10:02 am

Alex GittensIn (12), it looks like the denominator’s missing a negative sign, since 1/z is not Herglotz

[Corrected, thanks – T.]21 February, 2014 at 6:15 pm

Luo YongmingDear Prof. Tao,

Did you use the Borel-Cantelli-inequality for the almost convergence in the Exercise 6? But to do so, i think ck should at least larger than 1. Or did you mean another approach cause i saw that the hint was “a lacunary sequence” but i couldn’t see that point. What i’ve read from the lecture notes of prob. thry is that every subsequence has a conv. subsequence almost sure iff xn converges to x in probability, but from this i also can’t derive the almost sure convergence.

27 December, 2014 at 2:23 pm

Appell polynomials, cumulants, noncrossing partitions, Dyck lattice paths, and inversion | Shadows of Simplicity[…] https://terrytao.wordpress.com/2010/02/02/254a-notes-4-the-semi-circular-law/#more-3426 […]

14 February, 2015 at 7:17 am

JulianSDear Prof. Tao,

the way you phrased Exercise 6(2) it sounds like one can prove weak convergence in probability directly from the Moment Continuity Theorem? I don’t see how this can work. I am not even sure how weak convergence in probability can be phrased in terms of the characteristic functions. Any comment would be highly appreciated!

14 February, 2015 at 8:33 am

Terence TaoFrom the moment continuity theorem in the contrapositive, one can show that if a real subgaussian random variable, is a Gaussian variable, and one wishes to show that for some , then it suffices to show that for sufficiently many (depending on and the subgaussian constants) and some (also depending on and the subgaussian constants). Applying this to a randomly selected eigenvalue of gives the result.

16 July, 2016 at 6:38 am

Kwangjun AhnDear prof. Tao,

At the beginning of [1. Preliminary reductions ], you made an argument that it suffices to consider for a lacunary subsequnces. However, I don’t see why this is true because in your argument it appeared that you considered $\mu (-\infty , \lambda/ \sqrt{n})$ rather than $\mu (-\infty , \lambda)$, which is our actual interest.

I would be grateful to hear from you!

17 July, 2016 at 8:14 am

Terence TaoIn the interlacing law one can replace by and use the fact that is monotone in $t$; this lets one compare the nonlacunary sequence from above and below by two lacunary sequences (with slightly different values of ). One also needs to use the fact that the semicircular law is continuous to conclude.

18 March, 2017 at 10:56 am

tornado92I am a bit confused in the discussion following (19).

Could someone please elaborate on how has a positive definite imaginary part, and why has operator norm ? How does this follow from the spectral theorem?

22 March, 2017 at 5:51 am

Terence TaoThe spectral theorem tells us that the Hermitian matrix is unitarily conjugate to a real diagonal matrix, so to establish these two claims it suffices to do so in the case that is a real diagonal matrix. Then the problem decouples into copies of the one-dimensional version of the problem, so it suffices to check the scalar case, which is then a straightforward calculation.

24 March, 2017 at 1:49 am

Eric Kauderer-AbramsDear Prof. Tao,

In your first observation in 1. Preliminary Reductions, when you apply the Cauchy interlacing law to demonstrate that the ESD is stable in n, shouldn’t the measures be $\mu_{\frac{1}{\sqrt{n}}M_n}(-\infty, \lambda \sqrt{n})$ and $\mu_{\frac{1}{\sqrt{m}}M_m}(-\infty, \lambda \sqrt{m})$,

instead of $\mu_{\frac{1}{\sqrt{n}}M_n}(-\infty, \lambda / \sqrt{n})$ and $\mu_{\frac{1}{\sqrt{m}}M_m}(-\infty, \lambda / \sqrt{m})$ ?

The definition that you use for the ESD counts the number of eigenvalues less than $\lambda / \sqrt{n}$, so it seems that the changes described above are necessary to ensure that we are counting up to the same value for both $\mu_{\frac{1}{\sqrt{n}}M_n}$ and $\mu_{\frac{1}{\sqrt{m}}M_m}$.

24 March, 2017 at 1:53 am

Eric Kauderer-AbramsDear Prof. Tao,

In your first observation in 1. Preliminary Reductions, when you apply the Cauchy interlacing law to demonstrate that the ESD is stable in n, shouldn’t the measures be and ,

instead of and ?

The definition that you use for the ESD counts the number of eigenvalues less than , so it seems that the changes described above are necessary to ensure that we are counting up to the same value for both and .

[I believe the formulae are correct as stated. Note for instance that . -T]28 March, 2017 at 9:57 pm

keejI think it’s fine; the number of eigenvalues of below is the same as the number of eigenvalues of below .

19 April, 2017 at 8:04 am

AnonymousDoes the Stieltjies transform proof, which uses conc. of measure, require that the matrix-entry distribution be bounded?

19 April, 2017 at 8:16 am

AnonymousThat was poorly phrased. I meant to ask, does the Stieltjes transform version of the proof of the semicircle law, in which Talagrand’s Concentration Inequality was used, assume that the entries of the random matrix have a bounded distribution? Otherwise it seems, one cannot appeal to Talagrand’s Inequality.

19 April, 2017 at 9:26 pm

Terence TaoYes, but thanks to Exercise 5, one can reduce to the bounded distribution case.

16 September, 2017 at 2:08 pm

Inverting the Schur complement, and large-dimensional Gelfand-Tsetlin patterns | What's new[…] 2 If is GUE normalised so that each entry has variance , then by the semi-circular law (see previous notes) one has (using an appropriate branch of the square root). One can then verify the self-similar […]

2 December, 2017 at 5:44 pm

keejI want to prove the semicircular law holds for the adjacency matrix of an Erdos Renyi graph G(n, p). The problem is its off-diagonal entries are mean p, not zero. So, I would like to subtract p times the all ones matrix, and then add back the p along the diagonal.

First, the diagonal doesn’t matter, by Exercise 3. Next, the all ones matrix is low rank, so I want to apply Exercise 4, which as written doesn’t require that the perturbation be Hermitian. But I’m suspicious because the hint for Exercise 4 says to use the Weyl inequalities, which do require Hermitian matrices.

So is Exercise 4 really true? If not, how does one finish the argument for the Erdos Renyi graph?

6 December, 2017 at 12:22 pm

Terence TaoThe hypothesis that be Hermitian (which is for instance the case in your Erdos-Renyi application) was mistakenly omitted from Exercise 4, and is now corrected.

6 December, 2017 at 2:49 pm

keejThank you! Ah, for some reason I had been thinking that the all-ones matrix was not Hermitian, so my previous comment must not have made too much sense.

24 December, 2018 at 6:10 pm

ChenThe question is: if we can discard any portion of the matrix that has small frobenius norm, then why can we not discard any small number of entries of , and repeat the process many times, so that we are left with the zero matrix? At each successive step, we only remove a matrix of small frobenius norm, so we should not change the limiting distribution. But then our starting matrix clearly is different from the zero matrix. What went wrong here?

24 December, 2018 at 6:43 pm

keejTo get the zero matrix, you’d have to discard an o(N) portion Omega(N) times. If you write out and index all the matrices involved in a 2d array, you will see that this is a question of interchanging limits, which is not valid in general (and your example proves it false in this case)

12 August, 2019 at 1:48 pm

Some news regarding the Paley graph – Short, Fat Matrices[…] with zeros on the diagonal and independent Rademacher variables on the off-diagonal. Since is a Wigner matrix, its spectrum follows a semi-circle law, and so enjoys a similar spectrum with an additional spike […]

7 November, 2020 at 4:10 am

The Cauchy residue trick: spectral analysis made “easy” – Machine Learning Research Blog[…] In this post, I have shown various applications of the Cauchy residue formula, for computing integrals and for the spectral analysis of matrices. I have just scratched the surface of spectral analysis, and what I presented extends to many interesting situations, for example, to more general linear operators in infinite-dimensional spaces [3], or to the analysis fo the eigenvalue distribution of random matrices (see a nice and reasonably simple derivation of the semi-circular law from Terry Tao’s blog). […]

26 August, 2021 at 7:28 am

AnonymousDear Prof Tao,

A minor notational inconsistency in this post: you sometimes write circular law when semi-circular law is intended. (For instance in the statement of Exercise 2.)

[Great, thanks – T.]1 June, 2022 at 2:49 pm

BenDear Professor Tao,

If one allows the variance of the diagonal terms to be of slower order, like O(N^{-1/3}) instead of O(N^-1), would the semicircle law still hold?

2 June, 2022 at 1:40 pm

Terence TaoYes; this can basically be derived from the small-diagonal case by treating the large diagonal as an additive perturbation and using the Weyl eigenvalue inequalities.