We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble of Hermitian matrices (thus ; this includes real symmetric matrices as an important special case), in which the upper-triangular entries , are iid complex random variables with mean zero and unit variance, and the diagonal entries are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).
In previous notes we saw that the operator norm of was typically of size , so it is natural to work with the normalised matrix . Accordingly, given any Hermitian matrix , we can form the (normalised) empirical spectral distribution (or ESD for short)
of , where are the (necessarily real) eigenvalues of , counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of .
When is a random matrix ensemble, then the ESD is now a random measure – i.e. a random variable taking values in the space of probability measures on the real line. (Thus, the distribution of is a probability measure on probability measures!)
Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles as . Recall from Notes 0 that for any sequence of random variables in a -compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on , and to deterministic limits, we see that a sequence of random ESDs converge in probability (resp. converge almost surely) to a deterministic limit (which, confusingly enough, is a deterministic probability measure!) if, for every test function , the quantities converge in probability (resp. converge almost surely) to .
Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD , one can form its expectation , defined via duality (the Riesz representation theorem) as
this probability measure can be viewed as the law of a random eigenvalue drawn from a random matrix from the ensemble. We then say that the ESDs converge in expectation to a limit if converges the vague topology to , thus
for all .
In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.
Exercise 1 Let be a sequence of Hermitian matrix ensembles, and let be a continuous probability measure on .
- Show that converges almost surely to if and only if converges almost surely to for all .
- Show that converges in probability to if and only if converges in probability to for all .
- Show that converges in expectation to if and only if converges to for all .
We can now state the Wigner semi-circular law.
Theorem 1 (Semicircular law) Let be the top left minors of an infinite Wigner matrix . Then the ESDs converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution
A numerical example of this theorem in action can be seen at the MathWorld entry for this law.
The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices almost surely has operator norm at most . Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of is almost surely at least , and thus (in the finite fourth moment case) the norm is in fact equal to . Indeed, we have just shown that the circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).
As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.
There are several ways to prove (or at least to heuristically justify) the circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).
— 1. Preliminary reductions —
Before we begin any of the proofs of the circular law, we make some simple observations which will reduce the difficulty of the arguments in the sequel.
The first observation is that the Cauchy interlacing law (Exercise 14 from Notes 3a) shows that the ESD of is very stable in . Indeed, we see from the interlacing law that
for any threshold and any .
Exercise 2 Using this observation, show that to establish the circular law (in any of the three senses of convergence), it suffices to do so for an arbitrary lacunary sequence of (thus for some and all ).
The above lacunary reduction does not help one establish convergence in probability or expectation, but will be useful when establishing almost sure convergence, as it significantly reduces the inefficiency of the union bound. (Note that a similar lacunary reduction was also used to prove the strong law of large numbers in Notes 1.)
Lemma 2 For any Hermitian matrices , any , and any , we have
Proof: We just prove the first inequality, as the second is similar (and also follows from the first, by reversing the sign of ).
Let be the largest eigenvalue of less than , and let be the largest eigenvalue of less than . Our task is to show that
If then we are clearly done, so suppose that . Then we have for all , and hence
The claim now follows from (2).
This has the following corollary:
Exercise 3 (Stability of ESD laws wrt small perturbations) Let be a sequence of random Hermitian matrix ensembles such that converges almost surely to a limit . Let be another sequence of Hermitian random matrix ensembles such that converges almost surely to zero. Show that converges almost surely to .
Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.
Informally, this exercise allows us to discard any portion of the matrix which is in the Frobenius norm. For instance, the diagonal entries of have a Frobenius norm of almost surely, by the strong law of large numbers. Hence, without loss of generality, we may set the diagonal equal to zero for the purposes of the semi-circular law.
One can also remove any component of that is of rank :
Exercise 4 (Stability of ESD laws wrt small rank perturbations) Let be a sequence of random Hermitian matrix ensembles such that converges almost surely to a limit . Let be another sequence of random matrix ensembles such that converges almost surely to zero. Show that converges almost surely to . (Hint: use the Weyl inequalities instead of the Wielandt-Hoffman law.)
Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.
In a similar vein, we may apply the truncation argument (much as was done for the central limit theorem in Notes 2) to reduce the semi-circular law to the bounded case:
Exercise 5 Show that in order to prove the semi-circular law (in the almost sure sense), it suffices to do so under the additional hypothesis that the random variables are bounded. Similarly for the convergence in probability or in expectation senses.
Remark 2 These facts ultimately rely on the stability of eigenvalues with respect to perturbations. This stability is automatic in the Hermitian case, but for non-symmetric matrices, serious instabilities can occur due to the presence of pseudospectrum. We will discuss this phenomenon more in later lectures (but see also this earlier blog post).
— 2. The moment method —
We now prove the semi-circular law via the method of moments, which we have already used several times in the previous notes. In order to use this method, it is convenient to use the preceding reductions to assume that the coefficients are bounded, the diagonal vanishes, and that ranges over a lacunary sequence. We will implicitly assume these hypotheses throughout the rest of the section.
As we have already discussed the moment method extensively, much of the argument here will be delegated to exercises. A full treatment of these computations can be found in the book of Bai and Silverstein.
In particular, on taking expectations, we have
From concentration of measure (and the Bai-Yin theorem) for the operator norm of a random matrix (Proposition 7 of Notes 3), we see that the are uniformly subgaussian, indeed we have
for , where are absolute (so the decay in fact improves quite rapidly with ). From this and the moment continuity theorem (Theorem 4 of Notes 2), we can now establish the circular law through computing the mean and variance of moments:
- Show that to prove convergence in expectation to the semi-circular law, it suffices to show that
for , where is an expression that goes to zero as for fixed (and fixed choice of coefficient distribution ).
- Show that to prove convergence in probability to the semi-circular law, it suffices to show (4) together with the variance bound
- Show that to prove almost sure convergence to the semi-circular law, it suffices to show (4) together with the variance bound
for and some . (Note here that it is useful to restrict to a lacunary sequence!)
Ordinarily, computing second-moment quantities such as the left-hand side of (5) is harder than computing first-moment quantities such as (4). But one can obtain the required variance bounds from concentration of measure:
- When is a positive even integer, Use Talagrand’s inequality and convexity of the Schatten norm to establish (6) (and hence (5)) when is even.
- For odd, the formula still applies as long as is positive definite. Applying this observation, the Bai-Yin theorem, and Talagrand’s inequality to the norms of for a constant , establish (6) (and hence (5)) when is odd also.
Remark 3 More generally, concentration of measure results (such as Talagrand’s inequality) can often be used to automatically upgrade convergence in expectation to convergence in probability or almost sure convergence. We will not attempt to formalise this principle here.
for all , where the Catalan number is zero when is odd, and is equal to
In view of the above computations, the establishment of the semi-circular law now reduces to computing the moments of the semi-circular distribution:
(Hint: use a trigonometric substitution , and then express the integrand in terms of Fourier phases .)
This concludes the proof of the semi-circular law (for any of the three modes of convergence).
Remark 4 In the spirit of the Lindeberg exchange method, observe that Exercise (9) is unnecessary if one already knows that the semi-circular law holds for at least one ensemble of Wigner matrices (e.g. the GUE ensemble). Indeed, Exercise 9 can be deduced from such a piece of knowledge. In such a situation, it is not necessary to actually compute the main term on the right of (4); it would be sufficient to know that that limit is universal, in that it does not depend on the underlying distribution. In fact, it would even suffice to establish the slightly weaker statement
whenever are two ensembles of Wigner matrices arising from different underlying distributions (but still normalised to have mean zero, unit variance, and to be bounded (or at worst subgaussian)). We will take advantage of this perspective later in these notes.
— 3. The Stieltjes transform method —
The moment method was computationally intensive, but straightforward. As noted in Remark 4, even without doing much of the algebraic computation, it is clear that the moment method will show that some universal limit for Wigner matrices exists (or, at least, that the differences between the distributions of two different Wigner matrices converge to zero). But it is not easy to see from this method why the limit should be given by the semi-circular law, as opposed to some other distribution (although one could eventually work this out from an inverse moment computation).
When studying the central limit theorem, we were able to use the Fourier method to control the distribution of random matrices in a cleaner way than in the moment method. Analogues of this method exist, but require non-trivial formulae from noncommutative Fourier analysis, such as the Harish-Chandra integration formula (and also only work for highly symmetric ensembles, such as GUE or GOE), and will not be discussed in this course. (Our later notes on determinantal processes, however, will contain some algebraic identities related in some ways to the noncommutative Fourier-analytic approach.)
We now turn to another method, the Stieltjes transform method, which uses complex-analytic methods rather than Fourier-analytic methods, and has turned out to be one of the most powerful and accurate tools in dealing with the ESD of random Hermitian matrices. Whereas the moment method started from the identity (3), the Stieltjes transform method proceeds from the identity
for any complex not in the support of . We refer to the expression on the left-hand side as the Stieltjes transform of or of , and denote it by or as for short. The expression is the normalised resolvent of , and plays an important role in the spectral theory of that matrix. Indeed, in contrast to general-purpose methods such as the moment method, the Stieltjes transform method draws heavily on the specific linear-algebraic structure of this problem, and in particular on the rich structure of resolvents.
On the other hand, the Stieltjes transform can be viewed as a generating function of the moments via the Taylor series expansion
valid for sufficiently large. This is somewhat (though not exactly) analogous to how the characteristic function of a scalar random variable can be viewed as a generating function of the moments .
Now let us study the Stieltjes transform more systematically. Given any probability measure on the real line, we can form its Stieltjes transform
for any outside of the support of ; in particular, the Stieltjes transform is well-defined on the upper and lower half-planes in the complex plane. Even without any further hypotheses on other than it is a probability measure, we can say a remarkable amount about how this transform behaves in . Applying conjugations we obtain the symmetry
so we may as well restrict attention to in the upper half-plane (say). Next, from the trivial bound
where is an expression that, for any fixed , goes to zero as goes to infinity non-tangentially in the sense that is kept bounded, where the rate of convergence is allowed to depend on . From differentiation under the integral sign (or an application of Morera’s theorem and Fubini’s theorem) we see that is complex analytic on the upper and lower half-planes; in particular, it is smooth away from the real axis. From the Cauchy integral formula (or differentiation under the integral sign) we in fact get some bounds for higher derivatives of the Stieltjes transform away from this axis:
Informally, “behaves like a constant” at scales significantly less than the distance to the real axis; all the really interesting action here is going on near that axis.
The imaginary part of the Stieltjes transform is particularly interesting. Writing , we observe that
and so we see that
for in the upper half-plane; thus is a complex-analytic map from the upper half-plane to itself, a type of function known as a Herglotz function. (In fact, all complex-analytic maps from the upper half-plane to itself that obey the asymptotic (12) are of this form; this is a special case of the Herglotz representation theorem, which also gives a slightly more general description in the case when the asymptotic (12) is not assumed. A good reference for this material and its consequences is this book of Garnett.)
where is the Poisson kernel
as in the vague topology,or equivalently (by (10)) that
as (this is closely related to the Plemelj formula in potential theory). Thus we see that a probability measure can be recovered in terms of the limiting behaviour of the Stieltjes transform on the real axis.
A variant of the above machinery gives us a criterion for convergence:
Exercise 10 (Stieltjes continuity theorem) Let be a sequence of random probability measures on the real line, and let be a deterministic probability measure.
- converges almost surely to in the vague topology if and only if converges almost surely to for every in the upper half-plane.
- converges in probability to in the vague topology if and only if converges in probability to for every in the upper half-plane.
- converges in expectation to in the vague topology if and only if converges to for every in the upper half-plane.
(Hint: The “only if” parts are fairly easy. For the “if” parts, take a test function and approximate by . Then approximate this latter integral in turn by a Riemann sum, using (13).)
Thus, to prove the semi-circular law, it suffices to show that for each in the upper half-plane, the Stieltjes transform
converges almost surely (and thus in probability and in expectation) to the Stieltjes transform of the semi-circular law.
It is not difficult to compute the Stieltjes transform of the semi-circular law, but let us hold off on that task for now, because we want to illustrate how the Stieltjes transform method can be used to find the semi-circular law, even if one did not know this law in advance, by directly controlling . We will fix to be a complex number not on the real line, and allow all implied constants in the discussion below to depend on and (we will focus here only on the behaviour as ).
The main idea here is predecessor comparison: to compare the transform of the matrix with the transform of the top left minor , or of other minors. For instance, we have the Cauchy interlacing law (Exercise 14 from Notes 3a), which asserts that the eigenvalues of intersperse that of . This implies that for a complex number with , the difference
is an alternating sum of evaluations of the function . The total variation of this function is (recall that we are suppressing dependence of constaants on ), and so the alternating sum above is . Writing this in terms of the Stieltjes transform, we conclude that
Applying (13) to approximate by , we conclude that
So for fixed away from the real axis, the Stieltjes transform is quite stable in .
This stability has the following important consequence. Observe that while the left-hand side of (16) depends on the matrix , the right-hand side depends only on the top left minor of that matrix. In particular, it is independent of the row and column of . This implies that this entire row and column has only a limited amount of influence on the Stieltjes transform : no matter what value one assigns to this row and column (including possibly unbounded values, as long as one keeps the matrix Hermitian of course), the transform can only move by .
By permuting the rows and columns, we obtain that in fact any row or column of can influence is at most . (This is closely related to the observation in Exercise 4 that low rank perturbations do not significantly affect the ESD.) On the other hand, the rows of (the upper triangular portion of) are jointly independent. When is a Wigner random matrix, we can then apply a standard concentration of measure result, such as McDiarmid’s inequality (Theorem 7 from Notes 1) to conclude concetration of around its mean:
for all and some absolute constants . (This is not necessarily the strongest concentration result one can establish for the Stieltjes transform, but it will certainly suffice for our discussion here.) In particular, we see from the Borel-Cantelli lemma (Exercise 24 of Notes 0a)that for any fixed away from the real line, converges almost surely (and thus also in probability) to zero. As a consequence, convergence of in expectation automatically implies convergence in probability or almost sure convergence.
However, while concentration of measure tells us that is close to its mean, it does not shed much light as to what this mean is. For this, we have to go beyond the Cauchy interlacing formula and deal with the resolvent more directly. Firstly, we observe from the linearity of trace that
where denotes the component of a matrix . Because is a Wigner matrix, it is easy to see on permuting the rows and columns that all of the random variables have the same distribution. Thus we may simplify the above formula as
So now we have to compute the last entry of an inverse of a matrix. There are of course a number of formulae for this, such as Cramer’s rule. But it will be more convenient here to use a formula based instead on the Schur complement:
Exercise 11 Let be a matrix, let be the top left minor, let be the bottom right entry of , let be the right column of with the bottom right entry removed, and let be the bottom row with the bottom right entry removed. In other words,
Assume that and are both invertible. Show that
(Hint: Solve the equation , where is the basis vector, using the method of Schur complements (or from first principles).)
The point of this identity is that it describes (part of) the inverse of in terms of the inverse of , which will eventually provide a non-trivial recursive relationship between and , which can then be played off against (16) to solve for in the asymptotic limit .
In our situation, the matrix and its minor is automatically invertible. Inserting the above formula into (18) (and recalling that we normalised the diagonal of to vanish), we conclude that
where is the top right column of with the bottom entry removed.
One may be concerned that the denominator here could vanish. However, observe that has imaginary part if . Furthermore, from the spectral theorem we see that the imaginary part of is positive definite, and so has non-negative imaginary part. As a consequence the magnitude of the denominator here is bounded below by , and so its reciprocal is (compare with (11)). So the reciprocal here is not going to cause any discontinuity, as we are considering is fixed and non-zero.
Now we need to understand the expression . We write this as , where is the resolvent matrix . The distribution of the random matrix could conceivably be quite complicated. However, the key point is that the vector only involves the entries of that do not lie in , and so the random matrix and the vector are independent. Because of this, we can use the randomness of to do most of the work in understanding the expression , without having to know much about at all.
To understand this, let us first condition to be a deterministic matrix , and see what we can do with the expression .
Firstly, observe that will not be arbitrary; indeed, from the spectral theorem we see that will have operator norm at most . Meanwhile, from the Chernoff (or Hoeffding) inequality (Theorem 2 or Exercise 4 of Notes 1) we know that has magnitude with overwhelming probability. So we know that has magnitude with overwhelming probability.
Furthermore, we can use concentration of measure as follows. Given any positive semi-definite matrix of operator norm , the expression is a Lipschitz function of with operator norm . Applying Talagrand’s inequality (Theorem 9 of Notes 1) we see that this expression concentrates around its median:
for any . On the other hand, has magnitude with overwhelming probability, so the median must be . Squaring, we conclude that
(possibly after adjusting the absolute constants ). As usual, we may replace the median with the expectation:
for any deterministic matrix of operator norm .
But what is the expectation ? This can be expressed in components as
where are the entries of , and are the entries of . But the are iid with mean zero and variance one, so the standard second moment computation shows that this expectation is nothing more than the trace
for any deterministic matrix of operator norm , and any . Informally, is typically .
The bound (21) was proven for deterministic matrices, but by using conditional expectation it also applies for any random matrix , so long as that matrix is independent of . In particular, we may apply it to our specific matrix of interest
The trace of this matrix is essentially just the Stieltjes transform at . Actually, due to the normalisation factor being slightly off, we actually have
with overwhelming probability. Putting this back into (19), and recalling that the denominator is bounded away from zero, we have the remarkable equation
Note how this equation came by playing off two ways in which the spectral properties of a matrix interacted with that of its minor ; firstly via the Cauchy interlacing inequality, and secondly via the Schur complement formula.
This equation already describes the behaviour of quite well, but we will content ourselves with understanding the limiting behaviour as . From (13) and Fubini’s theorem we know that the function is locally uniformly equicontinuous and locally uniformly bounded away from the real line. Applying the Arzelá-Ascoli theorem, we thus conclude that on a subsequence at least, converges locally uniformly to a limit . This will be a Herglotz function (i.e. an analytic function mapping the upper half-plane to the upper half-plane), and taking limits in (22) (observing that the imaginary part of the denominator here is bounded away from zero) we end up with the exact equation
We can of course solve this by the quadratic formula, obtaining
To figure out what branch of the square root one has to use here, we use (12), which easily implies that
as goes to infinity non-tangentially away from the real line. (To justify this, one has to make the error term in (12) uniform in , but this can be accomplished without difficulty using the Bai-Yin theorem (for instance).) Also, we know that has to be complex analytic (and in particular, continuous) away from the real line. From this and basic complex analysis, we conclude that
where is the branch of the square root with a branch cut at and which equals at infinity.
As there is only one possible subsequence limit of the , we conclude that converges locally uniformly (and thus pointwise) to the function (24), and thus (by the concentration of measure of ) we see that for each , converges almost surely (and in probability) to .
Exercise 12 Find a direct proof (starting from (22), (12), and the smoothness of ) that for any fixed , that avoids using the Arzelá-Ascoli theorem. (The basic point here is that one has to solve the approximate equation (22), using some robust version of the quadratic formula. The fact that is a Herglotz function will help eliminate various unwanted possibilities, such as one coming from the wrong branch of the square root.)
as . Thus the semi-circular law is the only possible measure which has Stieltjes transform , and indeed a simple application of the Cauchy integral formula and (25) shows us that is indeed the Stieltjes transform of .
Putting all this together, we have completed the Stieltjes transform proof of the semi-circular law.
Remark 5 In order to simplify the above exposition, we opted for a qualitative analysis of the semi-circular law here, ignoring such questions as the rate of convergence to this law. However, an inspection of the above arguments reveals that it is easy to make all of the above analysis quite quantitative, with quite reasonable control on all terms. (One has to use Exercise 12 instead of the Arzelá-Ascoli theorem if one wants everything to be quantitative.) In particular, it is not hard to use the above analysis to show that for for some small absolute constant , one has with overwhelming probability. Combining this with a suitably quantitative version of the Stieltjes continuity theorem, this in turn gives a polynomial rate of convergence of the ESDs to the semi-circular law , in that one has
with overwhelming probability for all .
A variant of this quantitative analysis can in fact get very good control on this ESD down to quite fine scales, namely to scales , which is only just a little bit larger than the mean spacing of the normalised eigenvalues (recall that we have normalised eigenvalues, constrained to lie in the interval by the Bai-Yin theorem). This was accomplished by Erdös, Schlein, and Yau (under some additional regularity hypotheses on the distribution , but these can be easily removed with the assistance of Talagrand’s inequality) by using an additional observation, namely that the eigenvectors of a random matrix are very likely to be delocalised in the sense that their energy is dispersed more or less evenly across its coefficients. We will return to this point in later notes.
— 4. Dyson Brownian motion and the Stieltjes transform —
We now explore how the Stieltjes transform interacts with the Dyson Brownian motion introduced in Notes 3b. We let be a large number, and let be a Wiener process of Hermitian random matrices, with associated eigenvalues , Stieltjes transforms
We now study how , evolve in time in the asymptotic limit . Our computation will be only heuristic in nature.
Recall from Notes 3b that the eigenvalues undergo Dyson Brownian motion
Applying (26) and Taylor expansion (dropping all terms of higher order than , using the Ito heuristic ), we conclude that
For away from the real line, the term is of size and can heuristically be ignored in the limit . Dropping this term, and then taking expectations to remove the Brownian motion term , we are led to
Performing the summation using (26) we obtain
where we adopt the convention that for real , is the average of and . Using (27), this becomes
where the subscript denotes differentiation in . From (15) we heuristically have
(heuristically treating as a function rather than a measure) and on squaring one obtains
From this the Cauchy integral formula around a slit in real axis (using the bound (11) to ignore the contributions near infinity) we thus have
and thus on differentiation in
Comparing this with (29), we obtain
From concentration of measure, we expect to concentrate around its mean , and similarly should concentrate around . In the limit , the expected Stieltjes transform should thus obey Burgers’ equation
To illustrate how this equation works in practice, let us give an informal derivation of the semi-circular law. We consider the case when the Wiener process starts from , thus for a GUE matrix . As such, we have the scaling symmetry
where is the asymptotic Stieltjes transform for GUE (which we secretly know to be given by (24), but let us pretend that we did not yet know this fact). Inserting this self-similar ansatz into (30) and setting , we conclude that
multiplying by two and integrating, we conclude that
for some constant . But from the asymptotic (12) we see that must equal . But then the above equation can be rearranged into (23), and so by repeating the arguments at the end of the previous section we can deduce the formula (24), which then gives the semi-circular law by (15).
As is well known in PDE, one can solve Burgers’ equation more generally by the method of characteristics. For reasons that will be come clearer in the next set of notes, I will solve this equation by a slightly different (but ultimately equivalent) method. The idea is that rather than think of as a function of for fixed , we think of as a function of for fixed . (This trick is sometimes known as the hodograph transform, especially if one views as “velocity” and as “position”.) Note from (12) that we expect to be able to invert the relationship between and as long as is large (and is small).
To exploit this change of perspective, we think of as all varying by infinitesimal amounts respectively. Using (30) and the total derivative formula , we see that
If we hold fixed (i.e. ), so that is now just a function of , and cancel off the factor, we conclude that
This, in principle, gives a way to compute from . First, we invert the relationship to ; then we add to ; then we invert again to recover .
Since , where is a GUE matrix independent of , we have thus given a formula to describe the Stieltjes transform of in terms of the Stieltjes transform of . This formula is a special case of a more general formula of Voiculescu for free convolution, with the operation of inverting the Stieltjes transform essentially being the famous -transform of Voiculescu; we will discuss this more in the next section.