You are currently browsing the tag archive for the ‘moment method’ tag.
We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble of Hermitian matrices (thus ; this includes real symmetric matrices as an important special case), in which the upper-triangular entries , are iid complex random variables with mean zero and unit variance, and the diagonal entries are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).
In previous notes we saw that the operator norm of was typically of size , so it is natural to work with the normalised matrix . Accordingly, given any Hermitian matrix , we can form the (normalised) empirical spectral distribution (or ESD for short)
of , where are the (necessarily real) eigenvalues of , counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of .
When is a random matrix ensemble, then the ESD is now a random measure – i.e. a random variable taking values in the space of probability measures on the real line. (Thus, the distribution of is a probability measure on probability measures!)
Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles as . Recall from Notes 0 that for any sequence of random variables in a -compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on , and to deterministic limits, we see that a sequence of random ESDs converge in probability (resp. converge almost surely) to a deterministic limit (which, confusingly enough, is a deterministic probability measure!) if, for every test function , the quantities converge in probability (resp. converge almost surely) to .
Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD , one can form its expectation , defined via duality (the Riesz representation theorem) as
this probability measure can be viewed as the law of a random eigenvalue drawn from a random matrix from the ensemble. We then say that the ESDs converge in expectation to a limit if converges the vague topology to , thus
for all .
In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.
Exercise 1 Let be a sequence of Hermitian matrix ensembles, and let be a continuous probability measure on .
- Show that converges almost surely to if and only if converges almost surely to for all .
- Show that converges in probability to if and only if converges in probability to for all .
- Show that converges in expectation to if and only if converges to for all .
We can now state the Wigner semi-circular law.
Theorem 1 (Semicircular law) Let be the top left minors of an infinite Wigner matrix . Then the ESDs converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution
A numerical example of this theorem in action can be seen at the MathWorld entry for this law.
The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices almost surely has operator norm at most . Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of is almost surely at least , and thus (in the finite fourth moment case) the norm is in fact equal to . Indeed, we have just shown that the circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).
As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.
There are several ways to prove (or at least to heuristically justify) the circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).
Suppose we have a large number of scalar random variables , which each have bounded size on average (e.g. their mean and variance could be ). What can one then say about their sum ? If each individual summand varies in an interval of size , then their sum of course varies in an interval of size . However, a remarkable phenomenon, known as concentration of measure, asserts that assuming a sufficient amount of independence between the component variables , this sum sharply concentrates in a much narrower range, typically in an interval of size . This phenomenon is quantified by a variety of large deviation inequalities that give upper bounds (often exponential in nature) on the probability that such a combined random variable deviates significantly from its mean. The same phenomenon applies not only to linear expressions such as , but more generally to nonlinear combinations of such variables, provided that the nonlinear function is sufficiently regular (in particular, if it is Lipschitz, either separately in each variable, or jointly in all variables).
The basic intuition here is that it is difficult for a large number of independent variables to “work together” to simultaneously pull a sum or a more general combination too far away from its mean. Independence here is the key; concentration of measure results typically fail if the are too highly correlated with each other.
There are many applications of the concentration of measure phenomenon, but we will focus on a specific application which is useful in the random matrix theory topics we will be studying, namely on controlling the behaviour of random -dimensional vectors with independent components, and in particular on the distance between such random vectors and a given subspace.
Once one has a sufficient amount of independence, the concentration of measure tends to be sub-gaussian in nature; thus the probability that one is at least standard deviations from the mean tends to drop off like for some . In particular, one is standard deviations from the mean with high probability, and standard deviations from the mean with overwhelming probability. Indeed, concentration of measure is our primary tool for ensuring that various events hold with overwhelming probability (other moment methods can give high probability, but have difficulty ensuring overwhelming probability).
This is only a brief introduction to the concentration of measure phenomenon. A systematic study of this topic can be found in this book by Ledoux.
Let X be a real-valued random variable, and let be an infinite sequence of independent and identically distributed copies of X. Let be the empirical averages of this sequence. A fundamental theorem in probability theory is the law of large numbers, which comes in both a weak and a strong form:
Strong law of large numbers. Suppose that the first moment of X is finite. Then converges almost surely to , thus .
[The concepts of convergence in probability and almost sure convergence in probability theory are specialisations of the concepts of convergence in measure and pointwise convergence almost everywhere in measure theory.]
(If one strengthens the first moment assumption to that of finiteness of the second moment , then we of course have a more precise statement than the (weak) law of large numbers, namely the central limit theorem, but I will not discuss that theorem here. With even more hypotheses on X, one similarly has more precise versions of the strong law of large numbers, such as the Chernoff inequality, which I will again not discuss here.)
The weak law is easy to prove, but the strong law (which of course implies the weak law, by Egoroff’s theorem) is more subtle, and in fact the proof of this law (assuming just finiteness of the first moment) usually only appears in advanced graduate texts. So I thought I would present a proof here of both laws, which proceeds by the standard techniques of the moment method and truncation. The emphasis in this exposition will be on motivation and methods rather than brevity and strength of results; there do exist proofs of the strong law in the literature that have been compressed down to the size of one page or less, but this is not my goal here.