You are currently browsing the tag archive for the ‘random matrices’ tag.
[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]
One of the most fundamental partial differential equations in mathematics is the heat equation
is a scalar function
of both time and space, and
is the Laplacian
. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of
in the definition of the heat propagator
is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.
In probability theory, this equation takes on particular significance when is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that
for all . (Actually, it suffices to verify this constraint at time
, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret
as the probability distribution of a Brownian motion
is a stochastic process with initial probability distribution
; see for instance this previous blog post for more discussion.
A model example of a solution to the heat equation to keep in mind is that of the fundamental solution
, which represents the distribution of Brownian motion of a particle starting at the origin
at time
. At time
,
represents an
-valued random variable, each coefficient of which is an independent random variable of mean zero and variance
. (As
,
converges in the sense of distributions to a Dirac mass at the origin.)
The heat equation can also be viewed as the gradient flow for the Dirichlet form
, which formally implies that
is (half of) the negative gradient of the Dirichlet energy
with respect to the
inner product. Among other things, this implies that the Dirichlet energy decreases in time:
that
Since is non-negative, the formula (6) implies that
is integrable in time, and in particular we see that
converges to zero as
, in some averaged
sense at least; similarly, (8) suggests that
also converges to zero. This suggests that
converges to a constant function; but as
is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in
to decay to zero in some sense as
. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the
norm like
.
Since , we also observe the basic cancellation property
.
There are other quantities relating to that also decrease in time under heat flow, particularly in the important case when
is a probability measure. In this case, it is natural to introduce the entropy
Thus, for instance, if is the uniform distribution on some measurable subset
of
of finite measure
, the entropy would be
. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has
for any
, reflecting the fact that
is approximately uniformly distributed on a ball of radius
(and thus of measure
).
A short formal computation shows (if one assumes for simplicity that is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that
where is the square root of
. For instance, if
is the fundamental solution (3), one can check that
(note that this is a significantly cleaner formula than (7)!).
In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.
Actually, one can say more: the rate of decrease of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have
, we see from (1) that
and thus (again assuming that , and hence
, is strictly positive to avoid technicalities)
We thus have
It is now convenient to compute using the Einstein summation convention to hide the summation over indices . We have
and
By integration by parts and interchanging partial derivatives, we may write the first integral as
and from the quotient and product rules, we may write the second integral as
Gathering terms, completing the square, and making the summations explicit again, we see that
and so in particular is always decreasing.
The above identity can also be written as
Exercise 1 Give an alternate proof of the above identity by writing
,
and deriving the equation
for
.
It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).
The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble of
matrices, one can arbitrarily enumerate the (geometric) eigenvalues as
, and one can then define the
-point correlation functions
to be the symmetric functions such that
In the case when is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of
near some point
as
and
is fixed are given by the determinantal rule
and
for , where
is the reproducing kernel
(There is also an asymptotic for the boundary case , but it is more complicated to state.) In particular, we see that
for almost every
, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.
Our first main result is that the asymptotic (1) for also holds (in the sense of vague convergence) when
is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded
and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is
with probability
for some absolute constant
.
There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula
that relates the local distribution of eigenvalues to the log-determinants , and secondly the elementary identity
that relates the log-determinants of to the log-determinants of the Hermitian matrices
The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants . This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.
The matrices are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.
While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)
Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues of a random Wigner matrix
(such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues
with
for some fixed
, although analogues of most of the results below have also been obtained at the edge of the spectrum.
If we normalise the entries of the matrix to have mean zero and variance
, then in the asymptotic limit
, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution
, where
An essentially equivalent way of saying this is that for large , we expect the
eigenvalue
of
to stay close to the classical location
, defined by the formula
In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has
.
In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape for some absolute constant
; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that
has variance comparable to
in the bulk, so that the optimal error term in (1) should be about
. (One may think that if one wanted bounds on (1) that were uniform in
, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the
; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order
.)
A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form
with exponentially high probability for intervals in the bulk that were as short as
for some
, where
is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues
(basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form
in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.
As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor . The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely
with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).
Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function of many independent random variables
with the same function
of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as
for various smooth test functions , by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy,
was a bounded test function, which allowed one to get control of the bulk of the distribution of
, and in particular in controlling probabilities such as
for various thresholds and
, but did not give good control on the tail as the error terms tended to be polynomially decaying in
rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as
for various moderately large (e.g. of size comparable to
), obtaining results such as
after performing all the relevant exchanges. As such, one can then use large deviation estimates on to deduce large deviation estimates on
.
In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents (and the closely related Stieltjes transform
) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity
for any energy level , which can be verified from elementary calculus. (In practice, we would truncate
near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions
, which in turn can be used to deduce concentration results for individual eigenvalues
by some basic combinatorial manipulations.
Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year. This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices. In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries. We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).
For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.
Van Vu and I have just uploaded to the arXiv our paper “The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices“, submitted to the Proceedings of the National Academy of Sciences. This short note concerns the convergence of the -point correlation functions of Wigner matrices in the bulk to the Dyson
-point functions, a statement conjectured by Wigner, Dyson, and Mehta. Thanks to the results of Erdös, Peche, Ramirez, Schlein, Vu, Yau, and myself, this conjecture has now been established for all Wigner matrices (assuming a finite moment condition on the entries), but only if one uses a quite weak notion of convergence, namely averaged vague convergence in which one averages in the energy parameter
. The main purpose of this note is to observe that by combining together existing results in the literature, one can improve the convergence to vague convergence (which is the natural notion of convergence in the discrete setting); and furthermore, if one assumes some regularity and decay conditions on the coefficient distribution, one can improve the convergence further to local
convergence.
More precisely, let be an
Wigner matrix – a random Hermitian matrix whose off-diagonal elements
for
are iid with mean zero and variance
(and whose diagonal elements also obey similar hypotheses, which we omit here). For simplicity, we also assume that the real and imaginary parts of
are also iid (as is the case for instance for the Gaussian Unitary Ensemble (GUE)). The eigenvalues
of such a matrix are known to be asymptotically distributed accordingly to the Wigner semicircular distribution
, where
In particular, this suggests that at any energy level in the bulk
of the spectrum, the average eigenvalue spacing should be about
. It is then natural to introduce the normalised
-point correlation function
for any distinct reals and
, where
is the event that there is an eigenvalue in each of the intervals
for each
. (This definition is valid when the Wigner ensemble is continuous; for discrete ensembles, one can define
instead in a distributional sense.)
The Wigner-Dyson-Mehta conjecture asserts that converges (in various senses) as
to the Dyson
-point function
where is the Dyson sine kernel. This conjecture was verified first for the GUE (with a quite strong notion of convergence, namely local uniform convergence) by Dyson, using an explicit formula for
in the GUE case due to Gaudin and Mehta. Later results of Johansson, Erdos-Ramirez-Schlein-Yau, Erdos-Peche-Ramirez-Schlein-Yau, and Vu and myself, extended these results to increasingly wider ranges of Wigner matrices, but in the context of either weak convergence (which means that
, compactly supported function
), or the slightly weaker notion of vague convergence (which is the same as weak convergence, except that the function
is also required to be continuous).
In a joint paper of Erdos, Ramirez, Schlein, Vu, Yau, and myself, we established the Wigner-Dyson-Mehta conjecture for all Wigner matrices (assuming only an exponential decay condition on the entries), but using a quite weak notion of convergence, namely averaged vague convergence, which allows for averaging in the energy parameter. Specifically, we showed that
Subsequently, Erdos, Schlein, and Yau introduced the powerful local relaxation flow method, which achieved a simpler proof of the same result which also generalised to other ensembles beyond the Wigner case. However, for technical reasons, this method was restricted to establishing averaged vague convergence only.
In the current paper, we show that by combining the argument of Erdos, Ramirez, Schlein, Vu, Yau, and myself with some more recent technical results, namely the relaxation of the exponential decay condition in the four moment theorem to a finite moment condition (established by Vu and myself) and a strong eigenvalue localisation bound of Erdos, Yau, and Yin, one can upgrade the averaged vague convergence to vague convergence, and handle all Wigner matrices that assume a finite moment condition. Vague convergence is the most natural notion of convergence for discrete random matrix ensembles; for such ensembles, the correlation function is a discrete measure, and so one does not expect convergence to a continuous limit in any stronger sense than the vague sense. Also, by carefully inspecting the earlier argument of Erdos, Peche, Ramirez, Schlein, and Yau, we were able to establish convergence in the stronger local sense once one assumed some regularity and positivity condition on the underlying coefficient distribution. These are somewhat modest and technical improvements over previous work on the Wigner-Dyson-Mehta conjecture, but they help to clarify and organise the profusion of results in this area, which are now reaching a fairly definitive form.
It may well be possible to go beyond local convergence in the case of smooth ensembles, for instance establishing local uniform convergence; this was recently accomplished in the
case by Maltsev and Schlein. Indeed one may optimistically expect to even have convergence in the local smooth topology, which would basically be the strongest convergence one could hope for.
I’ve just uploaded to the arXiv my paper “Outliers in the spectrum of iid matrices with bounded rank perturbations“, submitted to Probability Theory and Related Fields. This paper is concerned with outliers to the circular law for iid random matrices. Recall that if is an
matrix whose entries are iid complex random variables with mean zero and variance one, then the
complex eigenvalues of the normalised matrix
will almost surely be distributed according to the circular law distribution
in the limit
. (See these lecture notes for further discussion of this law.)
The circular law is also stable under bounded rank perturbations: if is a deterministic rank
matrix of polynomial size (i.e. of operator norm
), then the circular law also holds for
(this is proven in a paper of myself, Van Vu, and Manjunath Krisnhapur). In particular, the bulk of the eigenvalues (i.e.
of the
eigenvalues) will lie inside the unit disk
.
However, this leaves open the possibility for one or more outlier eigenvalues that lie significantly outside the unit disk; the arguments in the paper cited above give some upper bound on the number of such eigenvalues (of the form for some absolute constant
) but does not exclude them entirely. And indeed, numerical data shows that such outliers can exist for certain bounded rank perturbations.
In this paper, some results are given as to when outliers exist, and how they are distributed. The easiest case is of course when there is no bounded rank perturbation: . In that case, an old result of Bai and Yin and of Geman shows that the spectral radius of
is almost surely
, thus all eigenvalues will be contained in a
neighbourhood of the unit disk, and so there are no significant outliers. The proof is based on the moment method.
Now we consider a bounded rank perturbation which is nonzero, but which has a bounded operator norm:
. In this case, it turns out that the matrix
will have outliers if the deterministic component
has outliers. More specifically (and under the technical hypothesis that the entries of
have bounded fourth moment), if
is an eigenvalue of
with
, then (for
large enough),
will almost surely have an eigenvalue at
, and furthermore these will be the only outlier eigenvalues of
.
Thus, for instance, adding a bounded nilpotent low rank matrix to will not create any outliers, because the nilpotent matrix only has eigenvalues at zero. On the other hand, adding a bounded Hermitian low rank matrix will create outliers as soon as this matrix has an operator norm greater than
.
When I first thought about this problem (which was communicated to me by Larry Abbott), I believed that it was quite difficult, because I knew that the eigenvalues of non-Hermitian matrices were quite unstable with respect to general perturbations (as discussed in this previous blog post), and that there were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the unit disk in which the resolvent is actually well controlled and so there is no pseudospectrum present to cause instability. This was pointed out to me by Alice Guionnet at an AIM workshop last week, after I had posed the above question during an open problems session. Furthermore, at the same workshop, Percy Deift emphasised the point that the basic determinantal identity
matrices
and
matrices
was a particularly useful identity in random matrix theory, as it converted problems about large (
) matrices into problems about small (
) matrices, which was particularly convenient in the regime when
and
was fixed. (Percy was speaking in the context of invariant ensembles, but the point is in fact more general than this.)
From this, it turned out to be a relatively simple manner to transform what appeared to be an intractable matrix problem into quite a well-behaved
matrix problem for bounded
. Specifically, suppose that
had rank
, so that one can factor
for some (deterministic)
matrix
and
matrix
. To find an eigenvalue
of
, one has to solve the characteristic polynomial equation
This is an determinantal equation, which looks difficult to control analytically. But we can manipulate it using (1). If we make the assumption that
is outside the spectrum of
(which we can do as long as
is well away from the unit disk, as the unperturbed matrix
has no outliers), we can divide by
to arrive at
Now we apply the crucial identity (1) to rearrange this as
The crucial point is that this is now an equation involving only a determinant, rather than an
one, and is thus much easier to solve. The situation is particularly simple for rank one perturbations
in which case the eigenvalue equation is now just a scalar equation
that involves what is basically a single coefficient of the resolvent . (It is also an instructive exercise to derive this eigenvalue equation directly, rather than through (1).) There is by now a very well-developed theory for how to control such coefficients (particularly for
in the exterior of the unit disk, in which case such basic tools as Neumann series work just fine); in particular, one has precise enough control on these coefficients to obtain the result on outliers mentioned above.
The same method can handle some other bounded rank perturbations. One basic example comes from looking at iid matrices with a non-zero mean and variance
; this can be modeled by
where
is the unit vector
. Here, the bounded rank perturbation
has a large operator norm (equal to
), so the previous result does not directly apply. Nevertheless, the self-adjoint nature of the perturbation has a stabilising effect, and I was able to show that there is still only one outlier, and that it is at the expected location of
.
If one moves away from the case of self-adjoint perturbations, though, the situation changes. Let us now consider a matrix of the form , where
is a randomised version of
, e.g.
, where the
are iid Bernoulli signs; such models were proposed recently by Rajan and Abbott as a model for neural networks in which some nodes are excitatory (and give columns with positive mean) and some are inhibitory (leading to columns with negative mean). Despite the superficial similarity with the previous example, the outlier behaviour is now quite different. Instead of having one extremely large outlier (of size
) at an essentially deterministic location, we now have a number of eigenvalues of size
, scattered according to a random process. Indeed, (in the case when the entries of
were real and bounded) I was able to show that the outlier point process converged (in the sense of converging
-point correlation functions) to the zeroes of a random Laurent series
where are iid real Gaussians. This is basically because the coefficients of the resolvent
have a Neumann series whose coefficients enjoy a central limit theorem.
On the other hand, as already observed numerically (and rigorously, in the gaussian case) by Rajan and Abbott, if one projects such matrices to have row sum zero, then the outliers all disappear. This can be explained by another appeal to (1); this projection amounts to right-multiplying by the projection matrix
to the zero-sum vectors. But by (1), the non-zero eigenvalues of the resulting matrix
are the same as those for
. Since
annihilates
, we thus see that in this case the bounded rank perturbation plays no role, and the question reduces to obtaining a circular law with no outliers for
. As it turns out, this can be done by invoking the machinery of Van Vu and myself that we used to prove the circular law for various random matrix models.
This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula
whenever are
and
matrices respectively (or more generally,
and
could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an
determinant, while the right-hand side is a
determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of
determinants as
can be converted via this formula to determinants of a fixed size (independent of
), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.
There are many ways to prove the identity. One is to observe first that when are invertible square matrices of the same size, that
and
are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that
and
have the same non-zero eigenvalues.
By rescaling, one obtains the variant identity
which essentially relates the characteristic polynomial of with that of
. When
, a comparison of coefficients this already gives important basic identities such as
and
; when
is not equal to
, an inspection of the
coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).
Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.
Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues of the
Gaussian Unitary Ensemble (GUE), where
is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution
where is Lebesgue measure on the Weyl chamber
,
is a constant, and the Hamiltonian
is given by the formula
At the macroscopic scale of , the eigenvalues
are distributed according to the Wigner semicircle law
Indeed, if one defines the classical location of the
eigenvalue to be the unique solution in
to the equation
then it is known that the random variable is quite close to
. Indeed, a result of Gustavsson shows that, in the bulk region when
for some fixed
,
is distributed asymptotically as a gaussian random variable with mean
and variance
. Note that from the semicircular law, the factor
is the mean eigenvalue spacing.
At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to in the bulk, but can be as large as
at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.
Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale and the macroscopic scale
, for instance in correlating the eigenvalues
and
in the regime
for some
. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both
and
behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to
(in the bulk, at least); thus, for instance, adjacent eigenvalues
and
are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as
and
, have a correlation comparable to
. One way to get a sense of this is to look at the trace
This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of . In contrast, each of the
(in the bulk, at least) has a variance comparable to
. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of
.
Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian around the minimum
, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.
Let be a large integer, and let
be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution
where is a Haar measure on Hermitian matrices and
is the normalisation constant required to make the distribution of unit mass. The eigenvalues
of this matrix are then a coupled family of
real random variables. For any
, we can define the
-point correlation function
to be the unique symmetric measure on
such that
A standard computation (given for instance in these lecture notes of mine) gives the Ginebre formula
for the -point correlation function, where
is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as
where the kernel is given by
where and
are the (
-normalised) Hermite polynomials (thus the
are an orthonormal family, with each
being a polynomial of degree
). Integrating out one or more of the variables, one is led to the Gaudin-Mehta formula
in the previous formula turns out to simply be equal to
.) Again, see these lecture notes for details.
The functions can be viewed as an orthonormal basis of eigenfunctions for the harmonic oscillator operator
indeed it is a classical fact that
As such, the kernel can be viewed as the integral kernel of the spectral projection operator
.
From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of as
. The two main asymptotics of interest are given by the following lemmas:
Lemma 1 (Asymptotics of
in the bulk) Let
, and let
be the semicircular law density at
. Then, we have
as
for any fixed
(removing the singularity at
in the usual manner).
Lemma 2 (Asymptotics of
at the edge) We have
as
for any fixed
, where
is the Airy function
and again removing the singularity at
in the usual manner.
The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).
The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM). I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”). While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult. At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.
I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure. Below the fold is a first draft of the article; I would definitely welcome feedback and corrections. It does not yet have any pictures, but I plan to rectify that in the final draft. It also does not have a title, but this will be easy to address later. But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience. Any suggestions in this regard would be particularly appreciated.
I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues
of a Wigner random matrix . More specifically, we consider
Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal
on the diagonal, and
off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.
The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has
(in the vague topology) where is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)
One can phrase this law in a number of equivalent ways. For instance, in the bulk region , one almost surely has
, where the classical location
of the (normalised)
eigenvalue
is defined by the formula
The bound (1) also holds in the edge case (by using the operator norm bound , due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.
From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of . There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of
. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing
in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale
(and even at the slightly finer scale of
for some absolute constant
) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of
depends only on the first three moments of the matrix entries.
It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing at the critical scale
, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.
The main result of this paper is to show that this is not the case; that at the critical scale , the distribution of eigenvalues
is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.
Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment
A standard moment method computation shows that the right hand side is equal to
where is the fourth moment of the real part of the off-diagonal coefficients of
. In particular, a change in the fourth moment
by
leads to a change in the expression
by
. Thus, for a typical
, one expects
to shift by
; since
on the average, we thus expect
itself to shift by about
by the mean-value theorem.
To make this rigorous, one needs a sufficiently strong concentration of measure result for that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that
was sharply concentrated around an interval of size
around
for any
(in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that
was also weakly concentrated around an interval of size
around
, in the sense that the probability that one was outside this interval was
for some absolute constant
. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of
as
varied from
to
was of the size of
for some absolute
.
As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual is
(without averaging in
). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean
is indeed sensitive to the fourth moment of the entries at the critical scale
.
One curious feature of the analysis is how differently the median and the mean of the eigenvalue react to the available technology. To control the global behaviour of the eigenvalues (after averaging in
), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of
to an interval of size
, but can only localise the mean to a much larger interval of size
. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as
for which all eigenvalues could deviate as far as
from their expected location, instead of their typical deviation of
. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than
times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape
. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Recent Comments