This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula
whenever are and matrices respectively (or more generally, and could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an determinant, while the right-hand side is a determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of determinants as can be converted via this formula to determinants of a fixed size (independent of ), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.
There are many ways to prove the identity. One is to observe first that when are invertible square matrices of the same size, that and are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that and have the same non-zero eigenvalues.
By rescaling, one obtains the variant identity
which essentially relates the characteristic polynomial of with that of . When , a comparison of coefficients this already gives important basic identities such as and ; when is not equal to , an inspection of the coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).
Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.
Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues of the Gaussian Unitary Ensemble (GUE), where is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution
where is Lebesgue measure on the Weyl chamber , is a constant, and the Hamiltonian is given by the formula
At the macroscopic scale of , the eigenvalues are distributed according to the Wigner semicircle law
Indeed, if one defines the classical location of the eigenvalue to be the unique solution in to the equation
then it is known that the random variable is quite close to . Indeed, a result of Gustavsson shows that, in the bulk region when , is distributed asymptotically as a gaussian random variable with mean and variance . Note that from the semicircular law, the factor is the mean eigenvalue spacing.
At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to in the bulk, but can be as large as at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.
Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale and the macroscopic scale , for instance in correlating the eigenvalues and in the regime for some . Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both and behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to (in the bulk, at least); thus, for instance, adjacent eigenvalues and are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as and , have a correlation comparable to . One way to get a sense of this is to look at the trace
This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of . In contrast, each of the (in the bulk, at least) has a variance comparable to . In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of .
Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian around the minimum , which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.
— 1. Fekete points —
It is easy to see that the Hamiltonian is convex in the Weyl chamber, and goes to infinity on the boundary of this chamber, so it must have a unique minimum, at a set of points known as the Fekete points. At the minimum, we have , which expands to become the set of conditions
for all . To solve these conditions, we introduce the monic degree polynomial
Using the identity
followed by (1), we can rearrange this as
Comparing this with (2), we conclude that
or in other words that is the Hermite polyomial
Thus the Fekete points are nothing more than the zeroes of the Hermite polynomial.
Heuristically, one can study these zeroes by looking at the function
which solves the eigenfunction equation
Comparing this equation with the harmonic oscillator equation , which has plane wave solutions for positive and exponentially decaying solutions for negative, we are led (heuristically, at least) to conclude that is concentrated in the region where is positive (i.e. inside the interval ) and will oscillate at frequency roughly inside this region. As such, we expect the Fekete points to obey the same spacing law as the classical locations ; indeed it is possible to show that in the bulk (with some standard modifications at the edge). In particular, we have the heuristic
for in the bulk.
Remark 1 If one works with the circular unitary ensemble (CUE) instead of the GUE, the Fekete points become equally spaced around the unit circle, so that this heuristic essentially becomes exact.
— 2. Taylor expansion —
Now we expand around the Fekete points by making the ansatz
thus Gustavsson’s result predicts that each is normally distributed with standard deviation (in the bulk). We Taylor expand
We heuristically drop the cubic and higher order terms. The constant term can be absorbed into the partition constant , while the linear term vanishes by the property of the Fekete points. We are thus lead to a quadratic (i.e. gaussian) model
for the probability distribution of the shifts , where is the appropriate normalisation constant.
Direct computation allows us to expand the quadratic form as
The Taylor expansion is not particularly accurate when and are too close, say , but we will ignore this issue as it should only affect the microscopic behaviour rather than the mesoscopic behaviour. This models the as (coupled) gaussian random variables whose covariance matrix can in principle be explicitly computed by inverting the matrix of the quadratic form. Instead of doing this precisely, we shall instead work heuristically (and somewhat inaccurately) by re-expressing the quadratic form in the Haar basis. For simplicity, let us assume that is a power of . Then the Haar basis consists of the basis vector
together with the basis vectors
for every discrete dyadic interval of length between and , where and are the left and right halves of , and , are the vectors that are one on respectively and zero elsewhere. These form an orthonormal basis of , thus we can write
for some coefficients .
From orthonormality we have
and we have
where the matrix coefficients are given by
A standard heuristic wavelet computation using (3) suggests that is small unless and are actually equal, in which case one has
(in the bulk, at least). Actually, the decay of the away from the diagonal is not so large, because the Haar wavelets have poor moment and regularity properties. But one could in principle use much smoother and much more balanced wavelets, in which case the decay should be much faster.
for some absolute constant ; thus we may model and for some iid gaussians independent of . We then have as a model
for the fluctuations themselves. This model does not capture the microscopic behaviour of the eigenvalues such as the sine kernel (indeed, as noted before, the contribution of the very short (which corresponds to very small values of ) is inaccurate), but appears to be a good model to describe the mesoscopic behaviour. For instance, observe that for each there are independent normalised gaussians in the above sum, and so this model is consistent with the result of Gustavsson that each is gaussian with standard deviation . Also, if , then the expansions (5) of share about of the terms in the sum in common, which is consistent with the further result of Gustavsson that the correlation between such eigenvalues is comparable to .
If one looks at the gap using (5) (and replacing the Haar cutoff by something smoother for the purposes of computing the gap), one is led to a heuristic of the form
The dominant terms here are the first term and the contribution of the very short intervals . At present, this model cannot be accurate, because it predicts that the gap can sometimes be negative; the contribution of the very short intervals must instead be replaced some other model that gives sine process behaviour, but we do not know of an easy way to set up a plausible such model.
On the other hand, the model suggests that the gaps are largely decoupled from each other, and have gaussian tails. Standard heuristics then suggest that of the gaps in the bulk, the largest one should be comparable to , which was indeed established recently by Ben Arous and Bourgarde.
Given any probability measure on (or on the Weyl chamber) with a smooth nonzero density, one can can create an associated heat flow on other smooth probability measures by performing gradient flow with respect to the Dirichlet form
Using the ansatz (4), this flow decouples into a system of independent Ornstein-Uhlenbeck processes
where are independent Wiener processes (i.e. Brownian motion). This is a toy model for the Dyson Brownian motion. In this model, we see that the mixing time for each is ; thus, the large-scale variables ( for large ) evolve very slowly by Dyson Brownian motion, taking as long as to reach equilibrium, while the fine scale modes ( for small ) can achieve equilibrium in as brief a time as , with the intermediate modes taking an intermediate amount of time to reach equilibrium. It is precisely this picture that underlies the Erdos-Schlein-Yau approach to universality for Wigner matrices via the local equilibrium flow, in which the measure (4) is given an additional (artificial) weight, roughly of the shape , in order to make equilibrium achieved globally in just time , leading to a local log-Sobolev type inequality that ensures convergence of the local statistics once one controls a Dirichlet form connected to the local equilibrium measure; and then one can use the localisation of eigenvalues provided by a local semicircle law to control that Dirichlet form in turn for measures that have undergone Dyson Brownian motion.