You are currently browsing the tag archive for the ‘Dyson Brownian motion’ tag.

Let ${n}$ be a large natural number, and let ${M_n}$ be a matrix drawn from the Gaussian Unitary Ensemble (GUE), by which we mean that ${M_n}$ is a Hermitian matrix whose upper triangular entries are iid complex gaussians with mean zero and variance one, and whose diagonal entries are iid real gaussians with mean zero and variance one (and independent of the upper triangular entries). The eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ are then real and almost surely distinct, and can be viewed as a random point process ${\Sigma^{(n)} := \{\lambda_1(M_n),\ldots,\lambda_n(M_n)\}}$ on the real line. One can then form the ${k}$-point correlation functions ${\rho_k^{(n)}: {\bf R}^k \rightarrow {\bf R}^+}$ for every ${k \geq 0}$, which can be defined by duality by requiring

$\displaystyle \mathop{\bf E} \sum_{i_1,\ldots,i_k \hbox{ distinct}} F( \lambda_{i_1}(M_n),\ldots,\lambda_{i_k}(M_n))$

$\displaystyle = \int_{{\bf R}^k} \rho_k^{(n)}(x_1,\ldots,x_k) F(x_1,\ldots,x_k)\ dx_1 \ldots dx_k$

for any test function ${F: {\bf R}^k \rightarrow {\bf R}^+}$. For GUE, which is a continuous matrix ensemble, one can also define ${\rho_k^{(n)}(x_1,\ldots,x_k)}$ for distinct ${x_1<\ldots as the unique quantity such that the probability that there is an eigenvalue in each of the intervals ${[x_1,x_1+\epsilon],\ldots,[x_k,x_k+\epsilon]}$ is ${(\rho_k^{(n)}(x_1,\ldots,x_k)+o(1))\epsilon^k}$ in the limit ${\epsilon\rightarrow 0}$.

As is well known, the GUE process is a determinantal point process, which means that ${k}$-point correlation functions can be explicitly computed as

$\displaystyle \rho^{(n)}_k(x_1,\ldots,x_k) = \det( K^{(n)}(x_i,x_j) )_{1 \leq i,j \leq k}$

for some kernel ${K^{(n)}: {\bf R} \times {\bf R} \rightarrow {\bf C}}$; explicitly, one has

$\displaystyle K^{(n)}(x,y) := \sum_{k=0}^{n-1} P_k(x) e^{-x^2/4}P_k(y) e^{-y^2/4}$

where ${P_0, P_1,\ldots}$ are the (normalised) Hermite polynomials; see this previous blog post for details.

Using the asymptotics of Hermite polynomials (which then give asymptotics for the kernel ${K^{(n)}}$), one can take a limit of a (suitably rescaled) sequence of GUE processes to obtain the Dyson sine process, which is a determinantal point process ${\Sigma}$ on the real line with correlation functions

$\displaystyle \rho_k(x_1,\ldots,x_k) = \det( K(x_i,x_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)$

where ${K}$ is the Dyson sine kernel

$\displaystyle K(x,y) := \frac{\sin(\pi(x-y))}{\pi(x-y)}. \ \ \ \ \ (2)$

A bit more precisely, for any fixed bulk energy ${-2 < u < 2}$, the renormalised point processes ${\rho_{sc}(u) \sqrt{n} ( \Sigma^{(n)} - \sqrt{n} u )}$ converge in distribution in the vague topology to ${\Sigma}$ as ${n \rightarrow \infty}$, where ${\rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)^{1/2}_+}$ is the semi-circular law density.

On the other hand, an important feature of the GUE process ${\Sigma^{(n)} = \{\lambda_1,\ldots,\lambda_n\}}$ is its stationarity (modulo rescaling) under Dyson Brownian motion

$\displaystyle d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}$

which describes the stochastic evolution of eigenvalues of a Hermitian matrix under independent Brownian motion of its entries, and is discussed in this previous blog post. To cut a long story short, this stationarity tells us that the self-similar ${n}$-point correlation function

$\displaystyle \rho^{(n)}_n(t,x) := t^{-n/2} \rho^{(n)}_n(x/\sqrt{t})$

obeys the Dyson heat equation

$\displaystyle \partial_t \rho^{(n)}_n = \frac{1}{2} \sum_{i=1}^n \partial_{x_i}^2 \rho^{(n)}_n - \sum_{1 \leq i,j \leq n;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_n}{x_i-x_j}$

(see Exercise 11 of the previously mentioned blog post). Note that ${\rho^{(n)}_n}$ vanishes to second order whenever two of the ${x_i}$ coincide, so there is no singularity on the right-hand side. Setting ${t=1}$ and using self-similarity, we can rewrite this equation in time-independent form as

$\displaystyle -\frac{1}{2} \sum_{i=1}^n \partial_i (x_i \rho^{(n)}_n) = \frac{1}{2} \sum_{i=1}^n \partial_{x_i}^2 \rho^{(n)}_n - \sum_{1 \leq i,j \leq n;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_n}{x_i-x_j}.$

One can then integrate out all but ${k}$ of these variables (after carefully justifying convergence) to obtain a system of equations for the ${k}$-point correlation functions ${\rho^{(n)}_k}$:

$\displaystyle -\frac{1}{2} \sum_{i=1}^k \partial_i (x_i \rho^{(n)}_k) = \frac{1}{2} \sum_{i=1}^k \partial_{x_i}^2 \rho^{(n)}_k - \sum_{1 \leq i,j \leq k;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_k}{x_i-x_j}$

$\displaystyle - \sum_{i=1}^k \partial_{x_i} \int_{\bf R} \frac{\rho^{(n)}_{k+1}(x_1,\ldots,x_{k+1})}{x_i-x_{k+1}}\ dx_{k+1},$

where the integral is interpreted in the principal value case. This system is an example of a BBGKY hierarchy.

If one carefully rescales and takes limits (say at the energy level ${u=0}$, for simplicity), the left-hand side turns out to rescale to be a lower order term, and one ends up with a hierarchy for the Dyson sine process:

$\displaystyle 0 = \frac{1}{2} \sum_{i=1}^k \partial_{x_i}^2 \rho_k - \sum_{1 \leq i,j \leq k;i\neq j} \partial_{x_i} \frac{\rho_k}{x_i-x_j} \ \ \ \ \ (3)$

$\displaystyle - \sum_{i=1}^k \partial_{x_i} \int_{\bf R} \frac{\rho_{k+1}(x_1,\ldots,x_{k+1})}{x_i-x_{k+1}}\ dx_{k+1}.$

Informally, these equations show that the Dyson sine process ${\Sigma = \{ \lambda_i: i \in {\bf Z} \}}$ is stationary with respect to the infinite Dyson Brownian motion

$\displaystyle d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}$

where ${dB_i}$ are independent Brownian increments, and the sum is interpreted in a suitable principal value sense.

I recently set myself the exercise of deriving the identity (3) directly from the definition (1) of the Dyson sine process, without reference to GUE. This turns out to not be too difficult when done the right way (namely, by modifying the proof of Gaudin’s lemma), although it did take me an entire day of work before I realised this, and I could not find it in the literature (though I suspect that many people in the field have privately performed this exercise in the past). In any case, I am recording the computation here, largely because I really don’t want to have to do it again, but perhaps it will also be of interest to some readers.

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble ${M_n = (\xi_{ij})_{1 \leq i,j \leq n}}$ of Hermitian matrices (thus ${\xi_{ij} = \overline{\xi_{ji}}}$; this includes real symmetric matrices as an important special case), in which the upper-triangular entries ${\xi_{ij}}$, ${i>j}$ are iid complex random variables with mean zero and unit variance, and the diagonal entries ${\xi_{ii}}$ are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).

In previous notes we saw that the operator norm of ${M_n}$ was typically of size ${O(\sqrt{n})}$, so it is natural to work with the normalised matrix ${\frac{1}{\sqrt{n}} M_n}$. Accordingly, given any ${n \times n}$ Hermitian matrix ${M_n}$, we can form the (normalised) empirical spectral distribution (or ESD for short)

$\displaystyle \mu_{\frac{1}{\sqrt{n}} M_n} := \frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j(M_n) / \sqrt{n}},$

of ${M_n}$, where ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ are the (necessarily real) eigenvalues of ${M_n}$, counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of ${M_n}$.

When ${M_n}$ is a random matrix ensemble, then the ESD ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ is now a random measure – i.e. a random variable taking values in the space ${\hbox{Pr}({\mathbb R})}$ of probability measures on the real line. (Thus, the distribution of ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ is a probability measure on probability measures!)

Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles ${M_n}$ as ${n \rightarrow \infty}$. Recall from Notes 0 that for any sequence of random variables in a ${\sigma}$-compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on ${{\mathbb R}}$, and to deterministic limits, we see that a sequence of random ESDs ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ converge in probability (resp. converge almost surely) to a deterministic limit ${\mu \in \hbox{Pr}({\mathbb R})}$ (which, confusingly enough, is a deterministic probability measure!) if, for every test function ${\varphi \in C_c({\mathbb R})}$, the quantities ${\int_{\mathbb R} \varphi\ d\mu_{\frac{1}{\sqrt{n}} M_n}}$ converge in probability (resp. converge almost surely) to ${\int_{\mathbb R} \varphi\ d\mu}$.

Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD ${\mu_{\frac{1}{\sqrt{n}} M_n}}$, one can form its expectation ${{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} \in \hbox{Pr}({\mathbb R})}$, defined via duality (the Riesz representation theorem) as

$\displaystyle \int_{\mathbb R} \varphi\ d{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} := {\bf E} \int_{\mathbb R} \varphi\ d \mu_{\frac{1}{\sqrt{n}} M_n};$

this probability measure can be viewed as the law of a random eigenvalue ${\frac{1}{\sqrt{n}}\lambda_i(M_n)}$ drawn from a random matrix ${M_n}$ from the ensemble. We then say that the ESDs converge in expectation to a limit ${\mu \in \hbox{Pr}({\mathbb R})}$ if ${{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}}$ converges the vague topology to ${\mu}$, thus

$\displaystyle {\bf E} \int_{\mathbb R} \varphi\ d \mu_{\frac{1}{\sqrt{n}} M_n} \rightarrow \int_{\mathbb R} \varphi\ d\mu$

for all ${\phi \in C_c({\mathbb R})}$.

In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.

Exercise 1 Let ${M_n}$ be a sequence of ${n \times n}$ Hermitian matrix ensembles, and let ${\mu}$ be a continuous probability measure on ${{\mathbb R}}$.

• Show that ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ converges almost surely to ${\mu}$ if and only if ${\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)}$ converges almost surely to ${\mu(-\infty,\lambda)}$ for all ${\lambda \in {\mathbb R}}$.
• Show that ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ converges in probability to ${\mu}$ if and only if ${\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)}$ converges in probability to ${\mu(-\infty,\lambda)}$ for all ${\lambda \in {\mathbb R}}$.
• Show that ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ converges in expectation to ${\mu}$ if and only if ${\mathop{\mathbb E} \mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)}$ converges to ${\mu(-\infty,\lambda)}$ for all ${\lambda \in {\mathbb R}}$.

We can now state the Wigner semi-circular law.

Theorem 1 (Semicircular law) Let ${M_n}$ be the top left ${n \times n}$ minors of an infinite Wigner matrix ${(\xi_{ij})_{i,j \geq 1}}$. Then the ESDs ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution

$\displaystyle \mu_{sc} := \frac{1}{2\pi} (4-|x|^2)^{1/2}_+\ dx. \ \ \ \ \ (1)$

A numerical example of this theorem in action can be seen at the MathWorld entry for this law.

The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices ${\frac{1}{\sqrt{n}} M_n}$ almost surely has operator norm at most ${2+o(1)}$. Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval ${[-2,2]}$ with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of ${\frac{1}{\sqrt{n}} M_n}$ is almost surely at least ${2-o(1)}$, and thus (in the finite fourth moment case) the norm is in fact equal to ${2+o(1)}$. Indeed, we have just shown that the circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).

As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.

There are several ways to prove (or at least to heuristically justify) the circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).

One theme in this course will be the central nature played by the gaussian random variables ${X \equiv N(\mu,\sigma^2)}$. Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.

One way to exploit this algebraic structure is to continuously deform the variance ${t := \sigma^2}$ from an initial variance of zero (so that the random variable is deterministic) to some final level ${T}$. We would like to use this to give a continuous family ${t \mapsto X_t}$ of random variables ${X_t \equiv N(\mu, t)}$ as ${t}$ (viewed as a “time” parameter) runs from ${0}$ to ${T}$.

At present, we have not completely specified what ${X_t}$ should be, because we have only described the individual distribution ${X_t \equiv N(\mu,t)}$ of each ${X_t}$, and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.

We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of ${n \times n}$ Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of ${n \times n}$ Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.