You are currently browsing the tag archive for the ‘GUE’ tag.

Let ${n}$ be a large natural number, and let ${M_n}$ be a matrix drawn from the Gaussian Unitary Ensemble (GUE), by which we mean that ${M_n}$ is a Hermitian matrix whose upper triangular entries are iid complex gaussians with mean zero and variance one, and whose diagonal entries are iid real gaussians with mean zero and variance one (and independent of the upper triangular entries). The eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ are then real and almost surely distinct, and can be viewed as a random point process ${\Sigma^{(n)} := \{\lambda_1(M_n),\ldots,\lambda_n(M_n)\}}$ on the real line. One can then form the ${k}$-point correlation functions ${\rho_k^{(n)}: {\bf R}^k \rightarrow {\bf R}^+}$ for every ${k \geq 0}$, which can be defined by duality by requiring

$\displaystyle \mathop{\bf E} \sum_{i_1,\ldots,i_k \hbox{ distinct}} F( \lambda_{i_1}(M_n),\ldots,\lambda_{i_k}(M_n))$

$\displaystyle = \int_{{\bf R}^k} \rho_k^{(n)}(x_1,\ldots,x_k) F(x_1,\ldots,x_k)\ dx_1 \ldots dx_k$

for any test function ${F: {\bf R}^k \rightarrow {\bf R}^+}$. For GUE, which is a continuous matrix ensemble, one can also define ${\rho_k^{(n)}(x_1,\ldots,x_k)}$ for distinct ${x_1<\ldots as the unique quantity such that the probability that there is an eigenvalue in each of the intervals ${[x_1,x_1+\epsilon],\ldots,[x_k,x_k+\epsilon]}$ is ${(\rho_k^{(n)}(x_1,\ldots,x_k)+o(1))\epsilon^k}$ in the limit ${\epsilon\rightarrow 0}$.

As is well known, the GUE process is a determinantal point process, which means that ${k}$-point correlation functions can be explicitly computed as

$\displaystyle \rho^{(n)}_k(x_1,\ldots,x_k) = \det( K^{(n)}(x_i,x_j) )_{1 \leq i,j \leq k}$

for some kernel ${K^{(n)}: {\bf R} \times {\bf R} \rightarrow {\bf C}}$; explicitly, one has

$\displaystyle K^{(n)}(x,y) := \sum_{k=0}^{n-1} P_k(x) e^{-x^2/4}P_k(y) e^{-y^2/4}$

where ${P_0, P_1,\ldots}$ are the (normalised) Hermite polynomials; see this previous blog post for details.

Using the asymptotics of Hermite polynomials (which then give asymptotics for the kernel ${K^{(n)}}$), one can take a limit of a (suitably rescaled) sequence of GUE processes to obtain the Dyson sine process, which is a determinantal point process ${\Sigma}$ on the real line with correlation functions

$\displaystyle \rho_k(x_1,\ldots,x_k) = \det( K(x_i,x_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)$

where ${K}$ is the Dyson sine kernel

$\displaystyle K(x,y) := \frac{\sin(\pi(x-y))}{\pi(x-y)}. \ \ \ \ \ (2)$

A bit more precisely, for any fixed bulk energy ${-2 < u < 2}$, the renormalised point processes ${\rho_{sc}(u) \sqrt{n} ( \Sigma^{(n)} - \sqrt{n} u )}$ converge in distribution in the vague topology to ${\Sigma}$ as ${n \rightarrow \infty}$, where ${\rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)^{1/2}_+}$ is the semi-circular law density.

On the other hand, an important feature of the GUE process ${\Sigma^{(n)} = \{\lambda_1,\ldots,\lambda_n\}}$ is its stationarity (modulo rescaling) under Dyson Brownian motion

$\displaystyle d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}$

which describes the stochastic evolution of eigenvalues of a Hermitian matrix under independent Brownian motion of its entries, and is discussed in this previous blog post. To cut a long story short, this stationarity tells us that the self-similar ${n}$-point correlation function

$\displaystyle \rho^{(n)}_n(t,x) := t^{-n/2} \rho^{(n)}_n(x/\sqrt{t})$

obeys the Dyson heat equation

$\displaystyle \partial_t \rho^{(n)}_n = \frac{1}{2} \sum_{i=1}^n \partial_{x_i}^2 \rho^{(n)}_n - \sum_{1 \leq i,j \leq n;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_n}{x_i-x_j}$

(see Exercise 11 of the previously mentioned blog post). Note that ${\rho^{(n)}_n}$ vanishes to second order whenever two of the ${x_i}$ coincide, so there is no singularity on the right-hand side. Setting ${t=1}$ and using self-similarity, we can rewrite this equation in time-independent form as

$\displaystyle -\frac{1}{2} \sum_{i=1}^n \partial_i (x_i \rho^{(n)}_n) = \frac{1}{2} \sum_{i=1}^n \partial_{x_i}^2 \rho^{(n)}_n - \sum_{1 \leq i,j \leq n;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_n}{x_i-x_j}.$

One can then integrate out all but ${k}$ of these variables (after carefully justifying convergence) to obtain a system of equations for the ${k}$-point correlation functions ${\rho^{(n)}_k}$:

$\displaystyle -\frac{1}{2} \sum_{i=1}^k \partial_i (x_i \rho^{(n)}_k) = \frac{1}{2} \sum_{i=1}^k \partial_{x_i}^2 \rho^{(n)}_k - \sum_{1 \leq i,j \leq k;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_k}{x_i-x_j}$

$\displaystyle - \sum_{i=1}^k \partial_{x_i} \int_{\bf R} \frac{\rho^{(n)}_{k+1}(x_1,\ldots,x_{k+1})}{x_i-x_{k+1}}\ dx_{k+1},$

where the integral is interpreted in the principal value case. This system is an example of a BBGKY hierarchy.

If one carefully rescales and takes limits (say at the energy level ${u=0}$, for simplicity), the left-hand side turns out to rescale to be a lower order term, and one ends up with a hierarchy for the Dyson sine process:

$\displaystyle 0 = \frac{1}{2} \sum_{i=1}^k \partial_{x_i}^2 \rho_k - \sum_{1 \leq i,j \leq k;i\neq j} \partial_{x_i} \frac{\rho_k}{x_i-x_j} \ \ \ \ \ (3)$

$\displaystyle - \sum_{i=1}^k \partial_{x_i} \int_{\bf R} \frac{\rho_{k+1}(x_1,\ldots,x_{k+1})}{x_i-x_{k+1}}\ dx_{k+1}.$

Informally, these equations show that the Dyson sine process ${\Sigma = \{ \lambda_i: i \in {\bf Z} \}}$ is stationary with respect to the infinite Dyson Brownian motion

$\displaystyle d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}$

where ${dB_i}$ are independent Brownian increments, and the sum is interpreted in a suitable principal value sense.

I recently set myself the exercise of deriving the identity (3) directly from the definition (1) of the Dyson sine process, without reference to GUE. This turns out to not be too difficult when done the right way (namely, by modifying the proof of Gaudin’s lemma), although it did take me an entire day of work before I realised this, and I could not find it in the literature (though I suspect that many people in the field have privately performed this exercise in the past). In any case, I am recording the computation here, largely because I really don’t want to have to do it again, but perhaps it will also be of interest to some readers.

I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of a random Wigner matrix ${M_n}$ in the limit ${n \rightarrow \infty}$, with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues ${\lambda_i(M_n)}$ with ${\delta n \leq i \leq (1-\delta) n}$ for some fixed ${\delta>0}$.

The location of an individual eigenvalue ${\lambda_i(M_n)}$ is by now quite well understood. If we normalise the entries of the matrix ${M_n}$ to have mean zero and variance ${1}$, then in the asymptotic limit ${n \rightarrow \infty}$, the Wigner semicircle law tells us that with probability ${1-o(1)}$ one has

$\displaystyle \lambda_i(M_n) =\sqrt{n} u + o(\sqrt{n})$

where the classical location ${u = u_{i/n} \in [-2,2]}$ of the eigenvalue is given by the formula

$\displaystyle \int_{-2}^{u} \rho_{sc}(x)\ dx = \frac{i}{n}$

and the semicircular distribution ${\rho_{sc}(x)\ dx}$ is given by the formula

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Actually, one can improve the error term here from ${o(\sqrt{n})}$ to ${O( \log^{1/2+\epsilon} n)}$ for any ${\epsilon>0}$ (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).

From the semicircle law (and the fundamental theorem of calculus), one expects the ${i^{th}}$ eigenvalue spacing ${\lambda_{i+1}(M_n)-\lambda_i(M_n)}$ to have an average size of ${\frac{1}{\sqrt{n} \rho_{sc}(u)}}$. It is thus natural to introduce the normalised eigenvalue spacing

$\displaystyle X_i := \frac{\lambda_{i+1}(M_n) - \lambda_i(M_n)}{1/\sqrt{n} \rho_{sc}(u)}$

and ask what the distribution of ${X_i}$ is.

As mentioned previously, we will focus on the bulk case ${\delta n \leq i\leq (1-\delta)n}$, and begin with the model case when ${M_n}$ is drawn from GUE. (In the edge case when ${i}$ is close to ${1}$ or to ${n}$, the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to ${X_i}$, namely the probability

$\displaystyle {\bf P}( N_{[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]} = 0) \ \ \ \ \ (1)$

that an interval ${[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}$ near ${\sqrt{n} u}$ of length comparable to the expected eigenvalue spacing ${1/\sqrt{n} \rho_{sc}(u)}$ is devoid of eigenvalues. For ${u}$ in the bulk and fixed ${x,y}$, they showed that this probability is equal to

$\displaystyle \det( 1 - 1_{[x,y]} P 1_{[x,y]} ) + o(1),$

where ${P}$ is the Dyson projection

$\displaystyle P f(x) = \int_{\bf R} \frac{\sin(\pi(x-y))}{\pi(x-y)} f(y)\ dy$

to Fourier modes in ${[-1/2,1/2]}$, and ${\det}$ is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that ${X_i}$ will be asymptotically distributed according to the Gaudin-Mehta distribution ${p(x)\ dx}$, where

$\displaystyle p(x) := \frac{d^2}{dx^2} \det( 1 - 1_{[0,x]} P 1_{[0,x]} ).$

A reasonably accurate approximation for ${p}$ is given by the Wigner surmise ${p(x) \approx \frac{1}{2} \pi x e^{-\pi x^2/4}}$ [EDIT: as pointed out in comments, in this GUE setting the correct surmise is ${p(x) \approx \frac{3^2}{\pi^2} x^2 e^{-4 x^2/\pi}}$], which was presciently proposed by Wigner as early as 1957; it is exact for ${n=2}$ but not in the asymptotic limit ${n \rightarrow \infty}$.

Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap ${X_i}$, but rather an ensemble of gaps ${X_i}$, where ${i}$ is drawn from an interval ${[i_0 - L, i_0 + L]}$ of some moderate size ${L}$ (e.g. ${L = \log n}$); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices ${i}$). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)

The problem is that when one specifies a given window of spectrum such as ${[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}$, one cannot quite pin down in advance which eigenvalues ${\lambda_i(M_n)}$ are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of ${\sqrt{\log n}}$ or so in the ${i}$ index (as can be quantified quite precisely by this central limit theorem of Gustavsson).

The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is even, but very few when ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is odd. In this sort of situation, the gaps ${X_i}$ may have different behaviour for even ${i}$ than for odd ${i}$, and such anomalies would not be picked up in the averaged statistics in which ${i}$ is allowed to range over some moderately large interval.

The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps ${X_i}$ in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the ${i}$ parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels ${u}$ than on specific eigenvalue indices ${i}$. To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)

The main task in the proof is to show that the random variable ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is largely decoupled from the event in (1) when ${M_n}$ is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert ${1 - 1_{[x,y]}P1_{[x,y]}}$, where ${P}$ is the Dyson projection and ${[x,y]}$ is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant ${\det M_n}$ of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant ${\det A_n}$ of a random iid matrix ${A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$, thus the real and imaginary parts are independent with law ${N(0,1/2)_{\bf R}}$), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution ${\zeta_{ij} \equiv \pm 1}$ (with a ${1/2}$ chance of either sign). More generally, one can consider a matrix ${A_n}$ in which all the entries ${\zeta_{ij}}$ are independently and identically distributed with mean zero and variance ${1}$.

We can expand ${\det A_n}$ using the Leibniz expansion

$\displaystyle \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)$

where ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ ranges over the permutations of ${\{1,\ldots,n\}}$, and ${I_\sigma}$ is the product

$\displaystyle I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.$

From the iid nature of the ${\zeta_{ij}}$, we easily see that each ${I_\sigma}$ has mean zero and variance one, and are pairwise uncorrelated as ${\sigma}$ varies. We conclude that ${\det A_n}$ has mean zero and variance ${n!}$ (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that ${\det A_n}$ is typically of size ${O(\sqrt{n!})}$.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, ${\det A_n}$ is clearly symmetrical, so we can focus attention on the magnitude ${|\det A_n|}$. We can interpret this quantity geometrically as the volume of an ${n}$-dimensional parallelopiped whose generating vectors ${X_1,\ldots,X_n}$ are independent real gaussian vectors in ${{\bf R}^n}$ (i.e. their coefficients are iid with law ${N(0,1)_{\bf R}}$). Using the classical base-times-height formula, we thus have

$\displaystyle |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)$

where ${V_i}$ is the ${i-1}$-dimensional linear subspace of ${{\bf R}^n}$ spanned by ${X_1,\ldots,X_{i-1}}$ (note that ${X_1,\ldots,X_n}$, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

$\displaystyle \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).$

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group ${O(n)}$. Because of this, we see that if we fix ${X_1,\ldots,X_{i-1}}$ (and thus ${V_i}$, the random variable ${\hbox{dist}(X_i,V_i)}$ has the same distribution as ${\hbox{dist}(X_i,{\bf R}^{i-1})}$, or equivalently the ${\chi}$ distribution

$\displaystyle \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}$

where ${\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}}$ are iid copies of ${N(0,1)_{\bf R}}$. As this distribution does not depend on the ${X_1,\ldots,X_{i-1}}$, we conclude that the law of ${\log |\det A_n|}$ is given by the sum of ${n}$ independent ${\chi}$-variables:

$\displaystyle \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.$

A standard computation shows that each ${\chi_j^2}$ has mean ${j}$ and variance ${2j}$, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that ${\log \chi_j}$ has mean ${\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})}$ and variance ${\frac{1}{2j}+O(1/j^{3/2})}$. As such, ${\log |\det A_n|}$ has mean ${\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)}$ and variance ${\frac{1}{2} \log n + O(1)}$. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)$

where ${\rightarrow}$ denotes convergence in distribution. A bit more informally, we have

$\displaystyle |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)$

when ${A_n}$ is a real gaussian matrix; thus, for instance, the median value of ${|\det A_n|}$ is ${n^{-1/2+o(1)} \sqrt{n!}}$. At first glance, this appears to conflict with the second moment bound ${\mathop{\bf E} |\det A_n|^2 = n!}$ of Turán mentioned earlier, but once one recalls that ${\exp(N(0,t)_{\bf R})}$ has a second moment of ${\exp(2t)}$, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real ${\chi}$ distribution with the complex ${\chi}$ distribution, in which the ${\xi_{i,j}}$ are distributed according to the complex gaussian ${N(0,1)_{\bf C}}$ instead of the real one). At the end of the day, one ends up with the law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)$

$\displaystyle |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)$

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices ${A_n}$ by Wigner matrices ${M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus ${\zeta_{ij} = \overline{\zeta_{ji}}}$ for all ${i,j}$. Model examples here include the Gaussian Unitary Ensemble (GUE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i=j \leq n}$, the Gaussian Orthogonal Ensemble (GOE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,2)_{\bf R}}$ for ${1 \leq i=j \leq n}$, and the symmetric Bernoulli ensemble, in which ${\zeta_{ij} \equiv \pm 1}$ for ${1 \leq i \leq j \leq n}$ (with probability ${1/2}$ of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants ${\det M_n}$ of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the ${I_\sigma}$ are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of ${I_\sigma}$ is still usually zero, but equals ${(-1)^{n/2}}$ in the exceptional case when ${\sigma}$ is a perfect matching (i.e. the union of exactly ${n/2}$ ${2}$-cycles, a possibility that can of course only happen when ${n}$ is even). As such, the mean ${\mathop{\bf E} \det M_n}$ still vanishes when ${n}$ is odd, but for even ${n}$ it is equal to

$\displaystyle (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}$

(the fraction here simply being the number of perfect matchings on ${n}$ vertices). Using Stirling’s formula, one then computes that ${|\mathop{\bf E} \det A_n|}$ is comparable to ${n^{-1/4} \sqrt{n!}}$ when ${n}$ is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that ${\mathop{\bf E} |\det A_n|^2}$ is comparable to ${n^{1/2} n!}$ for GUE and ${n^{3/2} n!}$ for GOE. (The discrepancy here comes from the fact that in the GOE case, ${I_\sigma}$ and ${I_\rho}$ can correlate when ${\rho}$ contains reversals of ${k}$-cycles of ${\sigma}$ for ${k \geq 3}$, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let ${M_n}$ be a Wigner matrix.

• If ${M_n}$ is drawn from GUE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.$

• If ${M_n}$ is drawn from GOE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.$

• The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )$

when ${M_n}$ is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )$

when ${M_n}$ is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

$\displaystyle \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)$

of the Stieltjes transform

$\displaystyle s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}$

of ${M_n}$. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) Let ${M'_n}$ be the random tridiagonal real symmetric matrix

$\displaystyle M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}$

where the ${a_1,\ldots,a_n, b_1,\ldots,b_{n-1}}$ are jointly independent real random variables, with ${a_1,\ldots,a_n \equiv N(0,1)_{\bf R}}$ being standard real Gaussians, and each ${b_i}$ having a ${\chi}$-distribution:

$\displaystyle b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}$

where ${z_{i,j} \equiv N(0,1)_{\bf C}}$ are iid complex gaussians. Let ${M_n}$ be drawn from GUE. Then the joint eigenvalue distribution of ${M_n}$ is identical to the joint eigenvalue distribution of ${M'_n}$.

Proof: Let ${M_n}$ be drawn from GUE. We can write

$\displaystyle M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}$

where ${M_{n-1}}$ is drawn from the ${n-1\times n-1}$ GUE, ${a_n \equiv N(0,1)_{\bf R}}$, and ${X_n \in {\bf C}^{n-1}}$ is a random gaussian vector with all entries iid with distribution ${N(0,1)_{\bf C}}$. Furthermore, ${M_{n-1}, X_n, a_n}$ are jointly independent.

We now apply the tridiagonal matrix algorithm. Let ${b_{n-1} := |X_n|}$, then ${b_n}$ has the ${\chi}$-distribution indicated in the proposition. We then conjugate ${M_n}$ by a unitary matrix ${U}$ that preserves the final basis vector ${e_n}$, and maps ${X}$ to ${b_{n-1} e_{n-1}}$. Then we have

$\displaystyle U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}$

where ${\tilde M_{n-1}}$ is conjugate to ${M_{n-1}}$. Now we make the crucial observation: because ${M_{n-1}}$ is distributed according to GUE (which is a unitarily invariant ensemble), and ${U}$ is a unitary matrix independent of ${M_{n-1}}$, ${\tilde M_{n-1}}$ is also distributed according to GUE, and remains independent of both ${b_{n-1}}$ and ${a_n}$.

We continue this process, expanding ${U M_n U^*}$ as

$\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}$

Applying a further unitary conjugation that fixes ${e_{n-1}, e_n}$ but maps ${X_{n-1}}$ to ${b_{n-2} e_{n-2}}$, we may replace ${X_{n-1}}$ by ${b_{n-2} e_{n-2}}$ while transforming ${M_{n-2}}$ to another GUE matrix ${\tilde M_{n-2}}$ independent of ${a_n, b_{n-1}, a_{n-1}, b_{n-2}}$. Iterating this process, we eventually obtain a coupling of ${M_n}$ to ${M'_n}$ by unitary conjugations, and the claim follows. $\Box$

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant ${D_n}$ of the above matrix is given by solving the recursion

$\displaystyle D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}$

with ${D_0=1}$ and ${D_{-1} = 0}$. Thus, instead of the product of a sequence of independent scalar ${\chi}$ distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent ${2\times 2}$ matrices whose entries are given by gaussians and ${\chi}$ distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these ${2 \times 2}$ matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

Let ${n}$ be a large integer, and let ${M_n}$ be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution

$\displaystyle C_n e^{-\hbox{tr}(M_n^2)/2} dM_n$

where ${dM_n}$ is a Haar measure on Hermitian matrices and ${C_n}$ is the normalisation constant required to make the distribution of unit mass. The eigenvalues ${\lambda_1 < \ldots < \lambda_n}$ of this matrix are then a coupled family of ${n}$ real random variables. For any ${1 \leq k \leq n}$, we can define the ${k}$-point correlation function ${\rho_k( x_1,\ldots,x_k )}$ to be the unique symmetric measure on ${{\bf R}^k}$ such that

$\displaystyle \int_{{\bf R}^k} F(x_1,\ldots,x_k) \rho_k(x_1,\ldots,x_k) = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F( \lambda_{i_1}, \ldots, \lambda_{i_k} ).$

A standard computation (given for instance in these lecture notes of mine) gives the Ginebre formula

$\displaystyle \rho_n(x_1,\ldots,x_n) = C'_n (\prod_{1 \leq i < j \leq n} |x_i-x_j|^2) e^{-\sum_{j=1}^n |x_j|^2/2}.$

for the ${n}$-point correlation function, where ${C'_n}$ is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as

$\displaystyle \rho_n(x_1,\ldots,x_n) = C''_n \det(K_n(x_i,x_j))_{1 \leq i, j \leq n}$

where the kernel ${K_n}$ is given by

$\displaystyle K_n(x,y) := \sum_{k=0}^{n-1} \phi_k(x) \phi_k(y)$

where ${\phi_k(x) := P_k(x) e^{-x^2/4}}$ and ${P_0, P_1, \ldots}$ are the (${L^2}$-normalised) Hermite polynomials (thus the ${\phi_k}$ are an orthonormal family, with each ${P_k}$ being a polynomial of degree ${k}$). Integrating out one or more of the variables, one is led to the Gaudin-Mehta formula

$\displaystyle \rho_k(x_1,\ldots,x_k) = \det(K_n(x_i,x_j))_{1 \leq i, j \leq k}. \ \ \ \ \ (1)$

(In particular, the normalisation constant ${C''_n}$ in the previous formula turns out to simply be equal to ${1}$.) Again, see these lecture notes for details.

The functions ${\phi_k(x)}$ can be viewed as an orthonormal basis of eigenfunctions for the harmonic oscillator operator

$\displaystyle L \phi := (-\frac{d^2}{dx^2} + \frac{x^2}{4})\phi;$

indeed it is a classical fact that

$\displaystyle L \phi_k = (k + \frac{1}{2}) \phi_k.$

As such, the kernel ${K_n}$ can be viewed as the integral kernel of the spectral projection operator ${1_{(-\infty,n+\frac{1}{2}]}(L)}$.

From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of ${K_n}$ as ${n \rightarrow \infty}$. The two main asymptotics of interest are given by the following lemmas:

Lemma 1 (Asymptotics of ${K_n}$ in the bulk) Let ${x_0 \in (-2,2)}$, and let ${\rho_{sc}(x_0) := \frac{1}{2\pi} (4-x_0^2)^{1/2}_+}$ be the semicircular law density at ${x_0}$. Then, we have

$\displaystyle K_n( x_0 \sqrt{n} + \frac{y}{\sqrt{n} \rho_{sc}(x_0)}, x_0 \sqrt{n} + \frac{z}{\sqrt{n} \rho_{sc}(x_0)} )$

$\displaystyle \rightarrow \frac{\sin(\pi(y-z))}{\pi(y-z)} \ \ \ \ \ (2)$

as ${n \rightarrow \infty}$ for any fixed ${y,z \in {\bf R}}$ (removing the singularity at ${y=z}$ in the usual manner).

Lemma 2 (Asymptotics of ${K_n}$ at the edge) We have

$\displaystyle K_n( 2\sqrt{n} + \frac{y}{n^{1/6}}, 2\sqrt{n} + \frac{z}{n^{1/6}} )$

$\displaystyle \rightarrow \frac{Ai(y) Ai'(z) - Ai'(y) Ai(z)}{y-z} \ \ \ \ \ (3)$

as ${n \rightarrow \infty}$ for any fixed ${y,z \in {\bf R}}$, where ${Ai}$ is the Airy function

$\displaystyle Ai(x) := \frac{1}{\pi} \int_0^\infty \cos( \frac{t^3}{3} + tx )\ dt$

and again removing the singularity at ${y=z}$ in the usual manner.

The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).

<p>
Our study of random matrices, to date, has focused on somewhat general ensembles, such as iid random matrices or Wigner random matrices, in which the distribution of the individual entries of the matrices was essentially arbitrary (as long as certain moments, such as the mean and variance, were normalised). In these notes, we now focus on two much more special, and much more symmetric, ensembles:
<p>

<ul> <li> The <a href=”http://en.wikipedia.org/wiki/Gaussian_Unitary_Ensemble”>Gaussian Unitary Ensemble</a> (GUE), which is an ensemble of random ${n \times n}$ Hermitian matrices ${M_n}$ in which the upper-triangular entries are iid with distribution ${N(0,1)_{\bf C}}$, and the diagonal entries are iid with distribution ${N(0,1)_{\bf R}}$, and independent of the upper-triangular ones; and <li> The <em>Gaussian random matrix ensemble</em>, which is an ensemble of random ${n \times n}$ (non-Hermitian) matrices ${M_n}$ whose entries are iid with distribution ${N(0,1)_{\bf C}}$.
</ul>

<p>
The symmetric nature of these ensembles will allow us to compute the spectral distribution by exact algebraic means, revealing a surprising connection with orthogonal polynomials and with determinantal processes. This will, for instance, recover the semi-circular law for GUE, but will also reveal <em>fine</em> spacing information, such as the distribution of the gap between <em>adjacent</em> eigenvalues, which is largely out of reach of tools such as the Stieltjes transform method and the moment method (although the moment method, with some effort, is able to control the extreme edges of the spectrum).
<p>
Similarly, we will see for the first time the <em>circular law</em> for eigenvalues of non-Hermitian matrices.
<p>
There are a number of other highly symmetric ensembles which can also be treated by the same methods, most notably the Gaussian Orthogonal Ensemble (GOE) and the Gaussian Symplectic Ensemble (GSE). However, for simplicity we shall focus just on the above two ensembles. For a systematic treatment of these ensembles, see the <a href=”http://www.ams.org/mathscinet-getitem?mr=1677884″>text by Deift</a>.
<p>
<!–more–>
<p>

<p align=center><b> &mdash; 1. The spectrum of GUE &mdash; </b></p>

<p>
We have already shown using Dyson Brownian motion in <a href=”https://terrytao.wordpress.com/2010/01/18/254a-notes-3b-brownian-motion-and-dyson-brownian-motion/”>Notes 3b</a> that we have the <a href=”http://www.ams.org/mathscinet-getitem?mr=173726″>Ginibre formula</a> <a name=”rhod”><p align=center>$\displaystyle \rho_n(\lambda) = \frac{1}{(2\pi)^{n/2} 1! \ldots (n-1)!} e^{-|\lambda|^2/2} |\Delta_n(\lambda)|^2 \ \ \ \ \ (1)$</p>
</a> for the density function of the eigenvalues ${(\lambda_1,\ldots,\lambda_n) \in {\bf R}^n_\geq}$ of a GUE matrix ${M_n}$, where <p align=center>$\displaystyle \Delta_n(\lambda) = \prod_{1 \leq i < j \leq n} (\lambda_i - \lambda_j)$</p>
is the <a href=”http://en.wikipedia.org/wiki/Vandermonde_determinant”>Vandermonde determinant</a>. We now give an alternate proof of this result (omitting the exact value of the normalising constant ${\frac{1}{(2\pi)^{n/2} 1! \ldots (n-1)!}}$) that exploits unitary invariance and the change of variables formula (the latter of which we shall do from first principles). The one thing to be careful about is that one has to somehow quotient out by the invariances of the problem before being able to apply the change of variables formula.
<p>
One approach here would be to artificially “fix a gauge” and work on some slice of the parameter space which is “transverse” to all the symmetries. With such an approach, one can use the classical change of variables formula. While this can certainly be done, we shall adopt a more “gauge-invariant” approach and carry the various invariances with us throughout the computation. (For a comparison of the two approaches, see <a href=”https://terrytao.wordpress.com/2008/09/27/what-is-a-gauge/”>this previous blog post</a>.)
<p>
We turn to the details. Let ${V_n}$ be the space of Hermitian ${n \times n}$ matrices, then the distribution ${\mu_{M_n}}$ of a GUE matrix ${M_n}$ is a absolutely continuous probability measure on ${V_n}$, which can be written using the definition of GUE as <p align=center>$\displaystyle \mu_{M_n} = C_n (\prod_{1 \leq i < j \leq n} e^{-|\xi_{ij}|^2}) (\prod_{1 \leq i \leq n} e^{-|\xi_{ii}|^2/2})\ dM_n$</p>
where ${dM_n}$ is Lebesgue measure on ${V}$, ${\xi_{ij}}$ are the coordinates of ${M_n}$, and ${C_n}$ is a normalisation constant (the exact value of which depends on how one normalises Lebesgue measure on ${V}$). We can express this more compactly as <p align=center>$\displaystyle \mu_{M_n} = C_n e^{- \hbox{tr}(M_n^2) / 2}\ dM_n.$</p>
Expressed this way, it is clear that the GUE ensemble is invariant under conjugations ${M_n \mapsto U M_n U^{-1}}$ by any unitary matrix.
<p>
Let ${D}$ be the diagonal matrix whose entries ${\lambda_1 \geq \ldots \geq \lambda_n}$ are the eigenvalues of ${M_n}$ in descending order. Then we have ${M_n = U D U^{-1}}$ for some unitary matrix ${U \in U(n)}$. The matirx ${U}$ is not uniquely determined; if ${R}$ is diagonal unitary matrix, then ${R}$ commutes with ${D}$, and so one can freely replace ${U}$ with ${UR}$. On the other hand, if the eigenvalues of ${M}$ are simple, then the diagonal matrices are the <em>only</em> matrices that commute with ${D}$, and so this freedom to right-multiply ${U}$ by diagonal unitaries is the only failure of uniqueness here. And in any case, from the unitary invariance of GUE, we see that even after conditioning on ${D}$, we may assume without loss of generality that ${U}$ is drawn from the invariant Haar measure on ${U(n)}$. In particular, ${U}$ and ${D}$ can be taken to be independent.
<p>
Fix a diagonal matrix ${D_0 = \hbox{diag}(\lambda_1^0,\ldots,\lambda_n^0)}$ for some ${\lambda_1^0 > \ldots > \lambda_n^0}$, let ${\varepsilon > 0}$ be extremely small, and let us compute the probability <a name=”dprop”><p align=center>$\displaystyle \mathop{\bf P}( \| M_n - D_0 \|_F \leq \varepsilon ) \ \ \ \ \ (2)$</p>
</a> that ${M_n}$ lies within ${\varepsilon}$ of ${D_0}$ in the Frobenius norm. On the one hand, the probability density of ${M_n}$ is proportional to <p align=center>$\displaystyle e^{-\hbox{tr}(D_0^2)/2} = e^{-|\lambda^0|^2/2}$</p>
near ${D_0}$ (where we write ${\lambda^0 := (\lambda_1^0,\ldots,\lambda_n^0)}$) and the volume of a ball of radius ${\varepsilon}$ in the ${n^2}$-dimensional space ${V_n}$ is proportional to ${\varepsilon^{n^2}}$, so <a href=”#dprop”>(2)</a> is equal to <a name=”ciprian”><p align=center>$\displaystyle (C'_n + o(1)) \varepsilon^{n^2} e^{-\hbox{tr}(D_0^2)/2} \ \ \ \ \ (3)$</p>
</a> for some constant ${C'_n > 0}$ depending only on ${n}$, where ${o(1)}$ goes to zero as ${\varepsilon \rightarrow 0}$ (keeping ${n}$ and ${D_0}$ fixed). On the other hand, if ${\|M_n - D_0\|_F \leq \varepsilon}$, then by the Weyl inequality (or Hoffman-Weilandt inequality) we have ${D = D_0 + O(\varepsilon)}$ (we allow implied constants here to depend on ${n}$ and on ${D_0}$). This implies ${UDU^{-1} = D + O(\varepsilon)}$, thus ${UD - DU = O(\varepsilon)}$. As a consequence we see that the off-diagonal elements of ${U}$ are of size ${O(\varepsilon)}$. We can thus use the inverse function theorem in this local region of parameter space and make the ansatz <p align=center>$\displaystyle D = D_0 + \varepsilon E; \quad U = \exp( \varepsilon S ) R$</p>
where ${E}$ is a bounded diagonal matrix, ${R}$ is a diagonal unitary matrix, and ${S}$ is a bounded skew-adjoint matrix with zero diagonal. (Note here the emergence of the freedom to right-multiply ${U}$ by diagonal unitaries.) Note that the map ${(R,S) \mapsto \exp( \varepsilon S ) R}$ has a non-degenerate Jacobian, so the inverse function theorem applies to uniquely specify ${R, S}$ (and thus ${E}$) from ${U, D}$ in this local region of parameter space.
<p>
Conversely, if ${D, U}$ take the above form, then we can Taylor expand and conclude that <p align=center>$\displaystyle M_n = UDU^* = D_0 + \varepsilon E + \varepsilon (SD_0 - D_0S) + O(\varepsilon^2)$</p>
and so <p align=center>$\displaystyle \| M_n - D_0 \|_F = \varepsilon \| E + (SD_0 - D_0S)\|_F + O(\varepsilon^2).$</p>
We can thus bound <a href=”#dprop”>(2)</a> from above and below by expressions of the form <a name=”esd”><p align=center>$\displaystyle \mathop{\bf P}( \| E + (SD_0 - D_0S)\|_F \leq 1 + O(\varepsilon) ). \ \ \ \ \ (4)$</p>
</a> As ${U}$ is distributed using Haar measure on ${U(n)}$, ${S}$ is (locally) distributed using ${(1+o(1)) \varepsilon^{n^2-n}}$ times a constant multiple of Lebesgue measure on the space ${W}$ of skew-adjoint matrices with zero diagonal, which has dimension ${n^2-n}$. Meanwhile, ${E}$ is distributed using ${(\rho_n(\lambda^0)+o(1)) \varepsilon^{n}}$ times Lebesgue measure on the space of diagonal elements. Thus we can rewrite <a href=”#esd”>(4)</a> as <p align=center>$\displaystyle C''_n \varepsilon^{n^2} (\rho_n(\lambda^0)+o(1)) \int\int_{\| E + (SD_0 - D_0S)\|_F \leq 1 + O(\varepsilon)}\ dE dS$</p>
where ${dE}$ and ${dS}$ denote Lebesgue measure and ${C''_n > 0}$ depends only on ${n}$.
<p>
Observe that the map ${S \mapsto SD_0-D_0 S}$ dilates the (complex-valued) ${ij}$ entry of ${S}$ by ${\lambda^0_j-\lambda^0_i}$, and so the Jacobian of this map is ${\prod_{1 \leq i < j \leq n} |\lambda^0_j-\lambda^0_i|^2 = |\Delta_n(\lambda^0)|^2}$. Applying the change of variables, we can express the above as <p align=center>$\displaystyle C''_n \varepsilon^{n^2} \frac{\rho_n(\lambda^0)+o(1)}{|\Delta_n(\lambda^0)|^2} \int\int_{\| E + S\|_F \leq 1 + O(\varepsilon)}\ dE dS.$</p>
The integral here is of the form ${C'''_n + O(\varepsilon)}$ for some other constant ${C'''_n > 0}$. Comparing this formula with <a href=”#ciprian”>(3)</a> we see that <p align=center>$\displaystyle \rho_n(\lambda^0)+o(1) = C''''_n e^{-|\lambda^0|^2/2} \Delta_n(\lambda^0)|^2 + o(1)$</p>
for yet another constant ${C''''_n > 0}$. Sending ${\varepsilon \rightarrow 0}$ we recover an exact formula <p align=center>$\displaystyle \rho_n(\lambda)+o(1) = C''''_n e^{-|\lambda|^2/2} |\Delta_n(\lambda)|^2$</p>
when ${\lambda}$ is simple. Since almost all Hermitian matrices have simple spectrum (see Exercise 10 of <a href=”https://terrytao.wordpress.com/2010/01/12/254a-notes-3a-eigenvalues-and-sums-of-hermitian-matrices/”>Notes 3a</a>), this gives the full spectral distribution of GUE, except for the issue of the unspecified constant.
<p>

<blockquote><b>Remark 1</b> In principle, this method should also recover the explicit normalising constant ${\frac{1}{(2\pi)^{n/2}}}$ in <a href=”#rhod”>(1)</a>, but to do this it appears one needs to understand the volume of the fundamental domain of ${U(n)}$ with respect to the logarithm map, or equivalently to understand the volume of the unit ball of Hermitian matrices in the operator norm. I do not know of a simple way to compute this quantity (though it can be inferred from <a href=”#rhod”>(1)</a> and the above analysis). One can also recover the normalising constant through the machinery of determinantal processes, see below. </blockquote>

<p>

<p>

<blockquote><b>Remark 2</b> <a name=”daft”></a> The above computation can be generalised to other ${U(n)}$-conjugation-invariant ensembles ${M_n}$ whose probability distribution is of the form <p align=center>$\displaystyle \mu_{M_n} = C_n e^{- \hbox{tr} V(M_n)}\ dM_n$</p>
for some potential function ${V: {\bf R} \rightarrow {\bf R}}$ (where we use the spectral theorem to define ${V(M_n)}$), yielding a density function for the spectrum of the form <p align=center>$\displaystyle \rho_n(\lambda) = C'_n e^{- \sum_{j=1}^n V(\lambda_j)} |\Delta_n(\lambda)|^2.$</p>
Given suitable regularity conditions on ${V}$, one can then generalise many of the arguments in these notes to such ensembles. See <a href=”http://www.ams.org/mathscinet-getitem?mr=1677884″>this book by Deift</a> for details. </blockquote>

<p>

<p>

<p align=center><b> &mdash; 2. The spectrum of gaussian matrices &mdash; </b></p>

<p>
The above method also works for gaussian matrices ${G}$, as was first observed by Dyson (though the final formula was first obtained by Ginibre, using a different method). Here, the density function is given by <a name=”can”><p align=center>$\displaystyle C_n e^{-\hbox{tr}( GG^*)} dG = C_n e^{-\|G\|_F^2} dG \ \ \ \ \ (5)$</p>
</a> where ${C_n > 0}$ is a constant and ${dG}$ is Lebesgue measure on the space ${M_n({\bf C})}$ of all complex ${n \times n}$ matrices. This is invariant under both left and right multiplication by unitary matrices, so in particular is invariant under unitary conjugations as before.
<p>
This matrix ${G}$ has ${n}$ complex (generalised) eigenvalues ${\sigma(G) = \{\lambda_1,\ldots,\lambda_n\}}$, which are usually distinct:
<p>

<blockquote><b>Exercise 3</b> Let ${n \geq 2}$. Show that the space of matrices in ${M_n({\bf C})}$ with a repeated eigenvalue has codimension ${2}$. </blockquote>

<p>

<p>
Unlike the Hermitian situation, though, there is no natural way to order these ${n}$ complex eigenvalues. We will thus consider all ${n!}$ possible permutations at once, and define the spectral density function ${\rho_n(\lambda_1,\ldots,\lambda_n)}$ of ${G}$ by duality and the formula <p align=center>$\displaystyle \int_{{\bf C}^n} F(\lambda) \rho_n(\lambda)\ d\lambda := \mathop{\bf E} \sum_{\{\lambda_1,\ldots,\lambda_n\} = \sigma(G)} F(\lambda_1,\ldots,\lambda_n)$</p>
for all test functions ${F}$. By the Riesz representation theorem, this uniquely defines ${\rho_n}$ (as a distribution, at least), although the total mass of ${\rho_n}$ is ${n!}$ rather than ${1}$ due to the ambiguity in the spectrum.
<p>
Now we compute ${\rho_n}$ (up to constants). In the Hermitian case, the key was to use the factorisation ${M_n = UDU^{-1}}$. This particular factorisation is of course unavailable in the non-Hermitian case. However, if the non-Hermitian matrix ${G}$ has simple spectrum, it can always be factored instead as ${G = UTU^{-1}}$, where ${U}$ is unitary and ${T}$ is upper triangular. Indeed, if one applies the <a href=”http://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process”>Gram-Schmidt process</a> to the eigenvectors of ${G}$ and uses the resulting orthonormal basis to form ${U}$, one easily verifies the desired factorisation. Note that the eigenvalues of ${G}$ are the same as those of ${T}$, which in turn are just the diagonal entries of ${T}$.
<p>

<blockquote><b>Exercise 4</b> Show that this factorisation is also available when there are repeated eigenvalues. (Hint: use the <a href=”http://en.wikipedia.org/wiki/Jordan_normal_form”>Jordan normal form</a>.) </blockquote>

<p>

<p>
To use this factorisation, we first have to understand how unique it is, at least in the generic case when there are no repeated eigenvalues. As noted above, if ${G = UTU^{-1}}$, then the diagonal entries of ${T}$ form the same set as the eigenvalues of ${G}$.
<p>
Now suppose we fix the diagonal ${\lambda_1,\ldots,\lambda_n}$ of ${T}$, which amounts to picking an ordering of the ${n}$ eigenvalues of ${G}$. The eigenvalues of ${T}$ are ${\lambda_1,\ldots,\lambda_n}$, and furthermore for each ${1 \leq j \leq n}$, the eigenvector of ${T}$ associated to ${\lambda_j}$ lies in the span of the last ${n-j+1}$ basis vectors ${e_j,\ldots,e_n}$ of ${{\bf C}^n}$, with a non-zero ${e_j}$ coefficient (as can be seen by Gaussian elimination or Cramer’s rule). As ${G=UTU^{-1}}$ with ${U}$ unitary, we conclude that for each ${1 \leq j \leq n}$, the ${j^{th}}$ column of ${U}$ lies in the span of the eigenvectors associated to ${\lambda_j,\ldots,\lambda_n}$. As these columns are orthonormal, they must thus arise from applying the Gram-Schmidt process to these eigenvectors (as discussed earlier). This argument also shows that once the diagonal entries ${\lambda_1,\ldots,\lambda_n}$ of ${T}$ are fixed, each column of ${U}$ is determined up to rotation by a unit phase. In other words, the only remaining freedom is to replace ${U}$ by ${UR}$ for some unit diagonal matrix ${R}$, and then to replace ${T}$ by ${R^{-1}TR}$ to counterbalance this change of ${U}$.
<p>
To summarise, the factorisation ${G = UTU^{-1}}$ is unique up to specifying an enumeration ${\lambda_1,\ldots,\lambda_n}$ of the eigenvalues of ${G}$ and right-multiplying ${U}$ by diagonal unitary matrices, and then conjugating ${T}$ by the same matrix. Given a matrix ${G}$, we may apply these symmetries randomly, ending up with a random enumeration ${\lambda_1,\ldots,\lambda_n}$ of the eigenvalues of ${G}$ (whose distribution invariant with respect to permutations) together with a random factorisation ${UTU^{-1}}$ such that ${T}$ has diagonal entries ${\lambda_1,\ldots,\lambda_n}$ in that order, and the distribution of ${T}$ is invariant under conjugation by diagonal unitary matrices. Also, since ${G}$ is itself invariant under unitary conjugations, we may also assume that ${U}$ is distributed uniformly according to the Haar measure of ${U(n)}$, and independently of ${T}$.
<p>
To summarise, the gaussian matrix ensemble ${G}$, together with a randomly chosen enumeration ${\lambda_1,\ldots,\lambda_n}$ of the eigenvalues, can almost surely be factorised as ${UTU^{-1}}$, where ${T = ( t_{ij} )_{1 \leq i \leq j \leq n}}$ is an upper-triangular matrix with diagonal entries ${\lambda_1,\ldots,\lambda_n}$, distributed according to some distribution <p align=center>$\displaystyle \psi( (t_{ij})_{1 \leq i \leq j \leq n} )\ \prod_{1 \leq i \leq j \leq n} dt_{ij}$</p>
which is invariant with respect to conjugating ${T}$ by diagonal unitary matrices, and ${U}$ is uniformly distributed according to the Haar measure of ${U(n)}$, independently of ${T}$.
<p>
Now let ${T_0 = ( t_{ij}^0 )_{1 \leq i \leq j \leq n}}$ be an upper triangular matrix with complex entries whose entries ${t_{11}^0,\ldots,t_{nn}^0 \in {\bf C}}$ are distinct. As in the previous section, we consider the probability <a name=”gatf”><p align=center>$\displaystyle \mathop{\bf P}( \| G - T_0 \|_F \leq \varepsilon ) . \ \ \ \ \ (6)$</p>
</a> On the one hand, since the space ${M_n({\bf C})}$ of complex ${n \times n}$ matrices has ${2n^2}$ real dimensions, we see from <a href=”#can”>(9)</a> that this expression is equal to <a name=”gatf2″><p align=center>$\displaystyle (C'_n+o(1)) e^{-\|T_0\|_F^2} \varepsilon^{2n^2} \ \ \ \ \ (7)$</p>
</a> for some constant ${C'_n > 0}$.
<p>
Now we compute <a href=”#gatf”>(6)</a> using the factorisation ${G = UTU^{-1}}$. Suppose that ${\|G-T_0\|_F \leq \varepsilon}$, so ${G = T_0+O(\varepsilon)}$ As the eigenvalues of ${T_0}$ are ${t_{11}^0,\ldots,t_{nn}^0}$, which are assumed to be distinct, we see (from the inverse function theorem) that for ${\varepsilon}$ small enough, ${G}$ has eigenvalues ${t_{11}^0 +O(\varepsilon), \ldots, t_{nn}^0 + O(\varepsilon)}$. With probability ${1/n!}$, the diagonal entries of ${T}$ are thus ${t_{11}^0 +O(\varepsilon), \ldots, t_{nn}^0 + O(\varepsilon)}$ (in that order). We now restrict to this event (the ${1/n!}$ factor will eventually be absorbed into one of the unspecified constants).
<p>
Let ${u^0_1,\ldots,u^0_n}$ be eigenvector of ${T_0}$ associated to ${t_{11}^0,\ldots,t_{nn}^0}$, then the Gram-Schmidt process applied to ${u_1,\ldots,u_n}$ (starting at ${u^0_n}$ and working backwards to ${u^0_1}$) gives the standard basis ${e_1,\ldots,e_n}$ (in reverse order). By the inverse function theorem, we thus see that we have eigenvectors ${u_1 = u^0_1+O(\varepsilon),\ldots,u_n=u^0_n+O(\varepsilon)}$ of ${G}$, which when the Gram-Schmidt process is applied, gives a perturbation ${e_1+O(\varepsilon),\ldots,e_n+O(\varepsilon)}$ in reverse order. This gives a factorisation ${G=UTU^{-1}}$ in which ${U = I + O(\varepsilon)}$, and hence ${T = T_0+O(\varepsilon)}$. This is however not the most general factorisation available, even after fixing the diagonal entries of ${T}$, due to the freedom to right-multiply ${U}$ by diagonal unitary matrices ${R}$. We thus see that the correct ansatz here is to have <p align=center>$\displaystyle U = R + O(\varepsilon); \quad T = R^{-1} T_0 R + O(\varepsilon)$</p>
for some diagonal unitary matrix ${R}$.
<p>
In analogy with the GUE case, we can use the inverse function theorem and make the more precise ansatz <p align=center>$\displaystyle U = \exp(\varepsilon S) R; \quad T = R^{-1} (T_0 + \varepsilon E) R$</p>
where ${S}$ is skew-Hermitian with zero diagonal and size ${O(1)}$, ${R}$ is diagonal unitary, and ${E}$ is an upper triangular matrix of size ${O(1)}$. From the invariance ${U \mapsto UR; T \mapsto R^{-1} TR}$ we see that ${R}$ is distributed uniformly across all diagonal unitaries. Meanwhile, from the unitary conjugation invariance, ${S}$ is distributed according to a constant multiple of ${\varepsilon^{n^2-n}}$ times Lebesgue measure ${dS}$ on the ${n^2-n}$-dimensional space of skew Hermitian matrices with zero diagonal; and from the definition of ${\psi}$, ${E}$ is distributed according to a constant multiple of the measure <p align=center>$\displaystyle (1+o(1)) \varepsilon^{n^2+n} \psi(T_0)\ dE,$</p>
where ${dE}$ is Lebesgue measure on the ${n^2+n}$-dimensional space of upper-triangular matrices. Furthermore, the invariances ensure that the random variables ${S, R, E}$ are distributed independently. Finally, we have <p align=center>$\displaystyle G = UTU^{-1} = \exp(\varepsilon S) (T_0 + \varepsilon E) \exp(-\varepsilon S).$</p>
Thus we may rewrite <a href=”#gatf”>(6)</a> as <a name=”gatf-2″><p align=center>$\displaystyle (C''_n \psi(T_0)+o(1))\varepsilon^{2n^2} \int\int_{\|\exp(\varepsilon S) (T_0 + \varepsilon E) \exp(-\varepsilon S) - T_0\|_F \leq \varepsilon} dS dE \ \ \ \ \ (8)$</p>
</a> for some ${C''_n>0}$ (the ${R}$ integration being absorbable into this constant ${C''_n}$). We can Taylor expand <p align=center>$\displaystyle \exp(\varepsilon S) (T_0 + \varepsilon E) \exp(-\varepsilon S) = T_0 + \varepsilon (E + ST_0 - T_0S) + O(\varepsilon^2)$</p>
and so we can bound <a href=”#gatf-2″>(8)</a> above and below by expressions of the form <p align=center>$\displaystyle (C''_n \psi(T_0)+o(1))\varepsilon^{2n^2} \int\int_{\|E + ST_0 - T_0 S\|_F \leq 1+O(\varepsilon)} dS dE.$</p>
The Lebesgue measure ${dE}$ is invariant under translations by upper triangular matrices, so we may rewrite the above expression as <a name=”can”><p align=center>$\displaystyle (C''_n \psi(T_0)+o(1))\varepsilon^{2n^2} \int\int_{\|E + \pi(ST_0 - T_0 S)\|_F \leq 1+O(\varepsilon)} dS dE, \ \ \ \ \ (9)$</p>
</a> where ${\pi(ST_0 - T_0 S)}$ is the strictly lower triangular component of ${ST_0-T_0S}$.
<p>
The next step is to make the (linear) change of variables ${V := \pi(ST_0-T_0 S)}$. We check dimensions: ${S}$ ranges in the space ${S}$ of skew-adjoint Hermitian matrices with zero diagonal, which has dimension ${(n^2-n)/2}$, as does the space of strictly lower-triangular matrices, which is where ${V}$ ranges. So we can in principle make this change of variables, but we first have to compute the Jacobian of the transformation (and check that it is non-zero). For this, we switch to coordinates. Write ${S = (s_{ij})_{1 \leq i, j \leq n}}$ and ${V = (v_{ij})_{1 \leq j < i \leq n}}$. In coordinates, the equation ${V=\pi(ST_0-T_0S)}$ becomes <p align=center>$\displaystyle v_{ij} = \sum_{k=1}^j s_{ik} t_{kj}^0 - \sum_{k=i}^n t_{ik}^0 s_{kj}$</p>
or equivalently <p align=center>$\displaystyle v_{ij} = (t_{jj}^0-t_{ii}^0) s_{ij} + \sum_{k=1}^{j-1} t_{kj}^0 s_{ik}- \sum_{k=i+1}^n t_{ik}^0 s_{kj}.$</p>
Thus for instance <p align=center>$\displaystyle v_{n1} = (t^0_{11}-t^0_{nn}) s_{n1}$</p>
<p align=center>$\displaystyle v_{n2} = (t^0_{22}-t^0_{nn}) s_{n2} + t_{12}^0 s_{n1}$</p>
<p align=center>$\displaystyle v_{(n-1)1} = (t^0_{11} - t^0_{(n-1)(n-1)}) s_{(n-1)1} - t_{(n-1)n}^0 s_{n1}$</p>
<p align=center>$\displaystyle v_{n3} = (t^0_{33}-t^0_{nn}) s_{n3} + t_{13}^0 s_{n1} + t_{23}^0 s_{n2}$</p>
<p align=center>$\displaystyle v_{(n-1)2} = (t^0_{22}-t^0_{(n-1)(n-1)}) s_{(n-1)2} +t_{12}^0 s_{(n-1)1} - t_{(n-1)n}^0 s_{n2}$</p>
<p align=center>$\displaystyle v_{(n-2)1} = (t^0_{11}-t^0_{(n-2)(n-2)}) s_{(n-2)1} - t_{(n-2)(n-1)}^0 s_{(n-1)1} - t_{(n-2)n}^0 s_{n1}$</p>
etc. We then observe that the transformation matrix from ${s_{n1}, s_{n2}, s_{(n-1)1},\ldots}$ to ${v_{n1}, v_{n2}, v_{(n-1)1}, \ldots}$ is triangular, with diagonal entries given by ${t^0_{jj}-t^0_{ii}}$ for ${1 \leq j < i \leq n}$. The Jacobian of the (complex-linear) map ${S \mapsto V}$ is thus given by <p align=center>$\displaystyle |\prod_{1 \leq j < i \leq n} t^0_{jj}-t^0_{ii}|^2 = |\Delta( t^0_{11},\ldots,t^0_{nn})|^2$</p>
which is non-zero by the hypothesis that the ${t_0^{11},\ldots,t_0^{nn}}$ are distinct. We may thus rewrite <a href=”#can”>(9)</a> as <p align=center>$\displaystyle \frac{C''_n \psi(T_0)+o(1)}{|\Delta( t^0_{11},\ldots,t^0_{nn})|^2} \varepsilon^{2n^2}\int\int_{\|E + V\|_F \leq 1+O(\varepsilon)} dS dV$</p>
where ${dV}$ is Lebesgue measure on strictly lower-triangular matrices. The integral here is equal to ${C'''_n + O(\varepsilon)}$ for some constant ${C'''_n}$. Comparing this with <a href=”#gatf”>(6)</a>, cancelling the factor of ${\varepsilon^{2n^2}}$, and sending ${\varepsilon \rightarrow 0}$, we obtain the formula <p align=center>$\displaystyle \psi( (t^0_{ij})_{1 \leq i \leq j \leq n} ) = C''''_n |\Delta( t^0_{11},\ldots,t^0_{nn})|^2 e^{-\|T_0\|_F^2}$</p>
for some constant ${C''''_n > 0}$. We can expand <p align=center>$\displaystyle e^{-\|T_0\|_F^2} = \prod_{1 \leq i \leq j \leq n} e^{-|t_{ij}^0|^2}.$</p>
If we integrate out the off-diagonal variables ${t^0_{ij}}$ for ${1 \leq i < j \leq n}$, we see that the density function for the diagonal entries ${(\lambda_1,\ldots,\lambda_n)}$ of ${T}$ is proportional to <p align=center>$\displaystyle |\Delta(\lambda_1,\ldots,\lambda_n)|^2 e^{-\sum_{j=1}^n |\lambda_j|^2}.$</p>
Since these entries are a random permutation of the eigenvalues of ${G}$, we conclude the <em>Ginibre formula</em> <a name=”ginibre-gaussian”><p align=center>$\displaystyle \rho_n(\lambda_1,\ldots,\lambda_n) = c_n |\Delta(\lambda_1,\ldots,\lambda_n)|^2 e^{-\sum_{j=1}^n |\lambda_j|^2} \ \ \ \ \ (10)$</p>
</a> for the joint density of the eigenvalues of a gaussian random matrix, where ${c_n > 0}$ is a constant.
<p>

<blockquote><b>Remark 5</b> Given that <a href=”#rhod”>(1)</a> can be derived using Dyson Brownian motion, it is natural to ask whether <a href=”#ginibre-gaussian”>(10)</a> can be derived by a similar method. It seems that in order to do this, one needs to consider a Dyson-like process not just on the eigenvalues ${\lambda_1,\ldots,\lambda_n}$, but on the entire triangular matrix ${T}$ (or more precisely, on the moduli space formed by quotienting out the action of conjugation by unitary diagonal matrices). Unfortunately the computations seem to get somewhat complicated, and we do not present them here. </blockquote>

<p>

<p>

<p align=center><b> &mdash; 3. Mean field approximation &mdash; </b></p>

<p>
We can use the formula <a href=”#rhod”>(1)</a> for the joint distribution to heuristically derive the semicircular law, as follows.
<p>
It is intuitively plausible that the spectrum ${(\lambda_1,\ldots,\lambda_n)}$ should concentrate in regions in which ${\rho_n(\lambda_1,\ldots,\lambda_n)}$ is as large as possible. So it is now natural to ask how to optimise this function. Note that the expression in <a href=”#rhod”>(1)</a> is non-negative, and vanishes whenever two of the ${\lambda_i}$ collide, or when one or more of the ${\lambda_i}$ go off to infinity, so a maximum should exist away from these degenerate situations.
<p>
We may take logarithms and write <a name=”rhond”><p align=center>$\displaystyle - \log \rho_n(\lambda_1,\ldots,\lambda_n) = \sum_{j=1}^n \frac{1}{2} |\lambda_j|^2 + \sum\sum_{i \neq j} \log \frac{1}{|\lambda_i-\lambda_j|} + C \ \ \ \ \ (11)$</p>
</a> where ${C = C_n}$ is a constant whose exact value is not of importance to us. From a mathematical physics perspective, one can interpret <a href=”#rhond”>(11)</a> as a Hamiltonian for ${n}$ particles at positions ${\lambda_1,\ldots,\lambda_n}$, subject to a confining harmonic potential (these are the ${\frac{1}{2} |\lambda_j|^2}$ terms) and a repulsive logarithmic potential between particles (these are the ${\log \frac{1}{|\lambda_i-\lambda_j|}}$ terms).
<p>
Our objective is now to find a distribution of ${\lambda_1,\ldots,\lambda_n}$ that minimises this expression.
<p>
We know from previous notes that the ${\lambda_i}$ should have magnitude ${O(\sqrt{n})}$. Let us then heuristically make a <a href=”http://en.wikipedia.org/wiki/Mean_field_theory”>mean field approximation</a>, in that we approximate the discrete spectral measure ${\frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j/\sqrt{n}}}$ by a continuous probability measure ${\rho(x)\ dx}$. (Secretly, we know from the semi-circular law that we should be able to take ${\rho = \frac{1}{2\pi} (4-x^2)_+^{1/2}}$, but pretend that we do not know this fact yet.) Then we can heuristically approximate <a href=”#rhond”>(11)</a> as <p align=center>$\displaystyle n^2 ( \int_{\bf R} \frac{1}{2} x^2 \rho(x)\ dx + \int_{\bf R} \int_{\bf R} \log \frac{1}{|x-y|} \rho(x) \rho(y)\ dx dy ) + C'_n$</p>
and so we expect the distribution ${\rho}$ to minimise the functional <a name=”rhos”><p align=center>$\displaystyle \int_{\bf R} \frac{1}{2} x^2 \rho(x)\ dx + \int_{\bf R} \int_{\bf R} \log \frac{1}{|x-y|} \rho(x) \rho(y)\ dx dy. \ \ \ \ \ (12)$</p>
</a>
<p>
One can compute the Euler-Lagrange equations of this functional:
<p>

<blockquote><b>Exercise 6</b> Working formally, and assuming that ${\rho}$ is a probability measure that minimises <a href=”#rhos”>(12)</a>, argue that <p align=center>$\displaystyle \frac{1}{2} x^2 + 2 \int_{\bf R} \log \frac{1}{|x-y|} \rho(y)\ dy = C$</p>
for some constant ${C}$ and all ${x}$ in the support of ${\rho}$. For all ${x}$ outside of the support, establish the inequality <p align=center>$\displaystyle \frac{1}{2} x^2 + 2 \int_{\bf R} \log \frac{1}{|x-y|} \rho(y)\ dy \geq C.$</p>
</blockquote>

<p>

<p>
There are various ways we can solve this equation for ${\rho}$; we sketch here a complex-analytic method. Differentiating in ${x}$, we formally obtain <p align=center>$\displaystyle x - 2 p.v. \int_{\bf R} \frac{1}{x-y} \rho(y)\ dy = 0$</p>
on the support of ${\rho}$. But recall that if we let <p align=center>$\displaystyle s(z) := \int_{\bf R} \frac{1}{y-z} \rho(y)\ dy$</p>
be the Stieltjes transform of the probability measure ${\rho(x)\ dx}$, then we have <p align=center>$\displaystyle \hbox{Im}(s(x+i0^+)) = \pi \rho(x)$</p>
and <p align=center>$\displaystyle \hbox{Re}(s(x+i0^+)) = - p.v. \int_{\bf R} \frac{1}{x-y} \rho(y)\ dy.$</p>
We conclude that <p align=center>$\displaystyle (x + 2 \hbox{Re}(s(x+i0^+)) \hbox{Im}(s(x+i0^+))) = 0$</p>
for all ${x}$, which we rearrange as <p align=center>$\displaystyle \hbox{Im}( s^2(x+i0^+) + x s(x+i0^+) ) = 0.$</p>
This makes the function ${f(z) = s^2(z) + z s(z)}$ entire (it is analytic in the upper half-plane, obeys the symmetry ${f(\overline{z}) = \overline{f(z)}}$, and has no jump across the real line). On the other hand, as ${s(z) = \frac{-1+o(1)}{z}}$ as ${z \rightarrow \infty}$, ${f}$ goes to ${-1}$ at infinity. Applying <a href=”http://en.wikipedia.org/wiki/Liouville%27s_theorem_(complex_analysis)”>Liouville’s theorem</a>, we conclude that ${f}$ is constant, thus we have the familiar equation <p align=center>$\displaystyle s^2 + zs = -1$</p>
which can then be solved to obtain the semi-circular law as in <a href=”https://terrytao.wordpress.com/2010/02/02/254a-notes-4-the-semi-circular-law/”>previous notes</a>.
<p>

<blockquote><b>Remark 7</b> Recall from <a href=”https://terrytao.wordpress.com/2010/01/18/254a-notes-3b-brownian-motion-and-dyson-brownian-motion/”>Notes 3b</a> that Dyson Brownian motion can be used to derive the formula <a href=”#rhod”>(1)</a>. One can then interpret the Dyson Brownian motion proof of the semi-circular law for GUE in <a href=”https://terrytao.wordpress.com/2010/02/02/254a-notes-4-the-semi-circular-law/”>Notes 4</a> as a rigorous formalisation of the above mean field approximation heuristic argument. </blockquote>

<p>

<p>
One can perform a similar heuristic analysis for the spectral measure ${\mu_G}$ of a random gaussian matrix, giving a description of the limiting density:
<p>

<blockquote><b>Exercise 8</b> Using heuristic arguments similar to those above, argue that ${\mu_G}$ should be close to a continuous probability distribution ${\rho(z)\ dz}$ obeying the equation <p align=center>$\displaystyle |z|^2 + 2\int_{\bf C} \log \frac{1}{|z-w|} \rho(w)\ dw = C$</p>
on the support of ${\rho}$, for some constant ${C}$, with the inequality <a name=”zaw”><p align=center>$\displaystyle |z|^2 + 2\int_{\bf C} \log \frac{1}{|z-w|} \rho(w)\ dw \geq C \ \ \ \ \ (13)$</p>
</a> outside of this support. Using the <a href=”http://en.wikipedia.org/wiki/Newtonian_potential”>Newton potential</a> ${\frac{1}{2\pi} \log |z|}$ for the fundamental solution of the two-dimensional Laplacian ${-\partial_x^2 - \partial_y^2}$, conclude (non-rigorously) that ${\rho}$ is equal to ${\frac{1}{\pi}}$ on its support.
<p>
Also argue that ${\rho}$ should be rotationally symmetric. Use <a href=”#zaw”>(13)</a> and Green’s formula to argue why the support of ${\rho}$ should be simply connected, and then conclude (again non-rigorously) the <em>circular law</em> <a name=”circular”><p align=center>$\displaystyle \mu_G \approx \frac{1}{\pi} 1_{|z| \leq 1}\ dz. \ \ \ \ \ (14)$</p>
</a> </blockquote>

<p>

<p>
We will see more rigorous derivations of the circular law later in these notes, and also in subsequent notes.
<p>

<p align=center><b> &mdash; 4. Determinantal form of the GUE spectral distribution &mdash; </b></p>

<p>
In a previous section, we showed (up to constants) that the density function ${\rho_n(\lambda_1,\ldots,\lambda_n)}$ for the eigenvalues ${\lambda_1 \geq \ldots \geq \lambda_n}$ of GUE was given by the formula <a href=”#rhod”>(1)</a>.
<p>
As is well known, the Vandermonde determinant ${\Delta(\lambda_1,\ldots,\lambda_n)}$ that appears in <a href=”#rhod”>(1)</a> can be expressed up to sign as a determinant of an ${n \times n}$ matrix, namely the matrix ${(\lambda_i^{j-1})_{1 \leq i,j \leq n}}$. Indeed, this determinant is clearly a polynomial of degree ${n(n-1)/2}$ in ${\lambda_1,\ldots,\lambda_n}$ which vanishes whenever two of the ${\lambda_i}$ agree, and the claim then follows from the factor theorem (and inspecting a single coefficient of the Vandermonde determinant, e.g. the ${\prod_{j=1}^n \lambda_j^{j-1}}$ coefficient, to get the sign).
<p>
We can square the above fact (or more precisely, multiply the above matrix matrix by its adjoint) and conclude that ${|\Delta(\lambda_1,\ldots,\lambda_n)|^2}$ is the determinant of the matrix <p align=center>$\displaystyle ( \sum_{k=0}^{n-1} \lambda_i^k \lambda_j^k )_{1 \leq i,j \leq n}.$</p>
More generally, if ${P_0(x),\ldots,P_{n-1}(x)}$ are any sequence of polynomials, in which ${P_i(x)}$ has degree ${i}$, then we see from row operations that the determinant of <p align=center>$\displaystyle (P_{j-1}(\lambda_i))_{1 \leq i,j \leq n}$</p>
is a non-zero constant multiple of ${\Delta(\lambda_1,\ldots,\lambda_n)}$ (with the constant depending on the leading coefficients of the ${P_i}$), and so the determinant of <p align=center>$\displaystyle ( \sum_{k=0}^{n-1} P_k(\lambda_i) P_k(\lambda_j) )_{1 \leq i,j \leq n}$</p>
is a non-zero constant multiple of ${|\Delta(\lambda_1,\ldots,\lambda_n)|^2}$. Comparing this with <a href=”#rhod”>(1)</a>, we obtain the formula <p align=center>$\displaystyle \rho_n(\lambda) = C \det( \sum_{k=0}^{n-1} P_k(\lambda_i) e^{-\lambda_i^2/4} P_k(\lambda_j) e^{-\lambda_j^2/4})_{1 \leq i,j \leq n}$</p>
for some non-zero constant ${C}$.
<p>
This formula is valid for any choice of polynomials ${P_i}$ of degree ${i}$. But the formula is particularly useful when we set ${P_i}$ equal to the (normalised) <a href=”http://en.wikipedia.org/wiki/Hermite_polynomials”>Hermite polynomials</a>, defined by applying the Gram-Schmidt process in ${L^2({\bf R})}$ to the polynomials ${x^i e^{-x^2/4}}$ for ${i=0,\ldots,n-1}$ to yield ${P_i(x) e^{-x^2/4}}$. (Equivalently, the ${P_i}$ are the <a href=”http://en.wikipedia.org/wiki/Orthogonal_polynomials”>orthogonal polynomials</a> associated to the measure ${e^{-x^2/2}\ dx}$.) In that case, the expression <a name=”knxy”><p align=center>$\displaystyle K_n(x,y) := \sum_{k=0}^{n-1} P_k(x) e^{-x^2/4} P_k(y) e^{-y^2/4} \ \ \ \ \ (15)$</p>
</a> becomes the integral kernel of the orthogonal projection ${\pi_{V_n}}$ operator in ${L^2({\bf R})}$ to the span of the ${x^i e^{-x^2/4}}$, thus <p align=center>$\displaystyle \pi_{V_n} f(x) = \int_{\bf R} K_n(x,y) f(y)\ dy$</p>
for all ${f \in L^2({\bf R})}$, and so ${\rho_n(\lambda)}$ is now a constant multiple of <p align=center>$\displaystyle \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq n}.$</p>

<p>
The reason for working with orthogonal polynomials is that we have the trace identity <a name=”knxx”><p align=center>$\displaystyle \int_{\bf R} K_n(x,x)\ dx = \hbox{tr}(\pi_{V_n}) = n \ \ \ \ \ (16)$</p>
</a> and the reproducing formula <a name=”knxy2″><p align=center>$\displaystyle K_n(x,y) = \int_{\bf R} K_n(x,z) K_n(z,y)\ dz \ \ \ \ \ (17)$</p>
</a> which reflects the identity ${\pi_{V_n} = \pi_{V_n}^2}$. These two formulae have an important consequence:
<p>

<blockquote><b>Lemma 9 (Determinantal integration formula)</b> Let ${K_n: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ be any symmetric rapidly decreasing function obeying <a href=”#knxx”>(16)</a>, <a href=”#knxy2″>(17)</a>. Then for any ${k \geq 0}$, one has <a name=”city”><p align=center>$\displaystyle \int_{\bf R} \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k+1}\ d\lambda_{k+1} = (n-k) \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k}. \ \ \ \ \ (18)$</p>
</a> </blockquote>

<p>

<p>

<blockquote><b>Remark 10</b> This remarkable identity is part of the beautiful algebraic theory of <em>determinantal processes</em>, which I discuss further in <a href=”https://terrytao.wordpress.com/2009/08/23/determinantal-processes/”>this blog post</a>. </blockquote>

<p>

<p>
<em>Proof:</em> We induct on ${k}$. When ${k=0}$ this is just <a href=”#knxx”>(16)</a>. Now assume that ${k \geq 1}$ and that the claim has already been proven for ${k-1}$. We apply <a href=”http://en.wikipedia.org/wiki/Cofactor_(linear_algebra)”>cofactor expansion</a> to the bottom row of the determinant ${\det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k+1}}$. This gives a principal term <a name=”princ”><p align=center>$\displaystyle \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k} K_n(\lambda_{k+1},\lambda_{k+1}) \ \ \ \ \ (19)$</p>
</a> plus a sum of ${k}$ additional terms, the ${l^{th}}$ term of which is of the form <a name=”nonprinc”><p align=center>$\displaystyle (-1)^{k+1-l} K_n(\lambda_l,\lambda_{k+1}) \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i \leq k; 1 \leq j \leq k+1; j \neq l}. \ \ \ \ \ (20)$</p>
</a> Using <a href=”#knxx”>(16)</a>, the principal term <a href=”#princ”>(19)</a> gives a contribution of ${n \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k}}$ to <a href=”#city”>(18)</a>. For each nonprincipal term <a href=”#nonprinc”>(20)</a>, we use the multilinearity of the determinant to absorb the ${K_n(\lambda_l,\lambda_{k+1})}$ term into the ${j=k+1}$ column of the matrix. Using <a href=”#knxy2″>(17)</a>, we thus see that the contribution of <a href=”#nonprinc”>(20)</a> to <a href=”#city”>(18)</a> can be simplified as <p align=center>$\displaystyle (-1)^{k+1-l} \det( ( K_n(\lambda_i, \lambda_j) )_{1 \leq i \leq k; 1 \leq j \leq k; j \neq l}, (K_n(\lambda_i,\lambda_l))_{1 \leq i\leq k} )$</p>
which after row exchange, simplifies to ${-\det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k}}$. The claim follows. $\Box$

<p>
In particular, if we iterate the above lemma using the Fubini-Tonelli theorem, we see that <p align=center>$\displaystyle \int_{{\bf R}^n} \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq n}\ d\lambda_1 \ldots d\lambda_n = n!.$</p>
On the other hand, if we extend the probability density function ${\rho_n(\lambda_1,\ldots,\lambda_n)}$ symmetrically from the Weyl chamber ${{\bf R}^n_{\geq}}$ to all of ${{\bf R}^n}$, its integral is also ${n!}$. Since ${\det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq n}}$ is clearly symmetric in the ${\lambda_1,\ldots,\lambda_n}$, we can thus compare constants and conclude the <a href=”http://www.ams.org/mathscinet-getitem?mr=112895″>Gaudin-Mehta formula</a> <p align=center>$\displaystyle \rho_n(\lambda_1,\ldots,\lambda_n) = \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq n}.$</p>
More generally, if we define ${\rho_k: {\bf R}^k \rightarrow {\bf R}^+}$ to be the function <a name=”rhok-1″><p align=center>$\displaystyle \rho_k(\lambda_1,\ldots,\lambda_k) = \det( K_n(\lambda_i, \lambda_j) )_{1 \leq i,j \leq k}, \ \ \ \ \ (21)$</p>
</a> then the above formula shows that ${\rho_k}$ is the <em>${k}$-point correlation function</em> for the spectrum, in the sense that <a name=”rhok-2″><p align=center>$\displaystyle \int_{{\bf R}^k} \rho_k(\lambda_1,\ldots,\lambda_k) F(\lambda_1,\ldots,\lambda_k)\ d\lambda_1 \ldots d\lambda_k \ \ \ \ \ (22)$</p>
</a> <p align=center>$\displaystyle = \mathop{\bf E} \sum_{1 \leq i_1, \ldots, i_k \leq n, \hbox{ distinct}} F( \lambda_{i_1}(M_n),\ldots,\lambda_{i_k}(M_n) )$</p>
for any test function ${F: {\bf R}^k \rightarrow {\bf C}}$ supported in the region ${\{ (x_1,\ldots,x_k): x_1 \leq \ldots \leq x_k \}}$.
<p>
In particular, if we set ${k=1}$, we obtain the explicit formula <p align=center>$\displaystyle \mathop{\bf E} \mu_{M_n} = \frac{1}{n} K_n(x,x)\ dx$</p>
for the expected empirical spectral measure of ${M_n}$. Equivalently after renormalising by ${\sqrt{n}}$, we have <a name=”mun”><p align=center>$\displaystyle \mathop{\bf E} \mu_{M_n/\sqrt{n}} = \frac{1}{n^{1/2}} K_n(\sqrt{n}x,\sqrt{n}x)\ dx. \ \ \ \ \ (23)$</p>
</a>
<p>
It is thus of interest to understand the kernel ${K_n}$ better.
<p>
To do this, we begin by recalling that the functions ${P_i(x) e^{-x^2/4}}$ were obtained from ${x^i e^{-x^2/4}}$ by the Gram-Schmidt process. In particular, each ${P_i(x) e^{-x^2/4}}$ is orthogonal to the ${x^j e^{-x^2/4}}$ for all ${0 \leq j. This implies that ${x P_i(x) e^{-x^2/4}}$ is orthogonal to ${x^j e^{-x^2/4}}$ for ${0 \leq j < i-1}$. On the other hand, ${xP_i(x)}$ is a polynomial of degree ${i+1}$, so ${x P_i(x) e^{-x^2/4}}$ must lie in the span of ${x^j e^{-x^2/4}}$ for ${0 \leq j \leq i+1}$. Combining the two facts, we see that ${x P_i}$ must be a linear combination of ${P_{i-1}, P_i, P_{i+1}}$, with the ${P_{i+1}}$ coefficient being non-trivial. We rewrite this fact in the form <a name=”pii”><p align=center>$\displaystyle P_{i+1}(x) = (a_i x + b_i) P_i(x) - c_i P_{i-1}(x) \ \ \ \ \ (24)$</p>
</a> for some real numbers ${a_i,b_i,c_i}$ (with ${c_0=0}$). Taking inner products with ${P_{i+1}}$ and ${P_{i-1}}$ we see that <a name=”low”><p align=center>$\displaystyle \int_{\bf R} x P_i(x) P_{i+1}(x) e^{-x^2/2}\ dx = \frac{1}{a_i} \ \ \ \ \ (25)$</p>
</a> and <p align=center>$\displaystyle \int_{\bf R} x P_i(x) P_{i-1}(x) e^{-x^2/2}\ dx = \frac{c_i}{a_i}$</p>
and so <a name=”aii”><p align=center>$\displaystyle c_i := \frac{a_i}{a_{i-1}} \ \ \ \ \ (26)$</p>
</a> (with the convention ${a_{-1}=\infty}$).
<p>
We will continue the computation of ${a_i,b_i,c_i}$ later. For now, we we pick two distinct real numbers ${x,y}$ and consider the <a href=”http://en.wikipedia.org/wiki/Wronskian”>Wronskian</a>-type expression <p align=center>$\displaystyle P_{i+1}(x) P_i(y) - P_i(x) P_{i+1}(y).$</p>
Using <a href=”#pii”>(24)</a>, <a href=”#aii”>(26)</a>, we can write this as <p align=center>$\displaystyle a_i(x-y) P_i(x) P_i(y) + \frac{a_i}{a_{i-1}} (P_{i-1}(x) P_i(y) - P_i(x) P_{i-1}(y))$</p>
or in other words <p align=center>$\displaystyle P_i(x) P_i(y) = \frac{P_{i+1}(x) P_i(y) - P_i(x) P_{i+1}(y)}{a_i(x-y)}$</p>
<p align=center>$\displaystyle - \frac{P_i(x) P_{i-1}(y) - P_{i-1}(x) P_i(y)}{a_{i-1}(x-y)}.$</p>
We telescope this and obtain the <em>Christoffel-Darboux formula</em> for the kernel <a href=”#knxy”>(15)</a>: <a name=”darbo”><p align=center>$\displaystyle K_n(x,y) = \frac{P_{n}(x)P_{n-1}(y) - P_{n-1}(x) P_{n}(y)}{a_{n-1}(x-y)} e^{-(x^2+y^2)/4}. \ \ \ \ \ (27)$</p>
</a> Sending ${y \rightarrow x}$ using <a href=”http://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule”>L’Hopital’s rule</a>, we obtain in particular that <a name=”darbo-2″><p align=center>$\displaystyle K_n(x,x) = \frac{1}{a_{n-1}}( P'_{n}(x)P_{n-1}(x) - P'_{n-1}(x) P_{n}(x) ) e^{-x^2/2}. \ \ \ \ \ (28)$</p>
</a>
<p>
Inserting this into <a href=”#mun”>(23)</a>, we see that if we want to understand the expected spectral measure of GUE, we should understand the asymptotic behaviour of ${P_n}$ and the associated constants ${a_n}$. For this, we need to exploit the specific properties of the gaussian weight ${e^{-x^2/2}}$. In particular, we have the identity <a name=”gauss”><p align=center>$\displaystyle x e^{-x^2/2} = - \frac{d}{dx} e^{-x^2/2} \ \ \ \ \ (29)$</p>
</a> so upon integrating <a href=”#low”>(25)</a> by parts, we have <p align=center>$\displaystyle \int_{\bf R} (P'_i(x) P_{i+1}(x) + P_i(x) P'_{i+1}(x)) e^{-x^2/2}\ dx = \frac{1}{a_i}.$</p>
As ${P'_i}$ has degree at most ${i-1}$, the first term vanishes by the orthonormal nature of the ${P_i(x) e^{-x^2/4}}$, thus <a name=”ip”><p align=center>$\displaystyle \int_{\bf R} P_i(x) P'_{i+1}(x) e^{-x^2/2}\ dx = \frac{1}{a_i}. \ \ \ \ \ (30)$</p>
</a> To compute this, let us denote the leading coefficient of ${P_i}$ as ${k_i}$. Then ${P'_{i+1}}$ is equal to ${\frac{(i+1)k_{i+1}}{k_i} P_i}$ plus lower-order terms, and so we have <p align=center>$\displaystyle \frac{(i+1)k_{i+1}}{k_i} = \frac{1}{a_i}.$</p>
On the other hand, by inspecting the ${x^{i+1}}$ coefficient of <a href=”#pii”>(24)</a> we have <p align=center>$\displaystyle k_{i+1} = a_i k_i.$</p>
Combining the two formulae (and making the sign convention that the ${k_i}$ are always positive), we see that <p align=center>$\displaystyle a_i = \frac{1}{\sqrt{i+1}}$</p>
and <p align=center>$\displaystyle k_{i+1} = \frac{k_i}{\sqrt{i+1}}.$</p>
Meanwhile, a direct computation shows that ${P_0(x) = k_0 = \frac{1}{(2\pi)^{1/4}}}$, and thus by induction <p align=center>$\displaystyle k_i := \frac{1}{(2\pi)^{1/4} \sqrt{i!}}.$</p>
A similar method lets us compute the ${b_i}$. Indeed, taking inner products of <a href=”#pii”>(24)</a> with ${P_i(x) e^{-x^2/2}}$ and using orthonormality we have <p align=center>$\displaystyle b_i = - a_i \int_{\bf R} x P_i(x)^2 e^{-x^2/2}\ dx$</p>
which upon integrating by parts using <a href=”#gauss”>(29)</a> gives <p align=center>$\displaystyle b_i = - 2 a_i \int_{\bf R} P_i(x) P'_i(x) e^{-x^2/2}\ dx.$</p>
As ${P'_i}$ is of degree strictly less than ${i}$, the integral vanishes by orthonormality, thus ${b_i=0}$. The identity <a href=”#pii”>(24)</a> thus becomes <em>Hermite recurrence relation</em> <a name=”i2″><p align=center>$\displaystyle P_{i+1}(x) = \frac{1}{\sqrt{i+1}} x P_i(x) - \frac{\sqrt{i}}{\sqrt{i+1}} P_{i-1}(x). \ \ \ \ \ (31)$</p>
</a> Another recurrence relation arises by considering the integral <p align=center>$\displaystyle \int_{\bf R} P_j(x) P'_{i+1}(x) e^{-x^2/2}\ dx.$</p>
On the one hand, as ${P'_{i+1}}$ has degree at most ${i}$, this integral vanishes if ${j>i}$ by orthonormality. On the other hand, integrating by parts using <a href=”#gauss”>(29)</a>, we can write the integral as <p align=center>$\displaystyle \int_{\bf R} (xP_j - P'_j)(x) P_{i+1}(x) e^{-x^2/2}\ dx.$</p>
If ${j < i}$, then ${xP_j-P'_j}$ has degree less than ${i+1}$, so the integral again vanishes. Thus the integral is non-vanishing only when ${j=i}$. Using <a href=”#ip”>(30)</a>, we conclude that <a name=”i1″><p align=center>$\displaystyle P'_{i+1} = \frac{1}{a_i} P_i = \sqrt{i+1} P_i. \ \ \ \ \ (32)$</p>
</a> We can combine <a href=”#i1″>(32)</a> with <a href=”#i2″>(31)</a> to obtain the formula <p align=center>$\displaystyle \frac{d}{dx}( e^{-x^2/2} P_i(x) ) = - \sqrt{i+1} e^{-x^2/2} P_{i+1}(x),$</p>
which together with the initial condition ${P_0 = \frac{1}{(2\pi)^{1/4}}}$ gives the explicit representation <a name=”pn-form”><p align=center>$\displaystyle P_n(x) := \frac{(-1)^n}{(2\pi)^{1/4} \sqrt{n!}} e^{x^2/2} \frac{d^n}{dx^n} e^{-x^2/2} \ \ \ \ \ (33)$</p>
</a> for the Hermite polynomials. Thus, for instance, at ${x=0}$ one sees from Taylor expansion that <a name=”piano”><p align=center>$\displaystyle P_n(0) = \frac{(-1)^{n/2}\sqrt{n!}}{(2\pi)^{1/4} 2^{n/2} (n/2)!} ;\quad P'_n(0) = 0 \ \ \ \ \ (34)$</p>
</a> when ${n}$ is even, and <a name=”piano-2″><p align=center>$\displaystyle P_n(0) = 0;\quad P'_n(0) = \frac{(-1)^{(n+1)/2} (n+1) \sqrt{n!}}{(2\pi)^{1/4} 2^{(n+1)/2} ((n+1)/2)!} \ \ \ \ \ (35)$</p>
</a> when ${n}$ is odd.
<p>
In principle, the formula <a href=”#pn-form”>(33)</a>, together with <a href=”#darbo-2″>(28)</a>, gives us an explicit description of the kernel ${K_n(x,x)}$ (and thus of ${\mathop{\bf E} \mu_{M_n/\sqrt{n}}}$, by <a href=”#mun”>(23)</a>). However, to understand the asymptotic behaviour as ${n \rightarrow \infty}$, we would have to understand the asymptotic behaviour of ${\frac{d^n}{dx^n} e^{-x^2/2}}$ as ${n \rightarrow \infty}$, which is not immediately discernable by inspection. However, one can obtain such asymptotics by a variety of means. We give two such methods here: a method based on ODE analysis, and a complex-analytic method, based on the <a href=”http://en.wikipedia.org/wiki/Method_of_steepest_descent”>method of steepest descent</a>.
<p>
We begin with the ODE method. Combining <a href=”#i2″>(31)</a> with <a href=”#i1″>(32)</a> we see that each polynomial ${P_m}$ obeys the <em>Hermite differential equation</em> <p align=center>$\displaystyle P''_m(x) - x P'_m(x) + m P_m(x) = 0.$</p>
If we look instead at the Hermite functions ${\phi_m(x) := P_m(x) e^{-x^2/4}}$, we obtain the differential equation <p align=center>$\displaystyle L \phi_m(x) = (m+\frac{1}{2}) \phi_m$</p>
where ${L}$ is the <em>harmonic oscillator operator</em> <p align=center>$\displaystyle L \phi := - \phi'' + \frac{x^2}{4} \phi.$</p>
Note that the self-adjointness of ${L}$ here is consistent with the orthogonal nature of the ${\phi_m}$.
<p>

<blockquote><b>Exercise 11</b> Use <a href=”#knxy”>(15)</a>, <a href=”#darbo-2″>(28)</a>, <a href=”#pn-form”>(33)</a>, <a href=”#i2″>(31)</a>, <a href=”#i1″>(32)</a> to establish the identities <p align=center>$\displaystyle K_n(x,x) = \sum_{j=0}^{n-1} \phi_j(x)^2$</p>
<p align=center>$\displaystyle = \phi'_n(x)^2 + (n - \frac{x^2}{4}) \phi_n(x)^2$</p>
and thus by <a href=”#mun”>(23)</a> <p align=center>$\displaystyle \mathop{\bf E} \mu_{M_n/\sqrt{n}} = \frac{1}{\sqrt{n}} \sum_{j=0}^{n-1} \phi_j(\sqrt{n} x)^2\ dx =$</p>
<p align=center>$\displaystyle = [\frac{1}{\sqrt{n}} \phi'_n(\sqrt{n} x)^2 + \sqrt{n} (1 - \frac{x^2}{4}) \phi_n(\sqrt{n} x)^2]\ dx.$</p>
</blockquote>

<p>

<p>
It is thus natural to look at the rescaled functions <p align=center>$\displaystyle \tilde \phi_m(x) := \sqrt{n} \phi_m(\sqrt{n} x)$</p>
which are orthonormal in ${L^2({\bf R})}$ and solve the equation <p align=center>$\displaystyle L_{1/n} \tilde \phi_m(x) = \frac{m+1/2}{n} \tilde \phi_m$</p>
where ${L_h}$ is the <em>semiclassical harmonic oscillator operator</em> <p align=center>$\displaystyle L_h \phi := - h^2 \phi'' + \frac{x^2}{4} \phi,$</p>
thus <p align=center>$\displaystyle \mathop{\bf E} \mu_{M_n/\sqrt{n}} = \frac{1}{n} \sum_{j=0}^{n-1} \tilde \phi_j(x)^2\ dx =$</p>
<a name=”wings”><p align=center>$\displaystyle = [\frac{1}{n} \tilde \phi'_n(x)^2 + (1 - \frac{x^2}{4}) \tilde \phi_n(x)^2]\ dx. \ \ \ \ \ (36)$</p>
</a>
<p>
The projection ${\pi_{V_n}}$ is then the spectral projection operator of ${L_{1/\sqrt{n}}}$ to ${[0,1]}$. According to <a href=”http://en.wikipedia.org/wiki/Semiclassical”>semi-classical analysis</a>, with ${h}$ being interpreted as analogous to Planck’s constant, the operator ${L_h}$ has symbol ${p^2 + \frac{x^2}{4}}$, where ${p := -i h \frac{d}{dx}}$ is the momentum operator, so the projection ${\pi_{V_n}}$ is a projection to the region ${\{ (x,p): p^2 + \frac{x^2}{4} \leq 1 \}}$ of phase space, or equivalently to the region ${\{ (x,p): |p| < (4-x^2)_+^{1/2}\}}$. In the semi-classical limit ${h \rightarrow 0}$, we thus expect the diagonal ${K_n(x,x)}$ of the normalised projection ${h^2 \pi_{V_n}}$ to be proportional to the projection of this region to the ${x}$ variable, i.e. proportional to ${(4-x^2)_+^{1/2}}$. We are thus led to the semi-circular law via semi-classical analysis.
<p>
It is possible to make the above argument rigorous, but this would require developing the theory of <a href=”http://en.wikipedia.org/wiki/Microlocal_analysis”>microlocal analysis</a>, which would be overkill given that we are just dealing with an ODE rather than a PDE here (and an extremely classical ODE at that). We instead use a more basic semiclassical approximation, the <a href=”http://en.wikipedia.org/wiki/WKB_approximation”>WKB approximation</a>, which we will make rigorous using the classical <a href=”http://en.wikipedia.org/wiki/Method_of_variation_of_parameters”>method of variation of parameters</a> (one could also proceed using the closely related <em>Pr&uuml;fer transformation</em>, which we will not detail here). We study the eigenfunction equation <p align=center>$\displaystyle L_h \phi = \lambda \phi$</p>
where we think of ${h > 0}$ as being small, and ${\lambda}$ as being close to ${1}$. We rewrite this as <a name=”phik”><p align=center>$\displaystyle \phi'' = - \frac{1}{h^2} k(x)^2 \phi \ \ \ \ \ (37)$</p>
</a> where ${k(x) := \sqrt{\lambda - x^2/4}}$, where we will only work in the “classical” region ${x^2/4 < \lambda}$ (so ${k(x) > 0}$) for now.
<p>
Recall that the general solution to the constant coefficient ODE ${\phi'' = -\frac{1}{h^2} k^2 \phi}$ is given by ${\phi(x) = A e^{ikx/h} + B e^{-ikx/h}}$. Inspired by this, we make the ansatz <p align=center>$\displaystyle \phi(x) = A(x) e^{i \Psi(x)/h} + B(x) e^{-i \Psi(x)/h}$</p>
where ${\Psi(x) := \int_0^x k(y)\ dy}$ is the antiderivative of ${k}$. Differentiating this, we have <p align=center>$\displaystyle \phi'(x) = \frac{i k(x)}{h}( A(x) e^{i\Psi(x)/h} - B(x) e^{-i\Psi(x)/h} )$</p>
<p align=center>$\displaystyle + A'(x) e^{i\Psi(x)/h} + B'(x) e^{-i\Psi(x)/h}.$</p>
Because we are representing a single function ${\phi}$ by two functions ${A, B}$, we have the freedom to place an additional constraint on ${A,B}$. Following the usual variation of parameters strategy, we will use this freedom to eliminate the last two terms in the expansion of ${\phi}$, thus <a name=”aprime”><p align=center>$\displaystyle A'(x) e^{i\Psi(x)/h} + B'(x) e^{-i\Psi(x)/h} = 0. \ \ \ \ \ (38)$</p>
</a> We can now differentiate again and obtain <p align=center>$\displaystyle \phi''(x) = -\frac{k(x)^2}{h^2} \phi(x) + \frac{ik'(x)}{h} ( A(x) e^{i\Psi(x)/h} - B(x) e^{-i\Psi(x)/h} )$</p>
<p align=center>$\displaystyle + \frac{ik(x)}{h} ( A'(x) e^{i\Psi(x)/h} - B'(x) e^{-i\Psi(x)/h} ).$</p>
Comparing this with <a href=”#phik”>(37)</a> we see that <p align=center>$\displaystyle A'(x) e^{i\Psi(x)/h} - B'(x) e^{-i\Psi(x)/h} = - \frac{k'(x)}{k(x)} ( A(x) e^{i\Psi(x)/h} - B(x) e^{-i\Psi(x)/h} ).$</p>
Combining this with <a href=”#aprime”>(38)</a>, we obtain equations of motion for ${A}$ and ${B}$: <p align=center>$\displaystyle A'(x) = - \frac{k'(x)}{2 k(x)} A(x) + \frac{k'(x)}{2 k(x)} B(x) e^{-2i\Psi(x)/h}$</p>
<p align=center>$\displaystyle B'(x) = - \frac{k'(x)}{2 k(x)} B(x) + \frac{k'(x)}{2 k(x)} A(x) e^{2i\Psi(x)/h}.$</p>
We can simplify this using the <a href=”http://en.wikipedia.org/wiki/Integrating_factor”>integrating factor</a> substitution <p align=center>$\displaystyle A(x) = k(x)^{-1/2} a(x); \quad B(x) = k(x)^{-1/2} b(x)$</p>
to obtain <a name=”a1″><p align=center>$\displaystyle a'(x) = \frac{k'(x)}{2 k(x)} b(x) e^{-2i\Psi(x)/h}; \ \ \ \ \ (39)$</p>
</a> <a name=”a2″><p align=center>$\displaystyle b'(x) = \frac{k'(x)}{2 k(x)} a(x) e^{2i\Psi(x)/h}. \ \ \ \ \ (40)$</p>
</a> The point of doing all these transformations is that the role of the ${h}$ parameter no longer manifests itself through amplitude factors, and instead only is present in a phase factor. In particular, we have <p align=center>$\displaystyle a', b' = O( |a| + |b| )$</p>
on any compact interval ${I}$ in the interior of the classical region ${x^2/4 < \lambda}$ (where we allow implied constants to depend on ${I}$), which by <a href=”http://en.wikipedia.org/wiki/Gronwall’s_inequality”>Gronwall’s inequality</a> gives the bounds <p align=center>$\displaystyle a'(x), b'(x), a(x), b(x) = O( |a(0)| + |b(0)| )$</p>
on this interval ${I}$. We can then insert these bounds into <a href=”#a1″>(39)</a>, <a href=”#a2″>(40)</a> again and integrate by parts (taking advantage of the non-stationary nature of ${\Psi}$) to obtain the improved bounds <a name=”axo”><p align=center>$\displaystyle a(x) = a(0) + O( h (|a(0)|+|b(0)|) ); \quad b(x) = b(0) + O(h (|a(0)|+|b(0)|) ) \ \ \ \ \ (41)$</p>
</a> on this interval. (More precise asymptotic expansions can be obtained by iterating this procedure, but we will not need them here.) This is already enough to get the asymptotics that we need:
<p>

<blockquote><b>Exercise 12</b> Use <a href=”#wings”>(36)</a> to Show that on any compact interval ${I}$ in ${(-2,2)}$, the density of ${\mathop{\bf E} \mu_{M_n/\sqrt{n}}}$ is given by <p align=center>$\displaystyle (|a|^2(x)+|b|^2(x)) ( \sqrt{1-x^2/4} + o(1) ) + O( |a(x)| |b(x)| )$</p>
where ${a, b}$ are as above with ${\lambda=1+\frac{1}{2n}}$ and ${h=\frac{1}{n}}$. Combining this with <a href=”#axo”>(41)</a>, <a href=”#piano”>(34)</a>, <a href=”#piano-2″>(35)</a>, and Stirling’s formula, conclude that ${\mathop{\bf E} \mu_{M_n/\sqrt{n}}}$ converges in the vague topology to the semicircular law ${\frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx}$. (Note that once one gets convergence inside ${(-2,2)}$, the convergence outside of ${[-2,2]}$ can be obtained for free since ${\mu_{M_n/\sqrt{n}}}$ and ${\frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx}$ are both probability measures. </blockquote>

<p>

<p>
We now sketch out the approach using the <a href=”http://en.wikipedia.org/wiki/Method_of_steepest_descent”>method of steepest descent</a>. The starting point is the Fourier inversion formula <p align=center>$\displaystyle e^{-x^2/2} = \frac{1}{\sqrt{2\pi}} \int_{\bf R} e^{itx} e^{-t^2/2}\ dt$</p>
which upon repeated differentiation gives <p align=center>$\displaystyle \frac{d^n}{dx^n} e^{-x^2/2} = \frac{i^n}{\sqrt{2\pi}} \int_{\bf R} t^n e^{itx} e^{-t^2/2}\ dt$</p>
and thus by <a href=”#pn-form”>(33)</a> <p align=center>$\displaystyle P_n(x) = \frac{(-i)^n}{(2\pi)^{3/4} \sqrt{n!}} \int_{\bf R} t^n e^{-(t-ix)^2/2}\ dt$</p>
and thus <p align=center>$\displaystyle \tilde \phi_n(x) = \frac{(-i)^n}{(2\pi)^{3/4} \sqrt{n!}} n^{(n+1)/2} \int_{\bf R} e^{n \phi(t)} \ dt$</p>
where <p align=center>$\displaystyle \phi(t) := \log t - (t-ix)^2/2 - x^2/4$</p>
where we use a suitable branch of the complex logarithm to handle the case of negative ${t}$.
<p>
The idea of the principle of steepest descent is to shift the contour of integration to where the real part of ${\phi(z)}$ is as small as possible. For this, it turns out that the stationary points of ${\phi(z)}$ play a crucial role. A brief calculation using the quadratic formula shows that there are two such stationary points, at <p align=center>$\displaystyle z = \frac{ix \pm \sqrt{4-x^2}}{2}.$</p>
When ${|x| < 2}$, ${\phi}$ is purely imaginary at these stationary points, while for ${|x| > 2}$ the real part of ${\phi}$ is negative at both points. One then draws a contour through these two stationary points in such a way that near each such point, the imaginary part of ${\phi(z)}$ is kept fixed, which keeps oscillation to a minimum and allows the real part to decay as steeply as possible (which explains the name of the method). After a certain tedious amount of computation, one obtains the same type of asymptotics for ${\tilde \phi_n}$ that were obtained by the ODE method when ${|x| < 2}$ (and exponentially decaying estimates for ${|x|>2}$).
<p>

<blockquote><b>Exercise 13</b> Let ${f: {\bf C} \rightarrow {\bf C}}$, ${g:{\bf C} \rightarrow {\bf C}}$ be functions which are analytic near a complex number ${z_0}$, with ${f'(z_0) = 0}$ and ${f''(z_0) \neq 0}$. Let ${\varepsilon > 0}$ be a small number, and let ${\gamma}$ be the line segment ${\{ z_0 + t v: -\varepsilon < t < \varepsilon \}}$, where ${v}$ is a complex phase such that ${f''(z_0) v^2}$ is a negative real. Show that for ${\varepsilon}$ sufficiently small, one has <p align=center>$\displaystyle \int_\gamma e^{\lambda f(z)} g(z)\ dz = (1+o(1)) \frac{\sqrt{2\pi} v}{\sqrt{f''(z_0) \lambda}} e^{\lambda f(z_0)} g(z_0)$</p>
as ${\lambda \rightarrow +\infty}$. This is the basic estimate behind the method of steepest descent; readers who are also familiar with the <a href=”http://en.wikipedia.org/wiki/Stationary_phase_approximation”>method of stationary phase</a> may see a close parallel. </blockquote>

<p>

<p>

<blockquote><b>Remark 14</b> The method of steepest descent requires an explicit representation of the orthogonal polynomials as contour integrals, and as such is largely restricted to the classical orthogonal polynomials (such as the Hermite polynomials). However, there is a non-linear generalisation of the method of steepest descent developed by Deift and Zhou, in which one solves a matrix Riemann-Hilbert problem rather than a contour integral; see <a href=”http://www.ams.org/mathscinet-getitem?mr=1677884″>this book by Deift</a> for details. Using these sorts of tools, one can generalise much of the above theory to the spectral distribution of ${U(n)}$-conjugation-invariant discussed in Remark <a href=”#daft”>2</a>, with the theory of Hermite polynomials being replaced by the more general theory of orthogonal polynomials; this is discussed in the above book of Deift, as well as the more recent <a href=”http://www.ams.org/mathscinet-getitem?mr=2514781″>book of Deift and Gioev</a>. </blockquote>

<p>

<p>
The computations performed above for the diagonal kernel ${K_n(x,x)}$ can be summarised by the asymptotic <p align=center>$\displaystyle K_n( \sqrt{n} x, \sqrt{n} x ) = \sqrt{n} ( \rho_{sc}(x) + o(1) )$</p>
whenever ${x \in {\bf R}}$ is fixed and ${n \rightarrow \infty}$, and ${\rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}}$ is the semi-circular law distribution. It is reasonably straightforward to generalise these asymptotics to the off-diagonal case as well, obtaining the more general result <a name=”kn”><p align=center>$\displaystyle K_n( \sqrt{n} x + \frac{y_1}{\sqrt{n} \rho_{sc}(x)}, \sqrt{n} x + \frac{y_2}{\sqrt{n} \rho_{sc}(x)} ) = \sqrt{n} ( \rho_{sc}(x) K(y_1,y_2) + o(1) ) \ \ \ \ \ (42)$</p>
</a> for fixed ${x \in (-2,2)}$ and ${y_1,y_2 \in {\bf R}}$, where ${K}$ is the <em>Dyson sine kernel</em> <p align=center>$\displaystyle K(y_1,y_2) := \frac{\sin(\pi(y_1-y_2)}{\pi(y_1-y_2)}.$</p>
In the language of semi-classical analysis, what is going on here is that the rescaling in the left-hand side of <a href=”#kn”>(42)</a> is transforming the phase space region ${\{ (x,p): p^2 + \frac{x^2}{4} \leq 1 \}}$ to the region ${\{ (x,p): |p| \leq 1 \}}$ in the limit ${n \rightarrow \infty}$, and the projection to the latter region is given by the Dyson sine kernel. A formal proof of <a href=”#kn”>(42)</a> can be given by using either the ODE method or the steepest descent method to obtain asymptotics for Hermite polynomials, and thence (via the Christoffel-Darboux formula) to asymptotics for ${K_n}$; we do not give the details here, but see for instance the recent book of Anderson, Guionnet, and Zeitouni.
<p>
From <a href=”#kn”>(42)</a> and <a href=”#rhok-1″>(21)</a>, <a href=”#rhok-2″>(22)</a> we obtain the asymptotic formula <p align=center>$\displaystyle \mathop{\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F( \sqrt{n} \rho_{sc}(x) (\lambda_{i_1}(M_n) - \sqrt{n} x),\ldots,$</p>
<p align=center>$\displaystyle \sqrt{n} \rho_{sc}(x) (\lambda_{i_k}(M_n) - \sqrt{n} x) )$</p>
<p align=center>$\displaystyle \rightarrow \int_{{\bf R}^k} F(y_1,\ldots,y_k) \det(K(y_i,y_j))_{1 \leq i,j \leq k}\ dy_1 \ldots dy_k$</p>
for the local statistics of eigenvalues. By means of further algebraic manipulations (using the general theory of determinantal processes), this allows one to control such quantities as the distribution of eigenvalue gaps near ${\sqrt{n} x}$, normalised at the scale ${\frac{1}{\sqrt{n} \rho_{sc}(x)}}$, which is the average size of these gaps as predicted by the semicircular law. For instance, for any ${s_0>0}$, one can show (basically by the above formulae combined with the inclusion-exclusion principle) that the proportion of eigenvalues ${\lambda_i}$ with normalised gap ${\sqrt{n} \frac{\lambda_{i+1}-\lambda_i}{\rho_{sc}(t_{i/n})}}$ less than ${s_0}$ converges as ${n \rightarrow \infty}$ to ${\int_0^{s_0} \frac{d^2}{ds^2} \det(1-K)_{L^2[0,s]}\ ds}$, where ${t_c \in [-2,2]}$ is defined by the formula ${\int_{-2}^{t_c} \rho_{sc}(x)\ dx = c}$, and ${K}$ is the integral operator with kernel ${K(x,y)}$ (this operator can be verified to be <a href=”http://en.wikipedia.org/wiki/Trace-class_operator”>trace class</a>, so the determinant can be defined in a <a href=”http://en.wikipedia.org/wiki/Fredholm_determinant”>Fredholm sense</a>). See for instance this <a href=”http://www.ams.org/mathscinet-getitem?mr=2129906″>book of Mehta</a> (and my <a href=”https://terrytao.wordpress.com/2009/08/23/determinantal-processes/”>blog post on determinantal processes</a> describe a finitary version of the inclusion-exclusion argument used to obtain such a result).
<p>

<blockquote><b>Remark 15</b> One can also analyse the distribution of the eigenvalues at the edge of the spectrum, i.e. close to ${\pm 2 \sqrt{n}}$. This ultimately hinges on understanding the behaviour of the projection ${\pi_{V_n}}$ near the corners ${(0, \pm 2)}$ of the phase space region ${\Omega = \{ (p,x): p^2 + \frac{x^2}{4} \leq 1 \}}$, or of the Hermite polynomials ${P_n(x)}$ for ${x}$ close to ${\pm 2 \sqrt{n}}$. For instance, by using steepest descent methods, one can show that <p align=center>$\displaystyle n^{1/12} \phi_n(2\sqrt{n} + n^{1/6} x) \rightarrow Ai(x)$</p>
as ${n \rightarrow \infty}$ for any fixed ${x,y}$, where ${Ai}$ is the <a href=”http://en.wikipedia.org/wiki/Airy_function”>Airy function</a> <p align=center>$\displaystyle Ai(x) := \frac{1}{\pi} \int_0^\infty \cos( \frac{t^3}{3} + tx )\ dt.$</p>
This asymptotic and the Christoffel-Darboux formula then gives the asymptotic <a name=”nkn”><p align=center>$\displaystyle n^{1/6} K_n(2\sqrt{n} + n^{1/6} x, 2\sqrt{n} + n^{1/6} y) \rightarrow K_{Ai}(x,y) \ \ \ \ \ (43)$</p>
</a> for any fixed ${x,y}$, where ${K_{Ai}}$ is the <em>Airy kernel</em> <p align=center>$\displaystyle K_{Ai}(x,y) := \frac{Ai(x) Ai'(y) - Ai'(x) Ai(y)}{x-y}.$</p>
(<em>Aside</em>: Semiclassical heuristics suggest that the rescaled kernel <a href=”#nkn”>(43)</a> should correspond to projection to the parabolic region of phase space ${\{ (p,x): p^2 + x \leq 1\}}$, but I do not know of a connection between this region and the Airy kernel; I am not sure whether semiclassical heuristics are in fact valid at this scaling regime. On the other hand, these heuristics do explain the emergence of the length scale ${n^{1/6}}$ that emerges in <a href=”#nkn”>(43)</a>, as this is the smallest scale at the edge which occupies a region in ${\Omega}$ consistent with the Heisenberg uncertainty principle.) This then gives an asymptotic description of the largest eigenvalues of a GUE matrix, which cluster in the region ${2\sqrt{n} + O(n^{1/6})}$. For instance, one can use the above asymptotics to show that the largest eigenvalue ${\lambda_1}$ of a GUE matrix obeys the <a href=”http://en.wikipedia.org/wiki/Tracy%E2%80%93Widom_distribution”>Tracy-Widom law</a> <p align=center>$\displaystyle \mathop{\bf P}( \frac{\lambda_1 - 2\sqrt{n}}{n^{1/6}} < t ) \rightarrow \det(1 - A)_{L^2[0,t]}$</p>
for any fixed ${t}$, where ${A}$ is the integral operator with kernel ${K_{Ai}}$. See for instance the recent book of Anderson, Guionnet, and Zeitouni. </blockquote>

<p>

<p>

<p align=center><b> &mdash; 5. Determinantal form of the gaussian matrix distribution &mdash; </b></p>

<p>
One can perform an analogous analysis of the joint distribution function <a href=”#ginibre-gaussian”>(10)</a> of gaussian random matrices. Indeed, given any family ${P_0,\ldots,P_{n-1}(z)}$ of polynomials, with each ${P_i}$ of degree ${i}$, much the same arguments as before show that <a href=”#ginibre-gaussian”>(10)</a> is equal to a constant multiple of <p align=center>$\displaystyle \det( \sum_{k=0}^{n-1} P_k(\lambda_i) e^{-|\lambda_i|^2/2} \overline{P_k(\lambda_j)} e^{-|\lambda_j|^2/2} )_{1 \leq i,j \leq n}.$</p>
One can then select ${P_k(z) e^{-|z|^2/2}}$ to be orthonormal in ${L^2({\bf C})}$. Actually in this case, the polynomials are very simple, being given explicitly by the formula <p align=center>$\displaystyle P_k(z) := \frac{1}{\sqrt{\pi k!}} z^k.$</p>

<p>

<blockquote><b>Exercise 16</b> Verify that the ${P_k(z) e^{-|z|^2/2}}$ are indeed orthonormal, and then conclude that <a href=”#ginibre-gaussian”>(10)</a> is equal to ${\det( K_n(\lambda_i,\lambda_j) )_{1 \leq i,j \leq n}}$, where <p align=center>$\displaystyle K_n(z,w) := \frac{1}{\pi} e^{-(|z|^2+|w|^2)/2} \sum_{k=0}^{n-1} \frac{(z\overline{w})^k}{k!}.$</p>
Conclude further that the ${m}$-point correlation functions ${\rho_m(z_1,\ldots,z_m)}$ are given as <p align=center>$\displaystyle \rho_m(z_1,\ldots,z_m) = \det( K_n(z_i,z_j) )_{1 \leq i,j \leq m}.$</p>
</blockquote>

<p>

<p>

<blockquote><b>Exercise 17</b> Show that as ${n \rightarrow \infty}$, one has <p align=center>$\displaystyle K_n( \sqrt{n} z, \sqrt{n} z ) = \frac{1}{\pi} 1_{|z| \leq 1} + o(1)$</p>
and deduce that the expected spectral measure ${{\bf E} \mu_{G/\sqrt{n}}}$ converges vaguely to the circular measure ${\mu_c := \frac{1}{\pi} 1_{|z| \leq 1}\ dz}$; this is a special case of the <em>circular law</em>. </blockquote>

<p>

<p>

<blockquote><b>Exercise 18</b> For any ${|z| < 1}$ and ${w_1, w_2 \in {\bf C}}$, show that <p align=center>$\displaystyle K_n( \sqrt{n} (z+w_1), \sqrt{n} (z+w_2) ) = \frac{1}{\pi} \exp( - |w_1-w_2|^2 / 2 ) + o(1)$</p>
as ${n \rightarrow \infty}$. This formula (in principle, at least) describes the asymptotic local ${m}$-point correlation functions of the spectrum of gaussian matrices. </blockquote>

<p>

<p>

<blockquote><b>Remark 19</b> One can use the above formulae as the starting point for many other computations on the spectrum of random gaussian matrices; to give just one example, one can show that expected number of eigenvalues which are real is of the order of ${\sqrt{n}}$ (see <a href=”http://www.ams.org/mathscinet-getitem?mr=1437734″>this paper of Edelman</a> for more precise results of this nature). It remains a challenge to extend these results to more general ensembles than the gaussian ensemble. </blockquote>

<p>

<p>

One theme in this course will be the central nature played by the gaussian random variables ${X \equiv N(\mu,\sigma^2)}$. Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.

One way to exploit this algebraic structure is to continuously deform the variance ${t := \sigma^2}$ from an initial variance of zero (so that the random variable is deterministic) to some final level ${T}$. We would like to use this to give a continuous family ${t \mapsto X_t}$ of random variables ${X_t \equiv N(\mu, t)}$ as ${t}$ (viewed as a “time” parameter) runs from ${0}$ to ${T}$.

At present, we have not completely specified what ${X_t}$ should be, because we have only described the individual distribution ${X_t \equiv N(\mu,t)}$ of each ${X_t}$, and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.

We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of ${n \times n}$ Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of ${n \times n}$ Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.

Given a set ${S}$, a (simple) point process is a random subset ${A}$ of ${S}$. (A non-simple point process would allow multiplicity; more formally, ${A}$ is no longer a subset of ${S}$, but is a Radon measure on ${S}$, where we give ${S}$ the structure of a locally compact Polish space, but I do not wish to dwell on these sorts of technical issues here.) Typically, ${A}$ will be finite or countable, even when ${S}$ is uncountable. Basic examples of point processes include

• (Bernoulli point process) ${S}$ is an at most countable set, ${0 \leq p \leq 1}$ is a parameter, and ${A}$ a random set such that the events ${x \in A}$ for each ${x \in S}$ are jointly independent and occur with a probability of ${p}$ each. This process is automatically simple.
• (Discrete Poisson point process) ${S}$ is an at most countable space, ${\lambda}$ is a measure on ${S}$ (i.e. an assignment of a non-negative number ${\lambda(\{x\})}$ to each ${x \in S}$), and ${A}$ is a multiset where the multiplicity of ${x}$ in ${A}$ is a Poisson random variable with intensity ${\lambda(\{x\})}$, and the multiplicities of ${x \in A}$ as ${x}$ varies in ${S}$ are jointly independent. This process is usually not simple.
• (Continuous Poisson point process) ${S}$ is a locally compact Polish space with a Radon measure ${\lambda}$, and for each ${\Omega \subset S}$ of finite measure, the number of points ${|A \cap \Omega|}$ that ${A}$ contains inside ${\Omega}$ is a Poisson random variable with intensity ${\lambda(\Omega)}$. Furthermore, if ${\Omega_1,\ldots,\Omega_n}$ are disjoint sets, then the random variables ${|A \cap \Omega_1|, \ldots, |A \cap \Omega_n|}$ are jointly independent. (The fact that Poisson processes exist at all requires a non-trivial amount of measure theory, and will not be discussed here.) This process is almost surely simple iff all points in ${S}$ have measure zero.
• (Spectral point processes) The spectrum of a random matrix is a point process in ${{\mathbb C}}$ (or in ${{\mathbb R}}$, if the random matrix is Hermitian). If the spectrum is almost surely simple, then the point process is almost surely simple. In a similar spirit, the zeroes of a random polynomial are also a point process.

A remarkable fact is that many natural (simple) point processes are determinantal processes. Very roughly speaking, this means that there exists a positive semi-definite kernel ${K: S \times S \rightarrow {\mathbb R}}$ such that, for any ${x_1,\ldots,x_n \in S}$, the probability that ${x_1,\ldots,x_n}$ all lie in the random set ${A}$ is proportional to the determinant ${\det( (K(x_i,x_j))_{1 \leq i,j \leq n} )}$. Examples of processes known to be determinantal include non-intersecting random walks, spectra of random matrix ensembles such as GUE, and zeroes of polynomials with gaussian coefficients.

I would be interested in finding a good explanation (even at the heuristic level) as to why determinantal processes are so prevalent in practice. I do have a very weak explanation, namely that determinantal processes obey a large number of rather pretty algebraic identities, and so it is plausible that any other process which has a very algebraic structure (in particular, any process involving gaussians, characteristic polynomials, etc.) would be connected in some way with determinantal processes. I’m not particularly satisfied with this explanation, but I thought I would at least describe some of these identities below to support this case. (This is partly for my own benefit, as I am trying to learn about these processes, particularly in connection with the spectral distribution of random matrices.) The material here is partly based on this survey of Hough, Krishnapur, Peres, and Virág.

The Riemann zeta function ${\zeta(s)}$, defined for ${\hbox{Re}(s)>1}$ by

$\displaystyle \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)$

and then continued meromorphically to other values of ${s}$ by analytic continuation, is a fundamentally important function in analytic number theory, as it is connected to the primes ${p=2,3,5,\ldots}$ via the Euler product formula

$\displaystyle \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (2)$

(for ${\hbox{Re}(s) > 1}$, at least), where ${p}$ ranges over primes. (The equivalence between (1) and (2) is essentially the generating function version of the fundamental theorem of arithmetic.) The function ${\zeta}$ has a pole at ${1}$ and a number of zeroes ${\rho}$. A formal application of the factor theorem gives

$\displaystyle \zeta(s) = \frac{1}{s-1} \prod_\rho (s-\rho) \times \ldots \ \ \ \ \ (3)$

where ${\rho}$ ranges over zeroes of ${\zeta}$, and we will be vague about what the ${\ldots}$ factor is, how to make sense of the infinite product, and exactly which zeroes of ${\zeta}$ are involved in the product. Equating (2) and (3) and taking logarithms gives the formal identity

$\displaystyle - \log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s}) = \log(s-1) - \sum_\rho \log(s-\rho) + \ldots; \ \ \ \ \ (4)$

using the Taylor expansion

$\displaystyle \log(1 - \frac{1}{p^s}) = - \frac{1}{p^s} - \frac{1}{2 p^{2s}} - \frac{1}{3p^{3s}} - \ldots \ \ \ \ \ (5)$

and differentiating the above identity in ${s}$ yields the formal identity

$\displaystyle - \frac{\zeta'(s)}{\zeta(s)} = \sum_n \frac{\Lambda(n)}{n^s} = \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} + \ldots \ \ \ \ \ (6)$

where ${\Lambda(n)}$ is the von Mangoldt function, defined to be ${\log p}$ when ${n}$ is a power of a prime ${p}$, and zero otherwise. Thus we see that the behaviour of the primes (as encoded by the von Mangoldt function) is intimately tied to the distribution of the zeroes ${\rho}$. For instance, if we knew that the zeroes were far away from the axis ${\hbox{Re}(s)=1}$, then we would heuristically have

$\displaystyle \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \frac{1}{it}$

for real ${t}$. On the other hand, the integral test suggests that

$\displaystyle \sum_n \frac{1}{n^{1+it}} \approx \frac{1}{it}$

and thus we see that ${\frac{\Lambda(n)}{n}}$ and ${\frac{1}{n}}$ have essentially the same (multiplicative) Fourier transform:

$\displaystyle \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \sum_n \frac{1}{n^{1+it}}.$

Inverting the Fourier transform (or performing a contour integral closely related to the inverse Fourier transform), one is led to the prime number theorem

$\displaystyle \sum_{n \leq x} \Lambda(n) \approx \sum_{n \leq x} 1.$

In fact, the standard proof of the prime number theorem basically proceeds by making all of the above formal arguments precise and rigorous.

Unfortunately, we don’t know as much about the zeroes ${\rho}$ of the zeta function (and hence, about the ${\zeta}$ function itself) as we would like. The Riemann hypothesis (RH) asserts that all the zeroes (except for the “trivial” zeroes at the negative even numbers) lie on the critical line ${\hbox{Re}(s)=1/2}$; this hypothesis would make the error terms in the above proof of the prime number theorem significantly more accurate. Furthermore, the stronger GUE hypothesis asserts in addition to RH that the local distribution of these zeroes on the critical line should behave like the local distribution of the eigenvalues of a random matrix drawn from the gaussian unitary ensemble (GUE). I will not give a precise formulation of this hypothesis here, except to say that the adjective “local” in the context of distribution of zeroes ${\rho}$ means something like “at scale ${O(1/\log T)}$ when ${\hbox{Im}(s) = O(T)}$“.

Nevertheless, we do know some reasonably non-trivial facts about the zeroes ${\rho}$ and the zeta function ${\zeta}$, either unconditionally, or assuming RH (or GUE). Firstly, there are no zeroes for ${\hbox{Re}(s)>1}$ (as one can already see from the convergence of the Euler product (2) in this case) or for ${\hbox{Re}(s)=1}$ (this is trickier, relying on (6) and the elementary observation that

$\displaystyle \hbox{Re}( 3\frac{\Lambda(n)}{n^{\sigma}} + 4\frac{\Lambda(n)}{n^{\sigma+it}} + \frac{\Lambda(n)}{n^{\sigma+2it}} ) = 2\frac{\Lambda(n)}{n^\sigma} (1+\cos(t \log n))^2$

is non-negative for ${\sigma > 1}$ and ${t \in {\mathbb R}}$); from the functional equation

$\displaystyle \pi^{-s/2} \Gamma(s/2) \zeta(s) = \pi^{-(1-s)/2} \Gamma((1-s)/2) \zeta(1-s)$

(which can be viewed as a consequence of the Poisson summation formula, see e.g. my blog post on this topic) we know that there are no zeroes for ${\hbox{Re}(s) \leq 0}$ either (except for the trivial zeroes at negative even integers, corresponding to the poles of the Gamma function). Thus all the non-trivial zeroes lie in the critical strip ${0 < \hbox{Re}(s) < 1}$.

We also know that there are infinitely many non-trivial zeroes, and can approximately count how many zeroes there are in any large bounded region of the critical strip. For instance, for large ${T}$, the number of zeroes ${\rho}$ in this strip with ${\hbox{Im}(\rho) = T+O(1)}$ is ${O(\log T)}$. This can be seen by applying (6) to ${s = 2+iT}$ (say); the trivial zeroes at the negative integers end up giving a contribution of ${O(\log T)}$ to this sum (this is a heavily disguised variant of Stirling’s formula, as one can view the trivial zeroes as essentially being poles of the Gamma function), while the ${\frac{1}{s-1}}$ and ${\ldots}$ terms end up being negligible (of size ${O(1)}$), while each non-trivial zero ${\rho}$ contributes a term which has a non-negative real part, and furthermore has size comparable to ${1}$ if ${\hbox{Im}(\rho) = T+O(1)}$. (Here I am glossing over a technical renormalisation needed to make the infinite series in (6) converge properly.) Meanwhile, the left-hand side of (6) is absolutely convergent for ${s=2+iT}$ and of size ${O(1)}$, and the claim follows. A more refined version of this argument shows that the number of non-trivial zeroes with ${0 \leq \hbox{Im}(\rho) \leq T}$ is ${\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}$, but we will not need this more precise formula here. (A fair fraction – at least 40%, in fact – of these zeroes are known to lie on the critical line; see this earlier blog post of mine for more discussion.)

Another thing that we happen to know is how the magnitude ${|\zeta(1/2+it)|}$ of the zeta function is distributed as ${t \rightarrow \infty}$; it turns out to be log-normally distributed with log-variance about ${\frac{1}{2} \log \log t}$. More precisely, we have the following result of Selberg:

Theorem 1 Let ${T}$ be a large number, and let ${t}$ be chosen uniformly at random from between ${T}$ and ${2T}$ (say). Then the distribution of ${\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|}$ converges (in distribution) to the normal distribution ${N(0,1)}$.

To put it more informally, ${\log |\zeta(1/2+it)|}$ behaves like ${\sqrt{\frac{1}{2} \log \log t} \times N(0,1)}$ plus lower order terms for “typical” large values of ${t}$. (Zeroes ${\rho}$ of ${\zeta}$ are, of course, certainly not typical, but one can show that one can usually stay away from these zeroes.) In fact, Selberg showed a slightly more precise result, namely that for any fixed ${k \geq 1}$, the ${k^{th}}$ moment of ${\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|}$ converges to the ${k^{th}}$ moment of ${N(0,1)}$.

Remarkably, Selberg’s result does not need RH or GUE, though it is certainly consistent with such hypotheses. (For instance, the determinant of a GUE matrix asymptotically obeys a remarkably similar log-normal law to that given by Selberg’s theorem.) Indeed, the net effect of these hypotheses only affects some error terms in ${\log |\zeta(1/2+it)|}$ of magnitude ${O(1)}$, and are thus asymptotically negligible compared to the main term, which has magnitude about ${O(\sqrt{\log \log T})}$. So Selberg’s result, while very pretty, manages to finesse the question of what the zeroes ${\rho}$ of ${\zeta}$ are actually doing – he makes the primes do most of the work, rather than the zeroes.

Selberg never actually published the above result, but it is reproduced in a number of places (e.g. in this book by Joyner, or this book by Laurincikas). As with many other results in analytic number theory, the actual details of the proof can get somewhat technical; but I would like to record here (partly for my own benefit) an informal sketch of some of the main ideas in the argument.