You are currently browsing the tag archive for the ‘universality’ tag.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).

The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble ${M_n}$ of ${n \times n}$ matrices, one can arbitrarily enumerate the (geometric) eigenvalues as ${\lambda_1(M_n),\ldots,\lambda_n(M_n) \in {\bf C}}$, and one can then define the ${k}$-point correlation functions ${\rho^{(k)}_n: {\bf C}^k \rightarrow {\bf R}^+}$ to be the symmetric functions such that

$\displaystyle \int_{{\bf C}^k} F(z_1,\ldots,z_k) \rho^{(k)}_n(z_1,\ldots,z_k)\ dz_1 \ldots dz_k$

$\displaystyle = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F(\lambda_1(M_n),\ldots,\lambda_k(M_n)).$

In the case when ${M_n}$ is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of ${\rho^{(k)}_n}$ near some point ${z \sqrt{n}}$ as ${n \rightarrow \infty}$ and ${z \in {\bf C}}$ is fixed are given by the determinantal rule

$\displaystyle \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow \hbox{det}( K(w_i,w_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)$

for ${|z| < 1}$ and

$\displaystyle \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow 0$

for ${|z| > 1}$, where ${K}$ is the reproducing kernel

$\displaystyle K(z,w) := \frac{1}{\pi} e^{-|z|^2/2 - |w|^2/2 + z \overline{w}}.$

(There is also an asymptotic for the boundary case ${|z|=1}$, but it is more complicated to state.) In particular, we see that ${\rho^{(k)}_n(z \sqrt{n}) \rightarrow \frac{1}{\pi} 1_{|z| \leq 1}}$ for almost every ${z}$, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.

Our first main result is that the asymptotic (1) for ${|z|<1}$ also holds (in the sense of vague convergence) when ${M_n}$ is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded ${z}$ and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is ${\sqrt{\frac{2n}{\pi}} + O(n^{1/2-c})}$ with probability ${1-O(n^{-c})}$ for some absolute constant ${c>0}$.

There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula

$\displaystyle \log |\det(M_n-z_0)| = - \sum_{1 \leq i \leq n: \lambda_i(M_n) \in B(z_0,r)} \log \frac{r}{|\lambda_i(M_n)-z_0|}$

$\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |\det(M_n-z_0-re^{i\theta})|\ d\theta$

that relates the local distribution of eigenvalues to the log-determinants ${\log |\det(M_n-z_0)|}$, and secondly the elementary identity

$\displaystyle \log |\det(M_n - z)| = \frac{1}{2} \log|\det W_{n,z}| + \frac{1}{2} n \log n$

that relates the log-determinants of ${M_n-z}$ to the log-determinants of the Hermitian matrices

$\displaystyle W_{n,z} := \frac{1}{\sqrt{n}} \begin{pmatrix} 0 & M_n -z \\ (M_n-z)^* & 0 \end{pmatrix}.$

The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants ${\log|\det W_{n,z}|}$. This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.

The matrices ${W_{n,z}}$ are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.

While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)

Van Vu and I have just uploaded to the arXiv our paper “The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices“, submitted to the Proceedings of the National Academy of Sciences. This short note concerns the convergence of the ${k}$-point correlation functions of Wigner matrices in the bulk to the Dyson ${k}$-point functions, a statement conjectured by Wigner, Dyson, and Mehta. Thanks to the results of Erdös, Peche, Ramirez, Schlein, Vu, Yau, and myself, this conjecture has now been established for all Wigner matrices (assuming a finite moment condition on the entries), but only if one uses a quite weak notion of convergence, namely averaged vague convergence in which one averages in the energy parameter ${u}$. The main purpose of this note is to observe that by combining together existing results in the literature, one can improve the convergence to vague convergence (which is the natural notion of convergence in the discrete setting); and furthermore, if one assumes some regularity and decay conditions on the coefficient distribution, one can improve the convergence further to local ${L^1}$ convergence.

More precisely, let ${M_n}$ be an ${n \times n}$ Wigner matrix – a random Hermitian matrix whose off-diagonal elements ${\frac{1}{\sqrt{n}} \zeta_{ij}}$ for ${1 \leq i < j \leq n}$ are iid with mean zero and variance ${1/n}$ (and whose diagonal elements also obey similar hypotheses, which we omit here). For simplicity, we also assume that the real and imaginary parts of ${\zeta_{ij}}$ are also iid (as is the case for instance for the Gaussian Unitary Ensemble (GUE)). The eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of such a matrix are known to be asymptotically distributed accordingly to the Wigner semicircular distribution ${\rho_{sc}(u)\ du}$, where

$\displaystyle \rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)_+^{1/2}.$

In particular, this suggests that at any energy level ${u}$ in the bulk ${(-2,2)}$ of the spectrum, the average eigenvalue spacing should be about ${\frac{1}{n \rho_{sc}(u)}}$. It is then natural to introduce the normalised ${k}$-point correlation function

$\displaystyle \rho_{n,u}^{(k)}(t_1,\ldots,t_k) := \lim_{\epsilon \rightarrow 0} \frac{1}{\epsilon^k} {\bf P} E_\epsilon$

for any distinct reals ${t_1,\ldots,t_k}$ and ${k \geq 1}$, where ${E_\epsilon}$ is the event that there is an eigenvalue in each of the intervals ${[u + \frac{t_i}{n \rho_{sc}(u)}, u + \frac{t_i+\epsilon}{n \rho_{sc}(u)}]}$ for each ${1 \leq i \leq k}$. (This definition is valid when the Wigner ensemble is continuous; for discrete ensembles, one can define ${\rho_{n,u}^{(k)}}$ instead in a distributional sense.)

The Wigner-Dyson-Mehta conjecture asserts that ${\rho_{n,u}^{(k)}}$ converges (in various senses) as ${n \rightarrow \infty}$ to the Dyson ${k}$-point function

$\displaystyle \rho_{Dyson}^{(k)}(t_1,\ldots,t_k) := \hbox{det}( K( t_i,t_j) )_{1 \leq i,j \leq k}$

where ${K(t,t'):=\frac{\sin \pi(t-t')}{\pi(t-t')}}$ is the Dyson sine kernel. This conjecture was verified first for the GUE (with a quite strong notion of convergence, namely local uniform convergence) by Dyson, using an explicit formula for ${\rho_{n,u}^{(k)}}$ in the GUE case due to Gaudin and Mehta. Later results of Johansson, Erdos-Ramirez-Schlein-Yau, Erdos-Peche-Ramirez-Schlein-Yau, and Vu and myself, extended these results to increasingly wider ranges of Wigner matrices, but in the context of either weak convergence (which means that

$\displaystyle \int_{{\bf R}^k} \rho_{n,u}^{(k)}(t) F(t)\ dt \rightarrow \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt \ \ \ \ \ (1)$

for any ${L^\infty}$, compactly supported function ${F}$), or the slightly weaker notion of vague convergence (which is the same as weak convergence, except that the function ${F}$ is also required to be continuous).

In a joint paper of Erdos, Ramirez, Schlein, Vu, Yau, and myself, we established the Wigner-Dyson-Mehta conjecture for all Wigner matrices (assuming only an exponential decay condition on the entries), but using a quite weak notion of convergence, namely averaged vague convergence, which allows for averaging in the energy parameter. Specifically, we showed that

$\displaystyle \lim_{b \rightarrow 0} \lim_{n \rightarrow \infty} \frac{1}{2b} \int_{u-b}^{u+b} \int_{{\bf R}^k} \rho_{n,u'}^{(k)}(t) F(t)\ dt = \int_{{\bf R}^k} \rho_{Dyson}^{(k)}(t) F(t)\ dt.$

Subsequently, Erdos, Schlein, and Yau introduced the powerful local relaxation flow method, which achieved a simpler proof of the same result which also generalised to other ensembles beyond the Wigner case. However, for technical reasons, this method was restricted to establishing averaged vague convergence only.

In the current paper, we show that by combining the argument of Erdos, Ramirez, Schlein, Vu, Yau, and myself with some more recent technical results, namely the relaxation of the exponential decay condition in the four moment theorem to a finite moment condition (established by Vu and myself) and a strong eigenvalue localisation bound of Erdos, Yau, and Yin, one can upgrade the averaged vague convergence to vague convergence, and handle all Wigner matrices that assume a finite moment condition. Vague convergence is the most natural notion of convergence for discrete random matrix ensembles; for such ensembles, the correlation function is a discrete measure, and so one does not expect convergence to a continuous limit in any stronger sense than the vague sense. Also, by carefully inspecting the earlier argument of Erdos, Peche, Ramirez, Schlein, and Yau, we were able to establish convergence in the stronger local ${L^1}$ sense once one assumed some regularity and positivity condition on the underlying coefficient distribution. These are somewhat modest and technical improvements over previous work on the Wigner-Dyson-Mehta conjecture, but they help to clarify and organise the profusion of results in this area, which are now reaching a fairly definitive form.

It may well be possible to go beyond local ${L^1}$ convergence in the case of smooth ensembles, for instance establishing local uniform convergence; this was recently accomplished in the ${k=1}$ case by Maltsev and Schlein. Indeed one may optimistically expect to even have convergence in the local smooth topology, which would basically be the strongest convergence one could hope for.

Last week I gave a talk at the Trinity Mathematical Society at Trinity College, Cambridge UK.  As the audience was primarily undergraduate, I gave a fairly non-technical talk on the universality phenomenon, based on this blog article of mine on the same topic.  It was a quite light and informal affair, and this is reflected in the talk slides (which, in particular, play up quite strongly the role of former students and Fellows of Trinity College in this story).   There was some interest in making these slides available publicly, so I have placed them on this site here.  (Note: copyright for the images in these slides has not been secured.)

I’ve spent the last week or so reworking the first draft of my universality article for Mathematics Awareness Month, in view of the useful comments and feedback received on that draft here on this blog, as well as elsewhere.  In fact, I ended up rewriting the article from scratch, and expanding it substantially, in order to focus on a more engaging and less technical narrative.  I found that I had to use a substantially different mindset than the one I am used to having for technical expository writing; indeed, the exercise reminded me more of my high school English assignments than of my professional work.  (This is perhaps a bad sign: English was not exactly my strongest subject as a student.)

The piece now has title: “E pluribus unum: from complexity, universality”.  This is a somewhat US-centric piece of wordplay, but Mathematics Awareness Month is, after all, a US-based initiative, even though awareness of mathematics certainly transcends national boundaries.   Still, it is a trivial matter to modify the title later if a better proposal arises, and I am sure that if I send this text to be published, that the editors may have some suggestions in this regard.

By coincidence, I moved up and expanded the other US-centric item – the discussion of the 2008 US presidential elections – to the front of the paper to play the role of the hook.  I’ll try to keep the Commonwealth spelling conventions, though. :-)

I decided to cut out the discussion of the N-body problem for various values of N, in part due to the confusion over the notion of a “solution”; there is a nice mathematical story there, but perhaps one that gets in the way of the main story of universality.

I have added a fair number of relevant images, though some of them will have to be changed in the final version for copyright reasons.  The narrow column format of this blog means that the image placement is not optimal, but I am sure that this can be rectified if this article is published professionally.

## Read the rest of this entry »

The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM).  I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”).  While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult.  At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.

I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure.   Below the fold is a first draft of the article; I would definitely welcome feedback and corrections.  It does not yet have any pictures, but I plan to rectify that in the final draft.  It also does not have a title, but this will be easy to address later.   But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience.  Any suggestions in this regard would be particularly appreciated.

I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).

Van Vu and I have just uploaded to the arXiv our paper “Random covariance matrices: Universality of local statistics of eigenvalues“, to be submitted shortly. This paper draws heavily on the technology of our previous paper, in which we established a Four Moment Theorem for the local spacing statistics of eigenvalues of Wigner matrices. This theorem says, roughly speaking, that these statistics are completely determined by the first four moments of the coefficients of such matrices, at least in the bulk of the spectrum. (In a subsequent paper we extended the Four Moment Theorem to the edge of the spectrum.)

In this paper, we establish the analogous result for the singular values of rectangular iid matrices ${M = M_{n,p}}$, or (equivalently) the eigenvalues of the associated covariance matrix ${\frac{1}{n} M M^*}$. As is well-known, there is a parallel theory between the spectral theory of random Wigner matrices and those of covariance matrices; for instance, just as the former has asymptotic spectral distribution governed by the semi-circular law, the latter has asymptotic spectral distribution governed by the Marcenko-Pastur law. One reason for the connection can be seen by noting that the singular values of a rectangular matrix ${M}$ are essentially the same thing as the eigenvalues of the augmented matrix

$\displaystyle \begin{pmatrix} 0 & M \\ M^* & 0\end{pmatrix}$

after eliminating sign ambiguities and degeneracies. So one can view singular values of a rectangular iid matrix as the eigenvalues of a matrix which resembles a Wigner matrix, except that two diagonal blocks of that matrix have been zeroed out.

The zeroing out of these elements prevents one from applying the entire Wigner universality theory directly to the covariance matrix setting (in particular, the crucial Talagrand concentration inequality for the magnitude of a projection of a random vector to a subspace does not work perfectly once there are many zero coefficients). Nevertheless, a large part of the theory (particularly the deterministic components of the theory, such as eigenvalue variation formulae) carry through without much difficulty. The one place where one has to spend a bit of time to check details is to ensure that the Erdos-Schlein-Yau delocalisation result (that asserts, roughly speaking, that the eigenvectors of a Wigner matrix are about as small in ${\ell^\infty}$ norm as one could hope to get) is also true for in the covariance matrix setting, but this is a straightforward (though somewhat tedious) adaptation of the method (which is based on the Stieltjes transform).

As an application, we extend the sine kernel distribution of local covariance matrix statistics, first established in the case of Wishart ensembles (when the underlying variables are gaussian) by Nagao and Wadati, and later extended to gaussian-divisible matrices by Ben Arous and Peche, to any distributions which matches one of these distributions to up to four moments, which covers virtually all complex distributions with independent iid real and imaginary parts, with basically the lone exception of the complex Bernoulli ensemble.

Recently, Erdos, Schlein, Yau, and Yin generalised their local relaxation flow method to also obtain similar universality results for distributions which have a large amount of smoothness, but without any matching moment conditions. By combining their techniques with ours as in our joint paper, one should probably be able to remove both smoothness and moment conditions, in particular now covering the complex Bernoulli ensemble.

In this paper we also record a new observation that the exponential decay hypothesis in our earlier paper can be relaxed to a finite moment condition, for a sufficiently high (but fixed) moment. This is done by rearranging the order of steps of the original argument carefully.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local eigenvalue statistics up to the edge“, submitted to Comm. Math. Phys..  This is a sequel to our previous paper, in which we studied universality of local eigenvalue statistics (such as normalised eigenvalue spacings $\sqrt{n} ( \lambda_{i+1}(M_n) - \lambda_i(M_n) )$) for random matrices $M_n$ of Wigner type, i.e. Hermitian (or symmetric) random matrices in which the upper-triangular entries are independent with mean zero and variance one (for technical reasons we also have to assume an exponential decay condition on the distribution of the entries).   The results in the previous paper were almost entirely focused on the bulk region, in which the index i of the eigenvalues involved was in the range $\varepsilon n \leq i \leq (1-\varepsilon) n$.  The main purpose of this paper is to extend the main results of the previous paper all the way up to the edge, thus allowing one to control all indices $1 \leq i \leq n$.  As an application, we obtain a variant of Soshnikov’s well-known result that the largest eigenvalue of Wigner matrices is distributed (after suitable normalisation) according to the Tracy-Widom law when the coefficients are symmetric, by assuming instead that the coefficients have vanishing third moment.

As one transitions from the bulk to the edge, the density of the eigenvalues decreases to zero (in accordance to the Wigner semicircular law), and so the average spacing between eigenvalues increases.  (For instance, the spacing between eigenvalues in the bulk is of size $n^{-1/2}$, but at the extreme edge it increases to $n^{-1/6}$.)  On the one hand, the increase in average spacing should make life easier, because one does not have to work at such a fine spatial scale in order to see the eigenvalue distribution.  On the other hand, a certain key technical step in the previous paper (in which we adapted an argument of Erdos, Schlein, and Yau to show that eigenvectors of Wigner matrices were delocalised) seemed to require eigenvalue spacings to be of size $O(n^{-1/2})$, which was the main technical obstacle to extending the preceding results from the bulk to the edge.

The main new observation in the paper is that it was not the eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_n)$ which were of importance to eigenvalue delocalisation, but rather the somewhat smaller interlaced eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_{n-1})$, where $M_{n-1}$ is a $n-1 \times n-1$ minor of $M_n$.  The Cauchy interlacing law asserts that the latter is smaller than the former.  But the interesting thing is that at the edge (when i is close to n), the interlaced spacings are much smaller than the former, and in particular remain of size about $O(n^{-1/2})$ (up to log factors) even though the non-interlaced spacings increase to be as large as $O(n^{-1/6})$.  This is ultimately due to a sort of “attractive force” on eigenvalues that draws them towards the origin, and counteracts the usual “eigenvalue repulsion effect”, that pushes eigenvalues away from each other.  This induces “bias” for eigenvalues to move in towards the bulk rescues the delocalization result, and the remainder of the arguments in our previous paper then continue with few changes.

Below the fold I wish to give some heuristic justification of the interlacing bias phenomenon, sketch why this is relevant for eigenvector delocalisation, and finally to recall why eigenvalue delocalisation in turn is relevant for universality.

[Update, Aug 16: sign error corrected.]

The Riemann zeta function ${\zeta(s)}$, defined for ${\hbox{Re}(s)>1}$ by

$\displaystyle \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)$

and then continued meromorphically to other values of ${s}$ by analytic continuation, is a fundamentally important function in analytic number theory, as it is connected to the primes ${p=2,3,5,\ldots}$ via the Euler product formula

$\displaystyle \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (2)$

(for ${\hbox{Re}(s) > 1}$, at least), where ${p}$ ranges over primes. (The equivalence between (1) and (2) is essentially the generating function version of the fundamental theorem of arithmetic.) The function ${\zeta}$ has a pole at ${1}$ and a number of zeroes ${\rho}$. A formal application of the factor theorem gives

$\displaystyle \zeta(s) = \frac{1}{s-1} \prod_\rho (s-\rho) \times \ldots \ \ \ \ \ (3)$

where ${\rho}$ ranges over zeroes of ${\zeta}$, and we will be vague about what the ${\ldots}$ factor is, how to make sense of the infinite product, and exactly which zeroes of ${\zeta}$ are involved in the product. Equating (2) and (3) and taking logarithms gives the formal identity

$\displaystyle - \log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s}) = \log(s-1) - \sum_\rho \log(s-\rho) + \ldots; \ \ \ \ \ (4)$

using the Taylor expansion

$\displaystyle \log(1 - \frac{1}{p^s}) = - \frac{1}{p^s} - \frac{1}{2 p^{2s}} - \frac{1}{3p^{3s}} - \ldots \ \ \ \ \ (5)$

and differentiating the above identity in ${s}$ yields the formal identity

$\displaystyle - \frac{\zeta'(s)}{\zeta(s)} = \sum_n \frac{\Lambda(n)}{n^s} = \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} + \ldots \ \ \ \ \ (6)$

where ${\Lambda(n)}$ is the von Mangoldt function, defined to be ${\log p}$ when ${n}$ is a power of a prime ${p}$, and zero otherwise. Thus we see that the behaviour of the primes (as encoded by the von Mangoldt function) is intimately tied to the distribution of the zeroes ${\rho}$. For instance, if we knew that the zeroes were far away from the axis ${\hbox{Re}(s)=1}$, then we would heuristically have

$\displaystyle \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \frac{1}{it}$

for real ${t}$. On the other hand, the integral test suggests that

$\displaystyle \sum_n \frac{1}{n^{1+it}} \approx \frac{1}{it}$

and thus we see that ${\frac{\Lambda(n)}{n}}$ and ${\frac{1}{n}}$ have essentially the same (multiplicative) Fourier transform:

$\displaystyle \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \sum_n \frac{1}{n^{1+it}}.$

Inverting the Fourier transform (or performing a contour integral closely related to the inverse Fourier transform), one is led to the prime number theorem

$\displaystyle \sum_{n \leq x} \Lambda(n) \approx \sum_{n \leq x} 1.$

In fact, the standard proof of the prime number theorem basically proceeds by making all of the above formal arguments precise and rigorous.

Unfortunately, we don’t know as much about the zeroes ${\rho}$ of the zeta function (and hence, about the ${\zeta}$ function itself) as we would like. The Riemann hypothesis (RH) asserts that all the zeroes (except for the “trivial” zeroes at the negative even numbers) lie on the critical line ${\hbox{Re}(s)=1/2}$; this hypothesis would make the error terms in the above proof of the prime number theorem significantly more accurate. Furthermore, the stronger GUE hypothesis asserts in addition to RH that the local distribution of these zeroes on the critical line should behave like the local distribution of the eigenvalues of a random matrix drawn from the gaussian unitary ensemble (GUE). I will not give a precise formulation of this hypothesis here, except to say that the adjective “local” in the context of distribution of zeroes ${\rho}$ means something like “at scale ${O(1/\log T)}$ when ${\hbox{Im}(s) = O(T)}$“.

Nevertheless, we do know some reasonably non-trivial facts about the zeroes ${\rho}$ and the zeta function ${\zeta}$, either unconditionally, or assuming RH (or GUE). Firstly, there are no zeroes for ${\hbox{Re}(s)>1}$ (as one can already see from the convergence of the Euler product (2) in this case) or for ${\hbox{Re}(s)=1}$ (this is trickier, relying on (6) and the elementary observation that

$\displaystyle \hbox{Re}( 3\frac{\Lambda(n)}{n^{\sigma}} + 4\frac{\Lambda(n)}{n^{\sigma+it}} + \frac{\Lambda(n)}{n^{\sigma+2it}} ) = 2\frac{\Lambda(n)}{n^\sigma} (1+\cos(t \log n))^2$

is non-negative for ${\sigma > 1}$ and ${t \in {\mathbb R}}$); from the functional equation

$\displaystyle \pi^{-s/2} \Gamma(s/2) \zeta(s) = \pi^{-(1-s)/2} \Gamma((1-s)/2) \zeta(1-s)$

(which can be viewed as a consequence of the Poisson summation formula, see e.g. my blog post on this topic) we know that there are no zeroes for ${\hbox{Re}(s) \leq 0}$ either (except for the trivial zeroes at negative even integers, corresponding to the poles of the Gamma function). Thus all the non-trivial zeroes lie in the critical strip ${0 < \hbox{Re}(s) < 1}$.

We also know that there are infinitely many non-trivial zeroes, and can approximately count how many zeroes there are in any large bounded region of the critical strip. For instance, for large ${T}$, the number of zeroes ${\rho}$ in this strip with ${\hbox{Im}(\rho) = T+O(1)}$ is ${O(\log T)}$. This can be seen by applying (6) to ${s = 2+iT}$ (say); the trivial zeroes at the negative integers end up giving a contribution of ${O(\log T)}$ to this sum (this is a heavily disguised variant of Stirling’s formula, as one can view the trivial zeroes as essentially being poles of the Gamma function), while the ${\frac{1}{s-1}}$ and ${\ldots}$ terms end up being negligible (of size ${O(1)}$), while each non-trivial zero ${\rho}$ contributes a term which has a non-negative real part, and furthermore has size comparable to ${1}$ if ${\hbox{Im}(\rho) = T+O(1)}$. (Here I am glossing over a technical renormalisation needed to make the infinite series in (6) converge properly.) Meanwhile, the left-hand side of (6) is absolutely convergent for ${s=2+iT}$ and of size ${O(1)}$, and the claim follows. A more refined version of this argument shows that the number of non-trivial zeroes with ${0 \leq \hbox{Im}(\rho) \leq T}$ is ${\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}$, but we will not need this more precise formula here. (A fair fraction – at least 40%, in fact – of these zeroes are known to lie on the critical line; see this earlier blog post of mine for more discussion.)

Another thing that we happen to know is how the magnitude ${|\zeta(1/2+it)|}$ of the zeta function is distributed as ${t \rightarrow \infty}$; it turns out to be log-normally distributed with log-variance about ${\frac{1}{2} \log \log t}$. More precisely, we have the following result of Selberg:

Theorem 1 Let ${T}$ be a large number, and let ${t}$ be chosen uniformly at random from between ${T}$ and ${2T}$ (say). Then the distribution of ${\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|}$ converges (in distribution) to the normal distribution ${N(0,1)}$.

To put it more informally, ${\log |\zeta(1/2+it)|}$ behaves like ${\sqrt{\frac{1}{2} \log \log t} \times N(0,1)}$ plus lower order terms for “typical” large values of ${t}$. (Zeroes ${\rho}$ of ${\zeta}$ are, of course, certainly not typical, but one can show that one can usually stay away from these zeroes.) In fact, Selberg showed a slightly more precise result, namely that for any fixed ${k \geq 1}$, the ${k^{th}}$ moment of ${\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|}$ converges to the ${k^{th}}$ moment of ${N(0,1)}$.

Remarkably, Selberg’s result does not need RH or GUE, though it is certainly consistent with such hypotheses. (For instance, the determinant of a GUE matrix asymptotically obeys a remarkably similar log-normal law to that given by Selberg’s theorem.) Indeed, the net effect of these hypotheses only affects some error terms in ${\log |\zeta(1/2+it)|}$ of magnitude ${O(1)}$, and are thus asymptotically negligible compared to the main term, which has magnitude about ${O(\sqrt{\log \log T})}$. So Selberg’s result, while very pretty, manages to finesse the question of what the zeroes ${\rho}$ of ${\zeta}$ are actually doing – he makes the primes do most of the work, rather than the zeroes.

Selberg never actually published the above result, but it is reproduced in a number of places (e.g. in this book by Joyner, or this book by Laurincikas). As with many other results in analytic number theory, the actual details of the proof can get somewhat technical; but I would like to record here (partly for my own benefit) an informal sketch of some of the main ideas in the argument.

One further paper in this stream: László Erdős, José Ramírez, Benjamin Schlein, Van Vu, Horng-Tzer Yau, and myself have just uploaded to the arXiv the paper “Bulk universality for Wigner hermitian matrices with subexponential decay“, submitted to Mathematical Research Letters.  (Incidentally, this is my first six-author paper I have been involved in, not counting the polymath projects of course, though I have had a number of five-author papers.)

This short paper (9 pages) combines the machinery from two recent papers on the universality conjecture for the eigenvalue spacings in the bulk for Wigner random matrices (see my earlier blog post for more discussion).  On the one hand, the paper of Erdős-Ramírez-Schlein-Yau established this conjecture under the additional hypothesis that the distribution of the individual entries obeyed some smoothness and exponential decay conditions.  Meanwhile, the paper of Van Vu and myself (which I discussed in my earlier blog post) established the conjecture under a somewhat different set of hypotheses, namely that the distribution of the individual entries obeyed some moment conditions (in particular, the third moment had to vanish), a support condition (the entries had to have real part supported in at least three points), and an exponential decay condition.

After comparing our results, the six of us realised that our methods could in fact be combined rather easily to obtain a stronger result, establishing the universality conjecture assuming only a exponential decay (or more precisely, sub-exponential decay) bound ${\Bbb P}(|x_{\ell k}| > t ) \ll \exp( - t^c )$ on the coefficients; thus all regularity, moment, and support conditions have been eliminated.  (There is one catch, namely that we can no longer control a single spacing $\lambda_{i+1}-\lambda_i$ for a single fixed i, but must now average over all $1 \leq i \leq n$ before recovering the universality.  This is an annoying technical issue but it may be resolvable in the future with further refinements to the method.)

I can describe the main idea behind the unified approach here.  One can arrange the Wigner matrices in a hierarchy, from most structured to least structured:

• The most structured (or special) ensemble is the Gaussian Unitary Ensemble (GUE), in which the coefficients are gaussian. Here, one has very explicit and tractable formulae for the eigenvalue distributions, gap spacing, etc.
• The next most structured ensemble of Wigner matrices are the Gaussian-divisible or Johansson matrices, which are matrices H of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, where $\hat H$ is another Wigner matrix, V is a GUE matrix independent of $\hat H$, and $0 < t < 1$ is a fixed parameter independent of n.  Here, one still has quite explicit (though not quite as tractable) formulae for the joint eigenvalue distribution and related statistics.  Note that the limiting case t=1 is GUE.
• After this, one has the Ornstein-Uhlenbeck-evolved matrices, which are also of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, but now $t = n^{-1+\delta}$ decays at a power rate with n, rather than being comparable to 1.  Explicit formulae still exist for these matrices, but extracting universality out of this is hard work (and occupies the bulk of the paper of Erdős-Ramírez-Schlein-Yau).
• Finally, one has arbitrary Wigner matrices, which can be viewed as the t=0 limit of the above Ornstein-Uhlenbeck process.

The arguments in the paper of Erdős-Ramírez-Schlein-Yau can be summarised as follows (I assume subexponential decay throughout this discussion):

1. (Structured case) The universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.  (The case $1/4 < \delta \leq 1$ was treated in an earlier paper of Erdős-Ramírez-Schlein-Yau, while the case where t is comparable to 1 was treated by Johansson.)
2. (Matching) Every Wigner matrix with suitable smoothness conditions can be “matched” with an Ornstein-Uhlenbeck-evolved matrix, in the sense that the eigenvalue statistics for the two matrices are asymptotically identical.  (This is relatively easy due to the fact that $\delta$ can be taken arbitrarily close to zero.)
3. Combining 1. and 2. one obtains universality for all Wigner matrices obeying suitable smoothness conditions.

The arguments in the paper of Van and myself can be summarised as follows:

1. (Structured case) The universality conjecture is true for Johansson matrices, by the paper of Johansson.
2. (Matching) Every Wigner matrix with some moment and support conditions can be “matched” with a Johansson matrix, in the sense that the first four moments of the entries agree, and hence (by the Lindeberg strategy in our paper) have asymptotically identical statistics.
3. Combining 1. and 2. one obtains universality for all Wigner matrices obtaining suitable moment and support conditions.

What we realised is by combining the hard part 1. of the paper of Erdős-Ramírez-Schlein-Yau with the hard part 2. of the paper of Van and myself, we can remove all regularity, moment, and support conditions.  Roughly speaking, the unified argument proceeds as follows:

1. (Structured case) By the arguments of Erdős-Ramírez-Schlein-Yau, the universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.
2. (Matching) Every Wigner matrix $H$ can be “matched” with an Ornstein-Uhlenbeck-evolved matrix $e^{-t/2} H + (1-e^{-t})^{1/2} V$ for $t= n^{-1+0.01}$ (say), in the sense that the first four moments of the entries almost agree, which is enough (by the arguments of Van and myself) to show that these two matrices have asymptotically identical statistics on the average.
3. Combining 1. and 2. one obtains universality for the averaged statistics for all Wigner matrices.

The averaging should be removable, but this would require better convergence results to the semicircular law than are currently known (except with additional hypotheses, such as vanishing third moment).  The subexponential decay should also be relaxed to a condition of finiteness for some fixed moment ${\Bbb E} |x|^C$, but we did not pursue this direction in order to keep the paper short.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues $\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)$ of a Wigner matrix $M_n = (\zeta_{ij})_{1 \leq i,j \leq n}$, which we define to be a random Hermitian $n \times n$ matrix whose upper-triangular entries $\zeta_{ij}, 1 \leq i \leq j \leq n$ are independent (and whose strictly upper-triangular entries $\zeta_{ij}, 1 \leq i < j \leq n$ are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries $\zeta_{ij}, i are complex gaussian, and the diagonal entries $\zeta_{ii}$ are real gaussians;
2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
3. The Bernoulli Ensemble, in which all entries take values $\pm 1$ (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues $\lambda_1(M_n),\ldots,\lambda_n(M_n)$ for GUE is given by the explicit formula

$\displaystyle C_n \prod_{1 \leq i (0)

for some explicitly computable constant $C_n$ on the orthant $\{ \lambda_1 \leq \ldots \leq \lambda_n\}$ (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure $\frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}}$ for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

$\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx$ (1)

as $n \to \infty$.  Actually, more precise statements are known for GUE; for instance, for $1 \leq i \leq n$, the $i^{th}$ eigenvalue $\lambda_i(M_n)$ is known to equal

$\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} )$ (2)

with probability $1-o(1)$, where $t(a) \in [-2,2]$ is the inverse cumulative distribution function of the semicircular law, thus

$\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx$.

Furthermore, the distribution of the normalised eigenvalue spacing $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ is known; in the bulk region $\varepsilon n \leq i \leq 1-\varepsilon n$ for fixed $\varepsilon > 0$, it converges as $n \to \infty$ to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel $K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}$.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say $i = n-O(1)$, the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose $M_n$ is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of $O(n^{-1+\delta})$ for any $\delta>0$ rather than $O(\log n/n)$) and the sine kernel statistics for individual eigenvalue spacings $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, $\log |\det M_n| = (1+o(1)) \log \sqrt{n!}$ with probability $1-o(1)$).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)