You are currently browsing the category archive for the ‘math.PR’ category.

I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues of a random Wigner matrix in the limit , with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the *bulk* of the spectrum, i.e. to eigenvalues with for some fixed .

The location of an individual eigenvalue is by now quite well understood. If we normalise the entries of the matrix to have mean zero and variance , then in the asymptotic limit , the Wigner semicircle law tells us that with probability one has

where the *classical location* of the eigenvalue is given by the formula

and the semicircular distribution is given by the formula

Actually, one can improve the error term here from to for any (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as *eigenvalue rigidity* estimates).

From the semicircle law (and the fundamental theorem of calculus), one expects the eigenvalue spacing to have an average size of . It is thus natural to introduce the normalised eigenvalue spacing

and ask what the distribution of is.

As mentioned previously, we will focus on the bulk case , and begin with the model case when is drawn from GUE. (In the edge case when is close to or to , the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to , namely the probability

that an interval near of length comparable to the expected eigenvalue spacing is devoid of eigenvalues. For in the bulk and fixed , they showed that this probability is equal to

where is the Dyson projection

to Fourier modes in , and is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that will be asymptotically distributed according to the *Gaudin-Mehta distribution* , where

A reasonably accurate approximation for is given by the *Wigner surmise* , which was presciently proposed by Wigner as early as 1957; it is exact for but not in the asymptotic limit .

Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap , but rather an ensemble of gaps , where is drawn from an interval of some moderate size (e.g. ); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices ). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)

The problem is that when one specifies a given window of spectrum such as , one cannot quite pin down in advance which eigenvalues are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of or so in the index (as can be quantified quite precisely by this central limit theorem of Gustavsson).

The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when is even, but very few when is odd. In this sort of situation, the gaps may have different behaviour for even than for odd , and such anomalies would not be picked up in the averaged statistics in which is allowed to range over some moderately large interval.

The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels than on specific eigenvalue indices . To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)

The main task in the proof is to show that the random variable is largely decoupled from the event in (1) when is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert , where is the Dyson projection and is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.

In the last three notes, we discussed the Bourgain-Gamburd expansion machine and two of its three ingredients, namely quasirandomness and product theorems, leaving only the non-concentration ingredient to discuss. We can summarise the results of the last three notes, in the case of fields of prime order, as the following theorem.

Theorem 1 (Non-concentration implies expansion in )Let be a prime, let , and let be a symmetric set of elements in of cardinality not containing the identity. Write , and suppose that one has the non-concentration propertyfor some and some even integer . Then is a two-sided -expander for some depending only on .

*Proof:* From (1) we see that is not supported in any proper subgroup of , which implies that generates . The claim now follows from the Bourgain-Gamburd expansion machine (Theorem 2 of Notes 4), the product theorem (Theorem 1 of Notes 5), and quasirandomness (Exercise 8 of Notes 3).

Remark 1The same argument also works if we replace by the field of order for some bounded . However, there is a difficulty in the regime when is unbounded, because the quasirandomness property becomes too weak for the Bourgain-Gamburd expansion machine to be directly applicable. On theother hand, the above type of theorem was generalised to the setting of cyclic groups with square-free by Varju, to arbitrary by Bourgain and Varju, and to more general algebraic groups than and square-free by Salehi Golsefidy and Varju. It may be that some modification of the proof techniques in these papers may also be able to handle the field case with unbounded .

It thus remains to construct tools that can establish the non-concentration property (1). The situation is particularly simple in , as we have a good understanding of the subgroups of that group. Indeed, from Theorem 14 from Notes 5, we obtain the following corollary to Theorem 1:

Corollary 2 (Non-concentration implies expansion in )Let be a prime, and let be a symmetric set of elements in of cardinality not containing the identity. Write , and suppose that one has the non-concentration propertyfor some and some even integer , where ranges over all Borel subgroups of . Then, if is sufficiently large depending on , is a two-sided -expander for some depending only on .

It turns out (2) can be verified in many cases by exploiting the solvable nature of the Borel subgroups . We give two examples of this in these notes. The first result, due to Bourgain and Gamburd (with earlier partial results by Gamburd and by Shalom) generalises Selberg’s expander construction to the case when generates a thin subgroup of :

Theorem 3 (Expansion in thin subgroups)Let be a symmetric subset of not containing the identity, and suppose that the group generated by is not virtually solvable. Then as ranges over all sufficiently large primes, the Cayley graphs form a two-sided expander family, where is the usual projection.

Remark 2One corollary of Theorem 3 (or of the non-concentration estimate (3) below) is that generates for all sufficiently large , if is not virtually solvable. This is a special case of a much more general result, known as the strong approximation theorem, although this is certainly not the most direct way to prove such a theorem. Conversely, the strong approximation property is used in generalisations of this result to higher rank groups than .

Exercise 1In the converse direction, if is virtually solvable, show that for sufficiently large , fails to generate . (Hint:use Theorem 14 from Notes 5 to prevent from having bounded index solvable subgroups.)

Exercise 2 (Lubotzsky’s 1-2-3 problem)Let .

- (i) Show that generates a free subgroup of . (
Hint:use a ping-pong argument, as in Exercise 23 of Notes 2.)- (ii) Show that if are two distinct elements of the sector , then there os no element for which . (
Hint:this is another ping-pong argument.) Conclude that has infinite index in . (Contrast this with the situation in which the coefficients in are replaced by or , in which case is either all of , or a finite index subgroup, as demonstrated in Exercise 23 of Notes 2).- (iii) Show that for sufficiently large primes form a two-sided expander family.

Remark 3Theorem 3 has been generalised to arbitrary linear groups, and with replaced by for square-free ; see this paper of Salehi Golsefidy and Varju. In this more general setting, the condition of virtual solvability must be replaced by the condition that the connected component of the Zariski closure of is perfect. An effective version of Theorem 3 (with completely explicit constants) was recently obtained by Kowalski.

The second example concerns Cayley graphs constructed using random elements of .

Theorem 4 (Random generators expand)Let be a prime, and let be two elements of chosen uniformly at random. Then with probability , is a two-sided -expander for some absolute constant .

Remark 4As with Theorem 3, Theorem 4 has also been extended to a number of other groups, such as the Suzuki groups (in this paper of Breuillard, Green, and Tao), and more generally to finite simple groups of Lie type of bounded rank (in forthcoming work of Breuillard, Green, Guralnick, and Tao). There are a number of other constructions of expanding Cayley graphs in such groups (and in other interesting groups, such as the alternating groups) beyond those discussed in these notes; see this recent survey of Lubotzky for further discussion. It has been conjectured by Lubotzky and Weiss thatanypair of (say) that generates the group, is a two-sided -expander for an absolute constant : in the case of , this has been established for a density one set of primes by Breuillard and Gamburd.

** — 1. Expansion in thin subgroups — **

We now prove Theorem 3. The first observation is that the expansion property is monotone in the group :

Exercise 3Let be symmetric subsets of not containing the identity, such that . Suppose that is a two-sided expander family for sufficiently large primes . Show that is also a two-sided expander family.

As a consequence, Theorem 3 follows from the following two statments:

Theorem 5 (Tits alternative)Let be a group. Then exactly one of the following statements holds:

- (i) is virtually solvable.
- (ii) contains a copy of the free group of two generators as a subgroup.

Theorem 6 (Expansion in free groups)Let be generators of a free subgroup of . Then as ranges over all sufficiently large primes, the Cayley graphs form a two-sided expander family.

Theorem 5 is a special case of the famous Tits alternative, which among other things allows one to replace by for any and any field of characteristic zero (and fields of positive characteristic are also allowed, if one adds the requirement that be finitely generated). We will not prove the full Tits alternative here, but instead just give an *ad hoc* proof of the special case in Theorem 5 in the following exercise.

Exercise 4Given any matrix , the singular values are and , and we can apply the singular value decomposition to decomposewhere and are orthonormal bases. (When , these bases are uniquely determined up to phase rotation.) We let be the projection of to the projective complex plane, and similarly define .

Let be a subgroup of . Call a pair a

limit pointof if there exists a sequence with and .

- (i) Show that if is infinite, then there is at least one limit point.
- (ii) Show that if is a limit point, then so is .
- (iii) Show that if there are two limit points with , then there exist that generate a free group. (
Hint:Choose close to and close to , and consider the action of and on , and specifically on small neighbourhoods of , and set up a ping-pong type situation.)- (iv) Show that if is hyperbolic (i.e. it has an eigenvalue greater than 1), with eigenvectors , then the projectivisations of form a limit point. Similarly, if is regular parabolic (i.e. it has an eigenvalue at 1, but is not the identity) with eigenvector , show that is a limit point.
- (v) Show that if has no free subgroup of two generators, then all hyperbolic and regular parabolic elements of have a common eigenvector. Conclude that all such elements lie in a solvable subgroup of .
- (vi) Show that if an element is neither hyperbolic nor regular parabolic, and is not a multiple of the identity, then is conjugate to a rotation by (in particular, ).
- (vii) Establish Theorem 5. (
Hint:show that two square roots of in cannot multiply to another square root of .)

Now we prove Theorem 6. Let be a free subgroup of generated by two generators . Let be the probability measure generating a random walk on , thus is the corresponding generator on . By Corollary 2, it thus suffices to show that

for all sufficiently large , some absolute constant , and some even (depending on , of course), where ranges over Borel subgroups.

As is a homomorphism, one has and so it suffices to show that

To deal with the supremum here, we will use an argument of Bourgain and Gamburd, taking advantage of the fact that all Borel groups of obey a common group law, the point being that free groups such as obey such laws only very rarely. More precisely, we use the fact that the Borel groups are solvable of derived length two; in particular we have

for all . Now, is supported on matrices in whose coefficients have size (where we allow the implied constants to depend on the choice of generators ), and so is supported on matrices in whose coefficients also have size . If is less than a sufficiently small multiple of , these coefficients are then less than (say). As such, if lie in the support of and their projections obey the word law (4) in , then the original matrices obey the word law (4) in . (This lifting of identities from the characteristic setting of to the characteristic setting of is a simple example of the “Lefschetz principle”.)

To summarise, if we let be the set of all elements of that lie in the support of , then (4) holds for all . This severely limits the size of to only be of polynomial size, rather than exponential size:

Proposition 7Let be a subset of the support of (thus, consists of words in of length ) such that the law (4) holds for all . Then .

The proof of this proposition is laid out in the exercise below.

Exercise 5Let be a free group generated by two generators . Let be the set of all words of length at most in .

- (i) Show that if commute, then lie in the same cyclic group, thus for some and .
- (ii) Show that if , there are at most elements of that commute with .
- (iii) Show that if , there are at most elements of with .
- (iv) Prove Proposition 7.

Now we can conclude the proof of Theorem 3:

Exercise 6Let be a free group generated by two generators .

- (i) Show that for some absolute constant . (For much more precise information on , see this paper of Kesten.)
- (ii) Conclude the proof of Theorem 3.

** — 2. Random generators expand — **

We now prove Theorem 4. Let be the free group on two formal generators , and let be the generator of the random walk. For any word and any in a group , let be the element of formed by substituting for respectively in the word ; thus can be viewed as a map for any group . Observe that if is drawn randomly using the distribution , and , then is distributed according to the law , where . Applying Corollary 2, it suffices to show that whenever is a large prime and are chosen uniformly and independently at random from , that with probability , one has

for some absolute constant , where ranges over all Borel subgroups of and is drawn from the law for some even natural number .

Let denote the words in of length at most . We may use the law (4) to obtain good bound on the supremum in (5) assuming a certain non-degeneracy property of the word evaluations :

Exercise 7Let be a natural number, and suppose that is such that for . Show thatfor some absolute constant , where is drawn from the law . (

Hint:use (4) and the hypothesis to lift the problem up to , at which point one can use Proposition 7 and Exercise 6.)

In view of this exercise, it suffices to show that with probability , one has for all for some comparable to a small multiple of . As has elements, it thus suffices by the union bound to show that

for some absolute constant , and any of length less than for some sufficiently small absolute constant .

Let us now fix a non-identity word of length less than , and consider as a function from to for an arbitrary field . We can identify with the set . A routine induction then shows that the expression is then a polynomial in the eight variables of degree and coefficients which are integers of size . Let us then make the additional restriction to the case , in which case we can write and . Then is now a rational function of whose numerator is a polynomial of degree and coefficients of size , and the denominator is a monomial of of degree .

We then specialise this rational function to the field . It is conceivable that when one does so, the rational function collapses to the constant polynomial , thus for all with . (For instance, this would be the case if , by Lagrange’s theorem, if it were not for the fact that is far too large here.) But suppose that this rational function does not collapse to the constant rational function. Applying the Schwarz-Zippel lemma (Exercise 23 from Notes 5), we then see that the set of pairs with and is at most ; adding in the and cases, one still obtains a bound of , which is acceptable since and . Thus, the only remaining case to consider is when the rational function is identically on with .

Now we perform another “Lefschetz principle” maneuvre to change the underlying field. Recall that the denominator of rational function is monomial in , and the numerator has coefficients of size . If is less than for a sufficiently small , we conclude in particular (for large enough) that the coefficients all have magnitude less than . As such, the only way that this function can be identically on is if it is identically on for all with , and hence for or also by taking Zariski closures.

On the other hand, we know that for some choices of , e.g. , contains a copy of the free group on two generators (see e.g. Exercise 23 of Notes 2). As such, it is not possible for any non-identity word to be identically trivial on . Thus this case cannot actually occur, completing the proof of (6) and hence of Theorem 4.

Remark 5We see from the above argument that the existence of subgroups of an algebraic group with good “independence” properties – such as that of generating a free group – can be useful in studying the expansion properties of that algebraic group, even if the field of interest in the latter is distinct from that of the former. For more complicated algebraic groups than , in which laws such as (4) are not always available, it turns out to be useful to place further properties on the subgroup , for instance by requiring that all non-abelian subgroups of that group be Zariski dense (a property which has been calledstrong density), as this turns out to be useful for preventing random walks from concentrating in proper algebraic subgroups. See this paper of Breuillard, Guralnick, Green and Tao for constructions of strongly dense free subgroups of algebraic groups and further discussion.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as *the* definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)

Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the *heat flow method*, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the *swapping method*, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as *four moment theorems*, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.

Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).

Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the -point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.

In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.

We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.

Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the *bulk* of the spectrum, i.e. to eigenvalues with for some fixed , although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix to have mean zero and variance , then in the asymptotic limit , we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution , where

An essentially equivalent way of saying this is that for large , we expect the eigenvalue of to stay close to the *classical location* , defined by the formula

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape for some absolute constant ; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that has variance comparable to in the bulk, so that the optimal error term in (1) should be about . (One may think that if one wanted bounds on (1) that were uniform in , one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the ; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order .)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain *local semicircle laws* which showed, among other things, that one had asymptotics of the form

with exponentially high probability for intervals in the bulk that were as short as for some , where is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor . The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function of many independent random variables with the same function of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

for various smooth test functions , by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, was a bounded test function, which allowed one to get control of the bulk of the distribution of , and in particular in controlling probabilities such as

for various thresholds and , but did not give good control on the tail as the error terms tended to be polynomially decaying in rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

for various moderately large (e.g. of size comparable to ), obtaining results such as

after performing all the relevant exchanges. As such, one can then use large deviation estimates on to deduce large deviation estimates on .

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents (and the closely related Stieltjes transform ) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

for any energy level , which can be verified from elementary calculus. (In practice, we would truncate near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions , which in turn can be used to deduce concentration results for individual eigenvalues by some basic combinatorial manipulations.

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year. This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices. In this abridged version of the survey, we focus on a key tool in the subject, namely the *Four Moment Theorem* which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries. We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant of a random *iid* matrix , such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution , thus the real and imaginary parts are independent with law ), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution (with a chance of either sign). More generally, one can consider a matrix in which all the entries are independently and identically distributed with mean zero and variance .

We can expand using the Leibniz expansion

where ranges over the permutations of , and is the product

From the iid nature of the , we easily see that each has mean zero and variance one, and are pairwise uncorrelated as varies. We conclude that has mean zero and variance (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that is typically of size .

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, is clearly symmetrical, so we can focus attention on the magnitude . We can interpret this quantity geometrically as the volume of an -dimensional parallelopiped whose generating vectors are independent real gaussian vectors in (i.e. their coefficients are iid with law ). Using the classical base-times-height formula, we thus have

where is the -dimensional linear subspace of spanned by (note that , having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group . Because of this, we see that if we fix (and thus , the random variable has the same distribution as , or equivalently the distribution

where are iid copies of . As this distribution does not depend on the , we conclude that the law of is given by the sum of independent -variables:

A standard computation shows that each has mean and variance , and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that has mean and variance . As such, has mean and variance . Applying a suitable version of the central limit theorem, one obtains the asymptotic law

where denotes convergence in distribution. A bit more informally, we have

when is a real gaussian matrix; thus, for instance, the median value of is . At first glance, this appears to conflict with the second moment bound of Turán mentioned earlier, but once one recalls that has a second moment of , we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real distribution with the complex distribution, in which the are distributed according to the complex gaussian instead of the real one). At the end of the day, one ends up with the law

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices by *Wigner matrices* , which are defined similarly but are constrained to be Hermitian (or real symmetric), thus for all . Model examples here include the Gaussian Unitary Ensemble (GUE), in which for and for , the Gaussian Orthogonal Ensemble (GOE), in which for and for , and the *symmetric Bernoulli ensemble*, in which for (with probability of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of is still usually zero, but equals in the exceptional case when is a perfect matching (i.e. the union of exactly -cycles, a possibility that can of course only happen when is even). As such, the mean still vanishes when is odd, but for even it is equal to

(the fraction here simply being the number of perfect matchings on vertices). Using Stirling’s formula, one then computes that is comparable to when is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that is comparable to for GUE and for GOE. (The discrepancy here comes from the fact that in the GOE case, and can correlate when contains reversals of -cycles of for , but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1Let be a Wigner matrix.

- If is drawn from GUE, then
- If is drawn from GOE, then
- The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

when is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

when is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the *four moment theorem* for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

of . Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE)\cite{trotter} Let be the random tridiagonal real symmetric matrixwhere the are jointly independent real random variables, with being standard real Gaussians, and each having a -distribution:

where are iid complex gaussians. Let be drawn from GUE. Then the joint eigenvalue distribution of is identical to the joint eigenvalue distribution of .

*Proof:* Let be drawn from GUE. We can write

where is drawn from the GUE, , and is a random gaussian vector with all entries iid with distribution . Furthermore, are jointly independent.

We now apply the tridiagonal matrix algorithm. Let , then has the -distribution indicated in the proposition. We then conjugate by a unitary matrix that preserves the final basis vector , and maps to . Then we have

where is conjugate to . Now we make the crucial observation: because is distributed according to GUE (which is a unitarily invariant ensemble), and is a unitary matrix independent of , is also distributed according to GUE, and remains independent of both and .

We continue this process, expanding as

Applying a further unitary conjugation that fixes but maps to , we may replace by while transforming to another GUE matrix independent of . Iterating this process, we eventually obtain a coupling of to by unitary conjugations, and the claim follows.

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant of the above matrix is given by solving the recursion

with and . Thus, instead of the product of a sequence of independent scalar distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent matrices whose entries are given by gaussians and distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some *ad hoc* manipulation of these matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

Let be a natural number, and let be a permutation of , drawn uniformly at random. Using the cycle decomposition, one can view as the disjoint union of cycles of varying lengths (from to ). For each , let denote the number of cycles of of length ; thus the are natural number-valued random variables with the constraint

We let be the number of cycles (of arbitrary length); this is another natural number-valued random variable, of size at most .

I recently had need to understand the distribution of the random variables and . As it turns out this is an extremely classical subject, but as an exercise I worked out what I needed using a quite tedious computation involving generating functions that I will not reproduce here. But the resulting identities I got were so nice, that they strongly suggested the existence of elementary bijective (or “double counting”) proofs, in which the identities are proven with a minimum of computation, by interpreting each side of the identity as the cardinality (or probability) of the same quantity (or event), viewed in two different ways. I then found these bijective proofs, which I found to be rather cute; again, these are all extremely classical (closely related, for instance, to Stirling numbers of the first kind), but I thought some readers might be interested in trying to find these proofs themselves as an exercise (and I also wanted a place to write the identities down so I could retrieve them later), so I have listed the identities I found below.

- For any , one has . In particular, .
- More generally, for any and with , one has .
- More generally still, for any and with , one has
- In particular, we have
*Cauchy’s formula*: if , then the probability that for all is precisely . (This in particular leads to a reasonably tractable formula for the joint generating function of the , which is what I initially used to compute everything that I needed, before finding the slicker bijective proofs.) - For fixed , converges in distribution as to the Poisson distribution of intensity .
- More generally, for fixed , converge in joint distribution to independent Poisson distributions of intensity respectively. (A more precise version of this claim can be found in this paper of Arratia and Tavaré.)
- One has .
- More generally, one has for all natural numbers .

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of eigenvectors“, submitted to Random Matrices: Theory and Applications. This paper concerns an extension of our four moment theorem for eigenvalues. Roughly speaking, that four moment theorem asserts (under mild decay conditions on the coefficients of the random matrix) that the fine-scale structure of individual eigenvalues of a Wigner random matrix depend only on the first four moments of each of the entries.

In this paper, we extend this result from eigenvalues to eigen*vectors*, and specifically to the coefficients of, say, the eigenvector of a Wigner random matrix . Roughly speaking, the main result is that the distribution of these coefficients also only depends on the first four moments of each of the entries. In particular, as the distribution of coefficients eigenvectors of invariant ensembles such as GOE or GUE are known to be asymptotically gaussian real (in the GOE case) or gaussian complex (in the GUE case), the same asymptotic automatically holds for Wigner matrices whose coefficients match GOE or GUE to fourth order.

(A technical point here: strictly speaking, the eigenvectors are only determined up to a phase, even when the eigenvalues are simple. So, to phrase the question properly, one has to perform some sort of normalisation, for instance by working with the coefficients of the spectral projection operators instead of the eigenvectors, or rotating each eigenvector by a random phase, or by fixing the first component of each eigenvector to be positive real. This is a fairly minor technical issue here, though, and will not be discussed further.)

This theorem strengthens a four moment theorem for eigenvectors recently established by Knowles and Yin (by a somewhat different method), in that the hypotheses are weaker (no level repulsion assumption is required, and the matrix entries only need to obey a finite moment condition rather than an exponential decay condition), and a slightly stronger conclusion (less regularity is needed on the test function, and one can handle the joint distribution of polynomially many coefficients, rather than boundedly many coefficients). On the other hand, the Knowles-Yin paper can also handle generalised Wigner ensembles in which the variances of the entries are allowed to fluctuate somewhat.

The method used here is a variation of that in our original paper (incorporating the subsequent improvements to extend the four moment theorem from the bulk to the edge, and to replace exponential decay by a finite moment condition). That method was ultimately based on the observation that if one swapped a single entry (and its adjoint) in a Wigner random matrix, then an individual eigenvalue would not fluctuate much as a consequence (as long as one had already truncated away the event of an unexpectedly small eigenvalue gap). The same analysis shows that the projection matrices obeys the same stability property.

As an application of the eigenvalue four moment theorem, we establish a four moment theorem for the coefficients of resolvent matrices , even when is on the real axis (though in that case we need to make a level repulsion hypothesis, which has been already verified in many important special cases and is likely to be true in general). This improves on an earlier four moment theorem for resolvents of Erdos, Yau, and Yin, which required to stay some distance away from the real axis (specifically, that for some small ).

Let be an abelian countable discrete group. A measure-preserving -system (or *-system for short*) is a probability space , equipped with a measure-preserving action of the group , thus

for all and , and

for all , with equal to the identity map. Classically, ergodic theory has focused on the cyclic case (in which the are iterates of a single map , with elements of being interpreted as a time parameter), but one can certainly consider actions of other groups also (including continuous or non-abelian groups).

A -system is said to be *strongly -mixing*, or strongly mixing for short, if one has

for all , where the convergence is with respect to the one-point compactification of (thus, for every , there exists a compact (hence finite) subset of such that for all ).

Similarly, we say that a -system is *strongly -mixing* if one has

for all , thus for every , there exists a finite subset of such that

whenever all lie outside .

It is obvious that a strongly -mixing system is necessarily strong -mixing. In the case of -systems, it has been an open problem for some time, due to Rohlin, whether the converse is true:

Problem 1 (Rohlin’s problem)Is every strongly mixing -system necessarily strongly -mixing?

This is a surprisingly difficult problem. In the positive direction, a routine application of the Cauchy-Schwarz inequality (via van der Corput’s inequality) shows that every strongly mixing system is *weakly -mixing*, which roughly speaking means that converges to for *most* . Indeed, every weakly mixing system is in fact weakly mixing of all orders; see for instance this blog post of Carlos Matheus, or these lecture notes of myself. So the problem is to exclude the possibility of correlation between , , and for a small but non-trivial number of pairs .

It is also known that the answer to Rohlin’s problem is affirmative for rank one transformations (a result of Kalikow) and for shifts with purely singular continuous spectrum (a result of Host; note that strongly mixing systems cannot have any non-trivial point spectrum). Indeed, any counterexample to the problem, if it exists, is likely to be highly pathological.

In the other direction, Rohlin’s problem is known to have a negative answer for -systems, by a well-known counterexample of Ledrappier which can be described as follows. One can view a -system as being essentially equivalent to a stationary process of random variables in some range space indexed by , with being with the obvious shift map

In Ledrappier’s example, the take values in the finite field of two elements, and are selected at uniformly random subject to the “Pascal’s triangle” linear constraints

A routine application of the Kolmogorov extension theorem allows one to build such a process. The point is that due to the properties of Pascal’s triangle modulo (known as Sierpinski’s triangle), one has

for all powers of two . This is enough to destroy strong -mixing, because it shows a strong correlation between , , and for arbitrarily large and randomly chosen . On the other hand, one can still show that and are asymptotically uncorrelated for large , giving strong -mixing. Unfortunately, there are significant obstructions to converting Ledrappier’s example from a -system to a -system, as pointed out by de la Rue.

In this post, I would like to record a “finite field” variant of Ledrappier’s construction, in which is replaced by the function field ring , which is a “dyadic” (or more precisely, “triadic”) model for the integers (cf. this earlier blog post of mine). In other words:

Theorem 2There exists a -system that is strongly -mixing but not strongly -mixing.

The idea is much the same as that of Ledrappier; one builds a stationary -process in which are chosen uniformly at random subject to the constraints

for all and all . Again, this system is manifestly not strongly -mixing, but can be shown to be strongly -mixing; I give details below the fold.

As I discussed in this previous post, in many cases the dyadic model serves as a good guide for the non-dyadic model. However, in this case there is a curious rigidity phenomenon that seems to prevent Ledrappier-type examples from being transferable to the one-dimensional non-dyadic setting; once one restores the Archimedean nature of the underlying group, the constraints (1) not only reinforce each other strongly, but also force so much linearity on the system that one loses the strong mixing property.

Van Vu and I have just uploaded to the arXiv our paper “The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices“, submitted to the Proceedings of the National Academy of Sciences. This short note concerns the convergence of the -point correlation functions of Wigner matrices in the bulk to the Dyson -point functions, a statement conjectured by Wigner, Dyson, and Mehta. Thanks to the results of Erdös, Peche, Ramirez, Schlein, Vu, Yau, and myself, this conjecture has now been established for all Wigner matrices (assuming a finite moment condition on the entries), but only if one uses a quite weak notion of convergence, namely *averaged vague convergence* in which one averages in the energy parameter . The main purpose of this note is to observe that by combining together existing results in the literature, one can improve the convergence to vague convergence (which is the natural notion of convergence in the discrete setting); and furthermore, if one assumes some regularity and decay conditions on the coefficient distribution, one can improve the convergence further to *local convergence*.

More precisely, let be an *Wigner matrix* – a random Hermitian matrix whose off-diagonal elements for are iid with mean zero and variance (and whose diagonal elements also obey similar hypotheses, which we omit here). For simplicity, we also assume that the real and imaginary parts of are also iid (as is the case for instance for the Gaussian Unitary Ensemble (GUE)). The eigenvalues of such a matrix are known to be asymptotically distributed accordingly to the Wigner semicircular distribution , where

In particular, this suggests that at any energy level in the bulk of the spectrum, the average eigenvalue spacing should be about . It is then natural to introduce the normalised -point correlation function

for any distinct reals and , where is the event that there is an eigenvalue in each of the intervals for each . (This definition is valid when the Wigner ensemble is continuous; for discrete ensembles, one can define instead in a distributional sense.)

The *Wigner-Dyson-Mehta conjecture* asserts that converges (in various senses) as to the Dyson -point function

where is the Dyson sine kernel. This conjecture was verified first for the GUE (with a quite strong notion of convergence, namely local uniform convergence) by Dyson, using an explicit formula for in the GUE case due to Gaudin and Mehta. Later results of Johansson, Erdos-Ramirez-Schlein-Yau, Erdos-Peche-Ramirez-Schlein-Yau, and Vu and myself, extended these results to increasingly wider ranges of Wigner matrices, but in the context of either *weak* convergence (which means that

for any , compactly supported function ), or the slightly weaker notion of *vague convergence* (which is the same as weak convergence, except that the function is also required to be continuous).

In a joint paper of Erdos, Ramirez, Schlein, Vu, Yau, and myself, we established the Wigner-Dyson-Mehta conjecture for all Wigner matrices (assuming only an exponential decay condition on the entries), but using a quite weak notion of convergence, namely *averaged vague convergence*, which allows for averaging in the energy parameter. Specifically, we showed that

Subsequently, Erdos, Schlein, and Yau introduced the powerful *local relaxation flow method*, which achieved a simpler proof of the same result which also generalised to other ensembles beyond the Wigner case. However, for technical reasons, this method was restricted to establishing averaged vague convergence only.

In the current paper, we show that by combining the argument of Erdos, Ramirez, Schlein, Vu, Yau, and myself with some more recent technical results, namely the relaxation of the exponential decay condition in the four moment theorem to a finite moment condition (established by Vu and myself) and a strong eigenvalue localisation bound of Erdos, Yau, and Yin, one can upgrade the averaged vague convergence to vague convergence, and handle all Wigner matrices that assume a finite moment condition. Vague convergence is the most natural notion of convergence for discrete random matrix ensembles; for such ensembles, the correlation function is a discrete measure, and so one does not expect convergence to a continuous limit in any stronger sense than the vague sense. Also, by carefully inspecting the earlier argument of Erdos, Peche, Ramirez, Schlein, and Yau, we were able to establish convergence in the stronger local sense once one assumed some regularity and positivity condition on the underlying coefficient distribution. These are somewhat modest and technical improvements over previous work on the Wigner-Dyson-Mehta conjecture, but they help to clarify and organise the profusion of results in this area, which are now reaching a fairly definitive form.

It may well be possible to go beyond local convergence in the case of smooth ensembles, for instance establishing local uniform convergence; this was recently accomplished in the case by Maltsev and Schlein. Indeed one may optimistically expect to even have convergence in the local smooth topology, which would basically be the strongest convergence one could hope for.

## Recent Comments