You are currently browsing the tag archive for the ‘eigenvalues’ tag.

Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:

Lemma 1 (Szemerédi regularity lemma) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition {V = V_1 \cup \ldots \cup V_M} for some {M \leq M(\epsilon)} with the property that for all but at most {\epsilon M^2} of the pairs {1 \leq i \leq j \leq M}, the pair {V_i, V_j} is {\epsilon}-regular in the sense that

\displaystyle  | d( A, B ) - d( V_i, V_j ) | \leq \epsilon

whenever {A \subset V_i, B \subset V_j} are such that {|A| \geq \epsilon |V_i|} and {|B| \geq \epsilon |V_j|}, and {d(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|/|A| |B|} is the edge density between {A} and {B}. Furthermore, the partition is equitable in the sense that {||V_i| - |V_j|| \leq 1} for all {1 \leq i \leq j \leq M}.

There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of {G}, which is essentially due to Frieze and Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.

For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells {V_i} by their magnitude when counting bad pairs):

Lemma 2 (Szemerédi regularity lemma, weakened variant) . Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition {V = V_1 \cup \ldots \cup V_M} for some {M \leq M(\epsilon)} with the property that for all pairs {(i,j) \in \{1,\ldots,M\}^2} outside of an exceptional set {\Sigma}, one has

\displaystyle  | E(A,B) - d_{ij} |A| |B| | \ll \epsilon |V_i| |V_j| \ \ \ \ \ (1)

whenever {A \subset V_i, B \subset V_j}, for some real number {d_{ij}}, where {E(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|} is the number of edges between {A} and {B}. Furthermore, we have

\displaystyle  \sum_{(i,j) \in \Sigma} |V_i| |V_j| \ll \epsilon |V|^2. \ \ \ \ \ (2)

Let us now prove Lemma 2. We enumerate {V} (after relabeling) as {V = \{1,\ldots,n\}}. The adjacency matrix {T} of the graph {G} is then a self-adjoint {n \times n} matrix, and thus admits an eigenvalue decomposition

\displaystyle  T = \sum_{i=1}^n \lambda_i u_i^* u_i

for some orthonormal basis {u_1,\ldots,u_n} of {{\bf C}^n} and some eigenvalues {\lambda_1,\ldots,\lambda_n \in {\bf R}}, which we arrange in decreasing order of magnitude:

\displaystyle  |\lambda_1| \geq \ldots \geq |\lambda_n|.

We can compute the trace of {T^2} as

\displaystyle  \hbox{tr}(T^2) = \sum_{i=1}^n |\lambda_i|^2.

But we also have {\hbox{tr}(T^2) = 2|E| \leq n^2}, so

\displaystyle  \sum_{i=1}^n |\lambda_i|^2 \leq n^2. \ \ \ \ \ (3)

Among other things, this implies that

\displaystyle  |\lambda_i| \leq \frac{n}{\sqrt{i}} \ \ \ \ \ (4)

for all {i \geq 1}.

Let {F: {\bf N} \rightarrow {\bf N}} be a function (depending on {\epsilon}) to be chosen later, with {F(i) \geq i} for all {i}. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find {J \leq C(F,\epsilon)} such that

\displaystyle  \sum_{J \leq i < F(J)} |\lambda_i|^2 \leq \epsilon^3 n^2.

(Indeed, the bound on {J} is basically {F} iterated {1/\epsilon^3} times.) We can now split

\displaystyle  T = T_1 + T_2 + T_3, \ \ \ \ \ (5)

where {T_1} is the “structured” component

\displaystyle  T_1 := \sum_{i < J} \lambda_i u_i^* u_i, \ \ \ \ \ (6)

{T_2} is the “small” component

\displaystyle  T_2 := \sum_{J \leq i < F(J)} \lambda_i u_i^* u_i, \ \ \ \ \ (7)

and {T_3} is the “pseudorandom” component

\displaystyle  T_3 := \sum_{i > F(J)} \lambda_i u_i^* u_i. \ \ \ \ \ (8)

We now design a vertex partition to make {T_1} approximately constant on most cells. For each {i < J}, we partition {V} into {O_{J,\epsilon}(1)} cells on which {u_i} (viewed as a function from {V} to {{\bf C}}) only fluctuates by {O(\epsilon n^{-1/2} /J)}, plus an exceptional cell of size {O( \frac{\epsilon}{J} |V|)} coming from the values where {|u_i|} is excessively large (larger than {\sqrt{\frac{J}{\epsilon}} n^{-1/2}}). Combining all these partitions together, we can write {V = V_1 \cup \ldots \cup V_{M-1} \cup V_M} for some {M = O_{J,\epsilon}(1)}, where {V_M} has cardinality at most {\epsilon |V|}, and for all {1 \leq i \leq M-1}, the eigenfunctions {u_1,\ldots,u_{J-1}} all fluctuate by at most {O(\epsilon/J)}. In particular, if {1 \leq i,j \leq M-1}, then (by (4) and (6)) the entries of {T_1} fluctuate by at most {O(\epsilon)} on each block {V_i \times V_j}. If we let {d_{ij}} be the mean value of these entries on {V_i \times V_j}, we thus have

\displaystyle  1_B^* T_1 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (9)

for any {1 \leq i,j \leq M-1} and {A \subset V_i, B \subset V_j}, where we view the indicator functions {1_A, 1_B} as column vectors of dimension {n}.

Next, we observe from (3) and (7) that {\hbox{tr} T_2^2 \leq \epsilon^3 n^2}. If we let {x_{ab}} be the coefficients of {T_2}, we thus have

\displaystyle  \sum_{a,b \in V} |x_{ab}|^2 \leq \epsilon^3 n^2

and hence by Markov’s inequality we have

\displaystyle  \sum_{a \in V_i} \sum_{b \in V_j} |x_{ab}|^2 \leq \epsilon^2 |V_i| |V_j| \ \ \ \ \ (10)

for all pairs {(i,j) \in \{1,\ldots,M-1\}^2} outside of an exceptional set {\Sigma_1} with

\displaystyle  \sum_{(i,j) \in \Sigma_1} |V_i| |V_j| \leq \epsilon |V|^2.

If {(i,j) \in \{1,\ldots,M-1\}^2} avoids {\Sigma_1}, we thus have

\displaystyle  1_B^* T_2 1_A = O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (11)

for any {A \subset V_i, B \subset V_j}, by (10) and the Cauchy-Schwarz inequality.

Finally, to control {T_3} we see from (4) and (8) that {T_3} has an operator norm of at most {n/\sqrt{F(J)}}. In particular, we have from the Cauchy-Schwarz inequality that

\displaystyle  1_B^* T_3 1_A = O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (12)

for any {A, B \subset V}.

Let {\Sigma} be the set of all pairs {(i,j) \in \{1,\ldots,M\}^2} where either {(i,j) \in \Sigma_1}, {i = M}, {j=M}, or

\displaystyle  \min(|V_i|, |V_j|) \leq \frac{\epsilon}{M} n.

One easily verifies that (2) holds. If {(i,j) \in \{1,\ldots,M\}^2} is not in {\Sigma}, then by summing (9), (11), (12) and using (5), we see that

\displaystyle  1_B^* T 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) + O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (13)

for all {A \subset V_i, B \subset V_j}. The left-hand side is just {E(A,B)}. As {(i,j) \not \in \Sigma}, we have

\displaystyle  |V_i|, |V_j| > \frac{\epsilon}{M} n

and so (since {M = O_{J,\epsilon}(1)})

\displaystyle  n^2 / \sqrt{F(J)} \ll_{J,\epsilon} |V_i| |V_j| / \sqrt{F(J)}.

If we let {F} be a sufficiently rapidly growing function of {J} that depends on {\epsilon}, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.

To prove Lemma 1, one argues similarly (after modifying {\epsilon} as necessary), except that the initial partition {V_1,\ldots,V_M} of {V} constructed above needs to be subdivided further into equitable components (of size {\epsilon |V|/M+O(1)}), plus some remainder sets which can be aggregated into an exceptional component of size {O( \epsilon |V| )} (and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.

Remark 1 It is easy to verify that {F} needs to be growing exponentially in {J} in order for the above argument to work, which leads to tower-exponential bounds in the number of cells {M} in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying {F}, one basically obtains the strong regularity lemma first established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting {F(J) := J} essentially gives the weak regularity lemma of Frieze and Kannan.

Remark 2 If we specialise to a Cayley graph, in which {V = (V,+)} is a finite abelian group and {E = \{ (a,b): a-b \in A \}} for some (symmetric) subset {A} of {V}, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes {V_i} are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components {T_1, T_2, T_3} of {T}, representing high, medium, and low eigenvalues of {T}, then become a decomposition associated to high, medium, and low Fourier coefficients of {A}.

Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of a random Wigner matrix {M_n} (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues {\lambda_i(M_n)} with {\delta n \leq i \leq (1-\delta) i n} for some fixed {\delta>0}, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix {M_n} to have mean zero and variance {1/n}, then in the asymptotic limit {n \rightarrow \infty}, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution {\rho_{sc}(x)\ dx}, where

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

An essentially equivalent way of saying this is that for large {n}, we expect the {i^{th}} eigenvalue {\lambda_i(M_n)} of {M_n} to stay close to the classical location {\gamma_i \in [-2,2]}, defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

\displaystyle  \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)

for all {1 \leq i \leq n}.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape {O(n^{-c})} for some absolute constant {c>0}; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that {\lambda_i(M_n)} has variance comparable to {\frac{\log n}{n^2}} in the bulk, so that the optimal error term in (1) should be about {O(\log^{1/2} n/n)}. (One may think that if one wanted bounds on (1) that were uniform in {i}, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the {\lambda_i}; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order {O(\log^{1/2} n/n)}.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

\displaystyle  N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx

with exponentially high probability for intervals {I} in the bulk that were as short as {n^{-1+\epsilon}} for some {\epsilon>0}, where {N(I)} is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues {\lambda_i} (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor {\log^{O(\log\log n)} n}. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function {F(X_1,\ldots,X_n)} of many independent random variables {X_1,\ldots,X_n} with the same function {F(Y_1,\ldots,Y_n)} of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

\displaystyle  {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))

for various smooth test functions {\phi}, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, {\phi} was a bounded test function, which allowed one to get control of the bulk of the distribution of {F(X_1,\ldots,X_n)}, and in particular in controlling probabilities such as

\displaystyle  {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )

for various thresholds {a} and {b}, but did not give good control on the tail as the error terms tended to be polynomially decaying in {n} rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

\displaystyle  {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

for various moderately large {k} (e.g. of size comparable to {\log n}), obtaining results such as

\displaystyle  {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

after performing all the relevant exchanges. As such, one can then use large deviation estimates on {F(X_1,\ldots,X_n)} to deduce large deviation estimates on {F(Y_1,\ldots,Y_n)}.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents {(M_n-z)^{-1}} (and the closely related Stieltjes transform {s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

\displaystyle  \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta

for any energy level {E \in {\bf R}}, which can be verified from elementary calculus. (In practice, we would truncate {\eta} near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions {N((-\infty,E))}, which in turn can be used to deduce concentration results for individual eigenvalues {\lambda_i(M_n)} by some basic combinatorial manipulations.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

\displaystyle  \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)

of a Wigner random matrix {M_n}. More specifically, we consider {n \times n} Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal {\pm 1} on the diagonal, and {\pm \frac{1}{2} \pm \frac{i}{2}} off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

\displaystyle  \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx

(in the vague topology) where {\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}} is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region {\epsilon n \leq i \leq (1-\epsilon) n}, one almost surely has

\displaystyle  \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)

uniformly for in {i}, where the classical location {\gamma_i \in [-2,2]} of the (normalised) {i^{th}} eigenvalue {\frac{1}{\sqrt{n}} \lambda_i} is defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.

The bound (1) also holds in the edge case (by using the operator norm bound {\| M_n\|_{op} = (2+o(1)) \sqrt{n}}, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of {\sqrt{n}}. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of {\sqrt{n}/n = n^{-1/2}}. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing {\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))} in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale {n^{-1/2}} (and even at the slightly finer scale of {n^{-1/2-c}} for some absolute constant {c>0}) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of {n^{-1/2+\epsilon}} depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing {\lambda_{i+1}(M_n) - \lambda_i(M_n)} at the critical scale {n^{-1/2}}, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale {n^{-1/2}}, the distribution of eigenvalues {\lambda_i(M_n)} is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

\displaystyle  \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.

A standard moment method computation shows that the right hand side is equal to

\displaystyle  2n^3 + 2a n^2 + \ldots

where {a} is the fourth moment of the real part of the off-diagonal coefficients of {M_n}. In particular, a change in the fourth moment {a} by {O(1)} leads to a change in the expression {\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)} by {O(n^2)}. Thus, for a typical {i}, one expects {{\bf E}(\lambda_i(M_n)^4)} to shift by {O(n)}; since {\lambda_i(M_n) = O(\sqrt{n})} on the average, we thus expect {\lambda_i(M_n)} itself to shift by about {O(n^{-1/2})} by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for {\lambda_i(M_n)} that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that {\lambda_i(M_n)} was sharply concentrated around an interval of size {O(n^\epsilon)} around {\sqrt{n} \gamma_i} for any {\epsilon > 0} (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that {\lambda_i(M_n)} was also weakly concentrated around an interval of size {O(n^{-1/2+\epsilon})} around {\sqrt{n} \gamma_i}, in the sense that the probability that one was outside this interval was {O(n^{-c})} for some absolute constant {c>0}. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of {\lambda_i(M_n)} as {i} varied from {1} to {n} was of the size of {O(n^{-c})} for some absolute {c>0}.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual {\lambda_i(M_n)} is {O(n^{-c})} (without averaging in {i}). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean {{\bf E} \lambda_i(M_n)} is indeed sensitive to the fourth moment of the entries at the critical scale {n^{-1/2}}.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue {\lambda_i(M_n)} react to the available technology. To control the global behaviour of the eigenvalues (after averaging in {i}), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of {\lambda_i(M_n)} to an interval of size {O(n^{-1/2+\epsilon})}, but can only localise the mean to a much larger interval of size {O(n^{-c})}. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as {O(n^{-c})} for which all eigenvalues could deviate as far as {O(n^\epsilon)} from their expected location, instead of their typical deviation of {O(n^{-1/2})}. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than {n^{-c}} times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape {O(n^{-c})}. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n) of a Wigner matrix M_n = (\zeta_{ij})_{1 \leq i,j \leq n}, which we define to be a random Hermitian n \times n matrix whose upper-triangular entries \zeta_{ij}, 1 \leq i \leq j \leq n are independent (and whose strictly upper-triangular entries \zeta_{ij}, 1 \leq i < j \leq n are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

  1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries \zeta_{ij}, i<j are complex gaussian, and the diagonal entries \zeta_{ii} are real gaussians;
  2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
  3. The Bernoulli Ensemble, in which all entries take values \pm 1 (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues \lambda_1(M_n),\ldots,\lambda_n(M_n) for GUE is given by the explicit formula

\displaystyle C_n \prod_{1 \leq i<j \leq n} |\lambda_i-\lambda_j|^2 \exp( - \frac{1}{2n} (\lambda_1^2+\ldots+\lambda_n^2))\ d\lambda_1 \ldots d\lambda_n (0)

for some explicitly computable constant C_n on the orthant \{ \lambda_1 \leq \ldots \leq \lambda_n\} (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx (1)

as n \to \infty.  Actually, more precise statements are known for GUE; for instance, for 1 \leq i \leq n, the i^{th} eigenvalue \lambda_i(M_n) is known to equal

\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} ) (2)

with probability 1-o(1), where t(a) \in [-2,2] is the inverse cumulative distribution function of the semicircular law, thus

\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx.

Furthermore, the distribution of the normalised eigenvalue spacing \sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n)) is known; in the bulk region \varepsilon n \leq i \leq 1-\varepsilon n for fixed \varepsilon > 0, it converges as n \to \infty to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say i = n-O(1), the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose M_n is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of O(n^{-1+\delta}) for any \delta>0 rather than O(\log n/n)) and the sine kernel statistics for individual eigenvalue spacings \sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n)) (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, \log |\det M_n| = (1+o(1)) \log \sqrt{n!} with probability 1-o(1)).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)

Read the rest of this entry »

I was asked recently (in relation to my recent work with Van Vu on the spectral theory of random matrices) to explain some standard heuristics regarding how the eigenvalues \lambda_1,\ldots,\lambda_n of an n \times n matrix A behave under small perturbations.  These heuristics can be summarised as follows:

  1. For normal matrices (and in particular, unitary or self-adjoint matrices), eigenvalues are very stable under small perturbations.  For more general matrices, eigenvalues can become unstable if there is pseudospectrum present.
  2. For self-adjoint (Hermitian) matrices, eigenvalues that are too close together tend to repel quickly from each other under such perturbations.  For more general matrices, eigenvalues can either be repelled or be attracted to each other, depending on the type of perturbation.

In this post I would like to briefly explain why these heuristics are plausible.

Read the rest of this entry »

This is my second Milliman lecture, in which I talk about recent applications of ideas from additive combinatorics (and in particular, from the inverse Littlewood-Offord problem) to the theory of discrete random matrices.
Read the rest of this entry »

Van Vu and I have recently uploaded our joint paper, “Random matrices: the circular law“, submitted to Contributions to Discrete Mathematics. In this paper we come close to fully resolving the circular law conjecture regarding the eigenvalue distribution of random matrices, for arbitrary choices of coefficient distribution.

More precisely, suppose we have an n \times n matrix N_n for some large n, where each coefficient x_{ij} of N_n is an independent identically distributed copy of a single random variable x (possibly complex-valued). x could be continuous (e.g. a Gaussian) or discrete (e.g. a Bernoulli random variable, taking values +1 and -1 with equal probability). For simplicity, let us normalise x to have mean 0 and variance 1 (in particular, the second moment is finite). This matrix will not be self-adjoint or normal, but we still expect it to be diagonalisable, with n complex eigenvalues. Heuristic arguments suggest that these eigenvalues should mostly have magnitude O(\sqrt{n}); for instance, one can see this by observing that the Hilbert-Schmidt norm (a.k.a. the Frobenius norm) \hbox{tr} N_n^* N_n, which can be shown to dominate the sum of squares of the magnitudes of the eigenvalues, is of size comparable to n^2 on the average. Because of this, it is customary to normalise the matrix by 1/\sqrt{n}; thus let \lambda_1,\ldots,\lambda_n be the n complex eigenvalues of \frac{1}{\sqrt{n}} N_n, arranged in any order.

Numerical evidence (as seen for instance here) soon reveals that these n eigenvalues appear to distribute themselves uniformly in the unit circle \{ z \in {\Bbb C}: |z| \leq 1 \} in the limit n \to \infty. This phenomenon is known as the circular law. It can be made more precise; if we define the empirical spectral distribution \mu_n: {\Bbb R}^2 \to [0,1] to be the function

\mu_n(s,t) := \frac{1}{n} \# \{ 1 \leq k \leq n: \hbox{Re}(\lambda_k) \leq s; \hbox{Im}(\lambda_k) \leq t \}

then with probability 1, \mu_n should converge uniformly to the uniform distribution \mu_\infty of the unit circle, defined as

\mu_\infty(s,t) := \frac{1}{\pi} \hbox{mes}(\{ (x,y): |x|^2 + |y|^2 \leq 1; x \leq s; y \leq t \}.

This statement is known as the circular law conjecture. In the case when x is a complex Gaussian, this law was verified by Mehta (using an explicit formula of Ginibre for the joint density function of the eigenvalues in this case). A strategy for attacking the general case was then formulated by Girko, although a fully rigorous execution of that strategy was first achieved by Bai (and then improved slightly by Bai and Silverstein). They established the circular law under the assumption that x had slightly better than bounded second moment (i.e. {\Bbb E}|x|^{2+\delta} < \infty for some \delta > 0), but more importantly that the probability density function of x in the complex plane was bounded (in particular, this ruled out all discrete random variables, such as the Bernoulli random variable). The reason for this latter restriction was in order to control the event that the matrix N_n (or more precisely \frac{1}{\sqrt{n}} N_n - z I for various complex numbers z) becomes too ill-conditioned by having a very small least singular value.

In the last few years, work of Rudelson, of myself with Van Vu, and of Rudelson-Vershynin (building upon earlier work of Kahn, Komlos, and Szemerédi, and of Van and myself), have opened the way to control the condition number of random matrices even when the matrices are discrete, and so there have been a recent string of results using these techniques to extend the circular law to discrete settings. In particular, Gotze and Tikhomirov established the circular law for discrete random variables which were sub-Gaussian, which was then relaxed by Pan and Zhou to an assumption of bounded fourth moment. In our paper, we get very close to the ideal assumption of bounded second moment; we need {\Bbb E} |x|^2 \log^C(1+|x|) < \infty for some C > 16. (The power of 16 in the logarithm can certainly be improved, though our methods do not allow the logarithm to be removed entirely.)

The main new difficulty that arises when relaxing the moment condition so close to the optimal one is that one begins to lose control on the largest singular value of \frac{1}{\sqrt{n}} N_n, i.e. on the operator norm of \frac{1}{\sqrt{n}} N_n. Under high moment assumptions (e.g. fourth moment) one can keep this operator norm bounded with reasonable probability (especially after truncating away some exceptionally large elements), but when the moment conditions are loosened, one can only bound this operator norm by a quantity bounded polynomially in n, even after truncation. This in turn causes certain metric entropy computations to become significantly more delicate, as one has to reduce the scale \epsilon of the net below what one would ordinarily like to have.

Read the rest of this entry »

This problem lies in the highly interconnected interface between algebraic combinatorics (esp. the combinatorics of Young tableaux and related objects, including honeycombs and puzzles), algebraic geometry (particularly classical and quantum intersection theory and geometric invariant theory), linear algebra (additive and multiplicative, real and tropical), and the representation theory (classical, quantum, crystal, etc.) of classical groups. (Another open problem in this subject is to find a succinct and descriptive name for the field.) I myself haven’t actively worked in this area for several years, but I still find it a fascinating and beautiful subject. (With respect to the dichotomy between structure and randomness, this subject lies deep within the “structure” end of the spectrum.)

As mentioned above, the problems in this area can be approached from a variety of quite diverse perspectives, but here I will focus on the linear algebra perspective, which is perhaps the most accessible. About nine years ago, Allen Knutson and I introduced a combinatorial gadget, called a honeycomb, which among other things controlled the relationship between the eigenvalues of two arbitrary Hermitian matrices A, B, and the eigenvalues of their sum A+B; this was not the first such gadget that achieved this purpose, but it was a particularly convenient one for studying this problem, in particular it was used to resolve two conjectures in the subject, the saturation conjecture and the Horn conjecture. (These conjectures have since been proven by a variety of other methods.) There is a natural multiplicative version of these problems, which now relates the eigenvalues of two arbitrary unitary matrices U, V and the eigenvalues of their product UV; this led to the “quantum saturation” and “quantum Horn” conjectures, which were proven a couple years ago. However, the quantum analogue of a “honeycomb” remains a mystery; this is the main topic of the current post.

Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 3,330 other followers