You are currently browsing the tag archive for the ‘median’ tag.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

\displaystyle  \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)

of a Wigner random matrix {M_n}. More specifically, we consider {n \times n} Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal {\pm 1} on the diagonal, and {\pm \frac{1}{2} \pm \frac{i}{2}} off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

\displaystyle  \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx

(in the vague topology) where {\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}} is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region {\epsilon n \leq i \leq (1-\epsilon) n}, one almost surely has

\displaystyle  \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)

uniformly for in {i}, where the classical location {\gamma_i \in [-2,2]} of the (normalised) {i^{th}} eigenvalue {\frac{1}{\sqrt{n}} \lambda_i} is defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.

The bound (1) also holds in the edge case (by using the operator norm bound {\| M_n\|_{op} = (2+o(1)) \sqrt{n}}, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of {\sqrt{n}}. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of {\sqrt{n}/n = n^{-1/2}}. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing {\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))} in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale {n^{-1/2}} (and even at the slightly finer scale of {n^{-1/2-c}} for some absolute constant {c>0}) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of {n^{-1/2+\epsilon}} depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing {\lambda_{i+1}(M_n) - \lambda_i(M_n)} at the critical scale {n^{-1/2}}, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale {n^{-1/2}}, the distribution of eigenvalues {\lambda_i(M_n)} is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

\displaystyle  \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.

A standard moment method computation shows that the right hand side is equal to

\displaystyle  2n^3 + 2a n^2 + \ldots

where {a} is the fourth moment of the real part of the off-diagonal coefficients of {M_n}. In particular, a change in the fourth moment {a} by {O(1)} leads to a change in the expression {\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)} by {O(n^2)}. Thus, for a typical {i}, one expects {{\bf E}(\lambda_i(M_n)^4)} to shift by {O(n)}; since {\lambda_i(M_n) = O(\sqrt{n})} on the average, we thus expect {\lambda_i(M_n)} itself to shift by about {O(n^{-1/2})} by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for {\lambda_i(M_n)} that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that {\lambda_i(M_n)} was sharply concentrated around an interval of size {O(n^\epsilon)} around {\sqrt{n} \gamma_i} for any {\epsilon > 0} (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that {\lambda_i(M_n)} was also weakly concentrated around an interval of size {O(n^{-1/2+\epsilon})} around {\sqrt{n} \gamma_i}, in the sense that the probability that one was outside this interval was {O(n^{-c})} for some absolute constant {c>0}. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of {\lambda_i(M_n)} as {i} varied from {1} to {n} was of the size of {O(n^{-c})} for some absolute {c>0}.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual {\lambda_i(M_n)} is {O(n^{-c})} (without averaging in {i}). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean {{\bf E} \lambda_i(M_n)} is indeed sensitive to the fourth moment of the entries at the critical scale {n^{-1/2}}.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue {\lambda_i(M_n)} react to the available technology. To control the global behaviour of the eigenvalues (after averaging in {i}), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of {\lambda_i(M_n)} to an interval of size {O(n^{-1/2+\epsilon})}, but can only localise the mean to a much larger interval of size {O(n^{-c})}. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as {O(n^{-c})} for which all eigenvalues could deviate as far as {O(n^\epsilon)} from their expected location, instead of their typical deviation of {O(n^{-1/2})}. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than {n^{-c}} times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape {O(n^{-c})}. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

I am currently at Princeton for the conference “The power of Analysis” honouring Charlie Fefferman‘s 60th birthday. I myself gave a talk at this conference entitled “Recent progress on the Kakeya conjecture”; I plan to post a version of this talk on this blog shortly.

But one nice thing about attending these sorts of conferences is that one can also learn some neat mathematical facts, and I wanted to show two such small gems here; neither is particularly deep, but I found both of them cute. The first one, which I learned from my former student Soonsik Kwon, is a unified way to view the mean, median, and mode of a probability distribution {\mu} on the real line. If one assumes that this is a continuous distribution {\mu = f(x)\ dx} for some smooth, rapidly decreasing function {f: {\mathbb R} \rightarrow {\mathbb R}^+} with {\int_{\mathbb R} f(x)\ dx = 1}, then the mean is the value of {x_0} that minimises the second moment

\displaystyle  \int_{\mathbb R} |x-x_0|^2 f(x)\ dx,

the median is the value of {x_0} that minimises the first moment

\displaystyle  \int_{\mathbb R} |x-x_0| f(x)\ dx,

and the mode is the value of {x_0} that maximises the “pseudo-negative first moment”

\displaystyle  \int_{\mathbb R} \delta(x-x_0) f(x)\ dx.

(Note that the Dirac delta function {\delta(x-x_0)} has the same scaling as {|x-x_0|^{-1}}, hence my terminology “pseudo-negative first moment”.)

The other fact, which I learned from my former classmate Diego Córdoba (and used in a joint paper of Diego with Antonio Córdoba), is a pointwise inequality

\displaystyle  |\nabla|^\alpha ( f^2 )(x) \leq 2 f(x) |\nabla|^\alpha f(x)

for the fractional differentiation operators {|\nabla|^\alpha} applied to a sufficiently nice real-valued function {f: {\mathbb R}^d \rightarrow {\mathbb R}} (e.g. Schwartz class will do), in any dimension {d} and for any {0 \leq \alpha \leq 1}; this should be compared with the product rule {\nabla (f^2 ) = 2 f \nabla f}.

The proof is as follows. By a limiting argument we may assume that {0 < \alpha < 1}. In this case, there is a formula

\displaystyle  |\nabla|^\alpha f(x) = c(\alpha) \int_{{\mathbb R}^d} \frac{f(x)-f(y)}{|x-y|^{d+\alpha}}\ dy

for some explicit constant {c(\alpha) > 0} (this can be seen by computations similar to those in my recent lecture notes on distributions, or by analytically continuing such computations; see also Stein’s “Singular integrals and differentiability properties of functions”). Using this formula, one soon sees that

\displaystyle  2 f(x) |\nabla|^\alpha f(x) - |\nabla|^\alpha ( f^2 )(x) = c(\alpha) \int_{{\mathbb R}^d} \frac{|f(x)-f(y)|^2}{|x-y|^{d+\alpha}}\ dy

and the claim follows.

Archives