You are currently browsing the tag archive for the ‘iid matrices’ tag.

I’ve just uploaded to the arXiv my paper “Outliers in the spectrum of iid matrices with bounded rank perturbations“, submitted to Probability Theory and Related Fields. This paper is concerned with outliers to the circular law for iid random matrices. Recall that if {X_n} is an {n \times n} matrix whose entries are iid complex random variables with mean zero and variance one, then the {n} complex eigenvalues of the normalised matrix {\frac{1}{\sqrt{n}} X_n} will almost surely be distributed according to the circular law distribution {\frac{1}{\pi} 1_{|z| \leq 1} d^2 z} in the limit {n \rightarrow \infty}. (See these lecture notes for further discussion of this law.)

The circular law is also stable under bounded rank perturbations: if {C_n} is a deterministic rank {O(1)} matrix of polynomial size (i.e. of operator norm {O(n^{O(1)})}), then the circular law also holds for {\frac{1}{\sqrt{n}} X_n + C_n} (this is proven in a paper of myself, Van Vu, and Manjunath Krisnhapur). In particular, the bulk of the eigenvalues (i.e. {(1-o(1)) n} of the {n} eigenvalues) will lie inside the unit disk {\{ z \in {\bf C}: |z| \leq 1 \}}.

However, this leaves open the possibility for one or more outlier eigenvalues that lie significantly outside the unit disk; the arguments in the paper cited above give some upper bound on the number of such eigenvalues (of the form {O(n^{1-c})} for some absolute constant {c>0}) but does not exclude them entirely. And indeed, numerical data shows that such outliers can exist for certain bounded rank perturbations.

In this paper, some results are given as to when outliers exist, and how they are distributed. The easiest case is of course when there is no bounded rank perturbation: {C_n=0}. In that case, an old result of Bai and Yin and of Geman shows that the spectral radius of {\frac{1}{\sqrt{n}} X_n} is almost surely {1+o(1)}, thus all eigenvalues will be contained in a {o(1)} neighbourhood of the unit disk, and so there are no significant outliers. The proof is based on the moment method.

Now we consider a bounded rank perturbation {C_n} which is nonzero, but which has a bounded operator norm: {\|C_n\|_{op} = O(1)}. In this case, it turns out that the matrix {\frac{1}{\sqrt{n}} X_n + C_n} will have outliers if the deterministic component {C_n} has outliers. More specifically (and under the technical hypothesis that the entries of {X_n} have bounded fourth moment), if {\lambda} is an eigenvalue of {C_n} with {|\lambda| > 1}, then (for {n} large enough), {\frac{1}{\sqrt{n}} X_n + C_n} will almost surely have an eigenvalue at {\lambda+o(1)}, and furthermore these will be the only outlier eigenvalues of {\frac{1}{\sqrt{n}} X_n + C_n}.

Thus, for instance, adding a bounded nilpotent low rank matrix to {\frac{1}{\sqrt{n}} X_n} will not create any outliers, because the nilpotent matrix only has eigenvalues at zero. On the other hand, adding a bounded Hermitian low rank matrix will create outliers as soon as this matrix has an operator norm greater than {1}.

When I first thought about this problem (which was communicated to me by Larry Abbott), I believed that it was quite difficult, because I knew that the eigenvalues of non-Hermitian matrices were quite unstable with respect to general perturbations (as discussed in this previous blog post), and that there were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the unit disk in which the resolvent is actually well controlled and so there is no pseudospectrum present to cause instability. This was pointed out to me by Alice Guionnet at an AIM workshop last week, after I had posed the above question during an open problems session. Furthermore, at the same workshop, Percy Deift emphasised the point that the basic determinantal identity

\displaystyle  \det(1 + AB) = \det(1 + BA) \ \ \ \ \ (1)

for {n \times k} matrices {A} and {k \times n} matrices {B} was a particularly useful identity in random matrix theory, as it converted problems about large ({n \times n}) matrices into problems about small ({k \times k}) matrices, which was particularly convenient in the regime when {n \rightarrow \infty} and {k} was fixed. (Percy was speaking in the context of invariant ensembles, but the point is in fact more general than this.)

From this, it turned out to be a relatively simple manner to transform what appeared to be an intractable {n \times n} matrix problem into quite a well-behaved {k \times k} matrix problem for bounded {k}. Specifically, suppose that {C_n} had rank {k}, so that one can factor {C_n = A_n B_n} for some (deterministic) {n \times k} matrix {A_n} and {k \times n} matrix {B_n}. To find an eigenvalue {z} of {\frac{1}{\sqrt{n}} X_n + C_n}, one has to solve the characteristic polynomial equation

\displaystyle  \det( \frac{1}{\sqrt{n}} X_n + A_n B_n - z ) = 0.

This is an {n \times n} determinantal equation, which looks difficult to control analytically. But we can manipulate it using (1). If we make the assumption that {z} is outside the spectrum of {\frac{1}{\sqrt{n}} X_n} (which we can do as long as {z} is well away from the unit disk, as the unperturbed matrix {\frac{1}{\sqrt{n}} X_n} has no outliers), we can divide by {\frac{1}{\sqrt{n}} X_n - z} to arrive at

\displaystyle  \det( 1 + (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n B_n ) = 0.

Now we apply the crucial identity (1) to rearrange this as

\displaystyle  \det( 1 + B_n (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n ) = 0.

The crucial point is that this is now an equation involving only a {k \times k} determinant, rather than an {n \times n} one, and is thus much easier to solve. The situation is particularly simple for rank one perturbations

\displaystyle  \frac{1}{\sqrt{n}} X_n + u_n v_n^*

in which case the eigenvalue equation is now just a scalar equation

\displaystyle  1 + \langle (\frac{1}{\sqrt{n}} X_n-z)^{-1} u_n, v_n \rangle = 0

that involves what is basically a single coefficient of the resolvent {(\frac{1}{\sqrt{n}} X_n-z)^{-1}}. (It is also an instructive exercise to derive this eigenvalue equation directly, rather than through (1).) There is by now a very well-developed theory for how to control such coefficients (particularly for {z} in the exterior of the unit disk, in which case such basic tools as Neumann series work just fine); in particular, one has precise enough control on these coefficients to obtain the result on outliers mentioned above.

The same method can handle some other bounded rank perturbations. One basic example comes from looking at iid matrices with a non-zero mean {\mu} and variance {1}; this can be modeled by {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \phi_n^*} where {\phi_n} is the unit vector {\phi_n := \frac{1}{\sqrt{n}} (1,\ldots,1)^*}. Here, the bounded rank perturbation {\mu \sqrt{n} \phi_n \phi_n^*} has a large operator norm (equal to {|\mu| \sqrt{n}}), so the previous result does not directly apply. Nevertheless, the self-adjoint nature of the perturbation has a stabilising effect, and I was able to show that there is still only one outlier, and that it is at the expected location of {\mu \sqrt{n}+o(1)}.

If one moves away from the case of self-adjoint perturbations, though, the situation changes. Let us now consider a matrix of the form {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*}, where {\psi_n} is a randomised version of {\phi_n}, e.g. {\psi_n := \frac{1}{\sqrt{n}} (\pm 1, \ldots, \pm 1)^*}, where the {\pm 1} are iid Bernoulli signs; such models were proposed recently by Rajan and Abbott as a model for neural networks in which some nodes are excitatory (and give columns with positive mean) and some are inhibitory (leading to columns with negative mean). Despite the superficial similarity with the previous example, the outlier behaviour is now quite different. Instead of having one extremely large outlier (of size {\sim\sqrt{n}}) at an essentially deterministic location, we now have a number of eigenvalues of size {O(1)}, scattered according to a random process. Indeed, (in the case when the entries of {X_n} were real and bounded) I was able to show that the outlier point process converged (in the sense of converging {k}-point correlation functions) to the zeroes of a random Laurent series

\displaystyle  g(z) = 1 - \mu \sum_{j=0}^\infty \frac{g_j}{z^{j+1}}

where {g_0,g_1,g_2,\ldots \equiv N(0,1)} are iid real Gaussians. This is basically because the coefficients of the resolvent {(\frac{1}{\sqrt{n}} X_n - zI)^{-1}} have a Neumann series whose coefficients enjoy a central limit theorem.

On the other hand, as already observed numerically (and rigorously, in the gaussian case) by Rajan and Abbott, if one projects such matrices to have row sum zero, then the outliers all disappear. This can be explained by another appeal to (1); this projection amounts to right-multiplying {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*} by the projection matrix {P} to the zero-sum vectors. But by (1), the non-zero eigenvalues of the resulting matrix {(\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)P} are the same as those for {P (\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)}. Since {P} annihilates {\phi_n}, we thus see that in this case the bounded rank perturbation plays no role, and the question reduces to obtaining a circular law with no outliers for {P \frac{1}{\sqrt{n}} X_n}. As it turns out, this can be done by invoking the machinery of Van Vu and myself that we used to prove the circular law for various random matrix models.

Now that we have developed the basic probabilistic tools that we will need, we now turn to the main subject of this course, namely the study of random matrices. There are many random matrix models (aka matrix ensembles) of interest – far too many to all be discussed in a single course. We will thus focus on just a few simple models. First of all, we shall restrict attention to square matrices {M = (\xi_{ij})_{1 \leq i,j \leq n}}, where {n} is a (large) integer and the {\xi_{ij}} are real or complex random variables. (One can certainly study rectangular matrices as well, but for simplicity we will only look at the square case.) Then, we shall restrict to three main models:

  • Iid matrix ensembles, in which the coefficients {\xi_{ij}} are iid random variables with a single distribution {\xi_{ij} \equiv \xi}. We will often normalise {\xi} to have mean zero and unit variance. Examples of iid models include the Bernouli ensemble (aka random sign matrices) in which the {\xi_{ij}} are signed Bernoulli variables, the real gaussian matrix ensemble in which {\xi_{ij} \equiv N(0,1)_{\bf R}}, and the complex gaussian matrix ensemble in which {\xi_{ij} \equiv N(0,1)_{\bf C}}.
  • Symmetric Wigner matrix ensembles, in which the upper triangular coefficients {\xi_{ij}}, {j \geq i} are jointly independent and real, but the lower triangular coefficients {\xi_{ij}}, {j<i} are constrained to equal their transposes: {\xi_{ij}=\xi_{ji}}. Thus {M} by construction is always a real symmetric matrix. Typically, the strictly upper triangular coefficients will be iid, as will the diagonal coefficients, but the two classes of coefficients may have a different distribution. One example here is the symmetric Bernoulli ensemble, in which both the strictly upper triangular and the diagonal entries are signed Bernoulli variables; another important example is the Gaussian Orthogonal Ensemble (GOE), in which the upper triangular entries have distribution {N(0,1)_{\bf R}} and the diagonal entries have distribution {N(0,2)_{\bf R}}. (We will explain the reason for this discrepancy later.)
  • Hermitian Wigner matrix ensembles, in which the upper triangular coefficients are jointly independent, with the diagonal entries being real and the strictly upper triangular entries complex, and the lower triangular coefficients {\xi_{ij}}, {j<i} are constrained to equal their adjoints: {\xi_{ij} = \overline{\xi_{ji}}}. Thus {M} by construction is always a Hermitian matrix. This class of ensembles contains the symmetric Wigner ensembles as a subclass. Another very important example is the Gaussian Unitary Ensemble (GUE), in which all off-diagional entries have distribution {N(0,1)_{\bf C}}, but the diagonal entries have distribution {N(0,1)_{\bf R}}.

Given a matrix ensemble {M}, there are many statistics of {M} that one may wish to consider, e.g. the eigenvalues or singular values of {M}, the trace and determinant, etc. In these notes we will focus on a basic statistic, namely the operator norm

\displaystyle  \| M \|_{op} := \sup_{x \in {\bf C}^n: |x|=1} |Mx| \ \ \ \ \ (1)

of the matrix {M}. This is an interesting quantity in its own right, but also serves as a basic upper bound on many other quantities. (For instance, {\|M\|_{op}} is also the largest singular value {\sigma_1(M)} of {M} and thus dominates the other singular values; similarly, all eigenvalues {\lambda_i(M)} of {M} clearly have magnitude at most {\|M\|_{op}}.) Because of this, it is particularly important to get good upper tail bounds

\displaystyle  {\bf P}( \|M\|_{op} \geq \lambda ) \leq \ldots

on this quantity, for various thresholds {\lambda}. (Lower tail bounds are also of interest, of course; for instance, they give us confidence that the upper tail bounds are sharp.) Also, as we shall see, the problem of upper bounding {\|M\|_{op}} can be viewed as a non-commutative analogue of upper bounding the quantity {|S_n|} studied in Notes 1. (The analogue of the central limit theorem in Notes 2 is the Wigner semi-circular law, which will be studied in the next set of notes.)

An {n \times n} matrix consisting entirely of {1}s has an operator norm of exactly {n}, as can for instance be seen from the Cauchy-Schwarz inequality. More generally, any matrix whose entries are all uniformly {O(1)} will have an operator norm of {O(n)} (which can again be seen from Cauchy-Schwarz, or alternatively from Schur’s test, or from a computation of the Frobenius norm). However, this argument does not take advantage of possible cancellations in {M}. Indeed, from analogy with concentration of measure, when the entries of the matrix {M} are independent, bounded and have mean zero, we expect the operator norm to be of size {O(\sqrt{n})} rather than {O(n)}. We shall see shortly that this intuition is indeed correct. (One can see, though, that the mean zero hypothesis is important; from the triangle inequality we see that if we add the all-ones matrix (for instance) to a random matrix with mean zero, to obtain a random matrix whose coefficients all have mean {1}, then at least one of the two random matrices necessarily has operator norm at least {n/2}.)

As mentioned before, there is an analogy here with the concentration of measure phenomenon, and many of the tools used in the latter (e.g. the moment method) will also appear here. (Indeed, we will be able to use some of the concentration inequalities from Notes 1 directly to help control {\|M\|_{op}} and related quantities.) Similarly, just as many of the tools from concentration of measure could be adapted to help prove the central limit theorem, several the tools seen here will be of use in deriving the semi-circular law.

The most advanced knowledge we have on the operator norm is given by the Tracy-Widom law, which not only tells us where the operator norm is concentrated in (it turns out, for instance, that for a Wigner matrix (with some additional technical assumptions), it is concentrated in the range {[2\sqrt{n} - O(n^{-1/6}), 2\sqrt{n} + O(n^{-1/6})]}), but what its distribution in that range is. While the methods in this set of notes can eventually be pushed to establish this result, this is far from trivial, and will only be briefly discussed here. (We may return to the Tracy-Widom law later in this course, though.)

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 4,052 other followers