You are currently browsing the tag archive for the ‘outliers’ tag.

I’ve just uploaded to the arXiv my paper “Outliers in the spectrum of iid matrices with bounded rank perturbations“, submitted to Probability Theory and Related Fields. This paper is concerned with outliers to the circular law for iid random matrices. Recall that if {X_n} is an {n \times n} matrix whose entries are iid complex random variables with mean zero and variance one, then the {n} complex eigenvalues of the normalised matrix {\frac{1}{\sqrt{n}} X_n} will almost surely be distributed according to the circular law distribution {\frac{1}{\pi} 1_{|z| \leq 1} d^2 z} in the limit {n \rightarrow \infty}. (See these lecture notes for further discussion of this law.)

The circular law is also stable under bounded rank perturbations: if {C_n} is a deterministic rank {O(1)} matrix of polynomial size (i.e. of operator norm {O(n^{O(1)})}), then the circular law also holds for {\frac{1}{\sqrt{n}} X_n + C_n} (this is proven in a paper of myself, Van Vu, and Manjunath Krisnhapur). In particular, the bulk of the eigenvalues (i.e. {(1-o(1)) n} of the {n} eigenvalues) will lie inside the unit disk {\{ z \in {\bf C}: |z| \leq 1 \}}.

However, this leaves open the possibility for one or more outlier eigenvalues that lie significantly outside the unit disk; the arguments in the paper cited above give some upper bound on the number of such eigenvalues (of the form {O(n^{1-c})} for some absolute constant {c>0}) but does not exclude them entirely. And indeed, numerical data shows that such outliers can exist for certain bounded rank perturbations.

In this paper, some results are given as to when outliers exist, and how they are distributed. The easiest case is of course when there is no bounded rank perturbation: {C_n=0}. In that case, an old result of Bai and Yin and of Geman shows that the spectral radius of {\frac{1}{\sqrt{n}} X_n} is almost surely {1+o(1)}, thus all eigenvalues will be contained in a {o(1)} neighbourhood of the unit disk, and so there are no significant outliers. The proof is based on the moment method.

Now we consider a bounded rank perturbation {C_n} which is nonzero, but which has a bounded operator norm: {\|C_n\|_{op} = O(1)}. In this case, it turns out that the matrix {\frac{1}{\sqrt{n}} X_n + C_n} will have outliers if the deterministic component {C_n} has outliers. More specifically (and under the technical hypothesis that the entries of {X_n} have bounded fourth moment), if {\lambda} is an eigenvalue of {C_n} with {|\lambda| > 1}, then (for {n} large enough), {\frac{1}{\sqrt{n}} X_n + C_n} will almost surely have an eigenvalue at {\lambda+o(1)}, and furthermore these will be the only outlier eigenvalues of {\frac{1}{\sqrt{n}} X_n + C_n}.

Thus, for instance, adding a bounded nilpotent low rank matrix to {\frac{1}{\sqrt{n}} X_n} will not create any outliers, because the nilpotent matrix only has eigenvalues at zero. On the other hand, adding a bounded Hermitian low rank matrix will create outliers as soon as this matrix has an operator norm greater than {1}.

When I first thought about this problem (which was communicated to me by Larry Abbott), I believed that it was quite difficult, because I knew that the eigenvalues of non-Hermitian matrices were quite unstable with respect to general perturbations (as discussed in this previous blog post), and that there were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the unit disk in which the resolvent is actually well controlled and so there is no pseudospectrum present to cause instability. This was pointed out to me by Alice Guionnet at an AIM workshop last week, after I had posed the above question during an open problems session. Furthermore, at the same workshop, Percy Deift emphasised the point that the basic determinantal identity

\displaystyle  \det(1 + AB) = \det(1 + BA) \ \ \ \ \ (1)

for {n \times k} matrices {A} and {k \times n} matrices {B} was a particularly useful identity in random matrix theory, as it converted problems about large ({n \times n}) matrices into problems about small ({k \times k}) matrices, which was particularly convenient in the regime when {n \rightarrow \infty} and {k} was fixed. (Percy was speaking in the context of invariant ensembles, but the point is in fact more general than this.)

From this, it turned out to be a relatively simple manner to transform what appeared to be an intractable {n \times n} matrix problem into quite a well-behaved {k \times k} matrix problem for bounded {k}. Specifically, suppose that {C_n} had rank {k}, so that one can factor {C_n = A_n B_n} for some (deterministic) {n \times k} matrix {A_n} and {k \times n} matrix {B_n}. To find an eigenvalue {z} of {\frac{1}{\sqrt{n}} X_n + C_n}, one has to solve the characteristic polynomial equation

\displaystyle  \det( \frac{1}{\sqrt{n}} X_n + A_n B_n - z ) = 0.

This is an {n \times n} determinantal equation, which looks difficult to control analytically. But we can manipulate it using (1). If we make the assumption that {z} is outside the spectrum of {\frac{1}{\sqrt{n}} X_n} (which we can do as long as {z} is well away from the unit disk, as the unperturbed matrix {\frac{1}{\sqrt{n}} X_n} has no outliers), we can divide by {\frac{1}{\sqrt{n}} X_n - z} to arrive at

\displaystyle  \det( 1 + (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n B_n ) = 0.

Now we apply the crucial identity (1) to rearrange this as

\displaystyle  \det( 1 + B_n (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n ) = 0.

The crucial point is that this is now an equation involving only a {k \times k} determinant, rather than an {n \times n} one, and is thus much easier to solve. The situation is particularly simple for rank one perturbations

\displaystyle  \frac{1}{\sqrt{n}} X_n + u_n v_n^*

in which case the eigenvalue equation is now just a scalar equation

\displaystyle  1 + \langle (\frac{1}{\sqrt{n}} X_n-z)^{-1} u_n, v_n \rangle = 0

that involves what is basically a single coefficient of the resolvent {(\frac{1}{\sqrt{n}} X_n-z)^{-1}}. (It is also an instructive exercise to derive this eigenvalue equation directly, rather than through (1).) There is by now a very well-developed theory for how to control such coefficients (particularly for {z} in the exterior of the unit disk, in which case such basic tools as Neumann series work just fine); in particular, one has precise enough control on these coefficients to obtain the result on outliers mentioned above.

The same method can handle some other bounded rank perturbations. One basic example comes from looking at iid matrices with a non-zero mean {\mu} and variance {1}; this can be modeled by {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \phi_n^*} where {\phi_n} is the unit vector {\phi_n := \frac{1}{\sqrt{n}} (1,\ldots,1)^*}. Here, the bounded rank perturbation {\mu \sqrt{n} \phi_n \phi_n^*} has a large operator norm (equal to {|\mu| \sqrt{n}}), so the previous result does not directly apply. Nevertheless, the self-adjoint nature of the perturbation has a stabilising effect, and I was able to show that there is still only one outlier, and that it is at the expected location of {\mu \sqrt{n}+o(1)}.

If one moves away from the case of self-adjoint perturbations, though, the situation changes. Let us now consider a matrix of the form {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*}, where {\psi_n} is a randomised version of {\phi_n}, e.g. {\psi_n := \frac{1}{\sqrt{n}} (\pm 1, \ldots, \pm 1)^*}, where the {\pm 1} are iid Bernoulli signs; such models were proposed recently by Rajan and Abbott as a model for neural networks in which some nodes are excitatory (and give columns with positive mean) and some are inhibitory (leading to columns with negative mean). Despite the superficial similarity with the previous example, the outlier behaviour is now quite different. Instead of having one extremely large outlier (of size {\sim\sqrt{n}}) at an essentially deterministic location, we now have a number of eigenvalues of size {O(1)}, scattered according to a random process. Indeed, (in the case when the entries of {X_n} were real and bounded) I was able to show that the outlier point process converged (in the sense of converging {k}-point correlation functions) to the zeroes of a random Laurent series

\displaystyle  g(z) = 1 - \mu \sum_{j=0}^\infty \frac{g_j}{z^{j+1}}

where {g_0,g_1,g_2,\ldots \equiv N(0,1)} are iid real Gaussians. This is basically because the coefficients of the resolvent {(\frac{1}{\sqrt{n}} X_n - zI)^{-1}} have a Neumann series whose coefficients enjoy a central limit theorem.

On the other hand, as already observed numerically (and rigorously, in the gaussian case) by Rajan and Abbott, if one projects such matrices to have row sum zero, then the outliers all disappear. This can be explained by another appeal to (1); this projection amounts to right-multiplying {\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*} by the projection matrix {P} to the zero-sum vectors. But by (1), the non-zero eigenvalues of the resulting matrix {(\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)P} are the same as those for {P (\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)}. Since {P} annihilates {\phi_n}, we thus see that in this case the bounded rank perturbation plays no role, and the question reduces to obtaining a circular law with no outliers for {P \frac{1}{\sqrt{n}} X_n}. As it turns out, this can be done by invoking the machinery of Van Vu and myself that we used to prove the circular law for various random matrix models.

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,964 other followers