I was asked recently (in relation to my recent work with Van Vu on the spectral theory of random matrices) to explain some standard heuristics regarding how the eigenvalues \lambda_1,\ldots,\lambda_n of an n \times n matrix A behave under small perturbations.  These heuristics can be summarised as follows:

  1. For normal matrices (and in particular, unitary or self-adjoint matrices), eigenvalues are very stable under small perturbations.  For more general matrices, eigenvalues can become unstable if there is pseudospectrum present.
  2. For self-adjoint (Hermitian) matrices, eigenvalues that are too close together tend to repel quickly from each other under such perturbations.  For more general matrices, eigenvalues can either be repelled or be attracted to each other, depending on the type of perturbation.

In this post I would like to briefly explain why these heuristics are plausible.

— Pseudospectrum —

As any student of linear algebra knows, the spectrum \sigma(A) of a n \times n matrix A consists of all the complex numbers \lambda such that A-\lambda I fails to be invertible, thus there exists a non-zero vector v such that (A-\lambda I)v is zero.   This is a finite set, consisting of at most n points.  But there is also an important set containing the spectrum (and which, at times, can be much larger than that spectrum), which is the pseudospectrum of the matrix A.  Unlike the spectrum, which is canonically defined, there is more than one definition of the pseudospectrum; also, this concept depends on an additional parameter, a small number \varepsilon > 0.  We will define the pseudospectrum (or more precisely, the \varepsilon-pseudospectrum) \sigma_\varepsilon(A) to be the set of all the complex numbers \lambda such that A-\lambda I has least singular value at most \varepsilon, or equivalently that there exists a unit vector v such that | (A-\lambda I) v | \leq \varepsilon.  (Another equivalent definition is that the \varepsilon-pseudospectrum consists of the spectrum, together with those complex numbers \lambda for which the resolvent (A-\lambda)^{-1} has operator norm at least 1/\varepsilon.)

The significance of the pseudospectrum \sigma_\varepsilon(A) is that it describes where the spectrum \sigma(A) can go to under small perturbations.  Indeed, if \lambda lies in the pseudospectrum \sigma_\varepsilon(A), so that there exists a unit vector v whose image w := (A-\lambda I) v has magnitude at most \varepsilon, then we see that

(A - w v^* - \lambda I) v = (A-\lambda I) v - w v^* v = w - w = 0

and so \lambda lies in the spectrum \sigma(A-wv^*) of the perturbation A-wv^* of A.  Note that the operator norm of wv^* is at most \varepsilon.

Conversely, if \lambda does not lie in the pseudospectrum \sigma_\varepsilon(A), and A+E is a small perturbation of A (with E having operator norm at most \varepsilon), then for any unit vector v, one has

|(A+E - \lambda I) v| \geq |(A-\lambda I)v| - |Ev| > \varepsilon - \varepsilon = 0

by the triangle inequality, and so \lambda cannot lie in the spectrum of A+E.

Thus, if the pseudospectrum is tightly clustered around the spectrum, the spectrum is stable under small perturbations; but if the pseudospectrum is widely dispersed, then the spectrum becomes unstable.

No matter what A is, the pseudospectrum \sigma_\varepsilon(A) always contains the \varepsilon-neighbourhood of the spectrum \sigma(A).  Indeed, if v is a unit eigenvector with eigenvalue \lambda \in \sigma(A), then (A-\lambda I) v = 0, which implies that |(A-\lambda' I) v| = |\lambda - \lambda'| \leq \varepsilon for any \lambda' in the \varepsilon-neighbourhood of \lambda, and the claim follows.

Conversely, when A is normal, \sigma_\varepsilon(A) consists only of the \varepsilon-neighbourhood of \sigma(A).  This is easiest to see by using the spectral theorem to diagonalise A and then computing everything explicitly.  In particular, we conclude that if we perturb a normal matrix by a (possibly non-normal) perturbation of operator norm at most \varepsilon, then the spectrum moves by at most \varepsilon.

In the non-normal case, things can be quite different.  A good example is provided by the shift matrix

\displaystyle U := \begin{pmatrix} 0 & 1 & 0 & \ldots & 0 \\ 0 & 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & 1 \\ 0 & 0 & 0 & \ldots & 0 \end{pmatrix}.

This matrix is nilpotent: U^n=0.  As such, the only eigenvalue is zero.  But observe that for any complex number \lambda,

\displaystyle (U -\lambda I) \begin{pmatrix} 1 \\ \lambda \\ \ldots \\ \lambda^{n-1} \\ \lambda^n \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ \ldots \\ 0 \\ -\lambda^{n+1} \end{pmatrix}.

From this and a little computation, we see that if |\lambda| < 1, then \lambda will lie in the O( \frac{|\lambda|^{n+1}}{1-|\lambda|} )-pseudospectrum of U.  For fixed \varepsilon, we thus see that \sigma_\varepsilon(U) fills up the unit disk in the high dimensional limit n \to \infty.  (The pseudospectrum will not venture far beyond the unit disk, as the operator norm of U is 1.)  And indeed, it is easy to perturb U so that its spectrum moves far away from the origin.  For instance, observe that the perturbation

\displaystyle U + E := \begin{pmatrix} 0 & 1 & 0 & \ldots & 0 \\ 0 & 0 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & 1 \\ \varepsilon & 0 & 0 & \ldots & 0 \end{pmatrix}

of U has a characteristic polynomial of (-\lambda)^n - (-1)^n \varepsilon (or note that (U+E)^n = \varepsilon I, which is basically the same fact thanks to the Cayley-Hamilton theorem) and so has eigenvalues equal to the n^{th} roots of \varepsilon; for fixed \varepsilon and n tending to infinity, this spectrum becomes asymptotically uniformly distributed on the unit circle, rather than at the origin.

[Update, Nov 8: Much more on the theory of pseudospectra can be found at this web site; thanks to Nick Trefethen for the link.]

— Spectral dynamics —

The pseudospectrum tells us, roughly speaking, how far the spectrum \sigma(A) of a matrix A can move with respect to a small perturbation, but does not tell us the direction in which the spectrum moves.  For this, it is convenient to use the language of calculus: we suppose that A=A(t) varies smoothly with respect to some time parameter t, and would like to “differentiate” the spectrum \sigma(A) with respect to t.

Since it is a little unclear what it means to differentiate a set, let us work instead with the eigenvalues \lambda_j = \lambda_j(t) of A = A(t).  Note that generically (e.g. for almost all A), the eigenvalues will be distinct.  (Proof: the eigenvalues are distinct when the characteristic polynomial has no repeated roots, or equivalently when the resultant of the characteristic polynomial with its derivative is non-zero.  This is clearly a Zariski-open condition; since the condition is obeyed at least once, it is thus Zariski-dense.)  So it is not unreasonable to assume that for all t in some open interval, the \lambda_j(t) are distinct; an application of the implicit function theorem then allows one to make the \lambda_j(t) smooth in t.  Similarly, we can make the eigenvectors v_j = v_j(t) vary smoothly in t.  (There is some freedom to multiply each eigenvector by a scalar, but this freedom will cancel itself out in the end, as we are ultimately interested only in the eigenvalues rather than the eigenvectors.)

The eigenvectors v_1,\ldots,v_n form a basis of {\Bbb C}^n.  Let w_1,\ldots,w_n be the dual basis, thus w_j^* v_k = \delta_{jk} for all 1 \leq j,k \leq n, and so we have the reproducing formula

u = \sum_{j=1}^n (w_j^* u) v_j (1)

for all vectors u.  Combining this with the eigenvector equations

A v_k = \lambda_k v_k (2)

we obtain the adjoint eigenvector equations

w_k^* A = \lambda_k w_k^*. (3)

Next, we differentiate (2) using the product rule to obtain

\dot A v_k + A \dot v_k = \dot \lambda_k v_k + \lambda_k \dot v_k. (4)

Taking the inner product of this with the dual vector w_k, and using (3) to cancel some terms, we obtain the first variation formula for eigenvalues:

\dot \lambda_k = w_k^* \dot A v_k. (5)

Note that if A is normal, then we can take the eigenbasis v_k to be orthonormal, in which case the dual basis w_k is identical to v_k.  In particular we see that |\dot \lambda_k| \leq \|\dot A\|_{op}; the infinitesimal change of each eigenvalue does not exceed the infinitesimal size of the perturbation.  This is consistent with the stability of the spectrum for normal operators mentioned in the previous section.

[Side remark: if A evolves by the Lax pair equation \dot A = [A,P] for some matrix P = P(t), then w_k^* \dot A v_k = w_k^* \lambda P v_k - w_k^* P \lambda v_k = 0, and so from (5) we see that the spectrum of A is invariant in time.  This fact underpins the inverse scattering method for solving integrable systems.]

Now we look at how the eigenvectors vary. Taking the inner product instead with a dual vector w_j for j \neq k, we obtain

w_j^* \dot A v_k + (\lambda_j - \lambda_k) w_j^* \dot v_k = 0;

applying (1) we conclude a first variation formula for the eigenvectors v_k, namely that

\displaystyle \dot v_k = \sum_{j \neq k} \frac{w_j^* \dot A v_k}{\lambda_k - \lambda_j} v_j + c_k v_k (6)

for some scalar c_k (the presence of this term reflects the freedom to multiply v_k by a scalar).  Similar considerations for the adjoint give

\displaystyle \dot w_k^* = \sum_{j \neq k} \frac{w_k^* \dot A v_j}{\lambda_k - \lambda_j} w_j^* - c_k w_k^* (6′)

(here we use the derivative of the identity w_k^* v_k = 1 to get the correct multiple of w_k^* on the right.)

We can use (5), (6), (6′) to obtain a second variation formula for the eigenvalues.  Indeed, by differentiating (5) we obtain

\ddot \lambda_k = \dot w_k^* \dot A v_k + w_k^* \ddot A v_k + w_k^* \dot A \dot v_k;

applying (6), (6′) we conclude the second variation formula

\displaystyle \ddot \lambda_k = w_k^* \ddot A v_k + 2\sum_{j \neq k} \frac{ (w_k^* \dot A v_j) (w_j^* \dot A v_k) }{\lambda_k - \lambda_j}. (7)

Now suppose that A is self-adjoint, so as before we can take v_k=w_k to be orthonormal. The above formula then becomes

\displaystyle \ddot \lambda_k = v_k^* \ddot A v_k + 2\sum_{j \neq k} \frac{ |v_k^* \dot A v_j|^2 }{\lambda_k - \lambda_j}.

One can view the terms on the right-hand side here as various “forces” acting on the eigenvalue \lambda_k; the acceleration of the original matrix A provides one such force, while all the other eigenvalues \lambda_j provide a repulsive force (and note, as with Newton’s third law, that the force that \lambda_j exerts on \lambda_k is equal and opposite to the force that \lambda_k exerts on \lambda_j; this is consistent with the trace formula \sum_{k=1}^n \lambda_k = \hbox{tr}(A)).  The closer two eigenvalues are to each other, the stronger the repulsive force becomes.

When the matrix A is not self-adjoint, then the interaction between \lambda_j and \lambda_k can be either repulsive or attractive.  Consider for instance the matrices

\begin{pmatrix} 1 & t \\ -t & -1 \end{pmatrix}.

The eigenvalues of this matrix are \pm \sqrt{1-t^2} for -1 < t < 1 – so we see that they are attracted to each other as t evolves, until the matrix becomes degenerate at t = \pm 1.  In contrast, the self-adjoint matrices

\begin{pmatrix} 1 & t \\ t & -1 \end{pmatrix}.

have eigenvalues \pm \sqrt{1+t^2}, which repel each other as t evolves.

The repulsion effect of eigenvalues is also consistent with the smallness of the set of matrices with repeated eigenvalues. Consider for instance the space of Hermitian n \times n matrices, which has real dimension n^2.   The subset of Hermitian matrices with distinct eigenvalues can be described by a collection of n orthogonal (complex) one-dimensional eigenspaces (which can be computed to have 2(n-1) + 2(n-2) + \ldots + 2 = n(n-1) degrees of freedom) plus n real eigenvalues (for an additional n degrees of freedom), thus the set of matrices with distinct eigenvalues has full dimension n^2.

Now consider the space of matrices with one repeated eigenvalue.  This can be described by n-2 orthogonal complex one-dimensional eigenspaces, plus a complex two-dimensional orthogonal complement (which has 2(n-1) + 2(n-2)+\ldots+4 = n(n-1)-2 degrees of freedom) plus n-1 real eigenvalues, thus the set of matrices with repeated eigenvalues only has dimension n^2-3.  Thus it is in fact very rare for eigenvalues to actually collide, which helps explain why there must be a repulsion effect in the first place.

An example can help illustrate this phenomenon.  Consider the one-parameter family of Hermitian matrices

\begin{pmatrix} t & 0 \\ 0 & -t \end{pmatrix}.

The eigenvalues of this matrix at time t are of course t and -t, which cross over each other when t changes sign.  Now consider instead the Hermitian perturbation

\begin{pmatrix} t & \varepsilon \\ \varepsilon & -t \end{pmatrix}

for some small \varepsilon > 0.  The eigenvalues are now \sqrt{t^2+\varepsilon^2} and -\sqrt{t^2+\varepsilon^2}; they come close to each other as t approaches 0, but then “bounce” off of each other due to the repulsion effect.