Let {A} be a Hermitian {n \times n} matrix. By the spectral theorem for Hermitian matrices (which, for sake of completeness, we prove below), one can diagonalise {A} using a sequence

\displaystyle \lambda_1(A) \geq \ldots \geq \lambda_n(A)

of {n} real eigenvalues, together with an orthonormal basis of eigenvectors {u_1(A),\ldots,u_n(A) \in {\mathbb C}^n}. (The eigenvalues are uniquely determined by {A}, but the eigenvectors have a little ambiguity to them, particularly if there are repeated eigenvalues; for instance, one could multiply each eigenvector by a complex phase {e^{i\theta}}. In these notes we are arranging eigenvalues in descending order; of course, one can also arrange eigenvalues in increasing order, which causes some slight notational changes in the results below.) The set {\{\lambda_1(A),\ldots,\lambda_n(A)\}} is known as the spectrum of {A}.

A basic question in linear algebra asks the extent to which the eigenvalues {\lambda_1(A),\ldots,\lambda_n(A)} and {\lambda_1(B),\ldots,\lambda_n(B)} of two Hermitian matrices {A, B} constrains the eigenvalues {\lambda_1(A+B),\ldots,\lambda_n(A+B)} of the sum. For instance, the linearity of trace

\displaystyle \hbox{tr}(A+B) = \hbox{tr}(A)+\hbox{tr}(B),

when expressed in terms of eigenvalues, gives the trace constraint

\displaystyle \lambda_1(A+B)+\ldots+\lambda_n(A+B) = \lambda_1(A)+\ldots+\lambda_n(A) \ \ \ \ \ (1)

\displaystyle +\lambda_1(B)+\ldots+\lambda_n(B);

the identity

\displaystyle \lambda_1(A) = \sup_{|v|=1} v^* Av \ \ \ \ \ (2)

(together with the counterparts for {B} and {A+B}) gives the inequality

\displaystyle \lambda_1(A+B) \leq \lambda_1(A) + \lambda_1(B); \ \ \ \ \ (3)

and so forth.

The complete answer to this problem is a fascinating one, requiring a strangely recursive description (once known as Horn’s conjecture, which is now solved), and connected to a large number of other fields of mathematics, such as geometric invariant theory, intersection theory, and the combinatorics of a certain gadget known as a “honeycomb”. See for instance my survey with Allen Knutson on this topic some years ago.

In typical applications to random matrices, one of the matrices (say, {B}) is “small” in some sense, so that {A+B} is a perturbation of {A}. In this case, one does not need the full strength of the above theory, and instead rely on a simple aspect of it pointed out by Helmke and Rosenthal and by Totaro, which generates several of the eigenvalue inequalities relating {A}, {B}, and {C}, of which (1) and (3) are examples. (Actually, this method eventually generates all of the eigenvalue inequalities, but this is a non-trivial fact to prove.) These eigenvalue inequalities can mostly be deduced from a number of minimax characterisations of eigenvalues (of which (2) is a typical example), together with some basic facts about intersections of subspaces. Examples include the Weyl inequalities

\displaystyle \lambda_{i+j-1}(A+B) \leq \lambda_i(A) + \lambda_j(B), \ \ \ \ \ (4)

valid whenever {i,j \geq 1} and {i+j-1 \leq n}, and the Ky Fan inequality

\displaystyle \lambda_1(A+B)+\ldots+\lambda_k(A+B) \leq

\displaystyle \lambda_1(A)+\ldots+\lambda_k(A) + \lambda_1(B)+\ldots+\lambda_k(B). \ \ \ \ \ (5)

One consequence of these inequalities is that the spectrum of a Hermitian matrix is stable with respect to small perturbations.

We will also establish some closely related inequalities concerning the relationships between the eigenvalues of a matrix, and the eigenvalues of its minors.

Many of the inequalities here have analogues for the singular values of non-Hermitian matrices (which is consistent with the discussion near Exercise 16 of Notes 3). However, the situation is markedly different when dealing with eigenvalues of non-Hermitian matrices; here, the spectrum can be far more unstable, if pseudospectrum is present. Because of this, the theory of the eigenvalues of a random non-Hermitian matrix requires an additional ingredient, namely upper bounds on the prevalence of pseudospectrum, which after recentering the matrix is basically equivalent to establishing lower bounds on least singular values. We will discuss this point in more detail in later notes.

We will work primarily here with Hermitian matrices, which can be viewed as self-adjoint transformations on complex vector spaces such as {{\mathbb C}^n}. One can of course specialise the discussion to real symmetric matrices, in which case one can restrict these complex vector spaces to their real counterparts {{\mathbb R}^n}. The specialisation of the complex theory below to the real case is straightforward and is left to the interested reader.

— 1. Proof of spectral theorem —

To prove the spectral theorem, it is convenient to work more abstractly, in the context of self-adjoint operators on finite-dimensional Hilbert spaces:

Theorem 1 (Spectral theorem) Let {V} be a finite-dimensional complex Hilbert space of some dimension {n}, and let {T: V \rightarrow V} be a self-adjoint operator. Then there exists an orthonormal basis {v_1,\ldots,v_n \in V} of {V} and eigenvalues {\lambda_1,\ldots,\lambda_n \in {\mathbb R}} such that {T v_i = \lambda_i v_i} for all {1 \leq i \leq n}.

The spectral theorem as stated in the introduction then follows by specialising to the case {V = {\mathbb C}^n} and ordering the eigenvalues.

Proof: We induct on the dimension {n}. The claim is vacuous for {n=0}, so suppose that {n \geq 1} and that the claim has already been proven for {n=1}.

Let {v} be a unit vector in {{\mathbb C}^n} (thus {v^* v = 1}) that maximises the form {\hbox{Re} v^* Tv}; this maximum exists by compactness. By the method of Lagrange multipliers, {v} is a critical point of {\hbox{Re} v^* Tv - \lambda v^* v} for some {\lambda \in {\mathbb R}}. Differentiating in an arbitrary direction {w \in {\mathbb C}^n}, we conclude that

\displaystyle \hbox{Re} ( v^* Tw + w^* Tv - \lambda v^* w - \lambda w^* v ) = 0;

this simplifies using self-adjointness to

\displaystyle \hbox{Re} ( w^* (Tv - \lambda v) ) = 0.

Since {w \in {\mathbb C}^n} was arbitrary, we conclude that {Tv = \lambda v}, thus {v} is a unit eigenvector of {T}. By self-adjointness, this implies that the orthogonal complement {v^\perp := \{ w \in V: v^* w = 0 \}} of {v} is preserved by {T}. Restricting {T} to this lower-dimensional subspace and applying the induction hypothesis, we can find an orthonormal basis of eigenvectors of {T} on {v^\perp}. Adjoining the new unit vector {v} to the orthonormal basis, we obtain the claim. \Box

Suppose we have a self-adjoint transformation {A: {\mathbb C}^n \rightarrow {\mathbb C}^n}, which of course can be identified with a Hermitian matrix. Using the orthogonal eigenbasis provided by the spectral theorem, we can perform an orthonormal change of variables to set that eigenbasis to be the standard basis {e_1,\ldots,e_n}, so that the matrix of {A} becomes diagonal. This is very useful when dealing with just a single matrix {A} – for instance, it makes the task of computing functions of {A}, such as {A^k} or {\exp(tA)}, much easier. However, when one has several Hermitian matrices in play (e.g. {A, B, C}), then it is usually not possible to standardise all the eigenbases simultaneously (i.e. to simultaneously diagonalise all the matrices), except when the matrices all commute. Nevertheless one can still normalise one of the eigenbases to be the standard basis, and this is still useful for several applications, as we shall soon see.

Exercise 1 Suppose that the eigenvalues {\lambda_1(A) > \ldots > \lambda_n(A)} of an {n \times n} Hermitian matrix are distinct. Show that the associated eigenbasis {u_1(A),\ldots,u_n(A)} is unique up to rotating each individual eigenvector {u_j(A)} by a complex phase {e^{i\theta_j}}. In particular, the spectral projections {P_j(A) := u_j(A) u_j(A)^*} are unique. What happens when there is eigenvalue multiplicity?

— 2. Minimax formulae —

The {i^{th}} eigenvalue functional {A \mapsto \lambda_i(A)} is not a linear functional (except in dimension one). It is not even a convex functional (except when {i=1}) or a concave functional (except when {i=n}). However, it is the next best thing, namely it is a minimax expression of linear functionals. (Note that a convex functional is the same thing as a max of linear functionals, while a concave functional is the same thing as a min of linear functionals.) More precisely, we have

Theorem 2 (Courant-Fischer min-max theorem) Let {A} be an {n \times n} Hermitian matrix. Then we have

\displaystyle \lambda_i(A) = \sup_{\hbox{dim}(V)=i} \inf_{v \in V: |v|=1} v^* A v \ \ \ \ \ (6)

and

\displaystyle \lambda_i(A) = \inf_{\hbox{dim}(V)=n-i+1} \sup_{v \in V: |v|=1} v^* A v \ \ \ \ \ (7)

for all {1 \leq i \leq n}, where {V} ranges over all subspaces of {{\mathbb C}^n} with the indicated dimension.

Proof: It suffices to prove (6), as (7) follows by replacing {A} by {-A} (noting that {\lambda_i(-A) = -\lambda_{n-i+1}(A)}).

We first verify the {i=1} case, i.e. (2). By the spectral theorem, we can assume that {A} has the standard eigenbasis {e_1,\ldots,e_n}, in which case we have

\displaystyle v^* A v = \sum_{i=1}^n \lambda_i |v_i|^2 \ \ \ \ \ (8)

whenever {v = (v_1,\ldots,v_n)}. The claim (2) is then easily verified.

To prove the general case, we may again assume {A} has the standard eigenbasis. By considering the space {V} spanned by {e_1,\ldots,e_i}, we easily see the inequality

\displaystyle \lambda_i(A) \leq \sup_{\hbox{dim}(V)=i} \inf_{v \in V: |v|=1} v^* A v

so we only need to prove the reverse inequality. In other words, for every {i}-dimensional subspace {V} of {{\mathbb C}^n}, we have to show that {V} contains a unit vector {v} such that

\displaystyle v^* A v \leq \lambda_i(A).

Let {W} be the space spanned by {e_i,\ldots,e_n}. This space has codimension {i-1}, so it must have non-trivial intersection with {V}. If we let {v} be a unit vector in {V \cap W}, the claim then follows from (8). \Box

Remark 1 By homogeneity, one can replace the restriction {|v|=1} with {v \neq 0} provided that one replaces the quadratic form {v^* A v} with the Rayleigh quotient {v^* A v / v^* v}.

A closely related formula is as follows. Given an {n \times n} Hermitian matrix {A} and an {m}-dimensional subspace {V} of {{\mathbb C}^n}, we define the partial trace {\hbox{tr}(A\downharpoonright_V)} to be the expression

\displaystyle \hbox{tr}(A\downharpoonright_V) := \sum_{i=1}^m v_i^* A v_i

where {v_1,\ldots,v_m} is any orthonormal basis of {V}. It is easy to see that this expression is independent of the choice of orthonormal basis, and so the partial trace is well-defined.

Proposition 3 (Extremal partial trace) Let {A} be an {n \times n} Hermitian matrix. Then for any {1 \leq k \leq n}, one has

\displaystyle \lambda_1(A)+\ldots+\lambda_k(A) = \sup_{\hbox{dim}(V)=k} \hbox{tr}(A \downharpoonright_V)

and

\displaystyle \lambda_{n-k+1}(A)+\ldots+\lambda_n(A) = \inf_{\hbox{dim}(V)=k} \hbox{tr}(A \downharpoonright_V).

As a corollary, we see that {A \mapsto \lambda_1(A)+\ldots+\lambda_k(A)} is a convex function, and {A \mapsto \lambda_{n-k+1}(A)+\ldots+\lambda_n(A)} is a concave function.

Proof: Again, by symmetry it suffices to prove the first formula. As before, we may assume without loss of generality that {A} has the standard eigenbasis. By selecting {V} to be the span of {e_1,\ldots,e_k} we have the inequality

\displaystyle \lambda_1(A)+\ldots+\lambda_k(A) \leq \sup_{\hbox{dim}(V)=k} \hbox{tr}(A \downharpoonright_V)

so it suffices to prove the reverse inequality. For this we induct on dimension. If {V} has dimension {k}, then it has a {k-1}-dimensional subspace {V'} that is contained in the span of {e_2,\ldots,e_n}. By the induction hypothesis, we have

\displaystyle \lambda_2(A)+\ldots+\lambda_k(A) \geq \hbox{tr}(A \downharpoonright_{V'}).

On the other hand, if {v} is a unit vector in the orthogonal complement of {V'} in {V}, we see from (2) that

\displaystyle \lambda_1(A) \geq v^* A v.

Adding the two inequalities we obtain the claim. \Box

Specialising Proposition 3 to the case when {V} is a coordinate subspace (i.e. the span of {k} of the basis vectors {e_1,\ldots,e_n}), we conclude the Schur-Horn inequalities

\displaystyle \lambda_{n-k+1}(A)+\ldots+\lambda_n(A) \leq

\displaystyle a_{i_1 i_1} + \ldots + a_{i_k i_k} \leq \lambda_1(A)+\ldots +\lambda_k(A) \ \ \ \ \ (9)

for any {1 \leq i_1 < \ldots < i_k \leq n}, where {a_{11},a_{22},\ldots,a_{nn}} are the diagonal entries of {A}.

Exercise 2 Show that the inequalities (9) are equivalent to the assertion that the diagonal entries {\hbox{diag}(A) = (a_{11},a_{22},\ldots,a_{nn})} lies in the permutahedron of {\lambda_1(A),\ldots,\lambda_n(A)}, defined as the convex hull of the {n!} permutations of {(\lambda_1(A),\ldots,\lambda_n(A))} in {{\mathbb R}^n}.

Remark 2 It is a theorem of Schur and Horn that these are the complete set of inequalities connecting the diagonal entries {\hbox{diag}(A) = (a_{11},a_{22},\ldots,a_{nn})} of a Hermitian matrix to its spectrum. To put it another way, the image of any coadjoint orbit {{\mathcal O}_A := \{ UAU^*: U \in U(n) \}} of a matrix {A} with a given spectrum {\lambda_1,\ldots,\lambda_n} under the diagonal map {\hbox{diag}: A \mapsto \hbox{diag}(A)} is the permutahedron of {\lambda_1,\ldots,\lambda_n}. Note that the vertices of this permutahedron can be attained by considering the diagonal matrices inside this coadjoint orbit, whose entries are then a permutation of the eigenvalues. One can interpret this diagonal map {\hbox{diag}} as the moment map associated with the conjugation action of the standard maximal torus of {U(n)} (i.e. the diagonal unitary matrices) on the coadjoint orbit. When viewed in this fashion, the Schur-Horn theorem can be viewed as the special case of the more general Atiyah convexity theorem (also proven independently by Guillemin and Sternberg) in symplectic geometry. Indeed, the topic of eigenvalues of Hermitian matrices turns out to be quite profitably viewed as a question in symplectic geometry (and also in algebraic geometry, particularly when viewed through the machinery of geometric invariant theory).

There is a simultaneous generalisation of Theorem 2 and Proposition 3:

Exercise 3 (Wielandt minimax formula) Let {1 \leq i_1 < \ldots < i_k \leq n} be integers. Define a partial flag to be a nested collection {V_1 \subset \ldots \subset V_k} of subspaces of {{\mathbb C}^n} such that {\hbox{dim}(V_j) = i_j} for all {1 \leq j \leq k}. Define the associated Schubert variety {X(V_1,\ldots,V_k)} to be the collection of all {k}-dimensional subspaces {W} such that {\hbox{dim}(W \cap V_j) \geq j}. Show that for any {n \times n} matrix {A},

\displaystyle \lambda_{i_1}(A) + \ldots + \lambda_{i_k}(A) = \sup_{V_1,\ldots,V_k} \inf_{W \in X(V_1,\ldots,V_k)} \hbox{tr}(A\downharpoonright_W).

— 3. Eigenvalue inequalities —

Using the above minimax formulae, we can now quickly prove a variety of eigenvalue inequalities. The basic idea is to exploit the linearity relationship

\displaystyle v^* (A+B) v = v^* A v + v^* B v \ \ \ \ \ (10)

for any unit vector {v}, and more generally

\displaystyle \hbox{tr}((A+B) \downharpoonright_V ) = \hbox{tr}(A \downharpoonright_V ) +\hbox{tr}(B \downharpoonright_V ) \ \ \ \ \ (11)

for any subspace {V}.

For instance, as mentioned before, the inequality (3) follows immediately from (2) and (10). Similarly, for the Ky Fan inequality (5), one observes from (11) and Proposition 3 that

\displaystyle \hbox{tr}((A+B) \downharpoonright_W ) \leq \hbox{tr}(A \downharpoonright_W ) + \lambda_1(B) + \ldots + \lambda_k(B)

for any {k}-dimensional subspace {W}. Substituting this into Proposition 3 gives the claim. If one uses Exercise 3 instead of Proposition 3, one obtains the more general Lidskii inequality

\displaystyle \lambda_{i_1}(A+B)+\ldots+\lambda_{i_k}(A+B) \leq

\displaystyle \lambda_{i_1}(A)+\ldots+\lambda_{i_k}(A) + \lambda_1(B)+\ldots+\lambda_k(B) \ \ \ \ \ (12)

for any {1 \leq i_1 < \ldots < i_k \leq n}.

In a similar spirit, using the inequality

\displaystyle |v^* B v| \leq \|B\|_{op} = \max( |\lambda_1(B)|, |\lambda_n(B)| )

for unit vectors {v}, combined with (10) and (6), we obtain the eigenvalue stability inequality

\displaystyle |\lambda_i(A+B) - \lambda_i(A)| \leq \|B\|_{op}, \ \ \ \ \ (13)

thus the spectrum of {A+B} is close to that of {A} if {B} is small in operator norm. In particular, we see that the map {A \mapsto \lambda_i(A)} is Lipschitz continuous on the space of Hermitian matrices, for fixed {1 \leq i \leq n}.

More generally, suppose one wants to establish the Weyl inequality (4). From (6) that it suffices to show that every {i+j-1}-dimensional subspace {V} contains a unit vector {v} such that

\displaystyle v^*(A+B)v \leq \lambda_i(A) + \lambda_j(B).

But from (6), one can find a subspace {U} of codimension {i-1} such that {v^* A v \leq \lambda_i(A)} for all unit vectors {v} in {U}, and a subspace {W} of codimension {j-1} such that {v^* B v \leq \lambda_j(B)} for all unit vectors {v} in {W}. The intersection {U \cap W} has codimension at most {i+j-2} and so has a nontrivial intersection with {V}; and the claim follows.

Remark 3 More generally, one can generate an eigenvalue inequality whenever the intersection numbers of three Schubert varieties of compatible dimensions is non-zero; see the paper of Helmke and Rosenthal. In fact, this generates a complete set of inequalities; this is a result of Klyachko. One can in fact restrict attention to those varieties whose intersection number is exactly one; this is a result of Knutson, Woodward, and myself. Finally, in those cases, the fact that the intersection is one can be proven by entirely elementary means (based on the standard inequalities relating the dimension of two subspaces {V, W} to their intersection {V \cap W} and sum {V + W}); this is a result of Bercovici, Collins, Dykema, Li, and Timotin. As a consequence, the methods in this section can, in principle, be used to derive all possible eigenvalue inequalities for sums of Hermitian matrices.

Exercise 4 Verify the inequalities (12) and (4) by hand in the case when {A} and {B} commute (and are thus simultaneously diagonalisable), without the use of minimax formulae.

Exercise 5 Establish the dual Lidskii inequality

\displaystyle \lambda_{i_1}(A+B)+\ldots+\lambda_{i_k}(A+B) \geq \lambda_{i_1}(A)+\ldots+\lambda_{i_k}(A)

\displaystyle + \lambda_{n-k+1}(B)+\ldots+\lambda_n(B)

for any {1 \leq i_1 < \ldots < i_k \leq n} and the dual Weyl inequality

\displaystyle \lambda_{i+j-n}(A+B) \geq \lambda_i(A) + \lambda_j(B)

whenever {1 \leq i,j,i+j-n \leq n}.

Exercise 6 Use the Lidskii inequality to establish the more general inequality

\displaystyle \sum_{i=1}^n c_i \lambda_i(A+B) \leq \sum_{i=1}^n c_i \lambda_i(A) + \sum_{i=1}^n c^*_i \lambda_i(B)

whenever {c_1,\ldots,c_n \geq 0}, and {c_1^* \geq \ldots \geq c^*_n \geq 0} is the decreasing rearrangement of {c_1,\ldots,c_n}. (Hint: express {c_i} as the integral of {{\bf I}(c_i \geq \lambda)} as {\lambda} runs from {0} to infinity. For each fixed {\lambda}, apply (12).) Combine this with Hölder’s inequality to conclude the {p}-Weilandt-Hoffman inequality

\displaystyle \| (\lambda_i(A+B) - \lambda_i(A))_{i=1}^n \|_{\ell^p_n} \leq \|B\|_{S^p} \ \ \ \ \ (14)

for any {1 \leq p \leq \infty}, where

\displaystyle \| (a_i)_{i=1}^n \|_{\ell^p_n} := (\sum_{i=1}^n |a_i|^p)^{1/p}

is the usual {\ell^p} norm (with the usual convention that {\| (a_i)_{i=1}^n \|_{\ell^\infty_n} := \sup_{1 \leq i \leq p} |a_i|}), and

\displaystyle \|B\|_{S^p} := \| (\lambda_i(B))_{i=1}^n \|_{\ell^p_n} \ \ \ \ \ (15)

is the {p}-Schatten norm of {B}.

Exercise 7 Show that the {p}-Schatten norms are indeed a norm on the space of Hermitian matrices for every {1 \leq p \leq \infty}.

Exercise 8 Show that for any {1 \leq p \leq \infty} and any Hermitian matrix {A = (a_{ij})_{1 \leq i,j \leq n}}, one has

\displaystyle \| (a_{ii})_{i=1}^n \|_{\ell^p_n} \leq \| A \|_{S^p}. \ \ \ \ \ (16)

Exercise 9 Establish the Hölder inequality

\displaystyle |\hbox{tr}(AB)| \leq \|A\|_{S^p} \|B\|_{S^{p'}}

whenever {1 \leq p,p' \leq \infty} with {1/p + 1/p' = 1}, and {A, B} are {n \times n} Hermitian matrices. (Hint: Diagonalise one of the matrices and use the preceding exercise.)

The most important {p}-Schatten norms are the {\infty}-Schatten norm {\|A\|_{S^\infty} = \|A\|_{op}}, which is just the operator norm, and the {2}-Schatten norm {\|A\|_{S^2} = (\sum_{i=1}^n \lambda_i(A)^2)^{1/2}}, which is also the Frobenius norm (or Hilbert-Schmidt norm)

\displaystyle \|A\|_{S^2} = \|A\|_F := \hbox{tr}(AA^*)^{1/2} = (\sum_{i=1}^n \sum_{j=1}^n |a_{ij}|^2)^{1/2}

where {a_{ij}} are the coeffiicents of {A}. (The {1}-Schatten norm {S^1}, also known as the nuclear norm or trace class norm, is important in a number of applications, such as matrix completion, but will not be used often in this course.) Thus we see that the {p=2} case of the Weilandt-Hoffman inequality can be written as

\displaystyle \sum_{i=1}^n |\lambda_i(A+B) - \lambda_i(A)|^2 \leq \|B\|_F^2. \ \ \ \ \ (17)

We will give an alternate proof of this inequality, based on eigenvalue deformation, in the next section.

— 4. Eigenvalue deformation —

From the Weyl inequality (13), we know that the eigenvalue maps {A \mapsto \lambda_i(A)} are Lipschitz continuous on Hermitian matrices (and thus also on real symmetric matrices). It turns out that we can obtain better regularity, provided that we avoid repeated eigenvalues. Fortunately, repeated eigenvalues are rare:

Exercise 10 (Dimension count) Suppose that {n \geq 2}. Show that the space of Hermitian matrices with at least one repeated eigenvalue has codimension {3} in the space of all Hermitian matrices, and the space of real symmetric matrices with at least one repeated eigenvalue has codimension {2} in the space of all real symmetric matrices. (When {n=1}, repeated eigenvalues of course do not occur.)

Let us say that a Hermitian matrix has simple spectrum if it has no repeated eigenvalues. We thus see from the above exercise and (13) that the set of Hermitian matrices with simple spectrum forms an open dense set in the space of all Hermitian matrices, and similarly for real symmetric matrices; thus simple spectrum is the generic behaviour of such matrices. Indeed, the unexpectedly high codimension of the non-simple matrices (naively, one would expect a codimension {1} set for a collision between, say, {\lambda_i(A)} and {\lambda_{i+1}(A)}) suggests a repulsion phenomenon: because it is unexpectedly rare for eigenvalues to be equal, there must be some “force” that “repels” eigenvalues of Hermitian (and to a lesser extent, real symmetric) matrices from getting too close to each other. We now develop some machinery to make this more precise.

We first observe that when {A} has simple spectrum, the zeroes of the characteristic polynomial {\lambda \mapsto \det(A - \lambda I)} are simple (i.e. the polynomial has nonzero derivartive at those zeroes). From this and the inverse function theorem, we see that each of the eigenvalue maps {A \mapsto \lambda_i(A)} are smooth on the region where {A} has simple spectrum. Because the eigenvectors {u_i(A)} are determined (up to phase) by the equations {(A - \lambda_i(A) I) u_i(A) = 0} and {u_i(A)^* u_i(A) = 1}, another application of the inverse function theorem tells us that we can (locally) select the maps {A \mapsto u_i(A)} to also be smooth. (There may be topological obstructions to smoothly selecting these vectors globally, but this will not concern us here as we will be performing a local analysis only. In later notes, we will in fact not work with the {u_i(A)} at all due to their phase ambiguity, and work instead with the spectral projections {P_i(A) := u_i(A) u_i(A)^*}, which do not have this ambiguity.)

Now suppose that {A = A(t)} depends smoothly on a time variable {t}, so that (when {A} has simple spectrum) the eigenvalues {\lambda_i(t) = \lambda_i(A(t))} and eigenvectors {u_i(t) = u_i(A(t))} also depend smoothly on {t}. We can then differentiate the equations

\displaystyle A u_i = \lambda_i u_i \ \ \ \ \ (18)

and

\displaystyle u_i^* u_i = 1 \ \ \ \ \ (19)

to obtain various equations of motion for {\lambda_i} and {u_i} in terms of the derivatives of {A}.

Let’s see how this works. Taking first derivatives of (18), (19) using the product rule, we obtain

\displaystyle \dot A u_i + A \dot u_i = \dot \lambda_i u_i + \lambda_i \dot u_i \ \ \ \ \ (20)

and

\displaystyle \dot u_i^* u_i + u_i^* \dot u_i = 0. \ \ \ \ \ (21)

The equation (21) simplifies to {{\mathrm Re} \dot u_i^* u_i = 0}, thus {\dot u_i} is orthogonal to {u_i}. Taking (real) inner products of (20) with {u_i}, we conclude the Hadamard first variation formula

\displaystyle \dot \lambda_i = u_i^* \dot A u_i. \ \ \ \ \ (22)

This can already be used to give alternate proofs of various eigenvalue identities. For instance, If we apply this to {A(t) := A + tB}, we see that

\displaystyle \frac{d}{dt} \lambda_i(A+tB) = u_i(A+tB)^* B u_i(A+tB)

whenever {A+tB} has simple spectrum. The right-hand side can be bounded in magnitude by {\|B\|_{op}}, and so we see that the map {t \mapsto \lambda_i(A+tB)} is Lipschitz with norm {\|B\|_{op}} whenever {A+tB} has simple spectrum, which happens for generic {A, B} (and all {t}) by Exercise 10. By the fundamental theorem of calculus, we thus conclude (13).

Exercise 11 Use a similar argument to the one above to establish (17) without using minimax formulae or Lidskii’s inequality.

Exercise 12 Use a similar argument to the one above to deduce Lidskii’s inequality (12) from Proposition 3 rather than Exercise 3.

One can also compute the second derivative of eigenvalues:

Exercise 13 Suppose that {A = A(t)} depends smoothly on {t}. By differentiating (20), (21) twice, establish the Hadamard second variation formula

\displaystyle \frac{d^2}{dt^2} \lambda_k = u_k^* \ddot A u_k + 2 \sum_{j \neq k} \frac{ |u_j^* \dot A u_k|^2 }{\lambda_k - \lambda_j} \ \ \ \ \ (23)

whenever {A} has simple spectrum and {1 \leq k \leq n}.

If one interprets the second derivative of the eigenvalues as being proportional to a “force” on those eigenvalues (in analogy with Newton’s second law), (23) is asserting that each eigenvalue {\lambda_j} “repels” the other eigenvalues {\lambda_k} by exerting a force that is inversely proportional to their separation (and also proportional to the square of the matrix coefficient of {\dot A} in the eigenbasis). See this earlier blog post for more discussion.

Remark 4 In the proof of the four moment theorem of Van Vu and myself, which we will discuss in a subsequent lecture, we will also need the variation formulae for the third, fourth, and fifth derivatives of the eigenvalues (the first four derivatives match up with the four moments mentioned in the theorem, and the fifth derivative is needed to control error terms). Fortunately, we do not need the precise formulae for these derivatives (which, as one can imagine, are quite complicated), but only their general form, and in particular an upper bound for these derivatives in terms of more easily computable quantities.

— 5. Minors —

In the previous sections, we perturbed {n \times n} Hermitian matrices {A = A_n} by adding a (small) {n \times n} Hermitian correction matrix {B} to them to form a new {n \times n} Hermitian matrix {A+B}. Another important way to perturb a matrix is to pass to a principal minor, for instance to the top left {n-1 \times n-1} minor {A_{n-1}} of {A_n}. There is an important relationship between the eigenvalues of the two matrices:

Exercise 14 (Cauchy interlacing inequalities) For any {n \times n} Hermitian matrix {A_n} with top left {n-1 \times n-1} minor {A_{n-1}}, then

\displaystyle \lambda_{i+1}(A_n) \leq \lambda_i(A_{n-1}) \leq \lambda_i(A_n) \ \ \ \ \ (24)

for all {1 \leq i \leq n}. (Hint: use the Courant-Fischer min-max theorem, Theorem 2.) Show furthermore that the space of {A_n} for which equality holds in one of the inequalities in (24) has codimension {2} (for Hermitian matrices) or {1} (for real symmetric matrices).

Remark 5 If one takes successive minors {A_{n-1}, A_{n-2},\ldots,A_1} of an {n \times n} Hermitian matrix {A_n}, and computes their spectra, then (24) shows that this triangular array of numbers forms a pattern known as a Gelfand-Tsetlin pattern. These patterns are discussed a little more in this blog post.

One can obtain a more precise formula for the eigenvalues of {A_n} in terms of those for {A_{n-1}}:

Exercise 15 (Eigenvalue equation) Let {A_n} be an {n \times n} Hermitian matrix with top left {n-1 \times n-1} minor {A_{n-1}}. Suppose that {\lambda} is an eigenvalue of {A_n} distinct from all the eigenvalues of {A_{n-1}} (and thus simple, by (24)). Show that

\displaystyle \sum_{j=1}^{n-1} \frac{ | u_j(A_{n-1})^* X |^2 }{\lambda_j(A_{n-1}) - \lambda} = a_{nn} - \lambda \ \ \ \ \ (25)

where {a_{nn}} is the bottom right entry of {A}, and {X = (a_{nj})_{j=1}^{n-1} \in {\mathbb C}^{n-1}} is the right column of {A} (minus the bottom entry). (Hint: Expand out the eigenvalue equation {A_n u = \lambda u} into the {{\mathbb C}^{n-1}} and {{\mathbb C}} components.) Note the similarities between (25) and (23).

Observe that the function {\lambda \rightarrow \sum_{j=1}^{n-1} \frac{ | u_j(A_{n-1})^* X |^2 }{\lambda_j(A_{n-1}) - \lambda}} is a rational function of {\lambda} which is increasing away from the eigenvalues of {A_{n-1}}, where it has a pole (except in the rare case when the inner product {u_{j-1}(A_{n-1})^* X} vanishes, in which case it can have a removable singularity). By graphing this function one can see that the interlacing formula (24) can also be interpreted as a manifestation of the intermediate value theorem.

The identity (25) suggests that under typical circumstances, an eigenvalue {\lambda} of {A_n} can only get close to an eigenvalue {\lambda_j(A_{n-1})} if the associated inner product {u_j(A_{n-1})^* X} is small. This type of observation is useful to achieve eigenvalue repulsion – to show that it is unlikely that the gap between two adjacent eigenvalues is small. We shall see examples of this in later notes.

— 6. Singular values —

The theory of eigenvalues of {n \times n} Hermitian matrices has an analogue in the theory of singular values of {p \times n} non-Hermitian matrices. We first begin with the counterpart to the spectral theorem, namely the singular value decomposition.

Theorem 4 (Singular value decomposition) Let {0 \leq p \leq n}, and let {A} be a linear transformation from an {n}-dimensional complex Hilbert space {U} to a {p}-dimensional complex Hilbert space {V}. (In particular, {A} could be an {p \times n} matrix with complex entries, viewed as a linear transformation from {{\mathbb C}^n} to {{\mathbb C}^p}.) Then there exist non-negative real numbers

\displaystyle \sigma_1(A) \geq \ldots \geq \sigma_p(A) \geq 0

(known as the singular values of {A}) and orthonormal sets {u_1(A),\ldots,u_p(A) \in U} and {v_1(A),\ldots,v_p(A) \in V} (known as singular vectors of {A}), such that

\displaystyle A u_j = \sigma_j v_j; \quad A^* v_j = \sigma_j u_j

for all {1 \leq j \leq p}, where we abbreviate {u_j = u_j(A)}, etc.

Furthermore, {Au=0} whenever {u} is orthogonal to all of the {u_1(A),\ldots,u_p(A)}.

We adopt the convention that {\sigma_i(A)=0} for {i>p}. The above theorem only applies to matrices with at least as many rows as columns, but one can also extend the definition to matrices with more columns than rows by adopting the convention {\sigma_i(A^*) := \sigma_i(A)} (it is easy to check that this extension is consistent on square matrices). All of the results below extend (with minor modifications) to the case when there are more columns than rows, but we have not displayed those extensions here in order to simplify the notation.

Proof: We induct on {p}. The claim is vacuous for {p=0}, so suppose that {p \geq 1} and that the claim has already been proven for {p-1}.

We follow a similar strategy to the proof of Theorem 1. We may assume that {A} is not identically zero, as the claim is obvious otherwise. The function {u \mapsto \|Au\|^2} is continuous on the unit sphere of {U}, so there exists a unit vector {u_1} which maximises this quantity. If we set {\sigma_1 := \|Au_1\| > 0}, one easily verifies that {u_1} is a critical point of the map {u \mapsto \|Au\|^2 - \sigma_1^2 \|u\|^2}, which then implies that {A^* A u_1 = \sigma_1^2 u_1}. Thus, if we set {v_1 := A u_1 / \sigma_1}, then {A u_1 = \sigma_1 v_1} and {A^* v_1 = \sigma_1 u_1}. This implies that {A} maps the orthogonal complement {u_1^\perp} of {u_1} in {U} to the orthogonal complement {v_1^\perp} of {v_1} in {V}. By induction hypothesis, the restriction of {A} to {u_1^\perp} (and {v_1^\perp}) then admits a singular value decomposition with singular values {\sigma_2 \geq \ldots \geq \sigma_p \geq 0} and singular vectors {u_2,\ldots,u_p \in u_1^\perp}, {v_2,\ldots,v_p \in v_1^\perp} with the stated properties. By construction we see that {\sigma_2,\ldots,\sigma_p} are less than or equal to {\sigma_1}. If we now adjoin {\sigma_1,u_1,v_1} to the other singular values and vectors we obtain the claim. \Box

Exercise 16 Show that the singular values {\sigma_1(A) \geq \ldots \geq \sigma_p(A) \geq 0} of a {p \times n} matrix {A} are unique. If we have {\sigma_1(A) > \ldots > \sigma_p(A) > 0}, show that the singular vectors are unique up to rotation by a complex phase.

By construction (and the above uniqueness claim) we see that {\sigma_i(UAV) = \sigma_i(A)} whenever {A} is a {p \times n} matrix, {U} is a unitary {p \times p} matrix, and {V} is a unitary {n \times n} matrix. Thus the singular spectrum of a matrix is invariant under left and right unitary transformations.

Exercise 17 If {A} is a {p \times n} complex matrix for some {1 \leq p \leq n}, show that the augmented matrix

\displaystyle \tilde A := \begin{pmatrix} 0 & A \\ A^* & 0 \end{pmatrix}

is a {p+n \times p+n} Hermitian matrix whose eigenvalues consist of {\pm \sigma_1(A),\ldots,\pm \sigma_p(A)}, together with {n-p} copies of the eigenvalue zero. (This generalises Exercise 16 from Notes 3.) What is the relationship between the singular vectors of {A} and the eigenvectors of {\tilde A}?

Exercise 18 If {A} is an {n \times n} Hermitian matrix, show that the singular values {\sigma_1(A),\ldots,\sigma_n(A)} of {A} are simply the absolute values {|\lambda_1(A)|,\ldots,|\lambda_n(A)|} of {A}, arranged in descending order. Show that the same claim also holds when {A} is a normal matrix. What is the relationship between the singular vectors and eigenvectors of {A}?

Remark 6 When {A} is not normal, the relationship between eigenvalues and singular values is more subtle. We will discuss this point in later notes.

Exercise 19 If {A} is a {p \times n} complex matrix for some {1 \leq p \leq n}, show that {AA^*} has eigenvalues {\sigma_1(A)^2,\ldots,\sigma_p(A)^2}, and {A^* A} has eigenvalues {\sigma_1(A)^2,\ldots,\sigma_p(A)^2} together with {n-p} copies of the eigenvalue zero. Based on this observation, give an alternate proof of the singular value decomposition theorem using the spectral theorem for (positive semi-definite) Hermitian matrices.

Exercise 20 Show that the rank of a {p \times n} matrix is equal to the number of non-zero singular values.

Exercise 21 Let {A} be a {p \times n} complex matrix for some {1 \leq p \leq n}. Establish the Courant-Fischer min-max formula

\displaystyle \sigma_i(A) = \sup_{\hbox{dim}(V)=i} \inf_{v \in V; |v|=1} |Av| \ \ \ \ \ (26)

for all {1 \leq i \leq p}, where the supremum ranges over all subspaces of {{\mathbb C}^n} of dimension {i}.

One can use the above exercises to deduce many inequalities about singular values from analogous ones about eigenvalues. We give some examples below.

Exercise 22 Let {A, B} be {p \times n} complex matrices for some {1 \leq p \leq n}.

  • (i) Establish the Weyl inequality {\sigma_{i+j-1}(A+B) \leq \sigma_i(A) + \sigma_j(B)} whenever {1 \leq i,j,i+j-1 \leq p}.
  • (ii) Establish the Lidskii inequality

    \displaystyle \sigma_{i_1}(A+B)+\ldots+\sigma_{i_k}(A+B) \leq \sigma_{i_1}(A)+\ldots+\sigma_{i_k}(A)

    \displaystyle +\sigma_1(B)+\ldots+\sigma_k(B)

    whenever {1 \leq i_1 < \ldots < i_k \leq p}.

  • (iii) Show that for any {1 \leq k \leq p}, the map {A \mapsto \sigma_1(A)+\ldots+\sigma_k(A)} defines a norm on the space {{\mathbb C}^{p \times n}} of complex {p \times n} matrices (this norm is known as the {k^{th}} Ky Fan norm).
  • (iv) Establish the Weyl inequality {|\sigma_i(A+B) - \sigma_i(A)| \leq \| B \|_{op}} for all {1 \leq i \leq p}.
  • (v) More generally, establish the {q}-Weilandt-Hoffman inequality {\| (\sigma_i(A+B)-\sigma_i(A))_{1 \leq i \leq p} \|_{\ell^q_p} \leq \|B\|_{S^q}} for any {1 \leq q \leq \infty}, where {\|B\|_{S^q} := \| (\sigma_i(B))_{1 \leq i \leq p} \|_{\ell^q_p}} is the {q}-Schatten norm of {B}. (Note that this is consistent with the previous definition of the Schatten norms.)
  • (vi) Show that the {q}-Schatten norm is indeed a norm on {{\mathbb C}^{p \times n}} for any {1 \leq q \leq \infty}.
  • (vii) If {A'} is formed by removing one row from {A}, show that {\sigma_{i+1}(A) \leq \sigma_i(A') \leq \sigma_i(A)} for all {1 \leq i < p}.
  • (viii) If {p<n} and {A'} is formed by removing one column from {A}, show that {\sigma_{i+1}(A) \leq \sigma_i(A') \leq \sigma_i(A)} for all {1 \leq i < p} and {\sigma_p(A') \leq \sigma_p(A)}. What changes when {p=n}?

Exercise 23 Let {A} be a {p \times n} complex matrix for some {1 \leq p \leq n}. Observe that the linear transformation {A: {\mathbb C}^n \rightarrow {\mathbb C}^p} naturally induces a linear transformation {A^{\wedge k}: \bigwedge^k {\mathbb C}^n \rightarrow \bigwedge^k {\mathbb C}^p} from {k}-forms on {{\mathbb C}^n} to {k}-forms on {{\mathbb C}^p}. We give {\bigwedge^k {\mathbb C}^n} the structure of a Hilbert space by declaring the basic forms {e_{i_1} \wedge \ldots \wedge e_{i_k}} for {1 \leq i_1 < \ldots < i_k \leq n} to be orthonormal.

For any {1 \leq k \leq p}, show that the operator norm of {A^{\wedge k}} is equal to {\sigma_1(A) \ldots \sigma_k(A)}.

Exercise 24 Let {A} be a {p \times n} matrix for some {1 \leq p \leq n}, let {B} be a {r \times p} matrix, and let {C} be a {n \times s} matrix for some {r, s \geq 1}.

Show that {\sigma_i(BA) \leq \|B\|_{op} \sigma_i(A)} and {\sigma_i(AC) \leq \sigma_i(A) \|C\|_{op}} for any {1 \leq i \leq p}.

Exercise 25 Let {A = (a_{ij})_{1 \leq i \leq p; 1 \leq j \leq n}} be a {p \times n} matrix for some {1 \leq p \leq n}, let {i_1,\ldots,i_k \in \{1,\ldots,p\}} be distinct, and let {j_1,\ldots,j_k \in \{1,\ldots,n\}} be distinct. Show that

\displaystyle a_{i_1 j_1} + \ldots + a_{i_k j_k} \leq \sigma_1(A) + \ldots + \sigma_k(A).

Using this, show that if {j_1,\ldots,j_p \in \{1,\ldots,n\}} are distinct, then

\displaystyle \| (a_{i j_i})_{i=1}^p \|_{\ell^q_p} \leq \|A\|_{S^q}

for every {1 \leq q \leq \infty}.

Exercise 26 Establish the Hölder inequality

\displaystyle |\hbox{tr}(A B^*)| \leq \|A\|_{S^q} \|B\|_{S^{q'}}

whenever {A, B} are {p \times n} complex matrices and {1 \leq q,q' \leq \infty} are such that {1/q+1/q'=1}.

Acknowledgments: Thanks to Allen Knutson for corrections.