254A, Notes 3a: Eigenvalues and sums of Hermitian matrices

12 January, 2010 in 254A - random matrices, math.SP | Tags: Courant-Fisher theorem, eigenvalue dynamics, Lidskii inequality, minimax formula, Schatten norms, Schur-Horn inequalities, Weyl inequalities, Wielandt-Hoffman inequality | by Terence Tao

Let ${A}$ be a Hermitian ${n \times n}$ matrix. By the spectral theorem for Hermitian matrices (which, for sake of completeness, we prove below), one can diagonalise ${A}$ using a sequence

$\displaystyle \lambda_1(A) \geq \ldots \geq \lambda_n(A)$

of ${n}$ real eigenvalues, together with an orthonormal basis of eigenvectors ${u_1(A),\ldots,u_n(A) \in {\mathbb C}^n}$ . (The eigenvalues are uniquely determined by ${A}$ , but the eigenvectors have a little ambiguity to them, particularly if there are repeated eigenvalues; for instance, one could multiply each eigenvector by a complex phase ${e^{i\theta}}$ . In these notes we are arranging eigenvalues in descending order; of course, one can also arrange eigenvalues in increasing order, which causes some slight notational changes in the results below.) The set ${\{\lambda_1(A),\ldots,\lambda_n(A)\}}$ is known as the spectrum of ${A}$ .

A basic question in linear algebra asks the extent to which the eigenvalues ${\lambda_1(A),\ldots,\lambda_n(A)}$ and ${\lambda_1(B),\ldots,\lambda_n(B)}$ of two Hermitian matrices ${A, B}$ constrains the eigenvalues ${\lambda_1(A+B),\ldots,\lambda_n(A+B)}$ of the sum. For instance, the linearity of trace

$\displaystyle \hbox{tr}(A+B) = \hbox{tr}(A)+\hbox{tr}(B),$

when expressed in terms of eigenvalues, gives the trace constraint

$\displaystyle \lambda_1(A+B)+\ldots+\lambda_n(A+B) = \lambda_1(A)+\ldots+\lambda_n(A) \ \ \ \ \ (1)$

$\displaystyle +\lambda_1(B)+\ldots+\lambda_n(B);$

the identity

$\displaystyle \lambda_1(A) = \sup_{|v|=1} v^* Av \ \ \ \ \ (2)$

(together with the counterparts for ${B}$ and ${A+B}$ ) gives the inequality

$\displaystyle \lambda_1(A+B) \leq \lambda_1(A) + \lambda_1(B); \ \ \ \ \ (3)$

and so forth.

The complete answer to this problem is a fascinating one, requiring a strangely recursive description (once known as Horn’s conjecture, which is now solved), and connected to a large number of other fields of mathematics, such as geometric invariant theory, intersection theory, and the combinatorics of a certain gadget known as a “honeycomb”. See for instance my survey with Allen Knutson on this topic some years ago.

In typical applications to random matrices, one of the matrices (say, ${B}$ ) is “small” in some sense, so that ${A+B}$ is a perturbation of ${A}$ . In this case, one does not need the full strength of the above theory, and instead rely on a simple aspect of it pointed out by Helmke and Rosenthal and by Totaro, which generates several of the eigenvalue inequalities relating ${A}$ , ${B}$ , and ${C}$ , of which (1) and (3) are examples. (Actually, this method eventually generates all of the eigenvalue inequalities, but this is a non-trivial fact to prove.) These eigenvalue inequalities can mostly be deduced from a number of minimax characterisations of eigenvalues (of which (2) is a typical example), together with some basic facts about intersections of subspaces. Examples include the Weyl inequalities

$\displaystyle \lambda_{i+j-1}(A+B) \leq \lambda_i(A) + \lambda_j(B), \ \ \ \ \ (4)$

valid whenever ${i,j \geq 1}$ and ${i+j-1 \leq n}$ , and the Ky Fan inequality

$\displaystyle \lambda_1(A+B)+\ldots+\lambda_k(A+B) \leq$

$\displaystyle \lambda_1(A)+\ldots+\lambda_k(A) + \lambda_1(B)+\ldots+\lambda_k(B). \ \ \ \ \ (5)$

One consequence of these inequalities is that the spectrum of a Hermitian matrix is stable with respect to small perturbations.

We will also establish some closely related inequalities concerning the relationships between the eigenvalues of a matrix, and the eigenvalues of its minors.

Many of the inequalities here have analogues for the singular values of non-Hermitian matrices (which is consistent with the discussion near Exercise 16 of Notes 3). However, the situation is markedly different when dealing with eigenvalues of non-Hermitian matrices; here, the spectrum can be far more unstable, if pseudospectrum is present. Because of this, the theory of the eigenvalues of a random non-Hermitian matrix requires an additional ingredient, namely upper bounds on the prevalence of pseudospectrum, which after recentering the matrix is basically equivalent to establishing lower bounds on least singular values. We will discuss this point in more detail in later notes.

We will work primarily here with Hermitian matrices, which can be viewed as self-adjoint transformations on complex vector spaces such as ${{\mathbb C}^n}$ . One can of course specialise the discussion to real symmetric matrices, in which case one can restrict these complex vector spaces to their real counterparts ${{\mathbb R}^n}$ . The specialisation of the complex theory below to the real case is straightforward and is left to the interested reader.

— 1. Proof of spectral theorem —

To prove the spectral theorem, it is convenient to work more abstractly, in the context of self-adjoint operators on finite-dimensional Hilbert spaces:

Theorem 1 (Spectral theorem) Let ${V}$ be a finite-dimensional complex Hilbert space of some dimension ${n}$ , and let ${T: V \rightarrow V}$ be a self-adjoint operator. Then there exists an orthonormal basis ${v_1,\ldots,v_n \in V}$ of ${V}$ and eigenvalues ${\lambda_1,\ldots,\lambda_n \in {\mathbb R}}$ such that ${T v_i = \lambda_i v_i}$ for all ${1 \leq i \leq n}$ .

The spectral theorem as stated in the introduction then follows by specialising to the case ${V = {\mathbb C}^n}$ and ordering the eigenvalues.

Proof: We induct on the dimension ${n}$ . The claim is vacuous for ${n=0}$ , so suppose that ${n \geq 1}$ and that the claim has already been proven for ${n=1}$ .

Let ${v}$ be a unit vector in ${{\mathbb C}^n}$ (thus ${v^* v = 1}$ ) that maximises the form ${\hbox{Re} v^* Tv}$ ; this maximum exists by compactness. By the method of Lagrange multipliers, ${v}$ is a critical point of ${\hbox{Re} v^* Tv - \lambda v^* v}$ for some ${\lambda \in {\mathbb R}}$ . Differentiating in an arbitrary direction ${w \in {\mathbb C}^n}$ , we conclude that

$\displaystyle \hbox{Re} ( v^* Tw + w^* Tv - \lambda v^* w - \lambda w^* v ) = 0;$

this simplifies using self-adjointness to

$\displaystyle \hbox{Re} ( w^* (Tv - \lambda v) ) = 0.$

Since ${w \in {\mathbb C}^n}$ was arbitrary, we conclude that ${Tv = \lambda v}$ , thus ${v}$ is a unit eigenvector of ${T}$ . By self-adjointness, this implies that the orthogonal complement ${v^\perp := \{ w \in V: v^* w = 0 \}}$ of ${v}$ is preserved by ${T}$ . Restricting ${T}$ to this lower-dimensional subspace and applying the induction hypothesis, we can find an orthonormal basis of eigenvectors of ${T}$ on ${v^\perp}$ . Adjoining the new unit vector ${v}$ to the orthonormal basis, we obtain the claim. $\Box$

Suppose we have a self-adjoint transformation ${A: {\mathbb C}^n \rightarrow {\mathbb C}^n}$ , which of course can be identified with a Hermitian matrix. Using the orthogonal eigenbasis provided by the spectral theorem, we can perform an orthonormal change of variables to set that eigenbasis to be the standard basis ${e_1,\ldots,e_n}$ , so that the matrix of ${A}$ becomes diagonal. This is very useful when dealing with just a single matrix ${A}$ – for instance, it makes the task of computing functions of ${A}$ , such as ${A^k}$ or ${\exp(tA)}$ , much easier. However, when one has several Hermitian matrices in play (e.g. ${A, B, C}$ ), then it is usually not possible to standardise all the eigenbases simultaneously (i.e. to simultaneously diagonalise all the matrices), except when the matrices all commute. Nevertheless one can still normalise one of the eigenbases to be the standard basis, and this is still useful for several applications, as we shall soon see.

Exercise 1 Suppose that the eigenvalues ${\lambda_1(A) > \ldots > \lambda_n(A)}$ of an ${n \times n}$ Hermitian matrix are distinct. Show that the associated eigenbasis ${u_1(A),\ldots,u_n(A)}$ is unique up to rotating each individual eigenvector ${u_j(A)}$ by a complex phase ${e^{i\theta_j}}$ . In particular, the spectral projections ${P_j(A) := u_j(A) u_j(A)^*}$ are unique. What happens when there is eigenvalue multiplicity?

— 2. Minimax formulae —

The ${i^{th}}$ eigenvalue functional ${A \mapsto \lambda_i(A)}$ is not a linear functional (except in dimension one). It is not even a convex functional (except when ${i=1}$ ) or a concave functional (except when ${i=n}$ ). However, it is the next best thing, namely it is a minimax expression of linear functionals. (Note that a convex functional is the same thing as a max of linear functionals, while a concave functional is the same thing as a min of linear functionals.) More precisely, we have

Theorem 2 (Courant-Fischer min-max theorem) Let ${A}$ be an ${n \times n}$ Hermitian matrix. Then we have

$\displaystyle \lambda_i(A) = \sup_{\hbox{dim}(V)=i} \inf_{v \in V: |v|=1} v^* A v \ \ \ \ \ (6)$

and

$\displaystyle \lambda_i(A) = \inf_{\hbox{dim}(V)=n-i+1} \sup_{v \in V: |v|=1} v^* A v \ \ \ \ \ (7)$

for all ${1 \leq i \leq n}$ , where ${V}$ ranges over all subspaces of ${{\mathbb C}^n}$ with the indicated dimension.

Proof: It suffices to prove (6), as (7) follows by replacing ${A}$ by ${-A}$ (noting that ${\lambda_i(-A) = -\lambda_{n-i+1}(A)}$ ).

We first verify the ${i=1}$ case, i.e. (2). By the spectral theorem, we can assume that ${A}$ has the standard eigenbasis ${e_1,\ldots,e_n}$ , in which case we have

$\displaystyle v^* A v = \sum_{i=1}^n \lambda_i |v_i|^2 \ \ \ \ \ (8)$

whenever ${v = (v_1,\ldots,v_n)}$ . The claim (2) is then easily verified.

To prove the general case, we may again assume ${A}$ has the standard eigenbasis. By considering the space ${V}$ spanned by ${e_1,\ldots,e_i}$ , we easily see the inequality

$\displaystyle \lambda_i(A) \leq \sup_{\hbox{dim}(V)=i} \inf_{v \in V: |v|=1} v^* A v$

so we only need to prove the reverse inequality. In other words, for every ${i}$ -dimensional subspace ${V}$ of ${{\mathbb C}^n}$ , we have to show that ${V}$ contains a unit vector ${v}$ such that

$\displaystyle v^* A v \leq \lambda_i(A).$

Let ${W}$ be the space spanned by ${e_i,\ldots,e_n}$ . This space has codimension ${i-1}$ , so it must have non-trivial intersection with ${V}$ . If we let ${v}$ be a unit vector in ${V \cap W}$ , the claim then follows from (8). $\Box$

Remark 1 By homogeneity, one can replace the restriction ${|v|=1}$ with ${v \neq 0}$ provided that one replaces the quadratic form ${v^* A v}$ with the Rayleigh quotient ${v^* A v / v^* v}$ .

A closely related formula is as follows. Given an ${n \times n}$ Hermitian matrix ${A}$ and an ${m}$ -dimensional subspace ${V}$ of ${{\mathbb C}^n}$ , we define the partial trace ${\hbox{tr}(A\downharpoonright_V)}$ to be the expression

$\displaystyle \hbox{tr}(A\downharpoonright_V) := \sum_{i=1}^m v_i^* A v_i$

where ${v_1,\ldots,v_m}$ is any orthonormal basis of ${V}$ . It is easy to see that this expression is independent of the choice of orthonormal basis, and so the partial trace is well-defined.

Proposition 3 (Extremal partial trace) Let ${A}$ be an ${n \times n}$ Hermitian matrix. Then for any ${1 \leq k \leq n}$ , one has

$\displaystyle \lambda_1(A)+\ldots+\lambda_k(A) = \sup_{\hbox{dim}(V)=k} \hbox{tr}(A \downharpoonright_V)$

and

$\displaystyle \lambda_{n-k+1}(A)+\ldots+\lambda_n(A) = \inf_{\hbox{dim}(V)=k} \hbox{tr}(A \downharpoonright_V).$

As a corollary, we see that ${A \mapsto \lambda_1(A)+\ldots+\lambda_k(A)}$ is a convex function, and ${A \mapsto \lambda_{n-k+1}(A)+\ldots+\lambda_n(A)}$ is a concave function.

Proof: Again, by symmetry it suffices to prove the first formula. As before, we may assume without loss of generality that ${A}$ has the standard eigenbasis. By selecting ${V}$ to be the span of ${e_1,\ldots,e_k}$ we have the inequality

$\displaystyle \lambda_1(A)+\ldots+\lambda_k(A) \leq \sup_{\hbox{dim}(V)=k} \hbox{tr}(A \downharpoonright_V)$

so it suffices to prove the reverse inequality. For this we induct on dimension. If ${V}$ has dimension ${k}$ , then it has a ${k-1}$ -dimensional subspace ${V'}$ that is contained in the span of ${e_2,\ldots,e_n}$ . By the induction hypothesis, we have

$\displaystyle \lambda_2(A)+\ldots+\lambda_k(A) \geq \hbox{tr}(A \downharpoonright_{V'}).$

On the other hand, if ${v}$ is a unit vector in the orthogonal complement of ${V'}$ in ${V}$ , we see from (2) that

$\displaystyle \lambda_1(A) \geq v^* A v.$

Adding the two inequalities we obtain the claim. $\Box$

Specialising Proposition 3 to the case when ${V}$ is a coordinate subspace (i.e. the span of ${k}$ of the basis vectors ${e_1,\ldots,e_n}$ ), we conclude the Schur-Horn inequalities

$\displaystyle \lambda_{n-k+1}(A)+\ldots+\lambda_n(A) \leq$

$\displaystyle a_{i_1 i_1} + \ldots + a_{i_k i_k} \leq \lambda_1(A)+\ldots +\lambda_k(A) \ \ \ \ \ (9)$

for any ${1 \leq i_1 < \ldots < i_k \leq n}$ , where ${a_{11},a_{22},\ldots,a_{nn}}$ are the diagonal entries of ${A}$ .

Exercise 2 Show that the inequalities (9) are equivalent to the assertion that the diagonal entries ${\hbox{diag}(A) = (a_{11},a_{22},\ldots,a_{nn})}$ lies in the permutahedron of ${\lambda_1(A),\ldots,\lambda_n(A)}$ , defined as the convex hull of the ${n!}$ permutations of ${(\lambda_1(A),\ldots,\lambda_n(A))}$ in ${{\mathbb R}^n}$ .

Remark 2 It is a theorem of Schur and Horn that these are the complete set of inequalities connecting the diagonal entries ${\hbox{diag}(A) = (a_{11},a_{22},\ldots,a_{nn})}$ of a Hermitian matrix to its spectrum. To put it another way, the image of any coadjoint orbit ${{\mathcal O}_A := \{ UAU^*: U \in U(n) \}}$ of a matrix ${A}$ with a given spectrum ${\lambda_1,\ldots,\lambda_n}$ under the diagonal map ${\hbox{diag}: A \mapsto \hbox{diag}(A)}$ is the permutahedron of ${\lambda_1,\ldots,\lambda_n}$ . Note that the vertices of this permutahedron can be attained by considering the diagonal matrices inside this coadjoint orbit, whose entries are then a permutation of the eigenvalues. One can interpret this diagonal map ${\hbox{diag}}$ as the moment map associated with the conjugation action of the standard maximal torus of ${U(n)}$ (i.e. the diagonal unitary matrices) on the coadjoint orbit. When viewed in this fashion, the Schur-Horn theorem can be viewed as the special case of the more general Atiyah convexity theorem (also proven independently by Guillemin and Sternberg) in symplectic geometry. Indeed, the topic of eigenvalues of Hermitian matrices turns out to be quite profitably viewed as a question in symplectic geometry (and also in algebraic geometry, particularly when viewed through the machinery of geometric invariant theory).

There is a simultaneous generalisation of Theorem 2 and Proposition 3:

Exercise 3 (Wielandt minimax formula) Let ${1 \leq i_1 < \ldots < i_k \leq n}$ be integers. Define a partial flag to be a nested collection ${V_1 \subset \ldots \subset V_k}$ of subspaces of ${{\mathbb C}^n}$ such that ${\hbox{dim}(V_j) = i_j}$ for all ${1 \leq j \leq k}$ . Define the associated Schubert variety ${X(V_1,\ldots,V_k)}$ to be the collection of all ${k}$ -dimensional subspaces ${W}$ such that ${\hbox{dim}(W \cap V_j) \geq j}$ . Show that for any ${n \times n}$ matrix ${A}$ ,

$\displaystyle \lambda_{i_1}(A) + \ldots + \lambda_{i_k}(A) = \sup_{V_1,\ldots,V_k} \inf_{W \in X(V_1,\ldots,V_k)} \hbox{tr}(A\downharpoonright_W).$

— 3. Eigenvalue inequalities —

Using the above minimax formulae, we can now quickly prove a variety of eigenvalue inequalities. The basic idea is to exploit the linearity relationship

$\displaystyle v^* (A+B) v = v^* A v + v^* B v \ \ \ \ \ (10)$

for any unit vector ${v}$ , and more generally

$\displaystyle \hbox{tr}((A+B) \downharpoonright_V ) = \hbox{tr}(A \downharpoonright_V ) +\hbox{tr}(B \downharpoonright_V ) \ \ \ \ \ (11)$

for any subspace ${V}$ .

For instance, as mentioned before, the inequality (3) follows immediately from (2) and (10). Similarly, for the Ky Fan inequality (5), one observes from (11) and Proposition 3 that

$\displaystyle \hbox{tr}((A+B) \downharpoonright_W ) \leq \hbox{tr}(A \downharpoonright_W ) + \lambda_1(B) + \ldots + \lambda_k(B)$

for any ${k}$ -dimensional subspace ${W}$ . Substituting this into Proposition 3 gives the claim. If one uses Exercise 3 instead of Proposition 3, one obtains the more general Lidskii inequality

$\displaystyle \lambda_{i_1}(A+B)+\ldots+\lambda_{i_k}(A+B) \leq$

$\displaystyle \lambda_{i_1}(A)+\ldots+\lambda_{i_k}(A) + \lambda_1(B)+\ldots+\lambda_k(B) \ \ \ \ \ (12)$

for any ${1 \leq i_1 < \ldots < i_k \leq n}$ .

In a similar spirit, using the inequality

$\displaystyle |v^* B v| \leq \|B\|_{op} = \max( |\lambda_1(B)|, |\lambda_n(B)| )$

for unit vectors ${v}$ , combined with (10) and (6), we obtain the eigenvalue stability inequality

$\displaystyle |\lambda_i(A+B) - \lambda_i(A)| \leq \|B\|_{op}, \ \ \ \ \ (13)$

thus the spectrum of ${A+B}$ is close to that of ${A}$ if ${B}$ is small in operator norm. In particular, we see that the map ${A \mapsto \lambda_i(A)}$ is Lipschitz continuous on the space of Hermitian matrices, for fixed ${1 \leq i \leq n}$ .

More generally, suppose one wants to establish the Weyl inequality (4). From (6) that it suffices to show that every ${i+j-1}$ -dimensional subspace ${V}$ contains a unit vector ${v}$ such that

$\displaystyle v^*(A+B)v \leq \lambda_i(A) + \lambda_j(B).$

But from (6), one can find a subspace ${U}$ of codimension ${i-1}$ such that ${v^* A v \leq \lambda_i(A)}$ for all unit vectors ${v}$ in ${U}$ , and a subspace ${W}$ of codimension ${j-1}$ such that ${v^* B v \leq \lambda_j(B)}$ for all unit vectors ${v}$ in ${W}$ . The intersection ${U \cap W}$ has codimension at most ${i+j-2}$ and so has a nontrivial intersection with ${V}$ ; and the claim follows.

Remark 3 More generally, one can generate an eigenvalue inequality whenever the intersection numbers of three Schubert varieties of compatible dimensions is non-zero; see the paper of Helmke and Rosenthal. In fact, this generates a complete set of inequalities; this is a result of Klyachko. One can in fact restrict attention to those varieties whose intersection number is exactly one; this is a result of Knutson, Woodward, and myself. Finally, in those cases, the fact that the intersection is one can be proven by entirely elementary means (based on the standard inequalities relating the dimension of two subspaces ${V, W}$ to their intersection ${V \cap W}$ and sum ${V + W}$ ); this is a result of Bercovici, Collins, Dykema, Li, and Timotin. As a consequence, the methods in this section can, in principle, be used to derive all possible eigenvalue inequalities for sums of Hermitian matrices.

Exercise 4 Verify the inequalities (12) and (4) by hand in the case when ${A}$ and ${B}$ commute (and are thus simultaneously diagonalisable), without the use of minimax formulae.

Exercise 5 Establish the dual Lidskii inequality

$\displaystyle \lambda_{i_1}(A+B)+\ldots+\lambda_{i_k}(A+B) \geq \lambda_{i_1}(A)+\ldots+\lambda_{i_k}(A)$

$\displaystyle + \lambda_{n-k+1}(B)+\ldots+\lambda_n(B)$

for any ${1 \leq i_1 < \ldots < i_k \leq n}$ and the dual Weyl inequality

$\displaystyle \lambda_{i+j-n}(A+B) \geq \lambda_i(A) + \lambda_j(B)$

whenever ${1 \leq i,j,i+j-n \leq n}$ .

Exercise 6 Use the Lidskii inequality to establish the more general inequality

$\displaystyle \sum_{i=1}^n c_i \lambda_i(A+B) \leq \sum_{i=1}^n c_i \lambda_i(A) + \sum_{i=1}^n c^*_i \lambda_i(B)$

whenever ${c_1,\ldots,c_n \geq 0}$ , and ${c_1^* \geq \ldots \geq c^*_n \geq 0}$ is the decreasing rearrangement of ${c_1,\ldots,c_n}$ . (Hint: express ${c_i}$ as the integral of ${{\bf I}(c_i \geq \lambda)}$ as ${\lambda}$ runs from ${0}$ to infinity. For each fixed ${\lambda}$ , apply (12).) Combine this with Hölder’s inequality to conclude the ${p}$ -Weilandt-Hoffman inequality

$\displaystyle \| (\lambda_i(A+B) - \lambda_i(A))_{i=1}^n \|_{\ell^p_n} \leq \|B\|_{S^p} \ \ \ \ \ (14)$

for any ${1 \leq p \leq \infty}$ , where

$\displaystyle \| (a_i)_{i=1}^n \|_{\ell^p_n} := (\sum_{i=1}^n |a_i|^p)^{1/p}$

is the usual ${\ell^p}$ norm (with the usual convention that ${\| (a_i)_{i=1}^n \|_{\ell^\infty_n} := \sup_{1 \leq i \leq p} |a_i|}$ ), and

$\displaystyle \|B\|_{S^p} := \| (\lambda_i(B))_{i=1}^n \|_{\ell^p_n} \ \ \ \ \ (15)$

is the ${p}$ -Schatten norm of ${B}$ .

Exercise 7 Show that the ${p}$ -Schatten norms are indeed a norm on the space of Hermitian matrices for every ${1 \leq p \leq \infty}$ .

Exercise 8 Show that for any ${1 \leq p \leq \infty}$ and any Hermitian matrix ${A = (a_{ij})_{1 \leq i,j \leq n}}$ , one has

$\displaystyle \| (a_{ii})_{i=1}^n \|_{\ell^p_n} \leq \| A \|_{S^p}. \ \ \ \ \ (16)$

Exercise 9 Establish the Hölder inequality

$\displaystyle |\hbox{tr}(AB)| \leq \|A\|_{S^p} \|B\|_{S^{p'}}$

whenever ${1 \leq p,p' \leq \infty}$ with ${1/p + 1/p' = 1}$ , and ${A, B}$ are ${n \times n}$ Hermitian matrices. (Hint: Diagonalise one of the matrices and use the preceding exercise.)

The most important ${p}$ -Schatten norms are the ${\infty}$ -Schatten norm ${\|A\|_{S^\infty} = \|A\|_{op}}$ , which is just the operator norm, and the ${2}$ -Schatten norm ${\|A\|_{S^2} = (\sum_{i=1}^n \lambda_i(A)^2)^{1/2}}$ , which is also the Frobenius norm (or Hilbert-Schmidt norm)

$\displaystyle \|A\|_{S^2} = \|A\|_F := \hbox{tr}(AA^*)^{1/2} = (\sum_{i=1}^n \sum_{j=1}^n |a_{ij}|^2)^{1/2}$

where ${a_{ij}}$ are the coeffiicents of ${A}$ . (The ${1}$ -Schatten norm ${S^1}$ , also known as the nuclear norm or trace class norm, is important in a number of applications, such as matrix completion, but will not be used often in this course.) Thus we see that the ${p=2}$ case of the Weilandt-Hoffman inequality can be written as

$\displaystyle \sum_{i=1}^n |\lambda_i(A+B) - \lambda_i(A)|^2 \leq \|B\|_F^2. \ \ \ \ \ (17)$

We will give an alternate proof of this inequality, based on eigenvalue deformation, in the next section.

— 4. Eigenvalue deformation —

From the Weyl inequality (13), we know that the eigenvalue maps ${A \mapsto \lambda_i(A)}$ are Lipschitz continuous on Hermitian matrices (and thus also on real symmetric matrices). It turns out that we can obtain better regularity, provided that we avoid repeated eigenvalues. Fortunately, repeated eigenvalues are rare:

Exercise 10 (Dimension count) Suppose that ${n \geq 2}$ . Show that the space of Hermitian matrices with at least one repeated eigenvalue has codimension ${3}$ in the space of all Hermitian matrices, and the space of real symmetric matrices with at least one repeated eigenvalue has codimension ${2}$ in the space of all real symmetric matrices. (When ${n=1}$ , repeated eigenvalues of course do not occur.)

Let us say that a Hermitian matrix has simple spectrum if it has no repeated eigenvalues. We thus see from the above exercise and (13) that the set of Hermitian matrices with simple spectrum forms an open dense set in the space of all Hermitian matrices, and similarly for real symmetric matrices; thus simple spectrum is the generic behaviour of such matrices. Indeed, the unexpectedly high codimension of the non-simple matrices (naively, one would expect a codimension ${1}$ set for a collision between, say, ${\lambda_i(A)}$ and ${\lambda_{i+1}(A)}$ ) suggests a repulsion phenomenon: because it is unexpectedly rare for eigenvalues to be equal, there must be some “force” that “repels” eigenvalues of Hermitian (and to a lesser extent, real symmetric) matrices from getting too close to each other. We now develop some machinery to make this more precise.

We first observe that when ${A}$ has simple spectrum, the zeroes of the characteristic polynomial ${\lambda \mapsto \det(A - \lambda I)}$ are simple (i.e. the polynomial has nonzero derivartive at those zeroes). From this and the inverse function theorem, we see that each of the eigenvalue maps ${A \mapsto \lambda_i(A)}$ are smooth on the region where ${A}$ has simple spectrum. Because the eigenvectors ${u_i(A)}$ are determined (up to phase) by the equations ${(A - \lambda_i(A) I) u_i(A) = 0}$ and ${u_i(A)^* u_i(A) = 1}$ , another application of the inverse function theorem tells us that we can (locally) select the maps ${A \mapsto u_i(A)}$ to also be smooth. (There may be topological obstructions to smoothly selecting these vectors globally, but this will not concern us here as we will be performing a local analysis only. In later notes, we will in fact not work with the ${u_i(A)}$ at all due to their phase ambiguity, and work instead with the spectral projections ${P_i(A) := u_i(A) u_i(A)^*}$ , which do not have this ambiguity.)

Now suppose that ${A = A(t)}$ depends smoothly on a time variable ${t}$ , so that (when ${A}$ has simple spectrum) the eigenvalues ${\lambda_i(t) = \lambda_i(A(t))}$ and eigenvectors ${u_i(t) = u_i(A(t))}$ also depend smoothly on ${t}$ . We can then differentiate the equations

$\displaystyle A u_i = \lambda_i u_i \ \ \ \ \ (18)$

and

$\displaystyle u_i^* u_i = 1 \ \ \ \ \ (19)$

to obtain various equations of motion for ${\lambda_i}$ and ${u_i}$ in terms of the derivatives of ${A}$ .

Let’s see how this works. Taking first derivatives of (18), (19) using the product rule, we obtain

$\displaystyle \dot A u_i + A \dot u_i = \dot \lambda_i u_i + \lambda_i \dot u_i \ \ \ \ \ (20)$

and

$\displaystyle \dot u_i^* u_i + u_i^* \dot u_i = 0. \ \ \ \ \ (21)$

The equation (21) simplifies to ${{\mathrm Re} \dot u_i^* u_i = 0}$ , thus ${\dot u_i}$ is orthogonal to ${u_i}$ . Taking (real) inner products of (20) with ${u_i}$ , we conclude the Hadamard first variation formula

$\displaystyle \dot \lambda_i = u_i^* \dot A u_i. \ \ \ \ \ (22)$

This can already be used to give alternate proofs of various eigenvalue identities. For instance, If we apply this to ${A(t) := A + tB}$ , we see that

$\displaystyle \frac{d}{dt} \lambda_i(A+tB) = u_i(A+tB)^* B u_i(A+tB)$

whenever ${A+tB}$ has simple spectrum. The right-hand side can be bounded in magnitude by ${\|B\|_{op}}$ , and so we see that the map ${t \mapsto \lambda_i(A+tB)}$ is Lipschitz with norm ${\|B\|_{op}}$ whenever ${A+tB}$ has simple spectrum, which happens for generic ${A, B}$ (and all ${t}$ ) by Exercise 10. By the fundamental theorem of calculus, we thus conclude (13).

Exercise 11 Use a similar argument to the one above to establish (17) without using minimax formulae or Lidskii’s inequality.

Exercise 12 Use a similar argument to the one above to deduce Lidskii’s inequality (12) from Proposition 3 rather than Exercise 3.

One can also compute the second derivative of eigenvalues:

Exercise 13 Suppose that ${A = A(t)}$ depends smoothly on ${t}$ . By differentiating (20), (21) twice, establish the Hadamard second variation formula

$\displaystyle \frac{d^2}{dt^2} \lambda_k = u_k^* \ddot A u_k + 2 \sum_{j \neq k} \frac{ |u_j^* \dot A u_k|^2 }{\lambda_k - \lambda_j} \ \ \ \ \ (23)$

whenever ${A}$ has simple spectrum and ${1 \leq k \leq n}$ .

If one interprets the second derivative of the eigenvalues as being proportional to a “force” on those eigenvalues (in analogy with Newton’s second law), (23) is asserting that each eigenvalue ${\lambda_j}$ “repels” the other eigenvalues ${\lambda_k}$ by exerting a force that is inversely proportional to their separation (and also proportional to the square of the matrix coefficient of ${\dot A}$ in the eigenbasis). See this earlier blog post for more discussion.

Remark 4 In the proof of the four moment theorem of Van Vu and myself, which we will discuss in a subsequent lecture, we will also need the variation formulae for the third, fourth, and fifth derivatives of the eigenvalues (the first four derivatives match up with the four moments mentioned in the theorem, and the fifth derivative is needed to control error terms). Fortunately, we do not need the precise formulae for these derivatives (which, as one can imagine, are quite complicated), but only their general form, and in particular an upper bound for these derivatives in terms of more easily computable quantities.

— 5. Minors —

In the previous sections, we perturbed ${n \times n}$ Hermitian matrices ${A = A_n}$ by adding a (small) ${n \times n}$ Hermitian correction matrix ${B}$ to them to form a new ${n \times n}$ Hermitian matrix ${A+B}$ . Another important way to perturb a matrix is to pass to a principal minor, for instance to the top left ${n-1 \times n-1}$ minor ${A_{n-1}}$ of ${A_n}$ . There is an important relationship between the eigenvalues of the two matrices:

Exercise 14 (Cauchy interlacing inequalities) For any ${n \times n}$ Hermitian matrix ${A_n}$ with top left ${n-1 \times n-1}$ minor ${A_{n-1}}$ , then

$\displaystyle \lambda_{i+1}(A_n) \leq \lambda_i(A_{n-1}) \leq \lambda_i(A_n) \ \ \ \ \ (24)$

for all ${1 \leq i \leq n}$ . (Hint: use the Courant-Fischer min-max theorem, Theorem 2.) Show furthermore that the space of ${A_n}$ for which equality holds in one of the inequalities in (24) has codimension ${2}$ (for Hermitian matrices) or ${1}$ (for real symmetric matrices).

Remark 5 If one takes successive minors ${A_{n-1}, A_{n-2},\ldots,A_1}$ of an ${n \times n}$ Hermitian matrix ${A_n}$ , and computes their spectra, then (24) shows that this triangular array of numbers forms a pattern known as a Gelfand-Tsetlin pattern. These patterns are discussed a little more in this blog post.

One can obtain a more precise formula for the eigenvalues of ${A_n}$ in terms of those for ${A_{n-1}}$ :

Exercise 15 (Eigenvalue equation) Let ${A_n}$ be an ${n \times n}$ Hermitian matrix with top left ${n-1 \times n-1}$ minor ${A_{n-1}}$ . Suppose that ${\lambda}$ is an eigenvalue of ${A_n}$ distinct from all the eigenvalues of ${A_{n-1}}$ (and thus simple, by (24)). Show that

$\displaystyle \sum_{j=1}^{n-1} \frac{ | u_j(A_{n-1})^* X |^2 }{\lambda_j(A_{n-1}) - \lambda} = a_{nn} - \lambda \ \ \ \ \ (25)$

where ${a_{nn}}$ is the bottom right entry of ${A}$ , and ${X = (a_{nj})_{j=1}^{n-1} \in {\mathbb C}^{n-1}}$ is the right column of ${A}$ (minus the bottom entry). (Hint: Expand out the eigenvalue equation ${A_n u = \lambda u}$ into the ${{\mathbb C}^{n-1}}$ and ${{\mathbb C}}$ components.) Note the similarities between (25) and (23).

Observe that the function ${\lambda \rightarrow \sum_{j=1}^{n-1} \frac{ | u_j(A_{n-1})^* X |^2 }{\lambda_j(A_{n-1}) - \lambda}}$ is a rational function of ${\lambda}$ which is increasing away from the eigenvalues of ${A_{n-1}}$ , where it has a pole (except in the rare case when the inner product ${u_{j-1}(A_{n-1})^* X}$ vanishes, in which case it can have a removable singularity). By graphing this function one can see that the interlacing formula (24) can also be interpreted as a manifestation of the intermediate value theorem.

The identity (25) suggests that under typical circumstances, an eigenvalue ${\lambda}$ of ${A_n}$ can only get close to an eigenvalue ${\lambda_j(A_{n-1})}$ if the associated inner product ${u_j(A_{n-1})^* X}$ is small. This type of observation is useful to achieve eigenvalue repulsion – to show that it is unlikely that the gap between two adjacent eigenvalues is small. We shall see examples of this in later notes.

— 6. Singular values —

The theory of eigenvalues of ${n \times n}$ Hermitian matrices has an analogue in the theory of singular values of ${p \times n}$ non-Hermitian matrices. We first begin with the counterpart to the spectral theorem, namely the singular value decomposition.

Theorem 4 (Singular value decomposition) Let ${0 \leq p \leq n}$ , and let ${A}$ be a linear transformation from an ${n}$ -dimensional complex Hilbert space ${U}$ to a ${p}$ -dimensional complex Hilbert space ${V}$ . (In particular, ${A}$ could be an ${p \times n}$ matrix with complex entries, viewed as a linear transformation from ${{\mathbb C}^n}$ to ${{\mathbb C}^p}$ .) Then there exist non-negative real numbers

$\displaystyle \sigma_1(A) \geq \ldots \geq \sigma_p(A) \geq 0$

(known as the singular values of ${A}$ ) and orthonormal sets ${u_1(A),\ldots,u_p(A) \in U}$ and ${v_1(A),\ldots,v_p(A) \in V}$ (known as singular vectors of ${A}$ ), such that

$\displaystyle A u_j = \sigma_j v_j; \quad A^* v_j = \sigma_j u_j$

for all ${1 \leq j \leq p}$ , where we abbreviate ${u_j = u_j(A)}$ , etc.

Furthermore, ${Au=0}$ whenever ${u}$ is orthogonal to all of the ${u_1(A),\ldots,u_p(A)}$ .

We adopt the convention that ${\sigma_i(A)=0}$ for ${i>p}$ . The above theorem only applies to matrices with at least as many rows as columns, but one can also extend the definition to matrices with more columns than rows by adopting the convention ${\sigma_i(A^*) := \sigma_i(A)}$ (it is easy to check that this extension is consistent on square matrices). All of the results below extend (with minor modifications) to the case when there are more columns than rows, but we have not displayed those extensions here in order to simplify the notation.

Proof: We induct on ${p}$ . The claim is vacuous for ${p=0}$ , so suppose that ${p \geq 1}$ and that the claim has already been proven for ${p-1}$ .

We follow a similar strategy to the proof of Theorem 1. We may assume that ${A}$ is not identically zero, as the claim is obvious otherwise. The function ${u \mapsto \|Au\|^2}$ is continuous on the unit sphere of ${U}$ , so there exists a unit vector ${u_1}$ which maximises this quantity. If we set ${\sigma_1 := \|Au_1\| > 0}$ , one easily verifies that ${u_1}$ is a critical point of the map ${u \mapsto \|Au\|^2 - \sigma_1^2 \|u\|^2}$ , which then implies that ${A^* A u_1 = \sigma_1^2 u_1}$ . Thus, if we set ${v_1 := A u_1 / \sigma_1}$ , then ${A u_1 = \sigma_1 v_1}$ and ${A^* v_1 = \sigma_1 u_1}$ . This implies that ${A}$ maps the orthogonal complement ${u_1^\perp}$ of ${u_1}$ in ${U}$ to the orthogonal complement ${v_1^\perp}$ of ${v_1}$ in ${V}$ . By induction hypothesis, the restriction of ${A}$ to ${u_1^\perp}$ (and ${v_1^\perp}$ ) then admits a singular value decomposition with singular values ${\sigma_2 \geq \ldots \geq \sigma_p \geq 0}$ and singular vectors ${u_2,\ldots,u_p \in u_1^\perp}$ , ${v_2,\ldots,v_p \in v_1^\perp}$ with the stated properties. By construction we see that ${\sigma_2,\ldots,\sigma_p}$ are less than or equal to ${\sigma_1}$ . If we now adjoin ${\sigma_1,u_1,v_1}$ to the other singular values and vectors we obtain the claim. $\Box$

Exercise 16 Show that the singular values ${\sigma_1(A) \geq \ldots \geq \sigma_p(A) \geq 0}$ of a ${p \times n}$ matrix ${A}$ are unique. If we have ${\sigma_1(A) > \ldots > \sigma_p(A) > 0}$ , show that the singular vectors are unique up to rotation by a complex phase.

By construction (and the above uniqueness claim) we see that ${\sigma_i(UAV) = \sigma_i(A)}$ whenever ${A}$ is a ${p \times n}$ matrix, ${U}$ is a unitary ${p \times p}$ matrix, and ${V}$ is a unitary ${n \times n}$ matrix. Thus the singular spectrum of a matrix is invariant under left and right unitary transformations.

Exercise 17 If ${A}$ is a ${p \times n}$ complex matrix for some ${1 \leq p \leq n}$ , show that the augmented matrix

$\displaystyle \tilde A := \begin{pmatrix} 0 & A \\ A^* & 0 \end{pmatrix}$

is a ${p+n \times p+n}$ Hermitian matrix whose eigenvalues consist of ${\pm \sigma_1(A),\ldots,\pm \sigma_p(A)}$ , together with ${n-p}$ copies of the eigenvalue zero. (This generalises Exercise 16 from Notes 3.) What is the relationship between the singular vectors of ${A}$ and the eigenvectors of ${\tilde A}$ ?

Exercise 18 If ${A}$ is an ${n \times n}$ Hermitian matrix, show that the singular values ${\sigma_1(A),\ldots,\sigma_n(A)}$ of ${A}$ are simply the absolute values ${|\lambda_1(A)|,\ldots,|\lambda_n(A)|}$ of ${A}$ , arranged in descending order. Show that the same claim also holds when ${A}$ is a normal matrix. What is the relationship between the singular vectors and eigenvectors of ${A}$ ?

Remark 6 When ${A}$ is not normal, the relationship between eigenvalues and singular values is more subtle. We will discuss this point in later notes.

Exercise 19 If ${A}$ is a ${p \times n}$ complex matrix for some ${1 \leq p \leq n}$ , show that ${AA^*}$ has eigenvalues ${\sigma_1(A)^2,\ldots,\sigma_p(A)^2}$ , and ${A^* A}$ has eigenvalues ${\sigma_1(A)^2,\ldots,\sigma_p(A)^2}$ together with ${n-p}$ copies of the eigenvalue zero. Based on this observation, give an alternate proof of the singular value decomposition theorem using the spectral theorem for (positive semi-definite) Hermitian matrices.

Exercise 20 Show that the rank of a ${p \times n}$ matrix is equal to the number of non-zero singular values.

Exercise 21 Let ${A}$ be a ${p \times n}$ complex matrix for some ${1 \leq p \leq n}$ . Establish the Courant-Fischer min-max formula

$\displaystyle \sigma_i(A) = \sup_{\hbox{dim}(V)=i} \inf_{v \in V; |v|=1} |Av| \ \ \ \ \ (26)$

for all ${1 \leq i \leq p}$ , where the supremum ranges over all subspaces of ${{\mathbb C}^n}$ of dimension ${i}$ .

One can use the above exercises to deduce many inequalities about singular values from analogous ones about eigenvalues. We give some examples below.

Exercise 22 Let ${A, B}$ be ${p \times n}$ complex matrices for some ${1 \leq p \leq n}$ .

(i) Establish the Weyl inequality ${\sigma_{i+j-1}(A+B) \leq \sigma_i(A) + \sigma_j(B)}$ whenever ${1 \leq i,j,i+j-1 \leq p}$ .

(ii) Establish the Lidskii inequality
$\displaystyle \sigma_{i_1}(A+B)+\ldots+\sigma_{i_k}(A+B) \leq \sigma_{i_1}(A)+\ldots+\sigma_{i_k}(A)$

$\displaystyle +\sigma_1(B)+\ldots+\sigma_k(B)$

whenever ${1 \leq i_1 < \ldots < i_k \leq p}$ .

(iii) Show that for any ${1 \leq k \leq p}$ , the map ${A \mapsto \sigma_1(A)+\ldots+\sigma_k(A)}$ defines a norm on the space ${{\mathbb C}^{p \times n}}$ of complex ${p \times n}$ matrices (this norm is known as the ${k^{th}}$ Ky Fan norm).

(iv) Establish the Weyl inequality ${|\sigma_i(A+B) - \sigma_i(A)| \leq \| B \|_{op}}$ for all ${1 \leq i \leq p}$ .

(v) More generally, establish the ${q}$ -Weilandt-Hoffman inequality ${\| (\sigma_i(A+B)-\sigma_i(A))_{1 \leq i \leq p} \|_{\ell^q_p} \leq \|B\|_{S^q}}$ for any ${1 \leq q \leq \infty}$ , where ${\|B\|_{S^q} := \| (\sigma_i(B))_{1 \leq i \leq p} \|_{\ell^q_p}}$ is the ${q}$ -Schatten norm of ${B}$ . (Note that this is consistent with the previous definition of the Schatten norms.)

(vi) Show that the ${q}$ -Schatten norm is indeed a norm on ${{\mathbb C}^{p \times n}}$ for any ${1 \leq q \leq \infty}$ .

(vii) If ${A'}$ is formed by removing one row from ${A}$ , show that ${\sigma_{i+1}(A) \leq \sigma_i(A') \leq \sigma_i(A)}$ for all ${1 \leq i < p}$ .

(viii) If ${p<n}$ and ${A'}$ is formed by removing one column from ${A}$ , show that ${\sigma_{i+1}(A) \leq \sigma_i(A') \leq \sigma_i(A)}$ for all ${1 \leq i < p}$ and ${\sigma_p(A') \leq \sigma_p(A)}$ . What changes when ${p=n}$ ?

Exercise 23 Let ${A}$ be a ${p \times n}$ complex matrix for some ${1 \leq p \leq n}$ . Observe that the linear transformation ${A: {\mathbb C}^n \rightarrow {\mathbb C}^p}$ naturally induces a linear transformation ${A^{\wedge k}: \bigwedge^k {\mathbb C}^n \rightarrow \bigwedge^k {\mathbb C}^p}$ from ${k}$ -forms on ${{\mathbb C}^n}$ to ${k}$ -forms on ${{\mathbb C}^p}$ . We give ${\bigwedge^k {\mathbb C}^n}$ the structure of a Hilbert space by declaring the basic forms ${e_{i_1} \wedge \ldots \wedge e_{i_k}}$ for ${1 \leq i_1 < \ldots < i_k \leq n}$ to be orthonormal.

For any ${1 \leq k \leq p}$ , show that the operator norm of ${A^{\wedge k}}$ is equal to ${\sigma_1(A) \ldots \sigma_k(A)}$ .

Exercise 24 Let ${A}$ be a ${p \times n}$ matrix for some ${1 \leq p \leq n}$ , let ${B}$ be a ${r \times p}$ matrix, and let ${C}$ be a ${n \times s}$ matrix for some ${r, s \geq 1}$ .

Show that ${\sigma_i(BA) \leq \|B\|_{op} \sigma_i(A)}$ and ${\sigma_i(AC) \leq \sigma_i(A) \|C\|_{op}}$ for any ${1 \leq i \leq p}$ .

Exercise 25 Let ${A = (a_{ij})_{1 \leq i \leq p; 1 \leq j \leq n}}$ be a ${p \times n}$ matrix for some ${1 \leq p \leq n}$ , let ${i_1,\ldots,i_k \in \{1,\ldots,p\}}$ be distinct, and let ${j_1,\ldots,j_k \in \{1,\ldots,n\}}$ be distinct. Show that

$\displaystyle a_{i_1 j_1} + \ldots + a_{i_k j_k} \leq \sigma_1(A) + \ldots + \sigma_k(A).$

Using this, show that if ${j_1,\ldots,j_p \in \{1,\ldots,n\}}$ are distinct, then

$\displaystyle \| (a_{i j_i})_{i=1}^p \|_{\ell^q_p} \leq \|A\|_{S^q}$

for every ${1 \leq q \leq \infty}$ .

Exercise 26 Establish the Hölder inequality

$\displaystyle |\hbox{tr}(A B^*)| \leq \|A\|_{S^q} \|B\|_{S^{q'}}$

whenever ${A, B}$ are ${p \times n}$ complex matrices and ${1 \leq q,q' \leq \infty}$ are such that ${1/q+1/q'=1}$ .

Acknowledgments: Thanks to Allen Knutson for corrections.

70 comments

Comments feed for this article

13 January, 2010 at 6:10 am

Marius

Hi Terry,

Linear algebra is indeed fascinating. Maybe the readers of this post may find
interesting some connections between this subject and quasiconvexity in the
calculus of variations and nonlinear elasticity:

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V0R-4SPSHN6-3&_user=3797462&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1164865628&_rerunOrigin=google&_acct=C000061416&_version=1&_urlVersion=0&_userid=3797462&md5=d8361adf4dc60e9323fc42216695b1cd

Click to access major_stem.pdf

14 January, 2010 at 1:50 pm

Anonymous

A potential typo: in eq. 19 (and elsewhere), I thought the norm should be 1, not 0.

[Corrected, thanks – T.]

15 January, 2010 at 7:59 am

kashif

A small typo start of section 4: “From the Weyl inequality (13) we now that…” should be “From the Weyl inequality (13) we know that…”

[Corrected, thanks – T.]

18 January, 2010 at 6:29 pm

254A, Notes 3b: Brownian motion and Dyson Brownian motion « What’s new

[…] from the first and second Hadamard variation formulae (see Section 4 of Notes 3a) we […]

19 January, 2010 at 4:15 pm

David Speyer

Possible typo: Should equation (19) read u_i^* u_i =1 (rather than 0)?

[Ack, I thought I corrected that typo already. Corrected again, thanks – T.]

2 February, 2010 at 1:34 pm

254A, Notes 4: The semi-circular law « What’s new

[…] first observation is that the Cauchy interlacing law (Exercise 14 from Notes 3a) shows that the ESD of is very stable in . Indeed, we see from the interlacing law […]

17 February, 2010 at 8:30 am

Steven Heilman

Some pretty pedantic potential corrections:

1. A basic question in linear algebra …\lambda_{1}(B),\ldots,\lambda_{n}(B)

2. Eqs. (5),(9),(12): numerical label outside viewing window?

3. Section 3: Similarly, [for] the Ky Fan…

4. Section 4: From the Weyl inequality… we [k]now

5. After Exercise 10: … from the above exercise and [ ] (13)

6(a). Statement of Theorem 4: A^* v_j=\sigma_j u_j

6(b). Proof of Theorem 4: \sigma_1 :=\|Au_1 \|>0 ?

7. Exercise 23: induces a linear transformation … \bigwedge^k {\mathbb C}^p …

——————-
[and a few for Notes 3b]

1. After Remark 6: Applying [Lemma] (5)

2. … with disjoint increments being jointly indepen[d]ent

3. Proof of Theorem 6: …G continues [to] have the GUE distribution…

[Corrected, thanks – T.]

23 February, 2010 at 10:02 pm

254A, Notes 6: Gaussian ensembles « What’s new

[…] is simple. Since almost all Hermitian matrices have simple spectrum (see Exercise 10 of Notes 3a), this gives the full spectral distribution of GUE, except for the issue of the unspecified […]

14 March, 2010 at 11:33 am

254A, Notes 8: The circular law « What’s new

[…] case, where eigenvalue inequalities such as the Weyl inequalities or Hoffman-Wielandt inequalities (Notes 3a) ensure stability of the […]

8 May, 2010 at 1:30 am

Anonymous

Dear Prof. Tao,

I had some problems on Ex. 8 and Ex. 13:

Ex. 8: Could you give some hint on how to prove it? This is most probably very trivial, but I tried it for several hours.

Ex. 13: I tried to do it according to your hint, but it is hard for me how to make the other eigenvalues $\lambda_j$ involved in the equations of $\ddot{\lambda}_k$ .

Thank you very much in advance!

8 May, 2010 at 1:12 pm

Terence Tao

For Ex 8, apply the method of Ex6 but with (12) replaced by (9).

For Ex13, a solution can be found at the link provided.

21 May, 2010 at 10:11 am

Ben

Hi,
I’m a bit confused about your use of the inverse/implicit function theorem to get smoothness of the eigenvectors $u_i(A)$ in a neighbourhood of $A$ . Is there not an issue that the matrix $A- \lambda I$ is singular? Is this where the normalization $u_i*u_i =1$ comes in? Thanks for your help.

21 May, 2010 at 12:07 pm

Terence Tao

Yes, with the normalisation $u_i^* u_i = 1$ (and working in the real symmetric setting for simplicity), the map $(\lambda, u) \mapsto (A-\lambda I) u$ is a map from an n-dimensional space ${\bf R} \times {\bf S}^{n-1}$ to an n-dimensoinal space ${\bf R}^n$ , and one can check by hand that the Jacobian is nondegenerate when the eigenvalues are simple, so the inverse theorem lets one select both $\lambda$ and $u$ smoothly in this case. In the complex Hermitian setting it is a bit more complicated, one has to also fix the phase of one of the coordinates of $u_i$ (e.g. insisting that the first coordinate is a positive real) to make the dimensions match.

There is a theorem of Rellich that shows that eigenvectors and eigenvalues can in fact be selected in an analytic fashion even when there are repeated eigenvalues, but only if one abandons the idea of keeping the order of the eigenvalues fixed (i.e. one allows one eigenvalue to overtake another). This is a nontrivial theorem to prove, though.

2 October, 2014 at 10:02 am

hmobahi

Thank you Terrence. This was an extremely helpful comment. Does your comment imply that the function \sum_{k=1}^n (\frac{d}{d t} \lambda_k(t)) f_k(\lambda_k(t)) is well-defined even when eigenvalues are not simple? Here f_k(x) is any functional that analytically depends on x. I think about this, because the sum kills the dependency of a specific ordering of the eigenvaules.

30 May, 2010 at 5:12 pm

Anonymous

Question and a correction.

In Exercise 5, the second inequality should be A->A+B
n->A, n->B.

Question: could a hint be given on how to prove the p-Weilandt-Hoffman inequality using Holder’s inequality? I do not have any clue.

30 May, 2010 at 11:05 pm

Terence Tao

Thanks for the correction!

By Holder’s inequality, the $\ell^p$ norm of a sequence is equal to the largest inner product between that sequence and a sequence of unit $\ell^{p'}$ norm. Now use the fact that the $\ell^{p'}$ norm is invariant with respect to rearrangement.

6 February, 2012 at 9:03 pm

Anonymous

Dear Prof. Tao, to get to get the first inequality in Exercise 6, do you multiply the Lidskii’s inequality by the integral given in the hint, and then how can one get $c_{i}^{*}$ or have an rearrangement of $c_i$ in front of $\lambda_1(B)$ ? Thanks for your help!

31 May, 2010 at 11:48 am

Anonymous

Thanks for the reply. But I am still confused about how to get (14) using ONLY the first inequality given in Exercise 6 and the Holder’s inequality.

TO generate the l_p norm for (\lamda_{i}(A+B)-\lamda_{i}(A)) using Holder’s inequality, we need to use a c_i that is negative if (\lamda_{i}(A+B)-\lamda_{i}(A)) is negative.

But this inequality is only about c_i that is nonnegative.

For example, consider the scalar case when A,B are both scalars, \lamda_{1}(A+B)-\lamda_{1}(A) <\lamda_1(B), does not necessarily mean the absolute value of the former is smaller than the absolute value of the latter.

Should we use some other conditions?

31 May, 2010 at 6:56 pm

Terence Tao

Because the trace of A+B is the sum of the trace of A and the trace of B, one can extend the original inequality to the case when the c_i are arbitrary reals.

6 February, 2012 at 9:06 pm

Anonymous

Dear Prof. Tao, to get the first inequality in Exercise 6, do you multiply the Lidskii’s inequality by the integral given in the hint, and then how can one get $c_{i}^{*}$ or have an rearrangement of $c_i$ in front of $\lambda_i(B)$ ? Thanks for your help!

22 December, 2010 at 9:08 pm

Outliers in the spectrum of iid matrices with bounded rank permutations « What’s new

[…] were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the […]

19 January, 2011 at 6:55 pm

San

In proposition3, how do you use induction hypothesis, as the hypothesis would be true for the sum of first k eigenvalues.
Thanks,
San

20 January, 2011 at 12:13 pm

Terence Tao

When one restricts A to the span of $e_2,\ldots,e_n$ to obtain an n-1-dimensional linear operator, the eigenvalues are now $\lambda_2(A),\ldots,\lambda_n(A)$ , thanks to the hypothesis that A has the standard eigenbasis (so that it is given by the diagonal matrix with entries $\lambda_1(A),\ldots,\lambda_n(A)$ .

20 January, 2011 at 5:41 pm

San

Thanks for the reply. Can I ask where can I find the solution of exercise3.
Sangeeta

24 July, 2019 at 5:37 am

Anonymous

https://math.stackexchange.com/q/2863568/469791

22 January, 2011 at 9:26 am

Sandor

A direct link to “my survey with Allen Knutson” is http://www.ams.org/notices/200102/fea-knutson.pdf

22 August, 2011 at 4:17 am

Interlaced eigenvalues « Alasdair's musings

[…] Another proof uses the min-max theorem, of which a full account is given by Terry Tao. […]

13 February, 2012 at 8:52 am

kushal

Hi,
1.Is there any upper bound for the norm of the inverse of a matrix (i.e. ||inv(A)||), in terms of the norm of the matrix , i.e. ||A|| ?

NB. There is no restriction on A such as det(A)= ???
Is there any lower bound for problem-2 that involves only A and not inv(A)?

11 July, 2013 at 11:03 am

Anonymous

Hello Prof. Tao,

Can these results be applied to matrices of differnt dimensions? For example, I have a m*m matrix (A) of rank n<m, and add another matrix of dimension q*q (B), where q< n. All matrices are Hermitian. Can I find a bound on the absolute eigenvalues of the system? The eigenvalues are lambda_1, – lambda_1, lambda_2, -lambda_2, and so on.

16 April, 2014 at 3:42 pm

Anonymous

Is there a relation for the eigenvalues of the sum of an anti-Hermitian A and a Hermitian matrix B? For example, if I want to know the eigenvalues of A + B, can I simply calculate the eigenvalues of A and B seperately and then get the eigenvalues of A + B from them?

16 April, 2014 at 4:03 pm

Terence Tao

If A and B commute, then the eigenvalues of A+B are the sum of eigenvalues of A and some permutation of the eigenvalues of B.

In general, though, the situation is quite complicated, since A+B will usually not be a normal matrix, and the eigenvalues will be quite badly behaved complex numbers. One can still control the singular values, and hence the eigenvalues to some extent, by the Weyl inequalities, and one also controls the trace of A+B, but other than that things look quite messy, even for the 2×2 matrix case.

26 August, 2015 at 4:38 am

Hamed

I do not know about the application you have in mind. But in case you need to know the bounds of eigenvalues (for example if you want to analysis the stability of some dynamical system) this structure can help you. You can use Bendixon’s theorem which says that the eigenvalues of A+B are bounded in a box, when the real parts are bounded by eigenvalues of B (Hermitian) and the imaginary parts are bounded by eigenvalues of A (anti-Hermitian).

27 June, 2014 at 2:18 am

Felix V.

Dear Professor Tao,

in Exercise 10, you ask the reader to proof that

“the space of Hermitian matrices with at least one repeated eigenvalue has codimension {3} in the space of all Hermitian matrices.”

It seemed to me that “space” should be interpreted as “(real) vector space” in this context.

But then the problem is that the set of matrices with at least one repeated eigenvalue is NOT a vector space (not closed under addition), as can be seen by considering e.g. diagonal matrices:

{
\left(\begin{array}{ccc}
1\\
& 1\\
& & 2
\end{array}\right)+\left(\begin{array}{ccc}
0\\
& 1\\
& & 1
\end{array}\right)=\left(\begin{array}{ccc}
1\\
& 2\\
& & 3
\end{array}\right)
}

Best regards,

Felix V.

27 June, 2014 at 8:10 am

Terence Tao

Here I am not interpreting “space” in the linear algebra sense; you may substitute “set” or “algebraic set” instead, if you prefer.

17 March, 2015 at 5:47 am

Lecture #22: Random Sampling for Fast SVDs | CMU Advanced Algos: S15

[…] pointed out that it follows from the Weyl inequalities. These say that for all integers such that […]

28 March, 2015 at 7:57 pm

kumar vishwajeet

Dear Dr Tao,
I am currently working on the distribution of eigenvalues of non central wishart matrices. I found the following interesting thing.
Let $\mathbf{A} = \mathbf{M}+\mu$ be a random complex matrix of size $m\times n; n\ge m$ . $\mu$ has circular Gaussian distribution with mean 0 and variance $\sigma^2$
Consider another case where, $\mathbf{B} = \mathbf{M}+\eta$ where $\eta$ has Gaussian distribution with mean 0 and variance $\frac{\sigma^2}{2}$ . Using monte Carlo, I found that the probaility density function of the $\mathbf{AA}^{\dagger}$ and that of $\mathbf{BB}^T$ only differs by a multiplying constant. Is there any mathematical proof or logical explanation for this?

29 March, 2015 at 8:08 am

Terence Tao

The eigenvalues of $AA^\dagger$ and $BB^T$ are the squares of the eigenvalues of the Hermitian matrices $\begin{pmatrix} 0 & M \\ M^T & 0 \end{pmatrix} + \begin{pmatrix} 0 & \mu \\ \mu^\dagger & 0 \end{pmatrix}$ and $\begin{pmatrix} 0 & M \\ M^T & 0 \end{pmatrix} + \begin{pmatrix} 0 & \eta \\ \eta^\dagger & 0 \end{pmatrix}$ respectively. So my guess is that the matrices $\begin{pmatrix} 0 & \mu \\ \mu^\dagger& 0 \end{pmatrix}$ and $\begin{pmatrix} 0 & \eta \\ \eta^\dagger & 0 \end{pmatrix}$ have essentially the same spectrum (semicircle law, I think) and are behaving as if they were free of $\begin{pmatrix} 0 & M \\ M^T & 0 \end{pmatrix}$ . (I’m not sure what you mean by the multiplying constant difference though – the probability density function has to integrate to 1.)

28 March, 2015 at 8:12 pm

kumar vishwajeet

Just a modification of my last post: The probability density function of eigenvalues of $\mathbf{AA}^{\dagger}$ and $\mathbf{BB}^T$ differ by a multiplying constant.

29 March, 2015 at 12:46 pm

kumar vishwajeet

Thank you for the prompt reply.
By “Multiplying constant difference” I mean that if $p_{\Lambda_a}(\lambda_1,\lambda_2,\cdots,\lambda_m)$ be the pdf of eigenvalues of $\mathbf{AA}^{\dagger}$ and $p_{\Lambda_b}(\lambda_1,\lambda_2,\cdots,\lambda_m)$ be the pdf of eigenvalues of $\mathbf{BB}^T$ , then

$p_{\Lambda_a}(\lambda_1,\lambda_2,\cdots,\lambda_m)=Kp_{\Lambda_b}(\lambda_1,\lambda_2,\cdots,\lambda_m)$ , where $K$ is a constant.

29 March, 2015 at 3:12 pm

kumar vishwajeet

I got it. $K$ has to be equal to one.

24 September, 2015 at 4:31 pm

Abbas

What does $|Av|$ denote in Exercise 21?

[The magnitude of the vector $Av$ formed by multiplying the matrix $A$ with the vector $v$ . -T.]

30 March, 2016 at 4:39 pm

David

Prof. Tao,

I am a novice going through a few of these exercises and ran into trouble on #6. Here you suggest using Holder’s inequality to prove the p-Wielandt Hoffman inequality. This works very straightforwardly to yield an Lp norm for the right hand side of the inequality (in (14)), but the Lp norm also must be obtained on the left hand side, which is more difficult. Can you comment on how to proceed for the left hand side of (14)?

30 March, 2016 at 9:24 pm

Terence Tao

Use the converse form $\|f\|_p = \sup \{ \langle f, g \rangle: \|g\|_{p'} = 1 \}$ of Holder’s inequality.

31 March, 2016 at 11:22 am

David

Very nice, thanks.

24 May, 2016 at 4:44 am

Anonymous

Dear Dr. Tao,

I was thinking about a way to show that the partial trace is independent of the choice an orthonormal basis. But I kept failing to find a good proof. Is there any hint you can give me ?

24 May, 2016 at 8:35 am

Terence Tao

Given two orthonormal bases for $V$ , write the elements of one basis as a linear combination of the other basis; the coefficients will then form a unitary matrix since both bases are orthonormal. Insert these expansions of the basis elements to write the partial trace in one basis in terms of the other basis, and use the definition of a unitary matrix to simplify.

Alternatively, one can note that the partial trace is also equal to the full trace of $P_V A P_V$ , where $P_V$ is the orthogonal projection onto $V$ , and use the standard fact that the full trace is basis-independent.

24 May, 2016 at 10:32 am

Anonymous

Thank you, the second one is the one I was looking for.

27 May, 2016 at 12:01 am

David

Dear Dr. Tao,
I would be interested to see the proof of the dual Weyl inequality if you happen to have it handy. I have read the proof in “A Note on Weyl’s Inequality” by Steve Fisk, but I failed to really understand it.

Thanks,
David

27 May, 2016 at 7:56 am

Terence Tao

One can for instance derive the dual Weyl inequality from the original Weyl inequality by repeatedly using the identity $\lambda_i(-A) = -\lambda_{n+1-i}(A)$ .

24 June, 2016 at 8:43 am

vsk1996

Tao sir please give some simpler version of (Courant-Fischer Theorem) and relate it with rayleigh quotient

25 October, 2016 at 3:39 am

Archy

Prof. Tao, is there a simplification of this theory, if all the Hermitian matrices are rank 1? thank you very much!

26 November, 2016 at 11:05 am

keej

These notes are great. I think there is a typo in the definition of spectral projection in Exercise 1. It should be $u_j(A) u_j(A)^*$, not $u_j(A)^* u_j(A)$.

[Corrected, thanks – T.]

2 March, 2017 at 12:08 am

keej

In the proof of the SVD, we use the fact that a critical point of the map $u \mapsto \Vert A u \Vert^2 - \sigma_1^2 \Vert u \Vert^2$ must satisfy the identity $A^*A u = \sigma^2 u$ . I have two questions about this.

First, I can see why this is true by expanding out all the entries and differentiating, but I feel like there should be a faster way. Is there?

Secondly, what does the derivative even mean in the case that $u$ has complex entries? The map in question is real-valued, so how can its gradient be a complex vector?

2 March, 2017 at 9:48 am

Terence Tao

One can first try to understand the situation when $A$ is a diagonal matrix, in which case the map basically splits as the sum of $n$ non-interacting one-dimensional maps, and the claim is then easy to understand in that setting. The existence of the SVD in fact tells us that the critical point behaviour of a general matrix $A$ will be the same as that of a diagonal matrix (with the singular values as entries), so if one believes the SVD to exist then this explains why this identity ought to be true. Of course one cannot use this observation to prove the existence of SVD, as this would be circular, but it can certainly be used to motivate such a proof.

Any complex vector space can be viewed as a real vector space of twice the dimension, so one can take the gradient in that setting.

16 February, 2018 at 4:52 pm

Dita

Dear Prof. Tao, I have one question. If the eigenvalues of matrices A and
A + B are increasing ordered, but the eigenvalues of B are not ordered, is it possible to use Weyl’s inequality to get a bound for the eigenvalue of A + B? Thank you for your attention.

4 May, 2018 at 10:55 am

Anonymous

Is the Weyl’s inequality true even in the case when we have separable Hilbert space?

4 May, 2018 at 12:54 pm

Terence Tao

For compact self-adjoint operators, basically all the inequalities for finite-dimensional operators extend to the infinite-dimensional setting by a limiting argument, see https://arxiv.org/abs/0709.1088 . For non-compact operators, in which the spectrum need not be pure point, the situation is significantly more complicated, and I am not sure what can cleanly be said (beyond the subadditivity of operator norms).

12 July, 2018 at 4:08 pm

Anonymous

Do (vii/viii) in Exercise 22 imply that cauchy interlacing theorem holds for eigenvalues of matrices that not necessarily hermitian?

[Oops, this was a typo: all references to eigenvalues here have been replaced with singular values. There is no interlacing for eigenvalues for non-Hermitian matrices, as these eigenvalues need not even be real. -T.]

6 April, 2019 at 5:48 pm

Schur-Horn | Random Mathematics

[…] do: Do the exercises of this post by […]

7 April, 2019 at 5:27 am

Diagonalization of Hermitian matrices | Random Mathematics

[…] Following Tao’s post here. […]

7 April, 2019 at 7:48 am

Courant-Fischer min-max Theorem | Random Mathematics

[…] Following Tao’s post here. […]

7 April, 2019 at 1:16 pm

Extremal partial traces | Random Mathematics

[…] Following Tao’s post. […]

7 April, 2019 at 6:07 pm

Wielandt minimax formula | Random Mathematics

[…] This is exercise 3 in Tao’s post here. […]

30 April, 2020 at 4:49 pm

Hamiltonian Density of States – Ramis Movassagh

[…] Horn’s conjecture was finally proved by Alexander Klaychko in (1998). Later Knutson and Tao gave a proof based on Honey comb puzzles and Schubert calculus (see this post). […]

30 September, 2020 at 10:02 am

Vaibhav

Does the positive-definiteness of the matrices A and B alter the eigenvalue inequalities, such as, Weyl inequalities? Is that a condition that needs consideration? Thanks in advance!

1 October, 2020 at 7:54 am

Terence Tao

Any self-adjoint matrix can be made positive definite by adding a large multiple of the identity, which has the effect of translating all the eigenvalues by a constant. One can check that inequalities such as Weyl’s inequalities are invariant under such a translation, hence the imposition of positive definiteness does not lead to any significant improvement to Weyl’s inequalities. On the other hand, being positive definite is equivalent to the eigenvalues being non-negative, so one now has some additional inequalities $\lambda_j \geq 0$ that one can combine with Weyl’s inequalities if one wishes to obtain some new, but *weaker* inequalities. For instance by adding $\lambda_1 \geq 0$ to $\nu_1 \leq \lambda_1 + \mu_1$ one gets the new, but worse, inequality $\nu_1 \leq 2\lambda_1 + \mu_1$ .

18 March, 2021 at 12:17 pm

The Better Way to Convert an SVD into a Symmetric Eigenvalue Problem – Ethan Epperly

[…] This approach of using the Hermitian dilation to compute the SVD of fixes all the issues identified with the “” approach. We are able to accurately resolve a full 16 orders of magnitude of singular values. The computed singular vectors are accurate and numerically orthogonal provided we use an accurate method for the symmetric eigenvalue problem. The Hermitian dilation preserves important structural characteristics in like sparsity. For purposes of theoretical analysis, the mapping is linear.5The linearity of the Hermitian dilation gives direct extensions of most results about the symmetric eigenvalues to singular values; see Exercise 22. […]

1 January, 2022 at 2:08 pm

Xavier Mootoo

Thank you Dr. Tao, this was really helpful for understanding the Courant-Fischer Theorem!

4 July, 2022 at 11:20 pm

matrices - Is there relation between eigenvalues of A,B,A+B,AB? Answer - Lord Web

[…] I found there are some works about eigenvalues of $A,B,A+B$ in the case of Hermitian matrices. For example, here. […]

30 January, 2024 at 8:45 pm

Anonymous

Hi professor,

Regarding the proof of the Hadamard first variation formula I am unable to see why for Hermitian matrices $\dot{u}_i^*u_i = 0$. If the inner product or eigenvectors were real this would be immediate from differentiating the unit norm condition, but for the complex inner product I only see that the real part must vanish. How can one show that the imaginary part vanishes as well?

[Real part inserted into the claim; control of the imaginary part is not needed for the rest of the argument. -T]

	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Pointwise ergodic theorems for…
	Anonymous on 275A, Notes 3: The weak and st…
	Terence Tao on Pointwise ergodic theorems for…
	Terence Tao on Erratum for “An inverse…
	Anonymous on Notes on the B+B+t theore…
	Anonymous on Pointwise ergodic theorems for…
	Anonymous on Erratum for “An inverse…
	Erratum for “A… on An inverse theorem for the Gow…
	Anonymous on Analysis II
	Anonymous on Notes on the B+B+t theore…
	Anonymous on Twisted convolution and the se…
	Anonymous on A generalized Cauchy-Schwarz i…
	Notes on the B+B+t t… on Ultrafilters, nonstandard anal…

254A, Notes 3a: Eigenvalues and sums of Hermitian matrices

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

70 comments

Leave a comment Cancel reply

For commenters

254A, Notes 3a: Eigenvalues and sums of Hermitian matrices

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

70 comments

Leave a comment Cancel reply

For commenters