Let be a Hermitian matrix. By the spectral theorem for Hermitian matrices (which, for sake of completeness, we prove below), one can diagonalise using a sequence
of real eigenvalues, together with an orthonormal basis of eigenvectors . (The eigenvalues are uniquely determined by , but the eigenvectors have a little ambiguity to them, particularly if there are repeated eigenvalues; for instance, one could multiply each eigenvector by a complex phase . In these notes we are arranging eigenvalues in descending order; of course, one can also arrange eigenvalues in increasing order, which causes some slight notational changes in the results below.) The set is known as the spectrum of .
A basic question in linear algebra asks the extent to which the eigenvalues and of two Hermitian matrices constrains the eigenvalues of the sum. For instance, the linearity of trace
and so forth.
The complete answer to this problem is a fascinating one, requiring a strangely recursive description (once known as Horn’s conjecture, which is now solved), and connected to a large number of other fields of mathematics, such as geometric invariant theory, intersection theory, and the combinatorics of a certain gadget known as a “honeycomb”. See for instance my survey with Allen Knutson on this topic some years ago.
In typical applications to random matrices, one of the matrices (say, ) is “small” in some sense, so that is a perturbation of . In this case, one does not need the full strength of the above theory, and instead rely on a simple aspect of it pointed out by Helmke and Rosenthal and by Totaro, which generates several of the eigenvalue inequalities relating , , and , of which (1) and (3) are examples. (Actually, this method eventually generates all of the eigenvalue inequalities, but this is a non-trivial fact to prove.) These eigenvalue inequalities can mostly be deduced from a number of minimax characterisations of eigenvalues (of which (2) is a typical example), together with some basic facts about intersections of subspaces. Examples include the Weyl inequalities
valid whenever and , and the Ky Fan inequality
One consequence of these inequalities is that the spectrum of a Hermitian matrix is stable with respect to small perturbations.
We will also establish some closely related inequalities concerning the relationships between the eigenvalues of a matrix, and the eigenvalues of its minors.
Many of the inequalities here have analogues for the singular values of non-Hermitian matrices (which is consistent with the discussion near Exercise 16 of Notes 3). However, the situation is markedly different when dealing with eigenvalues of non-Hermitian matrices; here, the spectrum can be far more unstable, if pseudospectrum is present. Because of this, the theory of the eigenvalues of a random non-Hermitian matrix requires an additional ingredient, namely upper bounds on the prevalence of pseudospectrum, which after recentering the matrix is basically equivalent to establishing lower bounds on least singular values. We will discuss this point in more detail in later notes.
We will work primarily here with Hermitian matrices, which can be viewed as self-adjoint transformations on complex vector spaces such as . One can of course specialise the discussion to real symmetric matrices, in which case one can restrict these complex vector spaces to their real counterparts . The specialisation of the complex theory below to the real case is straightforward and is left to the interested reader.
— 1. Proof of spectral theorem —
To prove the spectral theorem, it is convenient to work more abstractly, in the context of self-adjoint operators on finite-dimensional Hilbert spaces:
Theorem 1 (Spectral theorem) Let be a finite-dimensional complex Hilbert space of some dimension , and let be a self-adjoint operator. Then there exists an orthonormal basis of and eigenvalues such that for all .
The spectral theorem as stated in the introduction then follows by specialising to the case and ordering the eigenvalues.
Proof: We induct on the dimension . The claim is vacuous for , so suppose that and that the claim has already been proven for .
Let be a unit vector in (thus ) that maximises the form ; this maximum exists by compactness. By the method of Lagrange multipliers, is a critical point of for some . Differentiating in an arbitrary direction , we conclude that
this simplifies using self-adjointness to
Since was arbitrary, we conclude that , thus is a unit eigenvector of . By self-adjointness, this implies that the orthogonal complement of is preserved by . Restricting to this lower-dimensional subspace and applying the induction hypothesis, we can find an orthonormal basis of eigenvectors of on . Adjoining the new unit vector to the orthonormal basis, we obtain the claim.
Suppose we have a self-adjoint transformation , which of course can be identified with a Hermitian matrix. Using the orthogonal eigenbasis provided by the spectral theorem, we can perform an orthonormal change of variables to set that eigenbasis to be the standard basis , so that the matrix of becomes diagonal. This is very useful when dealing with just a single matrix – for instance, it makes the task of computing functions of , such as or , much easier. However, when one has several Hermitian matrices in play (e.g. ), then it is usually not possible to standardise all the eigenbases simultaneously (i.e. to simultaneously diagonalise all the matrices), except when the matrices all commute. Nevertheless one can still normalise one of the eigenbases to be the standard basis, and this is still useful for several applications, as we shall soon see.
Exercise 1 Suppose that the eigenvalues of an Hermitian matrix are distinct. Show that the associated eigenbasis is unique up to rotating each individual eigenvector by a complex phase . In particular, the spectral projections are unique. What happens when there is eigenvalue multiplicity?
— 2. Minimax formulae —
The eigenvalue functional is not a linear functional (except in dimension one). It is not even a convex functional (except when ) or a concave functional (except when ). However, it is the next best thing, namely it is a minimax expression of linear functionals. (Note that a convex functional is the same thing as a max of linear functionals, while a concave functional is the same thing as a min of linear functionals.) More precisely, we have
for all , where ranges over all subspaces of with the indicated dimension.
We first verify the case, i.e. (2). By the spectral theorem, we can assume that has the standard eigenbasis , in which case we have
whenever . The claim (2) is then easily verified.
To prove the general case, we may again assume has the standard eigenbasis. By considering the space spanned by , we easily see the inequality
so we only need to prove the reverse inequality. In other words, for every -dimensional subspace of , we have to show that contains a unit vector such that
Let be the space spanned by . This space has codimension , so it must have non-trivial intersection with . If we let be a unit vector in , the claim then follows from (8).
Remark 1 By homogeneity, one can replace the restriction with provided that one replaces the quadratic form with the Rayleigh quotient .
A closely related formula is as follows. Given an Hermitian matrix and an -dimensional subspace of , we define the partial trace to be the expression
where is any orthonormal basis of . It is easy to see that this expression is independent of the choice of orthonormal basis, and so the partial trace is well-defined.
As a corollary, we see that is a convex function, and is a concave function.
Proof: Again, by symmetry it suffices to prove the first formula. As before, we may assume without loss of generality that has the standard eigenbasis. By selecting to be the span of we have the inequality
so it suffices to prove the reverse inequality. For this we induct on dimension. If has dimension , then it has a -dimensional subspace that is contained in the span of . By the induction hypothesis, we have
On the other hand, if is a unit vector in the orthogonal complement of in , we see from (2) that
Adding the two inequalities we obtain the claim.
Specialising Proposition 3 to the case when is a coordinate subspace (i.e. the span of of the basis vectors ), we conclude the Schur-Horn inequalities
for any , where are the diagonal entries of .
Exercise 2 Show that the inequalities (9) are equivalent to the assertion that the diagonal entries lies in the permutahedron of , defined as the convex hull of the permutations of in .
Remark 2 It is a theorem of Schur and Horn that these are the complete set of inequalities connecting the diagonal entries of a Hermitian matrix to its spectrum. To put it another way, the image of any coadjoint orbit of a matrix with a given spectrum under the diagonal map is the permutahedron of . Note that the vertices of this permutahedron can be attained by considering the diagonal matrices inside this coadjoint orbit, whose entries are then a permutation of the eigenvalues. One can interpret this diagonal map as the moment map associated with the conjugation action of the standard maximal torus of (i.e. the diagonal unitary matrices) on the coadjoint orbit. When viewed in this fashion, the Schur-Horn theorem can be viewed as the special case of the more general Atiyah convexity theorem (also proven independently by Guillemin and Sternberg) in symplectic geometry. Indeed, the topic of eigenvalues of Hermitian matrices turns out to be quite profitably viewed as a question in symplectic geometry (and also in algebraic geometry, particularly when viewed through the machinery of geometric invariant theory).
Exercise 3 (Wielandt minimax formula) Let be integers. Define a partial flag to be a nested collection of subspaces of such that for all . Define the associated Schubert variety to be the collection of all -dimensional subspaces such that . Show that for any matrix ,
— 3. Eigenvalue inequalities —
for any subspace .
for any .
In a similar spirit, using the inequality
thus the spectrum of is close to that of if is small in operator norm. In particular, we see that the map is Lipschitz continuous on the space of Hermitian matrices, for fixed .
But from (6), one can find a subspace of codimension such that for all unit vectors in , and a subspace of codimension such that for all unit vectors in . The intersection has codimension at most and so has a nontrivial intersection with ; and the claim follows.
Remark 3 More generally, one can generate an eigenvalue inequality whenever the intersection numbers of three Schubert varieties of compatible dimensions is non-zero; see the paper of Helmke and Rosenthal. In fact, this generates a complete set of inequalities; this is a result of Klyachko. One can in fact restrict attention to those varieties whose intersection number is exactly one; this is a result of Knutson, Woodward, and myself. Finally, in those cases, the fact that the intersection is one can be proven by entirely elementary means (based on the standard inequalities relating the dimension of two subspaces to their intersection and sum ); this is a result of Bercovici, Collins, Dykema, Li, and Timotin. As a consequence, the methods in this section can, in principle, be used to derive all possible eigenvalue inequalities for sums of Hermitian matrices.
Exercise 5 Establish the dual Lidskii inequality
for any and the dual Weyl inequality
Exercise 6 Use the Lidskii inequality to establish the more general inequality
whenever , and is the decreasing rearrangement of . (Hint: express as the integral of as runs from to infinity. For each fixed , apply (12).) Combine this with Hölder’s inequality to conclude the -Weilandt-Hoffman inequality
for any , where
is the -Schatten norm of .
Exercise 7 Show that the -Schatten norms are indeed a norm on the space of Hermitian matrices for every .
Exercise 9 Establish the Hölder inequality
whenever with , and are Hermitian matrices. (Hint: Diagonalise one of the matrices and use the preceding exercise.)
where are the coeffiicents of . (The -Schatten norm , also known as the nuclear norm or trace class norm, is important in a number of applications, such as matrix completion, but will not be used often in this course.) Thus we see that the case of the Weilandt-Hoffman inequality can be written as
We will give an alternate proof of this inequality, based on eigenvalue deformation, in the next section.
— 4. Eigenvalue deformation —
From the Weyl inequality (13), we know that the eigenvalue maps are Lipschitz continuous on Hermitian matrices (and thus also on real symmetric matrices). It turns out that we can obtain better regularity, provided that we avoid repeated eigenvalues. Fortunately, repeated eigenvalues are rare:
Exercise 10 (Dimension count) Suppose that . Show that the space of Hermitian matrices with at least one repeated eigenvalue has codimension in the space of all Hermitian matrices, and the space of real symmetric matrices with at least one repeated eigenvalue has codimension in the space of all real symmetric matrices. (When , repeated eigenvalues of course do not occur.)
Let us say that a Hermitian matrix has simple spectrum if it has no repeated eigenvalues. We thus see from the above exercise and (13) that the set of Hermitian matrices with simple spectrum forms an open dense set in the space of all Hermitian matrices, and similarly for real symmetric matrices; thus simple spectrum is the generic behaviour of such matrices. Indeed, the unexpectedly high codimension of the non-simple matrices (naively, one would expect a codimension set for a collision between, say, and ) suggests a repulsion phenomenon: because it is unexpectedly rare for eigenvalues to be equal, there must be some “force” that “repels” eigenvalues of Hermitian (and to a lesser extent, real symmetric) matrices from getting too close to each other. We now develop some machinery to make this more precise.
We first observe that when has simple spectrum, the zeroes of the characteristic polynomial are simple (i.e. the polynomial has nonzero derivartive at those zeroes). From this and the inverse function theorem, we see that each of the eigenvalue maps are smooth on the region where has simple spectrum. Because the eigenvectors are determined (up to phase) by the equations and , another application of the inverse function theorem tells us that we can (locally) select the maps to also be smooth. (There may be topological obstructions to smoothly selecting these vectors globally, but this will not concern us here as we will be performing a local analysis only. In later notes, we will in fact not work with the at all due to their phase ambiguity, and work instead with the spectral projections , which do not have this ambiguity.)
to obtain various equations of motion for and in terms of the derivatives of .
This can already be used to give alternate proofs of various eigenvalue identities. For instance, If we apply this to , we see that
whenever has simple spectrum. The right-hand side can be bounded in magnitude by , and so we see that the map is Lipschitz with norm whenever has simple spectrum, which happens for generic (and all ) by Exercise 10. By the fundamental theorem of calculus, we thus conclude (13).
Exercise 11 Use a similar argument to the one above to establish (17) without using minimax formulae or Lidskii’s inequality.
One can also compute the second derivative of eigenvalues:
whenever has simple spectrum and .
If one interprets the second derivative of the eigenvalues as being proportional to a “force” on those eigenvalues (in analogy with Newton’s second law), (23) is asserting that each eigenvalue “repels” the other eigenvalues by exerting a force that is inversely proportional to their separation (and also proportional to the square of the matrix coefficient of in the eigenbasis). See this earlier blog post for more discussion.
Remark 4 In the proof of the four moment theorem of Van Vu and myself, which we will discuss in a subsequent lecture, we will also need the variation formulae for the third, fourth, and fifth derivatives of the eigenvalues (the first four derivatives match up with the four moments mentioned in the theorem, and the fifth derivative is needed to control error terms). Fortunately, we do not need the precise formulae for these derivatives (which, as one can imagine, are quite complicated), but only their general form, and in particular an upper bound for these derivatives in terms of more easily computable quantities.
— 5. Minors —
In the previous sections, we perturbed Hermitian matrices by adding a (small) Hermitian correction matrix to them to form a new Hermitian matrix . Another important way to perturb a matrix is to pass to a principal minor, for instance to the top left minor of . There is an important relationship between the eigenvalues of the two matrices:
for all . (Hint: use the Courant-Fischer min-max theorem, Theorem 2.) Show furthermore that the space of for which equality holds in one of the inequalities in (24) has codimension (for Hermitian matrices) or (for real symmetric matrices).
Remark 5 If one takes successive minors of an Hermitian matrix , and computes their spectra, then (24) shows that this triangular array of numbers forms a pattern known as a Gelfand-Tsetlin pattern. These patterns are discussed a little more in this blog post.
One can obtain a more precise formula for the eigenvalues of in terms of those for :
Exercise 15 (Eigenvalue equation) Let be an Hermitian matrix with top left minor . Suppose that is an eigenvalue of distinct from all the eigenvalues of (and thus simple, by (24)). Show that
where is the bottom right entry of , and is the right column of (minus the bottom entry). (Hint: Expand out the eigenvalue equation into the and components.) Note the similarities between (25) and (23).
Observe that the function is a rational function of which is increasing away from the eigenvalues of , where it has a pole (except in the rare case when the inner product vanishes, in which case it can have a removable singularity). By graphing this function one can see that the interlacing formula (24) can also be interpreted as a manifestation of the intermediate value theorem.
The identity (25) suggests that under typical circumstances, an eigenvalue of can only get close to an eigenvalue if the associated inner product is small. This type of observation is useful to achieve eigenvalue repulsion – to show that it is unlikely that the gap between two adjacent eigenvalues is small. We shall see examples of this in later notes.
— 6. Singular values —
The theory of eigenvalues of Hermitian matrices has an analogue in the theory of singular values of non-Hermitian matrices. We first begin with the counterpart to the spectral theorem, namely the singular value decomposition.
Theorem 4 (Singular value decomposition) Let , and let be a linear transformation from an -dimensional complex Hilbert space to a -dimensional complex Hilbert space . (In particular, could be an matrix with complex entries, viewed as a linear transformation from to .) Then there exist non-negative real numbers
(known as the singular values of ) and orthonormal sets and (known as singular vectors of ), such that
for all , where we abbreviate , etc.
Furthermore, whenever is orthogonal to all of the .
We adopt the convention that for . The above theorem only applies to matrices with at least as many rows as columns, but one can also extend the definition to matrices with more columns than rows by adopting the convention (it is easy to check that this extension is consistent on square matrices). All of the results below extend (with minor modifications) to the case when there are more columns than rows, but we have not displayed those extensions here in order to simplify the notation.
Proof: We induct on . The claim is vacuous for , so suppose that and that the claim has already been proven for .
We follow a similar strategy to the proof of Theorem 1. We may assume that is not identically zero, as the claim is obvious otherwise. The function is continuous on the unit sphere of , so there exists a unit vector which maximises this quantity. If we set , one easily verifies that is a critical point of the map , which then implies that . Thus, if we set , then and . This implies that maps the orthogonal complement of in to the orthogonal complement of in . By induction hypothesis, the restriction of to (and ) then admits a singular value decomposition with singular values and singular vectors , with the stated properties. By construction we see that are less than or equal to . If we now adjoin to the other singular values and vectors we obtain the claim.
Exercise 16 Show that the singular values of a matrix are unique. If we have , show that the singular vectors are unique up to rotation by a complex phase.
By construction (and the above uniqueness claim) we see that whenever is a matrix, is a unitary matrix, and is a unitary matrix. Thus the singular spectrum of a matrix is invariant under left and right unitary transformations.
is a Hermitian matrix whose eigenvalues consist of , together with copies of the eigenvalue zero. (This generalises Exercise 16 from Notes 3.) What is the relationship between the singular vectors of and the eigenvectors of ?
Exercise 18 If is an Hermitian matrix, show that the singular values of are simply the absolute values of , arranged in descending order. Show that the same claim also holds when is a normal matrix. What is the relationship between the singular vectors and eigenvectors of ?
Remark 6 When is not normal, the relationship between eigenvalues and singular values is more subtle. We will discuss this point in later notes.
Exercise 19 If is a complex matrix for some , show that has eigenvalues , and has eigenvalues together with copies of the eigenvalue zero. Based on this observation, give an alternate proof of the singular value decomposition theorem using the spectral theorem for (positive semi-definite) Hermitian matrices.
Exercise 20 Show that the rank of a matrix is equal to the number of non-zero singular values.
for all , where the supremum ranges over all subspaces of of dimension .
One can use the above exercises to deduce many inequalities about singular values from analogous ones about eigenvalues. We give some examples below.
Exercise 22 Let be complex matrices for some .
- (i) Establish the Weyl inequality whenever .
- (ii) Establish the Lidskii inequality
- (iii) Show that for any , the map defines a norm on the space of complex matrices (this norm is known as the Ky Fan norm).
- (iv) Establish the Weyl inequality for all .
- (v) More generally, establish the -Weilandt-Hoffman inequality for any , where is the -Schatten norm of . (Note that this is consistent with the previous definition of the Schatten norms.)
- (vi) Show that the -Schatten norm is indeed a norm on for any .
- (vii) If is formed by removing one row from , show that for all .
- (viii) If and is formed by removing one column from , show that for all and . What changes when ?
Exercise 23 Let be a complex matrix for some . Observe that the linear transformation naturally induces a linear transformation from -forms on to -forms on . We give the structure of a Hilbert space by declaring the basic forms for to be orthonormal.
For any , show that the operator norm of is equal to .
Exercise 24 Let be a matrix for some , let be a matrix, and let be a matrix for some .
Show that and for any .
Exercise 25 Let be a matrix for some , let be distinct, and let be distinct. Show that
Using this, show that if are distinct, then
for every .
Exercise 26 Establish the Hölder inequality
whenever are complex matrices and are such that .
Acknowledgments: Thanks to Allen Knutson for corrections.