In one of my recent posts, I used the Jordan normal form for a matrix in order to justify a couple of arguments. As a student, I learned the derivation of this form twice: firstly (as an undergraduate) by using the minimal polynomial, and secondly (as a graduate) by using the structure theorem for finitely generated modules over a principal ideal domain. I found though that the former proof was too concrete and the latter proof too abstract, and so I never really got a good intuition on how the theorem really worked. So I went back and tried to synthesise a proof that I was happy with, by taking the best bits of both arguments that I knew. I ended up with something which wasn’t too different from the standard proofs (relying primarily on the (extended) Euclidean algorithm and the fundamental theorem of algebra), but seems to get at the heart of the matter fairly quickly, so I thought I’d put it up on this blog anyway.

Before we begin, though, let us recall what the Jordan normal form theorem is. For this post, I’ll take the perspective of abstract linear transformations rather than of concrete matrices. Let be a linear transformation on a finite dimensional complex vector space V, with no preferred coordinate system. We are interested in asking what possible “kinds” of linear transformations V can support (more technically, we want to classify the conjugacy classes of , the ring of linear endomorphisms of V to itself). Here are some simple examples of linear transformations.

**The right shift**. Here, is a standard vector space, and the*right shift*is defined as , thus all elements are shifted right by one position. (For instance, the 1-dimensional right shift is just the zero operator.)**The right shift plus a constant.**Here we consider an operator , where is a right shift, I is the identity on V, and is a complex number.**Direct sums**. Given two linear transformations and , we can form their direct sum by the formula .

Our objective is then to prove the

Jordan normal form theorem. Every linear transformation on a finite dimensional complex vector space V is similar to a direct sum of transformations, each of which is a right shift plus a constant.

(Of course, the same theorem also holds with left shifts instead of right shifts.)

— Reduction to the nilpotent case —

Recall that a linear transformation is nilpotent if we have for some positive integer m. For instance, every right shift operator is nilpotent, as is any direct sum of right shifts. In fact, these are essentially the only nilpotent transformations:

Nilpotent Jordan normal form theorem. Every nilpotent linear transformation on a finite dimensional vector space is similar to a direct sum of right shifts.

We will prove this theorem later, but for now let us see how we can quickly deduce the full Jordan normal form theorem from this special case. The idea here is, of course, to split up the minimal polynomial, but it turns out that we don’t actually need the minimal polynomial *per se*; any polynomial that annihilates the transformation will do.

More precisely, let be a linear transformation on a finite-dimensional complex vector space V. Then the powers are all linear transformations on V. On the other hand, the space of all linear transformations on V is a finite-dimensional vector space. Thus there must be a non-trivial linear dependence between these powers. In other words, we have P(T) = 0 (or equivalently, ) for some polynomial P with complex coefficients.

Now suppose that we can factor this polynomial P into two coprime factors of lower degree, P = QR. Using the extended Euclidean algorithm (or more precisely, Bézout’s identity), we can find more polynomials A, B such that AQ + BR = 1. In particular,

. (1)

The formula (1) has two important consequences. Firstly, it shows that , since if a vector v was in the kernel of both Q(T) and R(T), then by applying (1) to v we obtain v=0. Secondly, it shows that . Indeed, given any , we see from (1) that ; since on V, we see that and lie in and respectively. Finally, since all polynomials in T commute with each other, the spaces and are T-invariant.

Putting all this together, we see that the linear transformation T on is similar to the direct sum of the restrictions of T to and respectively. We can iterate this observation, reducing the degree of the polynomial P which annihilates T, until we reduce to the case in which this polynomial P cannot be split into coprime factors of lesser degree. But by the fundamental theorem of algebra, this can only occur if P takes the form for some and . In other words, we can reduce to the case when , or in other words T is equal to plus a nilpotent transformation. If we then subtract off the term, the claim now easily follows from the nilpotent Jordan normal form theorem.

[From a modern algebraic geometry perspective, all we have done here is split the spectrum of T (or of the ring generated by T) into connected components.]

It is interesting to see what happens when two eigenvalues get very close together. If one carefully inspects how the Euclidean algorithm works, one concludes that the coefficients of the polynomials A(T) and B(T) above become very large (one is trying to separate two polynomials Q(T) and R(T) that are only barely coprime to each other). Because of this, the Jordan decomposition becomes very unstable when eigenvalues begin to collide.

Because the fundamental theorem of algebra is used, it was necessary to work in an algebraically closed field such as the complex numbers . (Indeed, one can in fact *deduce* the fundamental theorem of algebra from the Jordan normal form theorem.) Over the reals, one picks up other “elliptic” components, such as rotation matrices, which are not decomposable into translates of shift operators.

Thus far, the decompositions have been canonical – the spaces one is decomposing into can be defined uniquely in terms of T (they are the kernels of the primary factors of the minimal polynomial). However, the further splitting of the nilpotent (or shifted nilpotent) operators into smaller components will be non-canonical, depending on an arbitrary choice of basis. [However, the multiplicities of each type of shift-plus-constant factor will remain canonical; this is easiest to see by inspecting the dimensions of the kernels of for various using a Jordan normal form.]

— Proof of the nilpotent case —

To prove the nilpotent Jordan normal form theorem, I would like to take a dynamical perspective, looking at orbits of T. (These orbits will be a cheap substitute for the concept of a Jordan chain.) Since T is nilpotent, every such orbit terminates in some finite time , which I will call the *lifespan* of the orbit (i.e. is the least integer such that ). We will call x the *initial point* of the orbit, and the *final point*.

The elements of a finite orbit are all linearly independent. This is best illustrated with an example. Suppose x has lifespan 3, thus and . Suppose there linear dependence between , say . Applying we obtain , a contradiction. A similar argument works for any other linear dependence or for any other lifespan. (Note how we used the shift T here to eliminate all but the final point of the orbit; we shall use a similar trick with multiple orbits shortly.)

The vector space spanned by a finite orbit is clearly T-invariant, and if we use this finite orbit as a basis for this space then the restriction of T to this space is just the right shift. Thus to prove the nilpotent Jordan normal form theorem, it suffices to show

Nilpotent Jordan normal form theorem, again. Let be nilpotent. Then there exists a basis of V which is the concatenation of finite orbits .

We can prove this by the following argument (basically the same argument used to prove the Steinitz exchange lemma, but over the ring instead of ). First observe that it is a triviality to obtain a concatenation of finite orbits which *span* V: just take any basis of V and look at all the finite orbits that they generate. Now all we need to do is to keep whittling down this over-determined set of vectors so that they span *and* are linearly independent, thus forming a basis.

Suppose instead that we had some collection of finite orbits , for which spanned V, but which contained some non-trivial linear relation. To take a concrete example, suppose we had three orbits , and which had a non-trivial linear relation

.

By applying some powers of T if necessary, and stopping just before everything vanishes, we may assume that our non-trivial linear relation only involves the final points of our orbits. For instance, in the above case we can apply T once to obtain the non-trivial linear relation

We then factor out as many powers of T as we can; in this case, we have

The expression in parentheses is a linear combination of various elements of our putative basis, in this case x, Ty, and , with each orbit being represented at most once. At least one of these elements is an initial point (in this case x). We can then replace that element with the element in parentheses, thus shortening one of the orbits but keeping the span of the concatenated orbits unchanged. (In our example, we replace the orbit of lifespan 2 with an orbit of lifespan 1.)

Iterating this procedure until no further linear dependences remain, we obtain the nilpotent Jordan normal form theorem, and thus the usual Jordan normal form.

## 21 comments

Comments feed for this article

13 October, 2007 at 12:20 am

Attila SmithDear Terence,

like you I was dissatisfied by both paths to Jordan’s theorem, but I would never have had the guts to admit it, nor the talent to find a satisfying new way .

Your blog is truly liberating in showing how a great mathematician cares and thinks about relatively elementary mathematics, and then improves the traditional presentation.

I value this psychological effect of your blog as highly as its wonderful mathematical content.

Yours thankfully ,

Attila

13 October, 2007 at 7:03 am

imbring back snap!

14 October, 2007 at 1:35 am

BartNice job, Terry!

There is another result, namely Fermat’s Little Theorem (a^p congruent to a mod p), for which I’ve seen two proofs as a student: one by induction over a, using the fact that the (p choose k) are divisible by p, and one via Lagrange’s theorem in the multiplicative group F_p\{0}. There, too, I find the first proof “too concrete” and the second one “too abstract”.

Does anybody see a way of “combining” them similarly to what Terry did with Jordan’s theorem?

(Of course, there are many more proofs; see .)

14 October, 2007 at 1:37 am

BartSorry, there should have been an URL at the end of my comment: http://en.wikipedia.org/wiki/Proofs_of_Fermat%27s_little_theorem

14 October, 2007 at 11:38 am

Greg KuperbergI can only like this proof of the Jordan canonical form theorem. Who could be against it? But I don’t completely agree with the precept that the classification of finitely-generated PID modules is too abstract. Because, basically, you’ve proved that more general result in the important case of finitely-generated modules with a non-trivial annihilator. In the case of modules over a polynomial ring F[X], these modules are finite-dimensional vector spaces on which an operator acts. In the case of Z-modules, these modules are finite abelian groups.

For example, consider your discussion step by step for finite abelian groups. First you prove the abelian Sylow theorem, then you show dynamically that an abelian p-group is a direct sum of cyclic groups. The dynamic in this case is multiplication by p.

Certainly the classification of finitely generated modules over a PID can be proved in an excessively abstract style. Even so, I considered a great revelation to learn that Jordan canonical form and canonical form for finite abelian groups are really the same theorem.

For that matter, I was even more excited to learn that the spectral theorem for self-adjoint operators in infinite dimensions has a strong form that also looks the same. One form of the classification for PID modules is the prime-power form that you emphasize here. I don’t know of a way in which it that view is useful for Hermitian operators. The other form is the minimal cyclic form, in which you say that any finitely-generated module is a direct sum of R/n_k, such that each cyclic period n_k is divisible by the next one, n_{k+1}. That form generalizes to the Hahn-Hellinger theorem, the revelatory result to which I refer.

Since the answer is so good for self-adjoint operators on a Hilbert space, it makes me wonder why life is so much harder for non-self-adjoint bounded operators. Certainly there are examples that look like nice infinite forms of Jordan blocks. But is it that there are also more complicated examples, or that no one can show that the nice examples are everything?

14 October, 2007 at 3:57 pm

Terence TaoDear Greg,

That is a nice observation! I had not realised that the classification of finite abelian groups and the classification of linear transformations were essentially the same. I guess this shows the power of the abstract viewpoint.

As for what goes wrong in the non-self-adjoint case, it seems to me the problem is that the spectrum no longer controls the resolvent, due to the presence of pseudospectrum (i.e. the operator norm of can be large even when z is nowhere near the spectrum). This seems to destroy the utility of the spectrum for the analysis of non-self-adjoint infinite-dimensional operators. As for your last question, you are veering dangerously close to the invariant subspace problem and so I suspect there is no simple answer.

Dear Bart: this type of algebra is not really my forte, but I think the two proofs are not truly related, being caused by the field of p elements “wearing two hats”: one has a field of characteristic p, and the other as a multiplicative group with a zero element adjoined. In the former, we have the Frobenius automorphism, and in the latter, we have Lagrange’s theorem. Now is one-dimensional and so the Frobenius automorphism happens to be the identity, but this is sort of a coincidence due to the absence of a non-trivial Galois group in this field. In higher fields , the Lagrange identity approach gives , whereas the Frobenius automorphism , while still an automorphism, is no longer the identity.

17 October, 2007 at 7:10 am

Pace NielsenI think rather than a top down approach, where one takes a whole bunch of orbits and slowly makes them linearly independent; it is more intuitive (and computationally efficient) to take a bottom up approach.

To do this, one lets V_i be the vector subspace killed by T^i. We arrive at the chain V_0 <= V_1 <= … <= V_k = V. Fix *any* basis A_(k-1) for V_(k-1), and extend to a basis A’_(k-1) for V_k. Let B_k=A’_(k-1)\A_(k-1). The elements in B_k will generate linearly independent orbits (of maximal length). I like to think of these vectors as a basis for some maximal complement to the stuff killed by T^{k-1}.

At this point, one repeats the construction, working with V_(k-2). To avoid linear dependencies, we make sure that all of the new bases we construct at the very least contain the appropriate elements in the orbits of B_k, B_(k-1), etc… (In other words, once we have constructed an orbit we want to keep, we always force everything new we work with to be linearly independent from these orbits.)

17 October, 2007 at 7:14 am

Pace NielsenIn other words, we let B_(k-1) be a basis for a maximal complement in V_(k-1) of V_(k-2), but containing T(B_k). In turn, we let B_(k-2) be a basis for a maximal complement in V_(k-2) of V_(k-3), containing T(B_(k-1)). etc…

20 October, 2007 at 5:27 am

AnonymousI would like to ask something only tangentially related to the topic. You mentioned you learned the proof “using the structure theorem for finitely generated modules over a principal ideal domain” in your graduate years. There’s a rather big controversy where I study (University of Buenos Aires) as to whether it’s right to teach something like that at the undergraduate level (third/fourth year). Sorry to bother you with this, but it called our attention and would love to hear an outside opinion (feels like we’re running around in circles here…). Thank You.

22 October, 2007 at 5:09 pm

Terence TaoDear Said,

I’m afraid I’m not really the right person to ask; I haven’t taught algebra at either level. My undergraduate education in algebra got as far as Galois theory, which was a fun topic for that level I thought, but anything more abstract would have to be done properly. (My graduate education in algebra mostly consisted of reading Hungerford in order to prepare for qualifying exams; not exactly the most thorough education in the subject, but it did suffice for the type of maths I was doing at the time.)

16 January, 2008 at 12:41 pm

Not Even Wrong » Blog Archive » Accumulated Links[…] As always, Terry Tao’s blog has wonderful postings and articles, often of a general expository nature. For some recent examples, see one about the Schrodinger Equation, and another about Jordan normal form. […]

29 January, 2008 at 5:10 am

Prokrastination oder Blogroll (I): Terence Tao at LEMUREN-Blog[…] Jordansche Normalenform und Euklidischer Algorithmus […]

18 March, 2008 at 3:59 pm

254A, Lecture 17: A Ratner-type theorem for SL_2(R) orbits « What’s new[…] many eigenvalues of , and so is nilpotent on each of the generalised eigenvectors of . By the Jordan normal form, these generalised eigenvectors span V, and we are […]

12 October, 2008 at 1:34 pm

AnonymousHello Terrence,

Thank you for this fascinating discussion. As a physics student I have found utility in the idea of extending the jordan form to infinite dimensions. Could you help me understand why this isn’t possible?

12 October, 2008 at 2:53 pm

Terence TaoDear anonymous,

Basically, the problem is that the Jordan normal form becomes extremely sensitive to small perturbations in high dimensions, as the eigenvalues become increasingly close to each other (and thus difficult to separate), and also due to some related phenomena such as the emergence of pseudospectrum, and the increasingly degenerate nature of the Jordan basis. So there does not appear to be a useful and well-defined limiting normal form for general bounded linear operators on (say) a Hilbert space (this issue also comes dangerously close to the notorious invariant subspace problem, which remains open).

The situation is much better though for special classes of operators such as compact, Hermitian, unitary, or normal operators, which do not have much pseudospectrum and so enjoy a useful spectral theorem.

28 April, 2012 at 2:03 am

jamalI can’t see why the span is unchanged after replacing the orbits. Is it not possible that belongs to span(y,z)?

28 April, 2012 at 5:37 am

Terence TaoReplacing with is basically an elementary row operation and thus replacing x by x’ does not affect the span, because x’ can be written as a linear combination of x and various shifts of y,z, and x can conversely be written as a linear combination of x’ and various shifts of y,z, and similarly for shifts of x,x’. In particular,

and then since , we can shorten the spanning set by one element.

28 April, 2012 at 10:01 am

Ryan ReichYour exchange lemma is just what I need! It’s a simple argument but I find the exchanging part strangely unintuitive, in the sense that my mind recoils from the idea of applying a nilpotent operator and _then_ looking for something linearly independent.

27 April, 2013 at 9:25 pm

Notes on the classification of complex Lie algebras | What's new[…] is not difficult to then place in Jordan normal form by selecting a suitable basis for ; see e.g. this previous blog post. But in contrast to the Jordan-Chevalley decomposition, the basis is not unique in general, and we […]

15 March, 2015 at 9:02 pm

AnonymousDear Terrance,

Was looking to provide better understanding to my students on Jordan canonical theory and stumbled upon your page. This is my favourite proof too and is described in the book by Blyth and Robertson titled Further Linear Algebra.

Thanks much!

Best wishes,

Anupam

22 November, 2016 at 7:29 am

AnonymousKnuth mentioned in his article Computer Science and Its relation to Mathematics that

For three years I taught a sophomore course in abstract algebra, for mathematics majors at Caltech, and the most difficult topic was always the study of “Jordan canonical form” for matrices. The third year I tried a new approach, by looking at the subject algorithmically, and suddenly it became quite clear. The same thing happened with the discussion of finite groups defined by generators and relations; and in another course, with the reduction theory of binary quadratic forms. By presenting the subject in terms of algorithms, the purpose and meaning of the mathematical theorems became transparent.

Are you presenting the proof in a “global” algorithm-way or just using (locally) the Euclidean algorithm?