Let be two Hermitian matrices. When and commute, we have the identity

When and do not commute, the situation is more complicated; we have the *Baker-Campbell-Hausdorff formula*

where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity

Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between and , namely the Golden-Thompson inequality

The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices are required. Note that the right-hand side can be rearranged using the cyclic property of trace as ; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.

To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in . The left-hand side expands as

while the right-hand side expands as

Using the cyclic property of trace , one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out and using the cyclic property of trace as much as possible, we see that the fourth order terms *almost* agree, but the left-hand side contains a term whose counterpart on the right-hand side is . The difference between the two can be factorised (again using the cyclic property of trace) as . Since is skew-Hermitian, is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)

Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the factors lie on one side of a product and all the factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.

The proof of the Golden-Thompson inequality relies on the somewhat magical power of the tensor power trick. For any even integer and any matrix (not necessarily Hermitian), we define the -Schatten norm of by the formula

(This formula in fact defines a norm for any , but we will only need the even integer case here.) This norm can be viewed as a non-commutative analogue of the norm; indeed, the -Schatten norm of a *diagonal* matrix is just the norm of the coefficients.

Note that the -Schatten norm

is the Hilbert space norm associated to the Frobenius inner product (or Hilbert-Schmidt inner product)

This is clearly a non-negative Hermitian inner product, so by the Cauchy-Schwarz inequality we conclude that

for any matrices . As , we conclude in particular that

We can iterate this and establish the *non-commutative Hölder inequality*

whenever is an even power of . Indeed, we induct on , the case already having been established. If is a power of , then by the induction hypothesis (grouping into pairs) we can bound

On the other hand, we may expand

We use the cyclic property of trace to move the rightmost factor to the left. Applying the induction hypothesis again, we conclude that

But from the cyclic property of trace again, we have and . We conclude that

and similarly for , etc. Inserting this into (3) we obtain (2).

Remark 1Though we will not need to do so here, it is interesting to note that one can use the tensor power trick to amplify (2) for equal to a power of two, to obtain (2) for all positive integers , at least when the are all Hermitian. Indeed, pick a large integer and let be the integer part of . Then expand the left-hand side of (2) as and apply (2) with replaced by to bound this by . Sending (noting that ) we obtain the claim.

Specialising (2) to the case where for some Hermitian matrices , we conclude that

and hence by cyclic permutation

for any . Iterating this we conclude that

Applying this with replaced by and respectively, we obtain

Now we send . Since and , we have , and so the left-hand side is ; taking the limit as we obtain the Golden-Thompson inequality. (See also these notes of Vershynin for a slight variant of this proof.)

If we stop the iteration at an earlier point, then the same argument gives the inequality

for a power of two; one can view the original Golden-Thompson inequality as the endpoint of this case in some sense. (In fact, the Golden-Thompson inequality is true in any operator norm; see Theorem 9.3.7 of Bhatia’s book.) In the limit , we obtain in particular the operator norm inequality

This inequality has a nice consequence:

Corollary 1Let be Hermitian matrices. If (i.e. is positive semi-definite), then .

*Proof:* Since , we have for all vectors , or in other words for all . This implies that is a contraction, i.e. . By (5), we conclude that , thus , and the claim follows.

It is not difficult to reverse the above argument and conclude that (1) is in fact equivalent to (5).

It is remarkably tricky to try to prove Corollary 1 directly. Here is a somewhat messy proof; I would be interested in seeing a more elegant argument. By the fundamental theorem of calculus, it suffices to show that whenever is a Hermitian matrix depending smoothly on a real parameter with , then . Indeed, Corollary 1 follows from this claim by setting and concluding that .

To obtain this claim, we use the *Duhamel formula*

This formula can be proven by Taylor expansion, or by carefully approximating by ; alternatively, one can integrate the identity

which follows from the product rule and by interchanging the and derivatives at a key juncture. We rearrange the Duhamel formula as

Using the basic identity , we thus have

formally evaluating the integral, we obtain

and thus

As was positive semi-definite by hypothesis, is also. It thus suffices to show that for any Hermitian , the operator preserves the property of being semi-definite.

Note that for any real , the operator maps a positive semi-definite matrix to another positive semi-definite matrix, namely . By the Fourier inversion formula, it thus suffices to show that the kernel is positive semi-definite in the sense that it has non-negative Fourier transform (cf. Bochner’s theorem). But a routine (but somewhat tedious) application of contour integration shows that the Fourier transform is given by the formula , and the claim follows.

Because of the Golden-Thompson inequality, many applications of the *exponential moment method* in commutative probability theory can be extended without difficulty to the non-commutative case, as was observed by Ahlswede and Winter. For instance, consider (a special case of) the *Chernoff inequality*

for any , where are iid scalar random variables taking values in of mean zero and with total variance (i.e. each factor has variance ). We briefly sketch the standard proof of this inequality. We first use Markov’s inequality to obtain

for some parameter to be optimised later. In the scalar case, we can factor as and then use the iid hypothesis to write the right-hand side as

An elementary Taylor series computation then reveals the bound when ; inserting this bound and optimising in we obtain the claim.

Now suppose that are iid Hermitian matrices. One can try to adapt the above method to control the size of the sum . The key point is then to bound expressions such as

As need not commute, we cannot separate the product completely. But by Golden-Thompson, we can bound this expression by

which by independence we can then factorise as

As the matrices involved are positive definite, we can then take out the final factor in operator norm:

Iterating this procedure, we can eventually obtain the bound

Combining this with the rest of the Chernoff inequality argument, we can establish a matrix generalisation

of the Chernoff inequality, under the assumption that the are iid with mean zero, have operator norm bounded by , and have total variance equal to ; see for instance these notes of Vershynin for details.

Further discussion of the use of the Golden-Thompson inequality and its variants to non-commutative Chernoff-type inequalities can be found in this paper of Gross, these notes of Vershynin and this recent survey of Tropp. It seems that the use of this inequality may be quite useful in simplifying the proofs of several of the basic estimates in this subject.

## 27 comments

Comments feed for this article

15 July, 2010 at 10:29 am

Michael NielsenThere’s a beautiful related result, due to Thompson, stating that when A and B are Hermitian, e^A e^B = e^{U A U^\dagger +V B V^\dagger}, where U and V are unitary. My memory of the proof is very hazy, but I believe it relies on Horn’s conjecture, which, of course, wasn’t proved until some years later.

See: R. C. Thompson, Linear and Multilinear Algebra 19, 187 (1986).

This result has some applications in quantum computing: http://arxiv.org/abs/quant-ph/0307190

15 July, 2010 at 10:59 am

Michael NielsenI believe Corollary 1 has a straightforward proof. Start by observing that it is equivalent to proving that the log function is operator monotone, i.e., A \leq B implies ln(A) \leq \ln(B) where A and B are positive definite.

There are two ideas we need: (1) the observation that the function, A -> -A^{-1} is operator monotone, when A is positive definite; and (2) an integral representation for the log function.

The proof of (1) can easily be supplied by elementary techniques. For (2), a suitable integral representation is:

\ln(A) = \int_0^1 dy / (y+(A-I)^{-1})

Assuming A \leq B we can applying (1) twice to obtain 1/(y+(A-I)^{-1}) \leq 1/(y+(B-I)^{-1}). The result then follows.

There is something wrong with this proof: (A-I) may be singular, and so the the integral representation may not exist. This can be fixed by perturbing A slightly and using a continuity argument. However, I suspect that with a slightly different integral representation that won’t be necessary. I don’t have time to fiddle with it now, unfortunately…

15 July, 2010 at 11:02 am

Terence TaoAh, that is a nice argument! I had forgotten that reverses the positive definite ordering, that is certainly a useful thing to keep in mind. (I was relying instead on the fact that preserves positive definiteness, but this was significantly trickier to use.)

15 July, 2010 at 11:13 am

Michael NielsenI have a vague memory that there may even be a representation theorem for operator monotone functions in terms of integrals over inverse functions. That’s why I tried this strategy. Bhatia is most likely where I saw the result. My copy is in storage, so I can’t look it up.

20 July, 2010 at 9:49 am

Michael NielsenIt’s bothering me that (A-I) may not be invertible, and even if it is, s+(A-I)^{-1} may not be. Anyone see a clean fix?

20 July, 2010 at 9:55 am

Terence TaoIt seems to me from the spectral theorem that there will not be an issue as long as the spectrum of A stays away from both 0 and 1, but this seems easy to ensure via a standard perturbation argument.

20 July, 2010 at 9:59 am

Michael NielsenIf A is finite-dimensional a perturbation argument should work just fine. But if A has a spectrum dense in some interval around 0, say, then it will break down. In the context of the post (finite-dimensional matrices) I guess there’s no problem.

20 July, 2010 at 10:09 am

Terence TaoI think there are a number of ways around this. For instance, one can first show that for every , and then take limits in, say, the weak operator topology. Now there is no spectrum near zero. There is still the issue of spectrum near 1, but if the operators are bounded then we can simply rescale them to have spectrum away from 1. If they are unbounded, then perhaps one can approximate them by bounded operators and take further limits.

It may also be possible to deduce the infinite dimensional case from the finite dimensional case by a suitable approximation argument. Certainly I would imagine the compact case would follow fairly easily in this fashion.

19 July, 2010 at 8:05 am

Brian DaviesThere is a complete and simple classification of monotone matrix functions, due to Lowner in 1934. See for example

W. F. Donoghue, Monotone matrix functions and analytic continuation. Springer-Verlag, New York, Heidelberg, Berlin, 1974

or

G Sparr: Math. Scand 47 (1980) 266-274

for a short proof.

15 July, 2010 at 12:07 pm

AshleyVery nice post! One typo fix: “Ashwelde and Winter” should be “Ahlswede and Winter”.

[Corrected, thanks - T.]15 July, 2010 at 12:54 pm

Steve FlammiaIt might be worth mentioning that the naive generalization to three matrices,

Tr(e^{A+B+C}) \le Tr(e^A e^B e^C)

is false in general. But there is a nice paper by Lieb with a nontrivial generalization:

E. Lieb, “Convex trace functions and the Wigner-Yanase-Dyson conjecture”, Advances in Mathematics, 11, 267–288 (1973).

(The generalization is at the bottom of page 282.)

15 July, 2010 at 2:12 pm

mickI second Ashley’s comment, a very nice post.

Also, I assume that the David Gross mentioned is the one that works in quantum information theory (ie the one that doesn’t have a Nobel prize)? More specifically, this one.

16 July, 2010 at 1:58 pm

AnonymousProf. Tao,

Thanks for the great posts! Is the following comment correct?

“Since is skew-Hermitian, is positive definite…”

Try and .

Then

16 July, 2010 at 6:50 pm

Terence TaoA factor of i is missing in your computation of the commutator [A,B].

16 July, 2010 at 9:44 pm

AnonymousHa! Sorry, I had intended to write:

,

which is not positive definite, but both A and B are hermitian.

17 July, 2010 at 9:43 am

Terence TaoAh, I see what happened now. Yes, there was a sign error in the original text, which has now been fixed.

17 July, 2010 at 3:09 am

Klas MarkströmWith the recent discussion of Cayley graphs in mind I can mention that Demetres Christofides and I used the methods of Ahlswede and Winter to prove a matrix version of Hoeffding’s concentration inequality for martingales, which we used to give an sharpened version of the Alon-Roichman theorem.

http://dx.doi.org/10.1002/rsa.20177

The Alon-Roichman theorem states that Cayley graphs based on a random subset of the group are likely to be expander graphs if the subset is large enough.

18 July, 2010 at 9:01 am

Miguel LacruzWhat follows is an elementary proof of corollary 1 in the special case that are commuting matrices. The proof uses the fact that the product of two positive definite, commuting matrices, is positive definite.

Consider the matrix valued function defined for every This function is differentiable and is positive definite because it is the product of two positive definite commuting matrices. Finally, take a vector and apply the mean value theorem to find some real number such that

19 July, 2010 at 3:34 am

Miguel LacruzYour proof of corollary 1 in case commute does not need the Golden-Thompson inequality, because then This is simpler than what I mentioned above.

18 July, 2010 at 11:25 am

Anders KarlssonDear Terry,

These types of inequalities express that the exponential mapping (in the sense of differential geometry) is distance (semi)increasing, which is the source of nonpositive curvature for symmetric spaces of non-compact type. If I don’t misremember one version of the Golden-Thompson is used in one of the standard proofs (see Lang’s Fundamentals of Differential Geometry; going back to Mostow) that GLnR/OnR has nonpositive curvature.

I believe that the related inequality (5) for operators is called Segal’s inequality and is again a source of nonpositive curvature (in a weaker sense, because the tangent space is not a Hilbert space) for infinite dimensional symmetric spaces, see Gorach, G.; Porta, H.; Recht, L. A geometric interpretation of Segal’s inequality $\Vert e^{X+Y}\Vert \leq\Vert e^{X/2}e^Ye^{X/2}\Vert $. Proc. Amer. Math. Soc. 115 (1992), no. 1, 229–231, and some of their other papers.

As always, thanks for an excellent blog,

Anders

20 July, 2010 at 9:40 am

Denes PETZI proved in 1994 that equality in the Golden-Thompson inequality if and only if the two matrices commute. ( D. Petz, A survey of trace inequalities, in Functional Analysis and Operator Theory, 287-298, Banach Center Publications, 30 (Warszawa 1994). )

29 July, 2010 at 7:27 pm

percy deiftDear Terry,

I have some comments regarding your blog on the Golden-Thompson inequalities that may

be of interest.

First one may ask the following quantum-mechanical-type question: For linear operators

and ,

if one “knows” , how much does one know about ?

I believe that the best one can do is given by the following:

(1)

with its quantitative version

(2)

(1)(2) have many and varied applications in scattering theory, spectral theory,

soliton theory, …( see e.g. my paper in the Duke Math Journal (1978) “On a commutation

formula”), and also random matrix theory. Over the last 30 years, every 2 or 3 months,

a new application of “commutation” seems to come up. and can even be unbounded

operators.

Here’s how it works for the simplest GT inequality

(3)

I restrict to the bounded (self-adjoint) operator case for simplicity. Let denote the spectral radius of an operator : we have

(4)

Using self-adjointness, (4), commutation, and the fact that , we have

Replacing with , with , and with , shows that

is decreasing with . Inequality (3) now follows by the Trotter product formula.

A proof of

(5)

using commutation is in Reed-Simon IV. Whereas (3) only uses information about the top of the spectrum,

the commutation proof of (5) uses the fact that the (algebraic) multiplicity of an eigenvalue

of is the same as the algebraic multiplicity of in . This fact also follows from (2).

Concerning the result

implies ,

it is clearly enough to show that

(5) implies

But it is easy to see that if then for any real nonzero t,

which implies

Setting and , (5) is then immediate.

As regards 3 operators , one notices that in the Ising model and other integrable models,

one always has at hand an extra relation of the type . This implies by commutation that all

six of the permutations are equal. I believe that this goes to the

heart of the success of the analysis of the Ising model etc., but I have not thought about this

for a while.

Best,

Percy

5 August, 2010 at 12:06 am

Christian Dr. JaekelDear Prof. Tao,

I would like to point out a reference, that you might find helpful:

“Golden-Thompson and Peierls-Bogolubov inequalities for a general von Neumann algebra” by Huzihiro Araki, Comm. Math. Phys. Volume 34, Number 3 (1973), 167-178.”

Best regards,

Christian

16 October, 2010 at 8:58 am

Igor RivinFor a very simple proof of the G/T inequality you might want to check out my recent arxiv.org preprint

http://arxiv.org/abs/1010.2193

part of the point of which is to give yet another plug for Chandler Davis’ convexity theorem.

16 October, 2010 at 9:22 am

Michael NielsenThis is very beautiful, and seems like the “right” way of thinking about the result. It’s definitely a great plug for Davis’s convexity theorem, which makes G/T trivial.

23 February, 2011 at 6:54 am

Topics in random matrix theory « What’s new[...] primarily on my graduate course in the topic, though it also contains material from some additional posts related to random matrices on the blog. It is available online here. As usual, comments and [...]

27 February, 2013 at 1:42 pm

Chao GaoDear Prof. Tao:

I have a question about Frobenius norm. Let || . || be the Frobenius norm. Suppose both the largest eigenvalues of A and B are universally upper bounded. Then is it true that ||A-B|| can be upper and lower bounded by a constant (only depends on the largest eigenvalues of A and B) times ||exp(A)-exp(B)||?

Best,

Chao