The Golden-Thompson inequality

15 July, 2010 in expository, math.OA, math.PR | Tags: Ahlswede-Winter inequality, Golden-Thompson inequality | by Terence Tao

Let ${A, B}$ be two Hermitian ${n \times n}$ matrices. When ${A}$ and ${B}$ commute, we have the identity

$\displaystyle e^{A+B} = e^A e^B.$

When ${A}$ and ${B}$ do not commute, the situation is more complicated; we have the Baker-Campbell-Hausdorff formula

$\displaystyle e^{A+B} = e^A e^B e^{-\frac{1}{2}[A,B]} \ldots$

where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity

$\displaystyle \hbox{det}(e^{A+B}) = \hbox{det}(e^A e^B).$

Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between ${e^{A+B}}$ and ${e^A e^B}$ , namely the Golden-Thompson inequality

$\displaystyle \hbox{tr}(e^{A+B}) \leq \hbox{tr}(e^A e^B). \ \ \ \ \ (1)$

The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices ${A, B}$ are required. Note that the right-hand side can be rearranged using the cyclic property of trace as ${\hbox{tr}( e^{B/2} e^A e^{B/2} )}$ ; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as ${\hbox{tr}(e^A e^B e^C)}$ need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.
To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in ${A, B}$ . The left-hand side expands as

$\displaystyle \hbox{tr} 1 + \hbox{tr} (A+B) + \frac{1}{2} \hbox{tr} (A^2 + AB + BA + B^2) + \frac{1}{6} \hbox{tr} (A+B)^3$

$\displaystyle + \frac{1}{24} \hbox{tr} (A+B)^4 + \ldots$

while the right-hand side expands as

$\displaystyle \hbox{tr} 1 + \hbox{tr} (A+B) + \frac{1}{2} \hbox{tr} (A^2 + 2AB + B^2)$

$\displaystyle + \frac{1}{6} \hbox{tr} (A^3 + 3A^2 B + 3 A B^2+B^3) +$

$\displaystyle \frac{1}{24} \hbox{tr} (A^4 + 4 A^3 B + 6 A^2 B^2 + 4 A B^3 +B^4) + \ldots$

Using the cyclic property of trace ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ , one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out ${(A+B)^4}$ and using the cyclic property of trace as much as possible, we see that the fourth order terms almost agree, but the left-hand side contains a term ${\frac{1}{12} \hbox{tr}(ABAB)}$ whose counterpart on the right-hand side is ${\frac{1}{12} \hbox{tr}(ABBA)}$ . The difference between the two can be factorised (again using the cyclic property of trace) as ${-\frac{1}{24} \hbox{tr} [A,B]^2}$ . Since ${[A,B] := AB-BA}$ is skew-Hermitian, ${-[A,B]^2}$ is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)
Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair ${A, B}$ of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the ${A}$ factors lie on one side of a product and all the ${B}$ factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.

The proof of the Golden-Thompson inequality relies on the somewhat magical power of the tensor power trick. For any even integer ${p = 2,4,6,\ldots}$ and any ${n \times n}$ matrix ${A}$ (not necessarily Hermitian), we define the ${p}$ -Schatten norm ${\|A\|_p}$ of ${A}$ by the formula

$\displaystyle \| A \|_p := (\hbox{tr}(AA^*)^{p/2})^{1/p}.$

(This formula in fact defines a norm for any ${p \geq 1}$ , but we will only need the even integer case here.) This norm can be viewed as a non-commutative analogue of the ${\ell^p}$ norm; indeed, the ${p}$ -Schatten norm of a diagonal matrix is just the ${\ell^p}$ norm of the coefficients.
Note that the ${2}$ -Schatten norm

$\displaystyle \|A\|_2 := (\hbox{tr}(AA^*))^{1/2}$

is the Hilbert space norm associated to the Frobenius inner product (or Hilbert-Schmidt inner product)

$\displaystyle \langle A, B \rangle := \hbox{tr}(A B^*).$

This is clearly a non-negative Hermitian inner product, so by the Cauchy-Schwarz inequality we conclude that

$\displaystyle |\hbox{tr}(A_1 A_2^*)| \leq \| A_1 \|_2 \|A_2\|_2$

for any ${n \times n}$ matrices ${A_1, A_2}$ . As ${\|A_2\|_2 = \|A_2^*\|_2}$ , we conclude in particular that

$\displaystyle |\hbox{tr}(A_1 A_2)| \leq \| A_1 \|_2 \|A_2\|_2$

We can iterate this and establish the non-commutative Hölder inequality

$\displaystyle |\hbox{tr}(A_1 A_2 \ldots A_p)| \leq \| A_1 \|_p \|A_2\|_p \ldots \|A_p\|_p \ \ \ \ \ (2)$

whenever ${p=2,4,8,\ldots}$ is an even power of ${2}$ . Indeed, we induct on ${p}$ , the case ${p=2}$ already having been established. If ${p \geq 4}$ is a power of ${2}$ , then by the induction hypothesis (grouping ${A_1 \ldots A_p}$ into ${p/2}$ pairs) we can bound

$\displaystyle |\hbox{tr}(A_1 A_2 \ldots A_p)| \leq \| A_1 A_2 \|_{p/2} \|A_3 A_4\|_{p/2} \ldots \|A_{p-1} A_p\|_{p/2}. \ \ \ \ \ (3)$

On the other hand, we may expand

$\displaystyle \| A_1 A_2\|_{p/2}^{p/2} = \hbox{tr} A_1 A_2 A_2^* A_1^* \ldots A_1 A_2 A_2^* A_1^*.$

We use the cyclic property of trace to move the rightmost ${A_1^*}$ factor to the left. Applying the induction hypothesis again, we conclude that

$\displaystyle \| A_1 A_2\|_{p/2}^{p/2} \leq \| A_1^* A_1 \|_{p/2} \|A_2 A_2^*\|_{p/2} \ldots \| A_1^* A_1 \|_{p/2} \| A_2 A_2^* \|_{p/2}.$

But from the cyclic property of trace again, we have ${\| A_1^* A_1 \|_{p/2} = \|A_1\|_p^2}$ and ${\| A_2 A_2^* \|_{p/2} = \|A_2\|_p^2}$ . We conclude that

$\displaystyle \|A_1 A_2 \|_{p/2} \leq \|A_1\|_p \|A_2\|_p$

and similarly for ${\|A_3 A_4\|_{p/2}}$ , etc. Inserting this into (3) we obtain (2).

Remark 1 Though we will not need to do so here, it is interesting to note that one can use the tensor power trick to amplify (2) for ${p}$ equal to a power of two, to obtain (2) for all positive integers ${p}$ , at least when the ${A_i}$ are all Hermitian. Indeed, pick a large integer ${m}$ and let ${N}$ be the integer part of ${2^m/p}$ . Then expand the left-hand side of (2) as ${\hbox{tr}( A_1^{1/N} \ldots A_1^{1/N} A_2^{1/N} \ldots A_p^{1/N} \ldots A_p^{1/N} )}$ and apply (2) with ${p}$ replaced by ${2^m}$ to bound this by ${\| A_1^{1/N} \|_{2^m}^N \ldots \|A_p^{1/N}\|_{2^m}^N \| 1 \|_{2^m}^{2^m-pN}}$ . Sending ${m \rightarrow \infty}$ (noting that ${2^m = (1+o(1)) Np}$ ) we obtain the claim.

Specialising (2) to the case where ${A_1=\ldots=A_p = AB}$ for some Hermitian matrices ${A, B}$ , we conclude that

$\displaystyle \hbox{tr}( (AB)^{p} ) \leq \| AB \|_p^p$

and hence by cyclic permutation

$\displaystyle \hbox{tr}( (AB)^{p} ) \leq \hbox{tr}( (A^2 B^2)^{p/2} )$

for any ${p = 2,4,\ldots}$ . Iterating this we conclude that

$\displaystyle \hbox{tr}( (AB)^{p} ) \leq \hbox{tr}( A^p B^p ). \ \ \ \ \ (4)$

Applying this with ${A, B}$ replaced by ${e^{A/p}}$ and ${e^{B/p}}$ respectively, we obtain

$\displaystyle \hbox{tr}( (e^{A/p} e^{B/p})^{p} ) \leq \hbox{tr}( e^A e^B ).$

Now we send ${p \rightarrow \infty}$ . Since ${e^{A/p} = 1 + A/p + O(1/p^2)}$ and ${e^{B/p} = 1 + B/p + O(1/p^2)}$ , we have ${e^{A/p} e^{B/p} = e^{(A+B)/p + O(1/p^2)}}$ , and so the left-hand side is ${\hbox{tr}( e^{A+B + O(1/p)} )}$ ; taking the limit as ${p \rightarrow \infty}$ we obtain the Golden-Thompson inequality. (See also these notes of Vershynin for a slight variant of this proof.)
If we stop the iteration at an earlier point, then the same argument gives the inequality

$\displaystyle \| e^{A+B} \|_p \leq \| e^A e^B \|_p$

for ${p=2,4,8,\ldots}$ a power of two; one can view the original Golden-Thompson inequality as the ${p=1}$ endpoint of this case in some sense. (In fact, the Golden-Thompson inequality is true in any operator norm; see Theorem 9.3.7 of Bhatia’s book.) In the limit ${p \rightarrow \infty}$ , we obtain in particular the operator norm inequality

$\displaystyle \| e^{A+B} \|_{op} \leq \| e^A e^B \|_{op} \ \ \ \ \ (5)$

This inequality has a nice consequence:

Corollary 2 Let ${A, B}$ be Hermitian matrices. If ${e^A \leq e^B}$ (i.e. ${e^B-e^A}$ is positive semi-definite), then ${A \leq B}$ .

Proof: Since ${e^A \leq e^B}$ , we have ${\langle e^A x, x \rangle \leq \langle e^B x, x \rangle}$ for all vectors ${x}$ , or in other words ${\|e^{A/2} x \| \leq \| e^{B/2} x \|}$ for all ${x}$ . This implies that ${e^{A/2} e^{-B/2}}$ is a contraction, i.e. ${\|e^{A/2} e^{-B/2} \|_{op} \leq 1}$ . By (5), we conclude that ${\|e^{(A-B)/2}\|_{op} \leq 1}$ , thus ${(A-B)/2 \leq 0}$ , and the claim follows. $\Box$

It is not difficult to reverse the above argument and conclude that (2) is in fact equivalent to (5).

It is remarkably tricky to try to prove Corollary 2 directly. Here is a somewhat messy proof; I would be interested in seeing a more elegant argument. By the fundamental theorem of calculus, it suffices to show that whenever ${A(t)}$ is a Hermitian matrix depending smoothly on a real parameter with ${\frac{d}{dt} e^{A(t)} \geq 0}$ , then ${\frac{d}{dt} A(t) \geq 0}$ . Indeed, Corollary 2 follows from this claim by setting ${A(t) := \log(e^A + t (e^B - e^A))}$ and concluding that ${A(1) \geq A(0)}$ .

To obtain this claim, we use the Duhamel formula

$\displaystyle \frac{d}{dt} e^{A(t)} = \int_0^1 e^{(1-s)A(t)} (\frac{d}{dt} A(t)) e^{sA(t)}\ ds.$

This formula can be proven by Taylor expansion, or by carefully approximating ${e^{A(t)}}$ by ${(1 + A(t)/N)^N}$ ; alternatively, one can integrate the identity

$\displaystyle \frac{\partial}{\partial s}( e^{-sA(t)} \frac{\partial }{\partial t} e^{sA(t)} ) = e^{-sA(t)} (\frac{\partial}{\partial t} A(t)) e^{sA(t)}$

which follows from the product rule and by interchanging the ${s}$ and ${t}$ derivatives at a key juncture. We rearrange the Duhamel formula as

$\displaystyle \frac{d}{dt} e^{A(t)} = e^{A(t)/2} (\int_{-1/2}^{1/2} e^{sA(t)} (\frac{d}{dt} A(t)) e^{-sA(t)}\ ds) e^{A(t)/2}.$

Using the basic identity ${e^A B e^{-A} = e^{\hbox{ad}(A)} B}$ , we thus have

$\displaystyle \frac{d}{dt} e^{A(t)} = e^{A(t)/2} [(\int_{-1/2}^{1/2} e^{s \hbox{ad}(A(t))}\ ds) (\frac{d}{dt} A(t))] e^{A(t)/2};$

formally evaluating the integral, we obtain

$\displaystyle \frac{d}{dt} e^{A(t)} = e^{A(t)/2} [\frac{\sinh(\hbox{ad}(A(t))/2)}{\hbox{ad}(A(t))/2} (\frac{d}{dt} A(t))] e^{A(t)/2},$

and thus

$\displaystyle \frac{d}{dt} A(t) = \frac{\hbox{ad}(A(t))/2}{\sinh(\hbox{ad}(A(t))/2)} ( e^{-A(t)/2} (\frac{d}{dt} e^{A(t)}) e^{-A(t)/2} ).$

As ${\frac{d}{dt} e^{A(t)}}$ was positive semi-definite by hypothesis, ${e^{-A(t)/2} (\frac{d}{dt} e^{A(t)}) e^{-A(t)/2}}$ is also. It thus suffices to show that for any Hermitian ${A}$ , the operator ${\frac{\hbox{ad}(A)}{\sinh(\hbox{ad}(A))}}$ preserves the property of being semi-definite.
Note that for any real ${\xi}$ , the operator ${e^{2\pi i \xi \hbox{ad}(A)}}$ maps a positive semi-definite matrix ${B}$ to another positive semi-definite matrix, namely ${e^{2\pi i \xi A} B e^{-2\pi i \xi A}}$ . By the Fourier inversion formula, it thus suffices to show that the kernel ${F(x) := \frac{x}{\sinh(x)}}$ is positive semi-definite in the sense that it has non-negative Fourier transform (cf. Bochner’s theorem). But a routine (but somewhat tedious) application of contour integration shows that the Fourier transform ${\hat F(\xi) = \int_{\bf R} e^{-2\pi i x \xi} F(x)\ dx}$ is given by the formula ${\hat F(\xi) = \frac{1}{8 \cosh^2( \pi^2 \xi)}}$ , and the claim follows.

Because of the Golden-Thompson inequality, many applications of the exponential moment method in commutative probability theory can be extended without difficulty to the non-commutative case, as was observed by Ahlswede and Winter. For instance, consider (a special case of) the Chernoff inequality

$\displaystyle {\bf P}( X_1 + \ldots + X_N \geq \lambda \sigma ) \leq \max( e^{-\lambda^2/4}, e^{-\lambda \sigma / 2} )$

for any ${\lambda > 0}$ , where ${X_1,\ldots,X_n \equiv X}$ are iid scalar random variables taking values in ${[-1,1]}$ of mean zero and with total variance ${\sigma^2}$ (i.e. each factor has variance ${\sigma^2/N}$ ). We briefly sketch the standard proof of this inequality. We first use Markov’s inequality to obtain

$\displaystyle {\bf P}( X_1 + \ldots + X_N \geq \lambda \sigma ) \leq e^{-t\lambda \sigma } {\bf E} e^{t(X_1 + \ldots + X_N)}$

for some parameter ${t>0}$ to be optimised later. In the scalar case, we can factor ${e^{t(X_1+\ldots+X_N)}}$ as ${e^{tX_1} \ldots e^{tX_N}}$ and then use the iid hypothesis to write the right-hand side as

$\displaystyle e^{-t\lambda \sigma } ( {\bf E} e^{tX} )^N.$

An elementary Taylor series computation then reveals the bound ${{\bf E} e^{tX} \leq \exp( t^2 \sigma^2 / N )}$ when ${0 \leq t \leq 1}$ ; inserting this bound and optimising in ${t}$ we obtain the claim.
Now suppose that ${X_1,\ldots,X_n \equiv X}$ are iid ${d \times d}$ Hermitian matrices. One can try to adapt the above method to control the size of the sum ${X_1 + \ldots + X_N}$ . The key point is then to bound expressions such as

$\displaystyle {\bf E} \hbox{tr} e^{t(X_1 + \ldots + X_N)}.$

As ${X_1,\ldots,X_N}$ need not commute, we cannot separate the product completely. But by Golden-Thompson, we can bound this expression by

$\displaystyle {\bf E} \hbox{tr} e^{t(X_1 + \ldots + X_{N-1})} e^{tX_n}$

which by independence we can then factorise as

$\displaystyle \hbox{tr} ({\bf E} e^{t(X_1 + \ldots + X_{N-1})}) ({\bf E} e^{tX_n}).$

As the matrices involved are positive definite, we can then take out the final factor in operator norm:

$\displaystyle \| {\bf E} e^{tX_n} \|_{op} \hbox{tr} {\bf E} e^{t(X_1 + \ldots + X_{N-1})}.$

Iterating this procedure, we can eventually obtain the bound

$\displaystyle {\bf E} \hbox{tr} e^{t(X_1 + \ldots + X_N)} \leq \| {\bf E} e^{tX} \|_{op}^N.$

Combining this with the rest of the Chernoff inequality argument, we can establish a matrix generalisation

$\displaystyle {\bf P}( \| X_1 + \ldots + X_N \|_{op} \geq \lambda \sigma ) \leq n \max( e^{-\lambda^2/4}, e^{-\lambda \sigma / 2} )$

of the Chernoff inequality, under the assumption that the ${X_1,\ldots,X_N}$ are iid with mean zero, have operator norm bounded by ${1}$ , and have total variance ${\sum_{i=1}^n \| {\bf E} X_i^2 \|_{op}}$ equal to ${\sigma^2}$ ; see for instance these notes of Vershynin for details.
Further discussion of the use of the Golden-Thompson inequality and its variants to non-commutative Chernoff-type inequalities can be found in this paper of Gross, these notes of Vershynin and this recent article of Tropp. It seems that the use of this inequality may be quite useful in simplifying the proofs of several of the basic estimates in this subject.

34 comments

Comments feed for this article

15 July, 2010 at 10:29 am

Michael Nielsen

There’s a beautiful related result, due to Thompson, stating that when A and B are Hermitian, e^A e^B = e^{U A U^\dagger +V B V^\dagger}, where U and V are unitary. My memory of the proof is very hazy, but I believe it relies on Horn’s conjecture, which, of course, wasn’t proved until some years later.

See: R. C. Thompson, Linear and Multilinear Algebra 19, 187 (1986).

This result has some applications in quantum computing: http://arxiv.org/abs/quant-ph/0307190

15 July, 2010 at 10:59 am

Michael Nielsen

I believe Corollary 1 has a straightforward proof. Start by observing that it is equivalent to proving that the log function is operator monotone, i.e., A \leq B implies ln(A) \leq \ln(B) where A and B are positive definite.

There are two ideas we need: (1) the observation that the function, A -> -A^{-1} is operator monotone, when A is positive definite; and (2) an integral representation for the log function.

The proof of (1) can easily be supplied by elementary techniques. For (2), a suitable integral representation is:

\ln(A) = \int_0^1 dy / (y+(A-I)^{-1})

Assuming A \leq B we can applying (1) twice to obtain 1/(y+(A-I)^{-1}) \leq 1/(y+(B-I)^{-1}). The result then follows.

There is something wrong with this proof: (A-I) may be singular, and so the the integral representation may not exist. This can be fixed by perturbing A slightly and using a continuity argument. However, I suspect that with a slightly different integral representation that won’t be necessary. I don’t have time to fiddle with it now, unfortunately…

15 July, 2010 at 11:02 am

Terence Tao

Ah, that is a nice argument! I had forgotten that $A \mapsto A^{-1}$ reverses the positive definite ordering, that is certainly a useful thing to keep in mind. (I was relying instead on the fact that $A \mapsto B A B^*$ preserves positive definiteness, but this was significantly trickier to use.)

15 July, 2010 at 11:13 am

Michael Nielsen

I have a vague memory that there may even be a representation theorem for operator monotone functions in terms of integrals over inverse functions. That’s why I tried this strategy. Bhatia is most likely where I saw the result. My copy is in storage, so I can’t look it up.

20 July, 2010 at 9:49 am

Michael Nielsen

It’s bothering me that (A-I) may not be invertible, and even if it is, s+(A-I)^{-1} may not be. Anyone see a clean fix?

20 July, 2010 at 9:55 am

Terence Tao

It seems to me from the spectral theorem that there will not be an issue as long as the spectrum of A stays away from both 0 and 1, but this seems easy to ensure via a standard perturbation argument.

20 July, 2010 at 9:59 am

Michael Nielsen

If A is finite-dimensional a perturbation argument should work just fine. But if A has a spectrum dense in some interval around 0, say, then it will break down. In the context of the post (finite-dimensional matrices) I guess there’s no problem.

20 July, 2010 at 10:09 am

Terence Tao

I think there are a number of ways around this. For instance, one can first show that $\log(A+\epsilon) \leq \log(B+\epsilon)$ for every $\epsilon > 0$ , and then take limits in, say, the weak operator topology. Now there is no spectrum near zero. There is still the issue of spectrum near 1, but if the operators are bounded then we can simply rescale them to have spectrum away from 1. If they are unbounded, then perhaps one can approximate them by bounded operators and take further limits.

It may also be possible to deduce the infinite dimensional case from the finite dimensional case by a suitable approximation argument. Certainly I would imagine the compact case would follow fairly easily in this fashion.

19 July, 2010 at 8:05 am

Brian Davies

There is a complete and simple classification of monotone matrix functions, due to Lowner in 1934. See for example

W. F. Donoghue, Monotone matrix functions and analytic continuation. Springer-Verlag, New York, Heidelberg, Berlin, 1974

G Sparr: Math. Scand 47 (1980) 266-274

for a short proof.

15 July, 2010 at 12:07 pm

Ashley

Very nice post! One typo fix: “Ashwelde and Winter” should be “Ahlswede and Winter”. [Corrected, thanks – T.]

15 July, 2010 at 12:54 pm

Steve Flammia

It might be worth mentioning that the naive generalization to three matrices,
Tr(e^{A+B+C}) \le Tr(e^A e^B e^C)
is false in general. But there is a nice paper by Lieb with a nontrivial generalization:
E. Lieb, “Convex trace functions and the Wigner-Yanase-Dyson conjecture”, Advances in Mathematics, 11, 267–288 (1973).
(The generalization is at the bottom of page 282.)

14 April, 2016 at 8:25 pm

sflammia

My colleague Marco Tomamichel and his coauthors just posted a very nice paper that further generalizes the Golden-Thompson inequality, as well as Lieb’s extension to three matrices. Their result holds for n matrices, in fact, and it might be of some interest to readers of this blog post.
http://arxiv.org/abs/1604.03023

15 July, 2010 at 2:12 pm

mick

I second Ashley’s comment, a very nice post.

Also, I assume that the David Gross mentioned is the one that works in quantum information theory (ie the one that doesn’t have a Nobel prize)? More specifically, this one.

16 July, 2010 at 1:58 pm

Anonymous

Prof. Tao,

Thanks for the great posts! Is the following comment correct?

“Since $[A,B] := AB-BA$ is skew-Hermitian, $[A,B]^2$ is positive definite…”

Try $A = \left( \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right)$ and $B = \left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right)$ .

Then $[A,B] = \left( \begin{array}{cc} -4 & 0 \\ 0 & -4 \end{array} \right)$

16 July, 2010 at 6:50 pm

Terence Tao

A factor of i is missing in your computation of the commutator [A,B].

16 July, 2010 at 9:44 pm

Anonymous

Ha! Sorry, I had intended to write:

$[A,B]^2 = \left( \begin{array}{cc} -4 & 0 \\ 0 & -4 \end{array} \right)$ ,

which is not positive definite, but both A and B are hermitian.

17 July, 2010 at 9:43 am

Terence Tao

Ah, I see what happened now. Yes, there was a sign error in the original text, which has now been fixed.

17 July, 2010 at 3:09 am

Klas Markström

With the recent discussion of Cayley graphs in mind I can mention that Demetres Christofides and I used the methods of Ahlswede and Winter to prove a matrix version of Hoeffding’s concentration inequality for martingales, which we used to give an sharpened version of the Alon-Roichman theorem.
http://dx.doi.org/10.1002/rsa.20177

The Alon-Roichman theorem states that Cayley graphs based on a random subset of the group are likely to be expander graphs if the subset is large enough.

18 July, 2010 at 9:01 am

Miguel Lacruz

What follows is an elementary proof of corollary 1 in the special case that $A,\;B$ are commuting matrices. The proof uses the fact that the product of two positive definite, commuting matrices, is positive definite.

Consider the matrix valued function $A(t)=\log[(1-t)e^A+te^B]$ defined for every $0 \leq t \leq 1.$ This function is differentiable and $A^\prime(t)=[(1-t)A+tB]^{-1} \cdot (e^B-e^A)$ is positive definite because it is the product of two positive definite commuting matrices. Finally, take a vector $x \in {\bf {\rm C}}^n$ and apply the mean value theorem to find some real number $0 < t < 1$ such that $( (B-A)x,x)=(A'(t)x,x) \geq 0.$

19 July, 2010 at 3:34 am

Miguel Lacruz

Your proof of corollary 1 in case $A,B$ commute does not need the Golden-Thompson inequality, because then $e^{(A-B)/2}=e^{A/2}e^{-B/2}.$ This is simpler than what I mentioned above.

18 July, 2010 at 11:25 am

Anders Karlsson

Dear Terry,

These types of inequalities express that the exponential mapping (in the sense of differential geometry) is distance (semi)increasing, which is the source of nonpositive curvature for symmetric spaces of non-compact type. If I don’t misremember one version of the Golden-Thompson is used in one of the standard proofs (see Lang’s Fundamentals of Differential Geometry; going back to Mostow) that GLnR/OnR has nonpositive curvature.

I believe that the related inequality (5) for operators is called Segal’s inequality and is again a source of nonpositive curvature (in a weaker sense, because the tangent space is not a Hilbert space) for infinite dimensional symmetric spaces, see Gorach, G.; Porta, H.; Recht, L. A geometric interpretation of Segal’s inequality $\Vert e^{X+Y}\Vert \leq\Vert e^{X/2}e^Ye^{X/2}\Vert $. Proc. Amer. Math. Soc. 115 (1992), no. 1, 229–231, and some of their other papers.

As always, thanks for an excellent blog,
Anders

20 July, 2010 at 9:40 am

Denes PETZ

I proved in 1994 that equality in the Golden-Thompson inequality if and only if the two matrices commute. ( D. Petz, A survey of trace inequalities, in Functional Analysis and Operator Theory, 287-298, Banach Center Publications, 30 (Warszawa 1994). )

29 July, 2010 at 7:27 pm

percy deift

Dear Terry,

I have some comments regarding your blog on the Golden-Thompson inequalities that may
be of interest.

First one may ask the following quantum-mechanical-type question: For linear operators
$A$ and $B$ ,

if one “knows” $AB$ , how much does one know about $BA$ ?

I believe that the best one can do is given by the following:

(1) $spec(AB)/\{0\} = spec(BA)/\{0\}$

with its quantitative version

(2) $\lambda(\lambda + AB)^{-1} + A(\lambda + BA)^{-1}B =1$

(1)(2) have many and varied applications in scattering theory, spectral theory,
soliton theory, …( see e.g. my paper in the Duke Math Journal (1978) “On a commutation
formula”), and also random matrix theory. Over the last 30 years, every 2 or 3 months,
a new application of “commutation” seems to come up. $A$ and $B$ can even be unbounded
operators.

Here’s how it works for the simplest GT inequality

(3) $\ || \ e^{A+B} || \le || \ e^{A}e^{B} || .$

I restrict to the bounded (self-adjoint) operator case for simplicity. Let $r(T)$ denote the spectral radius of an operator $T$ : we have

(4) $\ r(T) \le ||\ T || .$

Using self-adjointness, (4), commutation, and the fact that $|| T^{*} T ||= || T||^{2}$ , we have

$\ || (e^{B/4}e^{A/2}e^{B/4}) ^{2} || = || e^{B/4}e^{A/2}e^{B/4} ||^{2} = ( r( e^{B/4}e^{A/2}e^{B/4} ))^{2} = (r(e^{A/2}e^{B/2}))^{2}$

$\le ||e^{A/2}e^{B/2}||^{2} = || e^{B/2}e^{A}e^{B/2}|| = r(e^{A}e^{B}) \le ||e^{A}e^{B}|| \ .$

Replacing $A$ with $A/2^{n}$ , $B$ with $B/2^{n}$ , and $(...)^{2}$ with $(...)^{2^{n}}$ , shows that

$|| (e^{B/2^{n+1}}e^{A/2{n}}e^{B/2^{n+1}})^{2^{n}}||$

is decreasing with $n$ . Inequality (3) now follows by the Trotter product formula.

A proof of

(5) $\ tr(e^{A+B}) \le tr(e^{A}e^{B})$

using commutation is in Reed-Simon IV. Whereas (3) only uses information about the top of the spectrum,
the commutation proof of (5) uses the fact that the (algebraic) multiplicity of an eigenvalue $-\lambda$
of $AB$ is the same as the algebraic multiplicity of $-\lambda$ in $spec(BA)$ . This fact also follows from (2).

Concerning the result

$e^{A} \le e^{B}$ implies $A \le B$ ,

it is clearly enough to show that

(5) $e^{A} \le e^{B}$ implies $e^{A/2} \le e^{B/2}$

But it is easy to see that if $C\le D$ then for any real nonzero t,

$C/(C+t^{2}) \le D/(D+t^{2})$

which implies

$C^{1/2} = (\pi)^{-1} \int C/(C+t^{2}) dt \le (\pi)^{-1} \int D/(D+t^{2}) dt = D^{1/2} .$

Setting $C=e^{A}$ and $D=e^{B}$ , (5) is then immediate.

As regards 3 operators $A,B,C$ , one notices that in the Ising model and other integrable models,
one always has at hand an extra relation of the type $ABC=BAC$ . This implies by commutation that all
six of the permutations $tr ABC = ...= tr ACB$ are equal. I believe that this goes to the
heart of the success of the analysis of the Ising model etc., but I have not thought about this
for a while.

Best,
Percy

5 August, 2010 at 12:06 am

Christian Dr. Jaekel

Dear Prof. Tao,

I would like to point out a reference, that you might find helpful:

“Golden-Thompson and Peierls-Bogolubov inequalities for a general von Neumann algebra” by Huzihiro Araki, Comm. Math. Phys. Volume 34, Number 3 (1973), 167-178.”

Best regards,
Christian

16 October, 2010 at 8:58 am

Igor Rivin

For a very simple proof of the G/T inequality you might want to check out my recent arxiv.org preprint
http://arxiv.org/abs/1010.2193
part of the point of which is to give yet another plug for Chandler Davis’ convexity theorem.

16 October, 2010 at 9:22 am

Michael Nielsen

This is very beautiful, and seems like the “right” way of thinking about the result. It’s definitely a great plug for Davis’s convexity theorem, which makes G/T trivial.

3 June, 2019 at 12:54 pm

Anonymous

What you are proving is the much weaker inequality $\mathrm{tr} \exp(A + B) \le \mathrm{tr} \exp A \mathrm{tr} \exp B$ .

23 February, 2011 at 6:54 am

Topics in random matrix theory « What’s new

[…] primarily on my graduate course in the topic, though it also contains material from some additional posts related to random matrices on the blog. It is available online here. As usual, comments and […]

27 February, 2013 at 1:42 pm

Chao Gao

Dear Prof. Tao:

I have a question about Frobenius norm. Let || . || be the Frobenius norm. Suppose both the largest eigenvalues of A and B are universally upper bounded. Then is it true that ||A-B|| can be upper and lower bounded by a constant (only depends on the largest eigenvalues of A and B) times ||exp(A)-exp(B)||?

Best,
Chao

10 July, 2014 at 2:19 pm

Effros

Frank Hansen has proved a multivariable version of the Golden-Thompson inequality see the Arxiv.

10 August, 2015 at 4:14 am

On the Chernoff bound | Triforce Studio

[…] EDIT: I found out the simple answer to this question in a nice blogpost by Terry Tao: https://terrytao.wordpress.com/2010/07/15/the-golden-thompson-inequality/ […]

5 October, 2018 at 5:44 am

Rate changes increase substitutions | Bits of DNA

[…] inequality was discovered independently by Sidney Golden and Colin Thompson in 1965. A proof is explained in an expository blog post by Terence Tao who heard of the Golden-Thompson inequality only eight years ago, which makes me feel a […]

17 September, 2021 at 4:36 pm

Adam

Hi, i have a question regarding special version of Chernoff inequality you use at the end of this note. It seems to me that \sigma is missing i.e. it should be P(X1+….XN\geq \lamba \sigma)\leq …. The same in the operator version. Can you check this?

[Corrected, thanks – T.]

10 November, 2021 at 4:12 am

Online Optimization Post 7: Matrix Multiplicative Weights Update | in theory

[…] to generalize to this matrix setting everything we have proved about multiplicative weights. See this post by Terry Tao for a […]

	Anonymous on Two announcements: AI for Math…
	Anonymous on 275A, Notes 3: The weak and st…
	Terence Tao on 254A, Supplement 4: Probabilis…
	Anonymous on 254A, Supplement 4: Probabilis…
	Terence Tao on Analysis II
	Anonymous on Analysis II
	El problema de Erdős… on Two announcements: AI for Math…
	Anonymous on An airport-inspired puzzle
	oliverknill on Two announcements: AI for Math…
	Anonymous on An airport-inspired puzzle
	Prashant Patil on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…

The Golden-Thompson inequality

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

34 comments

Leave a reply to Brian Davies Cancel reply

For commenters

The Golden-Thompson inequality

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

34 comments

Leave a reply to Brian Davies Cancel reply

For commenters