Let be two Hermitian matrices. When and commute, we have the identity
When and do not commute, the situation is more complicated; we have the Baker-Campbell-Hausdorff formula
where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity
Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between and , namely the Golden-Thompson inequality
The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices are required. Note that the right-hand side can be rearranged using the cyclic property of trace as ; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.
To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in . The left-hand side expands as
while the right-hand side expands as
Using the cyclic property of trace , one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out and using the cyclic property of trace as much as possible, we see that the fourth order terms almost agree, but the left-hand side contains a term whose counterpart on the right-hand side is . The difference between the two can be factorised (again using the cyclic property of trace) as . Since is skew-Hermitian, is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)
Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the factors lie on one side of a product and all the factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.
The proof of the Golden-Thompson inequality relies on the somewhat magical power of the tensor power trick. For any even integer and any matrix (not necessarily Hermitian), we define the -Schatten norm of by the formula
(This formula in fact defines a norm for any , but we will only need the even integer case here.) This norm can be viewed as a non-commutative analogue of the norm; indeed, the -Schatten norm of a diagonal matrix is just the norm of the coefficients.
Note that the -Schatten norm
This is clearly a non-negative Hermitian inner product, so by the Cauchy-Schwarz inequality we conclude that
for any matrices . As , we conclude in particular that
On the other hand, we may expand
We use the cyclic property of trace to move the rightmost factor to the left. Applying the induction hypothesis again, we conclude that
But from the cyclic property of trace again, we have and . We conclude that
Remark 1 Though we will not need to do so here, it is interesting to note that one can use the tensor power trick to amplify (2) for equal to a power of two, to obtain (2) for all positive integers , at least when the are all Hermitian. Indeed, pick a large integer and let be the integer part of . Then expand the left-hand side of (2) as and apply (2) with replaced by to bound this by . Sending (noting that ) we obtain the claim.
Specialising (2) to the case where for some Hermitian matrices , we conclude that
and hence by cyclic permutation
Applying this with replaced by and respectively, we obtain
Now we send . Since and , we have , and so the left-hand side is ; taking the limit as we obtain the Golden-Thompson inequality. (See also these notes of Vershynin for a slight variant of this proof.)
If we stop the iteration at an earlier point, then the same argument gives the inequality
for a power of two; one can view the original Golden-Thompson inequality as the endpoint of this case in some sense. (In fact, the Golden-Thompson inequality is true in any operator norm; see Theorem 9.3.7 of Bhatia’s book.) In the limit , we obtain in particular the operator norm inequality
This inequality has a nice consequence:
Proof: Since , we have for all vectors , or in other words for all . This implies that is a contraction, i.e. . By (5), we conclude that , thus , and the claim follows.
It is remarkably tricky to try to prove Corollary 1 directly. Here is a somewhat messy proof; I would be interested in seeing a more elegant argument. By the fundamental theorem of calculus, it suffices to show that whenever is a Hermitian matrix depending smoothly on a real parameter with , then . Indeed, Corollary 1 follows from this claim by setting and concluding that .
To obtain this claim, we use the Duhamel formula
This formula can be proven by Taylor expansion, or by carefully approximating by ; alternatively, one can integrate the identity
which follows from the product rule and by interchanging the and derivatives at a key juncture. We rearrange the Duhamel formula as
Using the basic identity , we thus have
formally evaluating the integral, we obtain
As was positive semi-definite by hypothesis, is also. It thus suffices to show that for any Hermitian , the operator preserves the property of being semi-definite.
Note that for any real , the operator maps a positive semi-definite matrix to another positive semi-definite matrix, namely . By the Fourier inversion formula, it thus suffices to show that the kernel is positive semi-definite in the sense that it has non-negative Fourier transform (cf. Bochner’s theorem). But a routine (but somewhat tedious) application of contour integration shows that the Fourier transform is given by the formula , and the claim follows.
Because of the Golden-Thompson inequality, many applications of the exponential moment method in commutative probability theory can be extended without difficulty to the non-commutative case, as was observed by Ahlswede and Winter. For instance, consider (a special case of) the Chernoff inequality
for any , where are iid scalar random variables taking values in of mean zero and with total variance (i.e. each factor has variance ). We briefly sketch the standard proof of this inequality. We first use Markov’s inequality to obtain
for some parameter to be optimised later. In the scalar case, we can factor as and then use the iid hypothesis to write the right-hand side as
An elementary Taylor series computation then reveals the bound when ; inserting this bound and optimising in we obtain the claim.
Now suppose that are iid Hermitian matrices. One can try to adapt the above method to control the size of the sum . The key point is then to bound expressions such as
As need not commute, we cannot separate the product completely. But by Golden-Thompson, we can bound this expression by
which by independence we can then factorise as
As the matrices involved are positive definite, we can then take out the final factor in operator norm:
Iterating this procedure, we can eventually obtain the bound
Combining this with the rest of the Chernoff inequality argument, we can establish a matrix generalisation
of the Chernoff inequality, under the assumption that the are iid with mean zero, have operator norm bounded by , and have total variance equal to ; see for instance these notes of Vershynin for details.
Further discussion of the use of the Golden-Thompson inequality and its variants to non-commutative Chernoff-type inequalities can be found in this paper of Gross, these notes of Vershynin and this recent survey of Tropp. It seems that the use of this inequality may be quite useful in simplifying the proofs of several of the basic estimates in this subject.