As many readers may already know, my good friend and fellow mathematical blogger Tim Gowers, having wrapped up work on the Princeton Companion to Mathematics (which I believe is now in press), has begun another mathematical initiative, namely a “Tricks Wiki” to act as a repository for mathematical tricks and techniques. Tim has already started the ball rolling with several seed articles on his own blog, and asked me to also contribute some articles. (As I understand it, these articles will be migrated to the Wiki in a few months, once it is fully set up, and then they will evolve with edits and contributions by anyone who wishes to pitch in, in the spirit of Wikipedia; in particular, articles are not intended to be permanently authored or signed by any single contributor.)
So today I’d like to start by extracting some material from an old post of mine on “Amplification, arbitrage, and the tensor power trick” (as well as from some of the comments), and converting it to the Tricks Wiki format, while also taking the opportunity to add a few more examples.
Title: The tensor power trick
Quick description: If one wants to prove an inequality for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers. If all goes well, one can show that for all , with a constant C which is independent of M, which implies that as desired by taking roots and then taking limits as .
General discussion: This trick works best when using techniques which are “dimension-independent” (or depend only weakly (e.g. polynomially) on the ambient dimension M). On the other hand, the constant C is allowed to depend on other parameters than the dimension M. In particular, even if the eventual inequality one wishes to prove is supposed to be uniform over all choices of some auxiliary parameter n, it is possible to establish this estimate by first establishing a non-uniform estimate , so long as this estimate “tensorises” to , where n does not grow with M.
It is of course essential that the problem “commutes” or otherwise “plays well” with the tensor power operation in order for the trick to be effective. In order for this to occur, one often has to phrase the problem in a sufficiently abstract setting, for instance generalising a one-dimensional problem to one in arbitrary dimensions.
Prerequisites: Advanced undergraduate analysis (real, complex, and Fourier). The user of the trick also has to be very comfortable with working in high-dimensional spaces such as , or more generally for some mathematical object G (e.g. a set, a group, a measure space, etc.) The later examples are more advanced but are only given as sketches.
Example 1. (Convexity of norms) Let be a measurable function such that and for some . Show that for all .
As a first attempt to solve this problem, observe that is less than when , and is less than when . Thus the inequality holds for all x, regardless of whether is larger or smaller than 1. This gives us the bound , which is off by a factor of 2 from what we need.
But we can eliminate this loss of 2 by the tensor power trick. We pick a large integer , and define the tensor power of f by the formula
From the Fubini-Tonelli theorem we see that
If we thus apply our previous arguments with and replaced by and respectively, we conclude that
applying Fubini-Tonelli again we conclude that
Taking roots and then taking limits as we obtain the claim.
Exercise. More generally, show that if , is a measure space, and is measurable, then we have the inequality
whenever the right-hand side is finite, where is such that . (Hint: by multiplying f and by appropriate constants one can normalise .)
Exercise. Use the previous exercise (and a clever choice of f, r, and – there is more than one choice available) to prove Hölder’s inequality.
Example 2. (The maximum principle) This example is due to Landau. Let be a simple closed curve in the complex plane that bounds a domain D, and let be a function which is complex analytic in the closure of this domain, and which obeys a bound on the boundary . The maximum principle for such functions asserts that one also has on the interior D as well. One way to prove this is by using the Cauchy integral formula
for (assuming that is oriented anti-clockwise). Taking absolute values and using the triangle inequality, we obtain the crude bound
where is the length of . This bound is off by a factor of . This loss depends on the point z and the curve , but it is crucial to observe that it does not depend on f. In particular, one can apply it with f replaced by (and A replaced by ) for any positive integer M, noting that is of course also complex analytic. We conclude that
and on taking roots and then taking limits as we obtain the maximum principle.
Example 3. (New Strichartz estimates from old; this observation is due to Jon Bennet) Technically, this is not an application of the tensor power trick as we will not let the dimension go off to infinity, but it is certainly in a similar spirit.
Strichartz estimates are an important tool in the theory of linear and nonlinear dispersive equations. Here is a typical such estimate: if solves the two-dimensional Schrödinger equation , then one has the inequality
for some absolute constant C. (In this case, the best value of C is known to equal , a result of Foschi and of Hundertmark-Zharnitsky.) It is possible to use this two-dimensional Strichartz estimate and a version of the tensor power trick to deduce a one-dimensional Strichartz estimate. Specifically, if solves the one-dimensional Schrödinger equation , then observe that the tensor square solves the two-dimensional Schrödinger equation. (This tensor product symmetry of the Schrödinger equation is fundamental in quantum physics; it allows one to model many-particle systems and single-particle systems by essentially the same equation.) Applying the above Strichartz inequality to this product we conclude that
Remark. A similar trick allows us to deduce “interaction” or “many-particle” Morawetz estimates for the Schrödinger equation from their more traditional “single-particle” counterparts; see for instance Chapter 3.5 of my book.
From the triangle inequality we have
while from Plancherel’s theorem we have
whenever and . However, it is also possible to deduce (3) from (2) and (1) by a more elementary method based on the tensor power trick. First suppose that f is supported on a set and that |f| takes values between (say) and on A. Then from (1) and (2) we have
from which we establish a “restricted weak-type” version of Hausdorff-Young, namely
This inequality is restricted to those f whose non-zero magnitudes |f(x)| range inside a dyadic interval , but by the technique of dyadic decomposition one can remove this restriction at the cost of an additional factor of , thus obtaining the estimate
for general f. (There are of course an infinite number of dyadic scales in the world, but if one normalises , then it is not hard to see that any scale above, say, or below is very easy to deal with, leaving only dyadic scales to consider.)
The estimate (3′) is off by a factor of from the true Hausdorff-Young inequality, but if one replaces G with a Cartesian power and f with its tensor power , and makes the crucial observation that the Fourier transform of a tensor power is the tensor power of the Fourier transform, we see from applying (3′) to the tensor power that
Taking roots and sending we obtain the desired inequality. Note that the logarithmic dependence on |G| in the constants turned out not to be a problem, because . Thus the tensor power trick is able to handle a certain amount of dependence on the dimension in the constants, as long as the loss does not grow too rapidly in that dimension.
Exercise. Establish Young’s inequality for and and finite abelian groups G, where , by the same method.
Exercise. Prove the Riesz-Thorin interpolation theorem by this method, thus avoiding all use of complex analysis. (Note the similarity here with Example 1 and Example 2.)
Example 5. (An example from additive combinatorics, due to Imre Ruzsa.) An important inequality of Plünnecke asserts, among other things, that for finite non-empty sets A, B of an additive group G, and any positive integer k, the iterated sumset obeys the bound
(This inequality, incidentally, is itself proven using a version of the tensor power trick, in conjunction with Hall’s marriage theorem, but never mind that here.) This inequality can be amplified to the more general inequality
via the tensor power trick as follows. Applying (4) with , we obtain
The right-hand side looks a bit too big, but we can resolve this problem with a Cartesian product trick (which can be viewed as a cousin of the tensor product trick). If we replace G with the larger group and replace each set with the larger set , where is the standard basis for and are arbitrary positive integers (and replacing A with ), we obtain
Optimising this in (basically, by making the close to constant; this is a general rule in optimisation, namely that to optimise X+Y it makes sense to make X and Y comparable in magnitude) we obtain the amplified estimate
for some constant ; but then if one replaces with their Cartesian powers , takes roots, and then sends M to infinity, we can delete the constant and recover the inequality.
Example 6. (Cotlar-Knapp-Stein lemma) This example is not exactly a use of the tensor power trick, but is very much in the same spirit. Let be bounded linear operators on a Hilbert space; for simplicity of discussion let us say that they are self-adjoint, though one can certainly generalise the discussion here to more general operators. If we assume that the operators are uniformly bounded in the operator norm, say
for all i and some fixed A, then this is not enough to ensure that the sum is bounded uniformly in N; indeed, the operator norm can be as large as AN. If however one has the stronger almost orthogonality estimate
for all i, then the very useful Cotlar-Knapp-Stein lemma asserts that is now bounded uniformly in N, with . To prove this, we first recall that a direct application of the triangle inequality using (5) (which is a consequence of (6)) only gave the inferior bound of AN. To improve this, first observe from self-adjointness that
Expanding the right-hand side out using the triangle inequality, we obtain
using (6) one soon ends up with a bound of . This is better than before as far as the dependence on N is concerned, but one can do better by using higher powers. In particular, if we raise to the power for some M, and repeat the previous arguments, we end up with
Now, one can estimate the operator norm either by
or (using (5)) by
We take the geometric mean of these upper bounds to obtain
Summing this in , then , and so forth down to using (6) repeatedly, one eventually establishes the bound
Sending one eliminates the dependence on N and obtains the claim.
Exercise. Show that the hypothesis of self-adjointness can be dropped if one replaces (6) with the two conditions
for all i.
Example 7. (Entropy estimates) Suppose that X is a random variable taking finitely many values. The Shannon entropy H(X) of this random variable is defined by the formula
where x runs over all possible values of X. For instance, if X is uniformly distributed over N values, then
If X is not uniformly distributed, then the formula is not quite as simple. However, the entropy formula (7) does simplify to the uniform distribution formula (8) after using the tensor power trick. More precisely, let be the random variable formed by taking M independent and identicaly distributed samples of X; thus for instance, if X was uniformly distributed on N values, then is uniformly distributed on values. For more general X, it is not hard to verify the formula
which is of course consistent with the uniformly distributed case thanks to (8).
A key observation is that as , the probability distribution of becomes “asymptotically uniform” in a certain sense. Indeed, the law of large numbers tells us that with very high probability, each possible value x of X will be attained by of the M trials . The number of possible configurations of which are consistent with this distribution can be computed (using Stirling’s formula) to be , and each such configuration appears with probability (again by Stirling’s formula). Thus, at a heuristic level at least, behaves like a uniform distribution on possible values; note that this is consistent with (8) and (9), and can in fact be taken as a definition of Shannon entropy.
One can use this “microstate” or “statistical mechanics” interpretation of entropy in conjunction with the tensor power trick to give short (heuristic) proofs of various fundamental entropy inequalities, such as the subadditivity inequality
whenever X, Y are discrete random variables which are not necessarily independent. Indeed, since and (heuristically) take only and values respectively, then will (mostly) take on at most values. On the other hand, this random variable is supposed to behave like a uniformly distributed random variable over values. These facts are only compatible if
taking roots and then sending we obtain the claim.
Exercise. Make the above arguments rigorous. (The final proof will be significantly longer than the standard proof of subadditivity based on Jensen’s inequality, but it may be clearer conceptually. One may also compare the arguments here with those in Example 1.)
Example 8. (Monotonicity of Perelman’s reduced volume) One of the key ingredients of Perelman’s proof of the Poincaré conjecture is a monotonicity formula for Ricci flows, that establishes that a certain geometric quantity, now known as the Perelman reduced volume, increases as time goes to negative infinity. Perelman gave both a rigorous proof and a heuristic proof of this formula. The heuristic proof is much shorter, and proceeds by first (formally) applying the Bishop-Gromov inequality for Ricci flat metrics (which shows that another geometric quantity – the Bishop-Gromov reduced volume, increases as the radius goes to infinity) not to the Ricci flow itself, but to a high-dimensional version of this Ricci flow formed by adjoining an M-dimensional sphere to the flow in a certain way, and then taking (formal) limits as . This is not precisely the tensor power trick, but is certainly in a similar spirit. For further discussion see “285G Lecture 9: Comparison geometry, the high-dimensional limit, and Perelman’s reduced volume.”
Example 9. (Riemann hypothesis for function fields) This example is much deeper than the previous ones, and I am not qualified to explain it in its entirety, but I can at least describe the one piece of this argument which uses the tensor power trick. For simplicity let us restrict attention to the Riemann hypothesis for curves C over a finite field of prime power order. Using the Riemann-Roch theorem and some other tools from arithmetic geometry, one can show that the number of points in C taking values (projectively) in is given by the formula
where g is the genus of the curve, and are complex numbers that depend on p but not on M. Weil showed that all of these complex numbers had magnitude exactly (the elliptic curve case g=1 being done earlier by Hasse, and the g=0 case morally going all the way back to Diophantus!), which is the analogue of the Riemann hypotheses for such curves. There is also an analogue of the functional equation for these curves, which asserts that the 2g numbers come in pairs .
As a corollary, we see that stays quite close to :
But because of the formula (10) and the functional equation, one can in fact deduce the Riemann hypothesis from the apparently weaker statement
where the implied constants can depend on the curve C, the prime p, and the genus g, but must be independent of the “dimension” M. Indeed, from (12) and (11) one can see that the largest magnitude of the (which can be viewed as a sort of “spectral radius” for an underlying operator) is at most ; combining this with the functional equation, one obtains the Riemann hypothesis.
[This is of course only a small part of the story; the proof of (12) is by far the hardest part of the whole proof, and beyond the scope of this article.]
Further reading: The blog post “Amplification, arbitrage, and the tensor power trick” discusses the tensor product trick as an example of a more general “amplification trick”.