As many readers may already know, my good friend and fellow mathematical blogger Tim Gowers, having wrapped up work on the Princeton Companion to Mathematics (which I believe is now in press), has begun another mathematical initiative, namely a “Tricks Wiki” to act as a repository for mathematical tricks and techniques.    Tim has already started the ball rolling with several seed articles on his own blog, and asked me to also contribute some articles.  (As I understand it, these articles will be migrated to the Wiki in a few months, once it is fully set up, and then they will evolve with edits and contributions by anyone who wishes to pitch in, in the spirit of Wikipedia; in particular, articles are not intended to be permanently authored or signed by any single contributor.)

So today I’d like to start by extracting some material from an old post of mine on “Amplification, arbitrage, and the tensor power trick” (as well as from some of the comments), and converting it to the Tricks Wiki format, while also taking the opportunity to add a few more examples.

Title: The tensor power trick

Quick description: If one wants to prove an inequality X \leq Y for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality X \leq CY that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers.  If all goes well, one can show that X^M \leq C Y^M for all M \geq 1, with a constant C which is independent of M, which implies that X \leq Y as desired by taking M^{th} roots and then taking limits as M \to \infty.

General discussion: This trick works best when using techniques which are “dimension-independent” (or depend only weakly (e.g. polynomially) on the ambient dimension M).  On the other hand, the constant C is allowed to depend on other parameters than the dimension M.  In particular, even if the eventual inequality X \leq Y one wishes to prove is supposed to be uniform over all choices of some auxiliary parameter n, it is possible to establish this estimate by first establishing a non-uniform estimate X \leq C_n Y, so long as this estimate “tensorises” to X^M \leq C_n Y^M, where n does not grow with M.

It is of course essential that the problem “commutes” or otherwise “plays well” with the tensor power operation in order for the trick to be effective.  In order for this to occur, one often has to phrase the problem in a sufficiently abstract setting, for instance generalising a one-dimensional problem to one in arbitrary dimensions.

Prerequisites: Advanced undergraduate analysis (real, complex, and Fourier).  The user of the trick also has to be very comfortable with working in high-dimensional spaces such as {\Bbb R}^M, or more generally G^M for some mathematical object G (e.g. a set, a group, a measure space, etc.)  The later examples are more advanced but are only given as sketches.

Example 1. (Convexity of L^p norms)  Let f: {\Bbb R} \to {\Bbb C} be a measurable function such that \int_{\Bbb R} |f(x)|^p\ dx \leq 1 and \int_{\Bbb R} |f(x)|^q\ dx \leq 1 for some 0 < p < q < \infty.  Show that \int_{\Bbb R} |f(x)|^r\ dx \leq 1 for all p < r < q.

As a first attempt to solve this problem, observe that |f(x)|^r is less than |f(x)|^q when |f(x)| \geq 1, and is less than |f(x)|^p when |f(x)| \leq 1.  Thus the inequality |f(x)|^r \leq |f(x)|^p + |f(x)|^q holds for all x, regardless of whether |f(x)| is larger or smaller than 1.   This gives us the bound \int_{\Bbb R} |f(x)|^r\ dx \leq 2, which is off by a factor of 2 from what we need.

But we can eliminate this loss of 2 by the tensor power trick. We pick a large integer M \ge 1, and define the tensor power f^{\otimes M}: {\Bbb R}^M \to {\Bbb C} of f by the formula

f^{\otimes M}(x_1,\ldots,x_M) = f(x_1) \ldots f(x_M).

From the Fubini-Tonelli theorem we see that

\int_{{\Bbb R}^M} |f^{\otimes M}|^p\ dx = (\int_{\Bbb R} |f|^p\ dx)^M \leq 1

and similarly

\int_{{\Bbb R}^M} |f^{\otimes M}|^q\ dx = (\int_{\Bbb R} |f|^q\ dx)^M \leq 1.

If we thus apply our previous arguments with f and {\Bbb R} replaced by f^{\otimes M} and {\Bbb R}^M respectively, we conclude that

\int_{{\Bbb R}^M} |f^{\otimes M}|^r\ dx \leq 2;

applying Fubini-Tonelli again we conclude that

(\int_{\Bbb R} |f|^r\ dx)^M \leq 2.

Taking M^{th} roots and then taking limits as M \to \infty we obtain the claim.

Exercise. More generally, show that if 0 < p < r < q < \infty, (X,\mu) is a measure space, and f: X \to {\Bbb C} is measurable, then we have the inequality

\| f\|_{L^r(X,\mu)} \leq \|f\|_{L^p(X,\mu)}^{1-\theta} \|f\|_{L^q(X,\mu)}^\theta

whenever the right-hand side is finite, where 0 < \theta < 1 is such that \frac{1}{r} = (1-\theta) \frac{1}{p} + \theta \frac{1}{q}.  (Hint: by multiplying f and \mu by appropriate constants one can normalise \|f\|_{L^p(X,\mu)} = \|f\|_{L^q(X,\mu)}=1.)

Exercise. Use the previous exercise (and a clever choice of f, r, \theta and \mu – there is more than one choice available) to prove Hölder’s inequality.

Example 2. (The maximum principle)  This example is due to Landau.  Let \gamma be a simple closed curve in the complex plane that bounds a domain D, and let f: \overline{D} \to {\Bbb C} be a function which is complex analytic in the closure of this domain, and which obeys a bound |f(z)| \leq A on the boundary \gamma.  The maximum principle for such functions asserts that one also has |f(z)| \leq A on the interior D as well.  One way to prove this is by using the Cauchy integral formula

\displaystyle f(z) =\frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

for z \in D (assuming that \gamma is oriented anti-clockwise).  Taking absolute values and using the triangle inequality, we obtain the crude bound

\displaystyle |f(z)| \leq \frac{1}{2\pi} \frac{|\gamma|}{\hbox{dist}(z,\gamma)} A

where |\gamma| is the length of \gamma.  This bound is off by a factor of \frac{1}{2\pi} \frac{|\gamma|}{\hbox{dist}(z,\gamma)}.  This loss depends on the point z and the curve \gamma, but it is crucial to observe that it does not depend on f.  In particular, one can apply it with f replaced by f^M (and A replaced by A^M) for any positive integer M, noting that f^M is of course also complex analytic.  We conclude that

\displaystyle |f(z)|^M \leq \frac{1}{2\pi} \frac{|\gamma|}{\hbox{dist}(z,\gamma)} A^M

and on taking M^{th} roots and then taking limits as M \to \infty we obtain the maximum principle.

Example 3. (New Strichartz estimates from old; this observation is due to Jon Bennet) Technically, this is not an application of the tensor power trick as we will not let the dimension go off to infinity, but it is certainly in a similar spirit.

Strichartz estimates are an important tool in the theory of linear and nonlinear dispersive equations.  Here is a typical such estimate: if u: {\Bbb R} \times {\Bbb R}^2 \to {\Bbb C} solves the two-dimensional Schrödinger equation i u_t + \Delta u = 0, then one has the inequality

\| u \|_{L^4_t L^4_x( {\Bbb R} \times {\Bbb R}^2 )} \leq C \| u(0) \|_{L^2( {\Bbb R}^2 )}

for some absolute constant C.  (In this case, the best value of C is known to equal 1/\sqrt{2}, a result of Foschi and of Hundertmark-Zharnitsky.)  It is possible to use this two-dimensional Strichartz estimate and a version of the tensor power trick to deduce a one-dimensional Strichartz estimate.  Specifically, if u: {\Bbb R} \times {\Bbb R} \to {\Bbb C} solves the one-dimensional Schrödinger equation iu_t + u_{xx} = 0, then observe that the tensor square u^{\otimes 2}(t,x,y) := u(t,x) u(t,y) solves the two-dimensional Schrödinger equation.  (This tensor product symmetry of the Schrödinger equation is fundamental in quantum physics; it allows one to model many-particle systems and single-particle systems by essentially the same equation.)  Applying the above Strichartz inequality to this product we conclude that

\| u \|_{L^8_t L^4_x( {\Bbb R} \times {\Bbb R} )} \leq C^{1/2} \| u(0) \|_{L^2( {\Bbb R} )}.

Remark. A similar trick allows us to deduce “interaction” or “many-particle” Morawetz estimates for the Schrödinger equation from their more traditional “single-particle” counterparts; see for instance Chapter 3.5 of my book.

Example 4. (Hausdorff-Young inequality)  Let G be a finite abelian group, and let f: G \to {\Bbb C} be a function.  We let \hat G be the group of characters \chi: G \to S^1 of G, and define the Fourier transform \hat f: \hat G \to {\Bbb C} by the formula

\hat f(\xi) := \frac{1}{|G|} \sum_{x \in G} f(x) \overline{\chi(x)}.

From the triangle inequality we have

\displaystyle \sup_{\xi \in \hat G} |\hat f(\xi)| \leq \frac{1}{|G|} \sum_{x \in G} |f(x)| (1)

while from Plancherel’s theorem we have

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^2)^{1/2} = (\frac{1}{|G|} \sum_{x \in G} |f(x)|^2)^{1/2} (2)

By applying the Riesz-Thorin interpolation theorem, we can then conclude the Hausdorff-Young inequality

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^q)^{1/q} \leq (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p} (3)

whenever 1 < p < 2 and \frac{1}{p}+\frac{1}{q}=1.  However, it is also possible to deduce (3) from (2) and (1) by a more elementary method based on the tensor power trick.  First suppose that f is supported on a set A \subset G and that |f| takes values between (say) 2^m and 2^{m+1} on A.  Then from (1) and (2) we have

\displaystyle \sup_{\xi \in \hat G} |\hat f(\xi)| \leq \frac{|A|}{|G|} 2^{m+1}

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^2)^{1/2} \leq (\frac{|A|}{|G|})^{1/2} 2^{m+1}

from which we establish a “restricted weak-type” version of Hausdorff-Young, namely

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^q)^{1/q} \leq (\frac{|A|}{|G|})^{1/p} 2^{m+1}

and thus

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^q)^{1/q} \leq 2 (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p}.

This inequality is restricted to those f whose non-zero magnitudes |f(x)| range inside a dyadic interval {}[2^m, 2^{m+1}], but by the technique of dyadic decomposition one can remove this restriction at the cost of an additional factor of O( 1 + \log |G| ), thus obtaining the estimate

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^q)^{1/q} \leq C (1 + \log |G|) (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p} (3′)

for general f.  (There are of course an infinite number of dyadic scales {}[2^m, 2^{m+1}] in the world, but if one normalises (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p} = 1, then it is not hard to see that any scale above, say, |G|^{100} or below |G|^{-100} is very easy to deal with, leaving only O( 1 + \log |G| ) dyadic scales to consider.)

The estimate (3′) is off by a factor of O( 1 + \log |G| ) from the true Hausdorff-Young inequality, but if one replaces G with a Cartesian power G^M and f with its tensor power f^{\otimes M}, and makes the crucial observation that the Fourier transform of a tensor power is the tensor power of the Fourier transform, we see from applying (3′) to the tensor power that

\displaystyle (\sum_{\xi \in \hat G} |\hat f(\xi)|^q)^{M/q} \leq C (1 + \log |G|^M) (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{M/p}.

Taking M^{th} roots and sending M \to \infty we obtain the desired inequality.  Note that the logarithmic dependence on |G| in the constants turned out not to be a problem, because M^{1/M} \to 1.  Thus the tensor power trick is able to handle a certain amount of dependence on the dimension in the constants, as long as the loss does not grow too rapidly in that dimension.

Exercise. Establish Young’s inequality \|f*g\|_{l^r(G)} \leq \|f\|_{l^p(G)} \|g\|_{l^q(G)} for 1 \leq p,q,r < \infty and \frac{1}{r}+1=\frac{1}{p}+\frac{1}{q} and finite abelian groups G, where f*g(x) := \sum_G f(y) g(x-y), by the same method.

Exercise. Prove the Riesz-Thorin interpolation theorem  (in the regime q \geq p) by this method, thus avoiding all use of complex analysis.  (Note the similarity here with Example 1 and Example 2.)

Example 5. (An example from additive combinatorics, due to Imre Ruzsa.) An important inequality of Plünnecke asserts, among other things, that for finite non-empty sets A, B of an additive group G, and any positive integer k, the iterated sumset kB = B + \ldots + B obeys the bound

\displaystyle |kB| \leq \frac{|A+B|^k}{|A|^{k-1}}. (4)

(This inequality, incidentally, is itself proven using a version of the tensor power trick, in conjunction with Hall’s marriage theorem, but never mind that here.) This inequality can be amplified to the more general inequality

\displaystyle |B_1 + \ldots + B_k| \leq \frac{|A+B_1| \ldots |A+B_k|}{|A|^{k-1}}

via the tensor power trick as follows. Applying (4) with B := B_1 \cup \ldots \cup B_k, we obtain

\displaystyle |B_1 + \ldots + B_k| \leq \frac{(|A+B_1| + \ldots + |A+B_k|)^k}{|A|^{k-1}}.

The right-hand side looks a bit too big, but we can resolve this problem with a Cartesian product trick (which can be viewed as a cousin of the tensor product trick). If we replace G with the larger group G \times {\Bbb Z}^k and replace each set B_i with the larger set B_i \times \{ e_i, 2e_i, \ldots, N_i e_i \}, where e_1,\ldots,e_k is the standard basis for {\Bbb Z}^k and N_i are arbitrary positive integers (and replacing A with A \times \{0\}), we obtain

\displaystyle N_1 \ldots N_k |B_1 + \ldots + B_k| \leq \frac{(N_1 |A+B_1| + \ldots + N_k |A+B_k|)^k}{|A|^{k-1}}.

Optimising this in N_1,\ldots,N_k (basically, by making the N_i |A+B_i| close to constant; this is a general rule in optimisation, namely that to optimise X+Y it makes sense to make X and Y comparable in magnitude) we obtain the amplified estimate

\displaystyle |B_1 + \ldots + B_k| \leq C_k \frac{|A+B_1| \ldots |A+B_k|}{|A|^{k-1}}.

for some constant C_k; but then if one replaces A, B_1, \ldots, B_k with their Cartesian powers A^M, B_1^M, \ldots, B_k^M, takes M^{th} roots, and then sends M to infinity, we can delete the constant C_k and recover the inequality.

Example 6. (Cotlar-Knapp-Stein lemma)  This example is not exactly a use of the tensor power trick, but is very much in the same spirit.  Let T_1, T_2, \ldots, T_N: H \to H be bounded linear operators on a Hilbert space; for simplicity of discussion let us say that they are self-adjoint, though one can certainly generalise the discussion here to more general operators.  If we assume that the operators are uniformly bounded in the operator norm, say

\| T_i \|_{op} \leq A (5)

for all i and some fixed A, then this is not enough to ensure that the sum \sum_{i=1}^N T_i is bounded uniformly in N; indeed, the operator norm can be as large as AN.  If however one has the stronger almost orthogonality estimate

\sum_{j=1}^N \| T_i T_j \|_{op}^{1/2} \leq A (6)

for all i, then the very useful Cotlar-Knapp-Stein lemma asserts that \sum_{i=1}^N T_i is now bounded uniformly in N, with \| \sum_{i=1}^N T_i \|_{op} \leq A.  To prove this, we first recall that a direct application of the triangle inequality using (5) (which is a consequence of (6)) only gave the inferior bound of AN.  To improve this, first observe from self-adjointness that

\| \sum_{i=1}^N T_i \|_{op} = \| (\sum_{i=1}^N T_i)^2 \|_{op}^{1/2}.

Expanding the right-hand side out using the triangle inequality, we obtain

\| \sum_{i=1}^N T_i \|_{op} \leq (\sum_{i=1}^N \sum_{j=1}^N \| T_i T_j \|_{op})^{1/2};

using (6) one soon ends up with a bound of A N^{1/2}.  This is better than before as far as the dependence on N is concerned, but one can do better by using higher powers.  In particular, if we raise \sum_{i=1}^N to the 2M^{th} power for some M, and repeat the previous arguments, we end up with

\| \sum_{i=1}^N T_i \|_{op} \leq (\sum_{1 \leq i_1,\ldots,i_{2M} \leq N} \| T_{i_1} \ldots T_{i_{2M}} \|_{op})^{1/2M}.

Now, one can estimate the operator norm \| T_{i_1} \ldots T_{i_{2M}} \|_{op} either by

\| T_{i_1} T_{i_2} \|_{op} \ldots \| T_{i_{2M-1}} T_{i_{2M}} \|_{op},

or (using (5)) by

A^2 \| T_{i_2} T_{i_3} \|_{op} \ldots \| T_{i_{2M-2}} T_{i_{2M-1}} \|_{op}.

We take the geometric mean of these upper bounds to obtain

A \| T_{i_1} T_{i_2} \|_{op}^{1/2} \| T_{i_2} T_{i_3} \|_{op}^{1/2} \ldots \| T_{i_{2M-1}} T_{i_{2M}} \|_{op}^{1/2}.

Summing this in i_{2M}, then i_{2M-1}, and so forth down to i_1 using (6) repeatedly, one eventually establishes the bound

\| \sum_{i=1}^N T_i \|_{op} \leq (N A^{2M})^{1/2M}.

Sending M \to \infty one eliminates the dependence on N and obtains the claim.

Exercise. Show that the hypothesis of self-adjointness can be dropped if one replaces (6) with the two conditions

\sum_{j=1}^N \| T_i T_j^* \|_{op}^{1/2} \leq A

and

\sum_{j=1}^N \| T_i^* T_j \|_{op}^{1/2} \leq A

for all i.

Example 7. (Entropy estimates) Suppose that X is a random variable taking finitely many values.  The Shannon entropy H(X) of this random variable is defined by the formula

H(X) := - \sum_x {\Bbb P}(X=x) \log {\Bbb P}(X=x), (7)

where x runs over all possible values of X.  For instance, if X is uniformly distributed over N values, then

H(X)=\log N. (8)

If X is not uniformly distributed, then the formula is not quite as simple.  However, the entropy formula (7) does simplify to the uniform distribution formula (8) after using the tensor power trick.  More precisely, let X^{\otimes M} = (X_1,\ldots,X_M) be the random variable formed by taking M independent and identicaly distributed samples of X; thus for instance, if X was uniformly distributed on N values, then X^{\otimes M} is uniformly distributed on N^M values.  For more general X, it is not hard to verify the formula

H( X^{\otimes M} ) = M H(X),(9)

which is of course consistent with the uniformly distributed case thanks to (8).

A key observation is that as M \to \infty, the probability distribution of X^{\otimes M} becomes “asymptotically uniform” in a certain sense.  Indeed, the law of large numbers tells us that with very high probability, each possible value x of X will be attained by ({\Bbb P}(X=x)+o(1)) M of the M trials X_1,\ldots,X_M.  The number of possible configurations of X^{\otimes M} = (X_1,\ldots,X_M) which are consistent with this distribution can be computed (using Stirling’s formula) to be e^{M (H(X)+o(1))}, and each such configuration appears with probability e^{-M (H(X)+o(1))} (again by Stirling’s formula).  Thus, at a heuristic level at least, X^{\otimes M} behaves like a uniform distribution on e^{M (H(X)+o(1))} possible values; note that this is consistent with (8) and (9), and can in fact be taken as a definition of Shannon entropy.

One can use this “microstate” or “statistical mechanics” interpretation of entropy in conjunction with the tensor power trick to give short (heuristic) proofs of various fundamental entropy inequalities, such as the subadditivity inequality

H(X,Y) \leq H(X) + H(Y)

whenever X, Y are discrete random variables which are not necessarily independent.  Indeed, since X^{\otimes M} and Y^{\otimes M} (heuristically) take only e^{M (H(X)+o(1))} and e^{M (H(Y)+o(1))} values respectively, then (X,Y)^{\otimes M} \equiv (X^{\otimes M}, Y^{\otimes M}) will (mostly) take on at most e^{M (H(X)+o(1))} e^{M (H(Y)+o(1))} values.  On the other hand, this random variable is supposed to behave like a uniformly distributed random variable over e^{M (H(X,Y)+o(1))} values.  These facts are only compatible if

e^{M (H(X,Y)+o(1))} \leq e^{M (H(X)+o(1))} e^{M (H(Y)+o(1))};

taking M^{th} roots and then sending M \to \infty we obtain the claim.

Exercise. Make the above arguments rigorous.  (The final proof will be significantly longer than the standard proof of subadditivity based on Jensen’s inequality, but it may be clearer conceptually.  One may also compare the arguments here with those in Example 1.)

Example 8. (Monotonicity of Perelman’s reduced volume)  One of the key ingredients of Perelman’s proof of the Poincaré conjecture is a monotonicity formula for Ricci flows, that establishes that a certain geometric quantity, now known as the Perelman reduced volume, increases as time goes to negative infinity.  Perelman gave both a rigorous proof and a heuristic proof of this formula.  The heuristic proof is much shorter, and proceeds by first (formally) applying the Bishop-Gromov inequality for Ricci flat metrics (which shows that another geometric quantity – the Bishop-Gromov reduced volume, increases as the radius goes to infinity) not to the Ricci flow itself, but to a high-dimensional version of this Ricci flow formed by adjoining an M-dimensional sphere to the flow in a certain way, and then taking (formal) limits as M \to \infty.  This is not precisely the tensor power trick, but is certainly in a similar spirit.  For further discussion see “285G Lecture 9: Comparison geometry, the high-dimensional limit, and Perelman’s reduced volume.”

Example 9. (Riemann hypothesis for function fields)  This example is much deeper than the previous ones, and I am not qualified to explain it in its entirety, but I can at least describe the one piece of this argument which uses the tensor power trick.  For simplicity let us restrict attention to the Riemann hypothesis for curves C over a finite field F_{p^M} of prime power order.  Using the Riemann-Roch theorem and some other tools from arithmetic geometry, one can show that the number |C(F_{p^M})| of points in C taking values (projectively) in F_{p^r} is given by the formula

|C(F_{p^M})| = p^M - \sum_{i=1}^{2g} \omega_i^M + 1 (10)

where g is the genus of the curve, and \omega_1,\ldots,\omega_{2g} are complex numbers that depend on p but not on M.  Weil showed that all of these complex numbers had magnitude exactly p^{1/2} (the elliptic curve case g=1 being done earlier by Hasse, and the g=0 case morally going all the way back to Diophantus!), which is the analogue of the Riemann hypotheses for such curves. There is also an analogue of the functional equation for these curves, which asserts that the 2g numbers \omega_1,\ldots,\omega_{2g} come in pairs \omega, p/\omega.

As a corollary, we see that |C(F_{p^M})| stays quite close to p^M:

\left| |C(F_{p^M})| - p^M - 1 \right| \leq 2g p^{M/2}.

But because of the formula (10) and the functional equation, one can in fact deduce the Riemann hypothesis from the apparently weaker statement

|C(F_{p^M})| = p^M + O( p^{M/2} ) (12)

where the implied constants can depend on the curve C, the prime p, and the genus g, but must be independent of the “dimension” M.  Indeed, from (12) and (11) one can see that the largest magnitude of the \omega_1,\ldots,\omega_{2g} (which can be viewed as a sort of “spectral radius” for an underlying operator) is at most p^{1/2}; combining this with the functional equation, one obtains the Riemann hypothesis.

[This is of course only a small part of the story; the proof of (12) is by far the hardest part of the whole proof, and beyond the scope of this article.]

Further reading: The blog post “Amplification, arbitrage, and the tensor power trick” discusses the tensor product trick as an example of a more general “amplification trick”.