You are currently browsing the tag archive for the ‘tensor power trick’ tag.

In this previous blog post I noted the following easy application of Cauchy-Schwarz:

Lemma 1 (Van der Corput inequality) Let {v,u_1,\dots,u_n} be unit vectors in a Hilbert space {H}. Then

\displaystyle  (\sum_{i=1}^n |\langle v, u_i \rangle_H|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H|.

Proof: The left-hand side may be written as {\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H} for some unit complex numbers {\epsilon_i}. By Cauchy-Schwarz we have

\displaystyle  |\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2 \leq \langle \sum_{i=1}^n \epsilon_i u_i, \sum_{j=1}^n \epsilon_j u_j \rangle_H

and the claim now follows from the triangle inequality. \Box

As a corollary, correlation becomes transitive in a statistical sense (even though it is not transitive in an absolute sense):

Corollary 2 (Statistical transitivity of correlation) Let {v,u_1,\dots,u_n} be unit vectors in a Hilbert space {H} such that {|\langle v,u_i \rangle_H| \geq \delta} for all {i=1,\dots,n} and some {0 < \delta \leq 1}. Then we have {|\langle u_i, u_j \rangle_H| \geq \delta^2/2} for at least {\delta^2 n^2/2} of the pairs {(i,j) \in \{1,\dots,n\}^2}.

Proof: From the lemma, we have

\displaystyle  \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H| \geq \delta^2 n^2.

The contribution of those {i,j} with {|\langle u_i, u_j \rangle_H| < \delta^2/2} is at most {\delta^2 n^2/2}, and all the remaining summands are at most {1}, giving the claim. \Box

One drawback with this corollary is that it does not tell us which pairs {u_i,u_j} correlate. In particular, if the vector {v} also correlates with a separate collection {w_1,\dots,w_n} of unit vectors, the pairs {(i,j)} for which {u_i,u_j} correlate may have no intersection whatsoever with the pairs in which {w_i,w_j} correlate (except of course on the diagonal {i=j} where they must correlate).

While working on an ongoing research project, I recently found that there is a very simple way to get around the latter problem by exploiting the tensor power trick:

Corollary 3 (Simultaneous statistical transitivity of correlation) Let {v, u^k_i} be unit vectors in a Hilbert space for {i=1,\dots,n} and {k=1,\dots,K} such that {|\langle v, u^k_i \rangle_H| \geq \delta_k} for all {i=1,\dots,n}, {k=1,\dots,K} and some {0 < \delta_k \leq 1}. Then there are at least {(\delta_1 \dots \delta_K)^2 n^2/2} pairs {(i,j) \in \{1,\dots,n\}^2} such that {\prod_{k=1}^K |\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}. In particular (by Cauchy-Schwarz) we have {|\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2} for all {k}.

Proof: Apply Corollary 2 to the unit vectors {v^{\otimes K}} and {u^1_i \otimes \dots \otimes u^K_i}, {i=1,\dots,n} in the tensor power Hilbert space {H^{\otimes K}}. \Box

It is surprisingly difficult to obtain even a qualitative version of the above conclusion (namely, if {v} correlates with all of the {u^k_i}, then there are many pairs {(i,j)} for which {u^k_i} correlates with {u^k_j} for all {k} simultaneously) without some version of the tensor power trick. For instance, even the powerful Szemerédi regularity lemma, when applied to the set of pairs {i,j} for which one has correlation of {u^k_i}, {u^k_j} for a single {i,j}, does not seem to be sufficient. However, there is a reformulation of the argument using the Schur product theorem as a substitute for (or really, a disguised version of) the tensor power trick. For simplicity of notation let us just work with real Hilbert spaces to illustrate the argument. We start with the identity

\displaystyle  \langle u^k_i, u^k_j \rangle_H = \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H + \langle \pi(u^k_i), \pi(u^k_j) \rangle_H

where {\pi} is the orthogonal projection to the complement of {v}. This implies a Gram matrix inequality

\displaystyle  (\langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ 0

for each {k} where {A \succ B} denotes the claim that {A-B} is positive semi-definite. By the Schur product theorem, we conclude that

\displaystyle  (\prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\prod_{k=1}^K \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n}

and hence for a suitable choice of signs {\epsilon_1,\dots,\epsilon_n},

\displaystyle  \sum_{1 \leq i, j \leq n} \epsilon_i \epsilon_j \prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H \geq \delta_1^2 \dots \delta_K^2 n^2.

One now argues as in the proof of Corollary 2.

A separate application of tensor powers to amplify correlations was also noted in this previous blog post giving a cheap version of the Kabatjanskii-Levenstein bound, but this seems to not be directly related to this current application.

Let {n} be a natural number. We consider the question of how many “almost orthogonal” unit vectors {v_1,\ldots,v_m} one can place in the Euclidean space {{\bf R}^n}. Of course, if we insist on {v_1,\ldots,v_m} being exactly orthogonal, so that {\langle v_i,v_j \rangle = 0} for all distinct {i,j}, then we can only pack at most {n} unit vectors into this space. However, if one is willing to relax the orthogonality condition a little, so that {\langle v_i,v_j\rangle} is small rather than zero, then one can pack a lot more unit vectors into {{\bf R}^n}, due to the important fact that pairs of vectors in high dimensions are typically almost orthogonal to each other. For instance, if one chooses {v_i} uniformly and independently at random on the unit sphere, then a standard computation (based on viewing the {v_i} as gaussian vectors projected onto the unit sphere) shows that each inner product {\langle v_i,v_j \rangle} concentrates around the origin with standard deviation {O(1/\sqrt{n})} and with gaussian tails, and a simple application of the union bound then shows that for any fixed {K \geq 1}, one can pack {n^K} unit vectors into {{\bf R}^n} whose inner products are all of size {O( K^{1/2} n^{-1/2} \log^{1/2} n )}.

One can remove the logarithm by using some number theoretic constructions. For instance, if {n} is twice a prime {n=2p}, one can identify {{\bf R}^n} with the space {\ell^2({\bf F}_p)} of complex-valued functions {f: {\bf F}_p \rightarrow {\bf C}}, where {{\bf F}_p} is the field of {p} elements, and if one then considers the {p^2} different quadratic phases {x \mapsto \frac{1}{\sqrt{p}} e_p( ax^2 + bx )} for {a,b \in {\bf F}_p}, where {e_p(a) := e^{2\pi i a/p}} is the standard character on {{\bf F}_p}, then a standard application of Gauss sum estimates reveals that these {p^2} unit vectors in {{\bf R}^n} all have inner products of magnitude at most {p^{-1/2}} with each other. More generally, if we take {d \geq 1} and consider the {p^d} different polynomial phases {x \mapsto \frac{1}{\sqrt{p}} e_p( a_d x^d + \ldots + a_1 x )} for {a_1,\ldots,a_d \in {\bf F}_p}, then an application of the Weil conjectures for curves, proven by Weil, shows that the inner products of the associated {p^d} unit vectors with each other have magnitude at most {(d-1) p^{-1/2}}.

As it turns out, this construction is close to optimal, in that there is a polynomial limit to how many unit vectors one can pack into {{\bf R}^n} with an inner product of {O(1/\sqrt{n})}:

Theorem 1 (Cheap Kabatjanskii-Levenstein bound) Let {v_1,\ldots,v_m} be unit vector in {{\bf R}^n} such that {|\langle v_i, v_j \rangle| \leq A n^{-1/2}} for some {1/2 \leq A \leq \frac{1}{2} \sqrt{n}}. Then we have {m \leq (\frac{Cn}{A^2})^{C A^2}} for some absolute constant {C}.

In particular, for fixed {d} and large {p}, the number of unit vectors one can pack in {{\bf R}^{2p}} whose inner products all have magnitude at most {dp^{-1/2}} will be {O( p^{O(d^2)} )}. This doesn’t quite match the construction coming from the Weil conjectures, although it is worth noting that the upper bound of {(d-1)p^{-1/2}} for the inner product is usually not sharp (the inner product is actually {p^{-1/2}} times the sum of {d-1} unit phases which one expects (cf. the Sato-Tate conjecture) to be uniformly distributed on the unit circle, and so the typical inner product is actually closer to {(d-1)^{1/2} p^{-1/2}}).

Note that for {0 \leq A < 1/2}, the {A=1/2} case of the above theorem (or more precisely, Lemma 2 below) gives the bound {m=O(n)}, which is essentially optimal as the example of an orthonormal basis shows. For {A \geq \sqrt{n}}, the condition {|\langle v_i, v_j \rangle| \leq A n^{-1/2}} is trivially true from Cauchy-Schwarz, and {m} can be arbitrariy large. Finally, in the range {\frac{1}{2} \sqrt{n} \leq A \leq \sqrt{n}}, we can use a volume packing argument: we have {\|v_i-v_j\|^2 \geq 2 (1 - A n^{-1/2})}, so of we set {r := 2^{-1/2} (1-A n^{-1/2})^{1/2}}, then the open balls of radius {r} around each {v_i} are disjoint, while all lying in a ball of radius {O(1)}, giving rise to the bound {m \leq C^n (1-A n^{-1/2})^{-n/2}} for some absolute constant {C}.

As I learned recently from Philippe Michel, a more precise version of this theorem is due to Kabatjanskii and Levenstein, who studied the closely related problem of sphere packing (or more precisely, cap packing) in the unit sphere {S^{n-1}} of {{\bf R}^n}. However, I found a short proof of the above theorem which relies on one of my favorite tricks – the tensor power trick – so I thought I would give it here.

We begin with an easy case, basically the {A=1/2} case of the above theorem:

Lemma 2 Let {v_1,\ldots,v_m} be unit vectors in {{\bf R}^n} such that {|\langle v_i, v_j \rangle| \leq \frac{1}{2n^{1/2}}} for all distinct {i,j}. Then {m < 2n}.

Proof: Suppose for contradiction that {m \geq 2n}. We consider the {2n \times 2n} Gram matrix {( \langle v_i,v_j \rangle )_{1 \leq i,j \leq 2n}}. This matrix is real symmetric with rank at most {n}, thus if one subtracts off the identity matrix, it has an eigenvalue of {-1} with multiplicity at least {n}. Taking Hilbert-Schmidt norms, we conclude that

\displaystyle \sum_{1 \leq i,j \leq 2n: i \neq j} |\langle v_i, v_j \rangle|^2 \geq n.

But by hypothesis, the left-hand side is at most {2n(2n-1) \frac{1}{4n} = n-\frac{1}{2}}, giving the desired contradiction. \Box

To amplify the above lemma to cover larger values of {A}, we apply the tensor power trick. A direct application of the tensor power trick does not gain very much; however one can do a lot better by using the symmetric tensor power rather than the raw tensor power. This gives

Corollary 3 Let {k} be a natural number, and let {v_1,\ldots,v_m} be unit vectors in {{\bf R}^n} such that {|\langle v_i, v_j \rangle| \leq 2^{-1/k} (\binom{n+k-1}{k})^{-1/2k}} for all distinct {i,j}. Then {m < 2\binom{n+k-1}{k}}.

Proof: We work in the symmetric component {\hbox{Sym}^k {\bf R}^n} of the tensor power {({\bf R}^n)^{\otimes k} \equiv {\bf R}^{n^k}}, which has dimension {\binom{n+k-1}{k}}. Applying the previous lemma to the tensor powers {v_1^{\otimes k},\ldots,v_m^{\otimes k}}, we obtain the claim. \Box

Using the trivial bound {e^k \geq \frac{k^k}{k!}}, we can lower bound

\displaystyle 2^{-1/k} (\binom{n+k-1}{k})^{-1/2k} \geq 2^{-1/k} (n+k-1)^{-1/2} (k!)^{1/2k}

\displaystyle \geq 2^{-1/k} e^{-1/2} k^{1/2} (n+k-1)^{-1/2} .

We can thus prove Theorem 1 by setting {k := \lfloor C A^2 \rfloor} for some sufficiently large absolute constant {C}.

As many readers may already know, my good friend and fellow mathematical blogger Tim Gowers, having wrapped up work on the Princeton Companion to Mathematics (which I believe is now in press), has begun another mathematical initiative, namely a “Tricks Wiki” to act as a repository for mathematical tricks and techniques.    Tim has already started the ball rolling with several seed articles on his own blog, and asked me to also contribute some articles.  (As I understand it, these articles will be migrated to the Wiki in a few months, once it is fully set up, and then they will evolve with edits and contributions by anyone who wishes to pitch in, in the spirit of Wikipedia; in particular, articles are not intended to be permanently authored or signed by any single contributor.)

So today I’d like to start by extracting some material from an old post of mine on “Amplification, arbitrage, and the tensor power trick” (as well as from some of the comments), and converting it to the Tricks Wiki format, while also taking the opportunity to add a few more examples.

Title: The tensor power trick

Quick description: If one wants to prove an inequality X \leq Y for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality X \leq CY that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers.  If all goes well, one can show that X^M \leq C Y^M for all M \geq 1, with a constant C which is independent of M, which implies that X \leq Y as desired by taking M^{th} roots and then taking limits as M \to \infty.

Read the rest of this entry »