Let be a natural number. We consider the question of how many “almost orthogonal” unit vectors
one can place in the Euclidean space
. Of course, if we insist on
being exactly orthogonal, so that
for all distinct
, then we can only pack at most
unit vectors into this space. However, if one is willing to relax the orthogonality condition a little, so that
is small rather than zero, then one can pack a lot more unit vectors into
, due to the important fact that pairs of vectors in high dimensions are typically almost orthogonal to each other. For instance, if one chooses
uniformly and independently at random on the unit sphere, then a standard computation (based on viewing the
as gaussian vectors projected onto the unit sphere) shows that each inner product
concentrates around the origin with standard deviation
and with gaussian tails, and a simple application of the union bound then shows that for any fixed
, one can pack
unit vectors into
whose inner products are all of size
.
One can remove the logarithm by using some number theoretic constructions. For instance, if is twice a prime
, one can identify
with the space
of complex-valued functions
, where
is the field of
elements, and if one then considers the
different quadratic phases
for
, where
is the standard character on
, then a standard application of Gauss sum estimates reveals that these
unit vectors in
all have inner products of magnitude at most
with each other. More generally, if we take
and consider the
different polynomial phases
for
, then an application of the Weil conjectures for curves, proven by Weil, shows that the inner products of the associated
unit vectors with each other have magnitude at most
.
As it turns out, this construction is close to optimal, in that there is a polynomial limit to how many unit vectors one can pack into with an inner product of
:
Theorem 1 (Cheap Kabatjanskii-Levenstein bound) Let
be unit vector in
such that
for some
. Then we have
for some absolute constant
.
In particular, for fixed and large
, the number of unit vectors one can pack in
whose inner products all have magnitude at most
will be
. This doesn’t quite match the construction coming from the Weil conjectures, although it is worth noting that the upper bound of
for the inner product is usually not sharp (the inner product is actually
times the sum of
unit phases which one expects (cf. the Sato-Tate conjecture) to be uniformly distributed on the unit circle, and so the typical inner product is actually closer to
).
Note that for , the
case of the above theorem (or more precisely, Lemma 2 below) gives the bound
, which is essentially optimal as the example of an orthonormal basis shows. For
, the condition
is trivially true from Cauchy-Schwarz, and
can be arbitrariy large. Finally, in the range
, we can use a volume packing argument: we have
, so of we set
, then the open balls of radius
around each
are disjoint, while all lying in a ball of radius
, giving rise to the bound
for some absolute constant
.
As I learned recently from Philippe Michel, a more precise version of this theorem is due to Kabatjanskii and Levenstein, who studied the closely related problem of sphere packing (or more precisely, cap packing) in the unit sphere of
. However, I found a short proof of the above theorem which relies on one of my favorite tricks – the tensor power trick – so I thought I would give it here.
We begin with an easy case, basically the case of the above theorem:
Lemma 2 Let
be unit vectors in
such that
for all distinct
. Then
.
Proof: Suppose for contradiction that . We consider the
Gram matrix
. This matrix is real symmetric with rank at most
, thus if one subtracts off the identity matrix, it has an eigenvalue of
with multiplicity at least
. Taking Hilbert-Schmidt norms, we conclude that
But by hypothesis, the left-hand side is at most , giving the desired contradiction.
To amplify the above lemma to cover larger values of , we apply the tensor power trick. A direct application of the tensor power trick does not gain very much; however one can do a lot better by using the symmetric tensor power rather than the raw tensor power. This gives
Corollary 3 Let
be a natural number, and let
be unit vectors in
such that
for all distinct
. Then
.
Proof: We work in the symmetric component of the tensor power
, which has dimension
. Applying the previous lemma to the tensor powers
, we obtain the claim.
Using the trivial bound , we can lower bound
We can thus prove Theorem 1 by setting for some sufficiently large absolute constant
.
26 comments
Comments feed for this article
18 July, 2013 at 2:34 pm
frankpmurphyh
Reblogged this on algebrafm.
18 July, 2013 at 7:13 pm
John Mangual
Didn’t you build
independent vectors with inner product less than
? Then why is your bound
in Lemma 2 ?
I am skeptical as we had drilled into our heads that
had exactly
basis vectors.
18 July, 2013 at 7:23 pm
Terence Tao
Lemma 2 concerns vectors whose inner product is less than
in magnitude, which in the case
would be
. This is less than the
bound obeyed by the example of the quadratic phase functions. The factor of
makes a big difference!
This is one of the initially unintuitive aspects of high-dimensional geometry; perfect orthogonality is difficult to achieve (and, as you say, one can only pack
perfectly orthogonal vectors into
), but almost orthogonality becomes increasingly easy to achieve as one loosens the definition of almost orthogonality, basically because with the exponentially large amount of “room” available in a high-dimensional space. With
, one can pack
vectors together with an inner product of at most
, or
vectors together with an inner product of at most
, and so forth. Of course, these vectors cannot be linearly independent, but it is possible in high dimensions for a family of vectors to be simultaneously linearly dependent and almost orthogonal, as the latter only guarantees a “local” form of linear independence. (Among other things, this is one of the reasons why compressed sensing can work.)
19 July, 2013 at 12:26 am
Mikhail Katz
I wonder if this ties in with Gil Kalai’s negative solution of the Borsuk problem (coloring pointsets in Euclidean space).
19 July, 2013 at 6:43 am
A.
Sorry for the stupid question. I haven’t seen the symmetric component before. Is the scalar product of tensor powers in the symmetric component the same as in the tensor product?
19 July, 2013 at 6:58 am
A.
Ok, I got it now.
19 July, 2013 at 10:44 am
valuevar
This is quite nice! A few years ago, Akshay wondered whether there was a bound of this quality with a simple proof (but didn’t get anything).
This means that the number of points one can pack on an
-dimensional sphere at angles
from each other is at most
, where
. Can the
be removed?
19 July, 2013 at 2:06 pm
Terence Tao
For very small
, comparable to
, one can’t remove the logarithm, because of the number-theoretic examples, e.g.
vectors all at angle at least
from each other. But one may be able to do better for larger values of
; when
the upper bound is of shape
or so but the number-theoretic construction only gives
vectors (this is something to do with the Sato-Tate type fluctuations in the Frobenius eigenvalues in the number theoretic constructions, and so may represent the limit of the number-theoretic method).
20 July, 2013 at 5:08 pm
The Riemann hypothesis in various settings | What's new
[…] To enter in LaTeX in comments, use
(without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy. « A cheap version of the Kabatjanskii-Levenstein bound for almost orthogonal vectors […]
21 July, 2013 at 1:27 am
Gil Kalai
A few (a bit rusty) remarks and thoughts on the post. We start with a nice symmetric space
. In our case
is an
-dimensional sphere, and other important cases are when
(Hamming scheme) , and
(Johnson scheme). (Association schemes is a useful general setting, and Gelfand pairs may serve as another nice general relevant setting.) The questions we can ask are:
1) What is the maximum number of points of a set Y in X of vectors with the property that the distance between every two points in Y is at least α?
2) What is the maximum measure for a (measurable) set Y where the distance between every two points is at most α?
3) What is the maximum measure for a (measurable) set
where the distance between every two points is different than α?
4) What is the maximum ratio
where the distance between every two points in Y is at least α, and between two points in X is greater than α.
5) What is the maximum ratio between the measures
where the distance between every two points in Y is at least α, and between two points in X is greater than α?
The object for question 1) are called codes. (Error-correcting codes for
and spherical codes for
.) Question 2 is an isoperimetric question. Question 3 is related to the Frankl-Wilson theorem where the polynomial methods is used to derive (close to optimal) bounds for the {0,1}^n case. The Frankl-Rodl paper gives an amazing method to move from question 2 to question 3. Problem 5 is related to the Borsuk’s problem that Jeff Kahn and I disproved. A special case of Problem 4 is the Erdos-Faber-Lovasz conjecture. A natural construction for question 3 is to take open caps of radius β/2 around the codewords of an optimal spherical code of minimal distance α+2β for some β. (Maybe it is even always the best to take α+2β=π so the code is two antipodal points.)
The very basic upper bounds for codes is the volume bound. There is a simple improvement based on intersecting our code with a small spherical strip (for
) or with all
vectors of fixed weight (for the Hamming scheme) and apply the volume bound for the intersection. (I am not quite sure if this already gives the quantitative results in this post.) The most advanced method is Delsartes linear-programming method which is used by Kabatjanskii and Levenstein, to give the best known results for spherical codes. Improving asymptotically these bounds for spherical codes and the bounds by McELIECE, RODEMICH, RUMSEY, and WELCH for error correcting codes are outstanding open problems with no progress over three decades. Of course the question of spherical codes is closely related to the classical problem of the densest sphere packing in Euclidean spaces.
The tensor power/symmetric power trick is indeed very nice and it is curious that here we have to insist on symmetric powers. A similar trick is used to move from problem 3 to 5 in the case of Borsuk’s problem, and it also played a role in the disproof by Khot and Vishnoi of the Goemans-Linial conjecture. A propos this, I am curious if symmetric powers of sets of constant width have constant width but this is a long shot.
21 July, 2013 at 1:50 am
Gil Kalai
Let me state more carefully problems 4 and 5.
4) What is the minimum over subsets
of
of the maximum over subsets
of
of the ratio
, where the distance between every two points in
is at least α, and between two points in
is greater than α.
5) What is the minimum over
of the maximum over
of the ratio between the measures
where the distance between every two points in Y is at least α, and between two points in Z is greater than α?
22 July, 2013 at 2:57 am
Gil Kalai
Sorry, 5) still needs to be corrected
5) What is the minimum over choices of
of the maximum over
of the ratio between the measures
where the distance between every two points in Y is at most α, and between every two points in Z is less than α?
22 July, 2013 at 7:39 am
konradswanepoel
This proof seems to be closely related to results of of Noga Alon on matrices close to the identity. See Section 9 of his paper
Problems and results in extremal combinatorics, I, Discrete Math. 273 (2003), 31-53
Click to access extremal1.pdf
or his paper
Perturbed identity matrices have high rank: proof and applications, Combinatorics, Probability and Computing 18 (2009), 3-15
Click to access identity1.pdf
22 July, 2013 at 9:22 am
Terence Tao
Thanks for the reference! This isn’t the first time I managed to rediscover some argument or trick that Noga had figured out years ago… :-) (I once gave a talk mentioning a result I had proven recently and was quite proud of, and within a minute Noga (who was in the audience) asked, “Did you use [technique X] to reduce the problem to [simpler problem Y]?”. Of course, this is what I had done, and was the heart of the argument.)
I also see that Noga has a discussion of Harald’s question of removing the logarithm factor (when
is fixed rather than decaying in
), but it seems that this is closely related to the notorious problem of removing the logarithm in the Johnson-Lindenstrauss lemma, so is likely to be very difficult (though also very interesting).
22 July, 2013 at 9:52 am
Emmanuel Kowalski
This is very nice indeed!
Actually, I think that Kabatjanski-Levenstein never explicitly address the regime where the inner product is O(1/sqrt(n)); in my paper with Phllippe and Fouvry where we used such ideas to give upper bounds on the number of sheaves with bounded complexity on a curve ( http://www.math.ethz.ch/~kowalski/counting-sheaves.pdf ) we ended proving such a bound using their method and a lemma on zeros of Hermite polynomials.
The question of optimizing the lower bounds for the regime O(1/sqrt(n)) using as many sheaves as possible is very interesting, and related to conjectures of Drinfeld, Deligne and others. We haven’t yet succeeded in getting a serious improvement on counting merely rank one sheaves.
26 July, 2013 at 5:22 am
A cheap version of the Kabatjanskii-Levenstein bound for almost orthogonal vectors | Fashion
[…] Read more… […]
8 November, 2013 at 10:59 pm
Mustafa Said
Some of the ideas and results in this blog may be used to study “almost normal matrices.” If we let
be the matrix with “almost orthogonal vectors,” then
will be small. Moreover, one can show that there exists a normal matrix
and a universal constant,
such that $||N-A|| \leq K ||AA^* – A^*A||$. In other words, “almost normal matrices” are near normal approximates.
6 February, 2014 at 2:50 am
gowers
I don’t understand your remark that the bound when A=1 is O(n). It seems to be
for an absolute constant C.
I’m asking because I’m interested to know what the correct bound is when A is something like 1/2. Do you have any information about this?
6 February, 2014 at 9:08 am
Terence Tao
Oops, you’re right; the argument above only gives a linear bound for A=1/2 rather than A=1 (Lemma 2). I’ve reworded the text accordingly.
6 February, 2014 at 10:41 am
gowers
Oops — I missed that what I wanted to know about was dealt with in Lemma 2.
3 March, 2014 at 3:08 pm
The Rank Lemma | Geometry and Combinatorics
[…] Before giving a proof of Lemma 1, I present the following application, which is a slight strengthening of Lemma 2 of this post by Terence Tao. […]
5 June, 2014 at 9:55 am
When is correlation transitive? | What's new
[…] the exceptional pairs that are even weakly correlated to each other become exponentially rare. (See this previous blog post for some related discussion; in particular Lemma 2 from that post is closely related to the van der […]
1 September, 2015 at 6:39 pm
Anonymous
Is this post available in any of your books, I just checked some of them but I could not find anyone which includes with this post?
[The posts from 2012 onward have not been published in book format; I may do so at some later point in time, though. -T]
27 February, 2021 at 12:22 pm
Boosting the van der Corput inequality using the tensor power trick | What's new
[…] separate application of tensor powers to amplify correlations was also noted in this previous blog post giving a cheap version of the Kabatjanskii-Levenstein bound, but this seems to not be directly […]
24 March, 2023 at 11:30 am
Brent Werness
Hi, I know this post is a decade old, but I was just implementing these vectors as a practical alternative to random generation (they work significantly better in some regimes) and I’m a bit perplexed by the ones depending on the Weil Conjecture. As a specific example, if you set
and
, there are off-diagonal elements which are exactly one by Fermat’s little theorem for instance
.
Even if you are not in this extremely degenerate case, I’m still getting inner products off diagonal which are much larger than could be explained by machine error (for instance for
,
where I find off-diagonal terms of size 0.54 vs a bound of 0.43).
This is in sharp contrast to the
case, where the bound can be established by much more elementary means, and appear to always hold up to machine precision. Short python code can be provided if interested.
29 March, 2023 at 4:40 am
Terence Tao
The Weil bound is
, which is consistent with both of the examples you provided (for instance, when
and
the bound is
, not 0.43 as you write). It is rare though that the Weil bound would be attained with equality.