You are currently browsing the category archive for the ‘math.MG’ category.

We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map from an open subset of the complex plane to another open set is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of to be nowhere vanishing. We then say that the two open sets are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation we see that they almost preserve small circles, indeed for small the circle will approximately map to .

In previous quarters, we proved a fundamental theorem about this concept, the Riemann mapping theorem:

Theorem 1 (Riemann mapping theorem)Let be a simply connected open subset of that is not all of . Then is conformally equivalent to the unit disk .

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps from to that map some fixed point to , pick one that maximises the magnitude of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If avoids some point in , one can compose with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase without destroying injectivity; see the previous lecture notes for details. The conformal map is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points in , and requiring to be zero and to be positive real.

It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to *prove* the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).

To make the above discussion more precise we need some notation.

Definition 2 (Circle packing)A (finite)circle packingis a finite collection of circles in the complex numbers indexed by some finite set , whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. Thenerveof a circle packing is the finite graph whose vertices are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.

The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem)Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem)If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5Let be a connected planar graph with vertices. Show that the following are equivalent:

- (i) is a maximal planar graph.
- (ii) has edges.
- (iii) Every drawing of divides the plane into faces that have three edges each. (This includes one unbounded face.)
- (iv) At least one drawing of divides the plane into faces that have three edges each.
(

Hint:use Euler’s formula , where is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture)Let be a simply connected domain, with two distinct points . Let be the conformal map from to that maps to the origin and to a positive real. For any small , let be the portion of the regular hexagonal circle packing by circles of radius that are contained in , and let be an circle packing of with all “boundary circles” tangent to , giving rise to an “approximate map” defined on the subset of consisting of the circles of , their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that is zero and is a positive real. Then converges to as .

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation , while (sufficiently smooth) quasiconformal maps only obey an inequality . As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the *conformal modulus* (or, equivalently, the extremal length, which is the reciprocal of the modulus).

The Polymath14 online collaboration has uploaded to the arXiv its paper “Homogeneous length functions on groups“, submitted to Algebra & Number Theory. The paper completely classifies *homogeneous length functions* on an arbitrary group , that is to say non-negative functions that obey the symmetry condition , the non-degeneracy condition , the triangle inequality , and the homogeneity condition . It turns out that these norms can only arise from pulling back the norm of a Banach space by an isometric embedding of the group. Among other things, this shows that can only support a homogeneous length function if and only if it is abelian and torsion free, thus giving a metric description of this property.

The proof is based on repeated use of the homogeneous length function axioms, combined with elementary identities of commutators, to obtain increasingly good bounds on quantities such as , until one can show that such norms have to vanish. See the previous post for a full proof. The result is robust in that it allows for some loss in the triangle inequality and homogeneity condition, allowing for some new results on “quasinorms” on groups that relate to quasihomomorphisms.

As there are now a large number of comments on the previous post on this project, this post will also serve as the new thread for any final discussion of this project as it winds down.

In the tradition of “Polymath projects“, the problem posed in the previous two blog posts has now been solved, thanks to the cumulative effect of many small contributions by many participants (including, but not limited to, Sean Eberhard, Tobias Fritz, Siddharta Gadgil, Tobias Hartnick, Chris Jerdonek, Apoorva Khare, Antonio Machiavelo, Pace Nielsen, Andy Putman, Will Sawin, Alexander Shamov, Lior Silberman, and David Speyer). In this post I’ll write down a streamlined resolution, eliding a number of important but ultimately removable partial steps and insights made by the above contributors en route to the solution.

Theorem 1Let be a group. Suppose one has a “seminorm” function which obeys the triangle inequalityfor all , with equality whenever . Then the seminorm factors through the abelianisation map .

*Proof:* By the triangle inequality, it suffices to show that for all , where is the commutator.

We first establish some basic facts. Firstly, by hypothesis we have , and hence whenever is a power of two. On the other hand, by the triangle inequality we have for all positive , and hence by the triangle inequality again we also have the matching lower bound, thus

for all . The claim is also true for (apply the preceding bound with and ). By replacing with if necessary we may now also assume without loss of generality that , thus

Next, for any , and any natural number , we have

so on taking limits as we have . Replacing by gives the matching lower bound, thus we have the conjugation invariance

Next, we observe that if are such that is conjugate to both and , then one has the inequality

Indeed, if we write for some , then for any natural number one has

where the and terms each appear times. From (2) we see that conjugation by does not affect the norm. Using this and the triangle inequality several times, we conclude that

and the claim (3) follows by sending .

The following special case of (3) will be of particular interest. Let , and for any integers , define the quantity

Observe that is conjugate to both and to , hence by (3) one has

which by (2) leads to the recursive inequality

We can write this in probabilistic notation as

where is a random vector that takes the values and with probability each. Iterating this, we conclude in particular that for any large natural number , one has

where and are iid copies of . We can write where are iid signs. By the triangle inequality, we thus have

noting that is an even integer. On the other hand, has mean zero and variance , hence by Cauchy-Schwarz

But by (1), the left-hand side is equal to . Dividing by and then sending , we obtain the claim.

The above theorem reduces such seminorms to abelian groups. It is easy to see from (1) that any torsion element of such groups has zero seminorm, so we can in fact restrict to torsion-free groups, which we now write using additive notation , thus for instance for . We think of as a -module. One can then extend the seminorm to the associated -vector space by the formula , and then to the associated -vector space by continuity, at which point it becomes a genuine seminorm (provided we have ensured the symmetry condition ). Conversely, any seminorm on induces a seminorm on . (These arguments also appear in this paper of Khare and Rajaratnam.)

This post is a continuation of the previous post, which has attracted a large number of comments. I’m recording here some calculations that arose from those comments (particularly those of Pace Nielsen, Lior Silberman, Tobias Fritz, and Apoorva Khare). Please feel free to either continue these calculations or to discuss other approaches to the problem, such as those mentioned in the remaining comments to the previous post.

Let be the free group on two generators , and let be a quantity obeying the triangle inequality

and the linear growth property

for all and integers ; this implies the conjugation invariance

or equivalently

We consider inequalities of the form

for various real numbers . For instance, since

we have (1) for . We also have the following further relations:

Proposition 1

*Proof:* For (i) we simply observe that

For (ii), we calculate

giving the claim.

For (iii), we calculate

giving the claim.

For (iv), we calculate

giving the claim.

Here is a typical application of the above estimates. If (1) holds for , then by part (i) it holds for , then by (ii) (2) holds for , then by (iv) (1) holds for . The map has fixed point , thus

For instance, if , then .

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let be the free group on two generators . Does there exist a metric on this group which is

- bi-invariant, thus for all ; and
- linear growth in the sense that for all and all natural numbers ?

By defining the “norm” of an element to be , an equivalent formulation of the problem asks if there exists a non-negative norm function that obeys the conjugation invariance

for all , the triangle inequality

for all , and the linear growth

for all and , and such that for all non-identity . Indeed, if such a norm exists then one can just take to give the desired metric.

One can normalise the norm of the generators to be at most , thus

This can then be used to upper bound the norm of other words in . For instance, from (1), (3) one has

A bit less trivially, from (3), (2), (1) one can bound commutators as

In a similar spirit one has

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm of a given non-trivial group element to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

I’ve just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds“, submitted to Discrete and Continuous Dynamical Systems. This is a variant of my recent paper on the universality of potential well dynamics, but instead of trying to embed dynamical systems into a potential well , here we try to embed dynamical systems into the incompressible Euler equations

on a Riemannian manifold . (One is particularly interested in the case of flat manifolds , particularly or , but for the main result of this paper it is essential that one is permitted to consider curved manifolds.) This system, first studied by Ebin and Marsden, is the natural generalisation of the usual incompressible Euler equations to curved space; it can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on (see this previous post for a discussion of this in the flat space case).

The Euler equations can be viewed as a nonlinear equation in which the nonlinearity is a quadratic function of the velocity field . It is thus natural to compare the Euler equations with quadratic ODE of the form

where is the unknown solution, and is a bilinear map, which we may assume without loss of generality to be symmetric. One can ask whether such an ODE may be linearly embedded into the Euler equations on some Riemannian manifold , which means that there is an injective linear map from to smooth vector fields on , as well as a bilinear map to smooth scalar fields on , such that the map takes solutions to (2) to solutions to (1), or equivalently that

for all .

For simplicity let us restrict to be compact. There is an obvious necessary condition for this embeddability to occur, which comes from energy conservation law for the Euler equations; unpacking everything, this implies that the bilinear form in (2) has to obey a cancellation condition

for some positive definite inner product on . The main result of the paper is the converse to this statement: if is a symmetric bilinear form obeying a cancellation condition (3), then it is possible to embed the equations (2) into the Euler equations (1) on some Riemannian manifold ; the catch is that this manifold will depend on the form and on the dimension (in fact in the construction I have, is given explicitly as , with a funny metric on it that depends on ).

As a consequence, any finite dimensional portion of the usual “dyadic shell models” used as simplified toy models of the Euler equation, can actually be embedded into a genuine Euler equation, albeit on a high-dimensional and curved manifold. This includes portions of the self-similar “machine” I used in a previous paper to establish finite time blowup for an averaged version of the Navier-Stokes (or Euler) equations. Unfortunately, the result in this paper does not apply to infinite-dimensional ODE, so I cannot yet establish finite time blowup for the Euler equations on a (well-chosen) manifold. It does not seem so far beyond the realm of possibility, though, that this could be done in the relatively near future. In particular, the result here suggests that one could construct something resembling a universal Turing machine within an Euler flow on a manifold, which was one ingredient I would need to engineer such a finite time blowup.

The proof of the main theorem proceeds by an “elimination of variables” strategy that was used in some of my previous papers in this area, though in this particular case the Nash embedding theorem (or variants thereof) are not required. The first step is to lessen the dependence on the metric by partially reformulating the Euler equations (1) in terms of the covelocity (which is a -form) instead of the velocity . Using the freedom to modify the dimension of the underlying manifold , one can also decouple the metric from the volume form that is used to obtain the divergence-free condition. At this point the metric can be eliminated, with a certain positive definiteness condition between the velocity and covelocity taking its place. After a substantial amount of trial and error (motivated by some “two-and-a-half-dimensional” reductions of the three-dimensional Euler equations, and also by playing around with a number of variants of the classic “separation of variables” strategy), I eventually found an ansatz for the velocity and covelocity that automatically solved most of the components of the Euler equations (as well as most of the positive definiteness requirements), as long as one could find a number of scalar fields that obeyed a certain nonlinear system of transport equations, and also obeyed a positive definiteness condition. Here I was stuck for a bit because the system I ended up with was overdetermined – more equations than unknowns. After trying a number of special cases I eventually found a solution to the transport system on the sphere, except that the scalar functions sometimes degenerated and so the positive definiteness property I wanted was only obeyed with positive semi-definiteness. I tried for some time to perturb this example into a strictly positive definite solution before eventually working out that this was not possible. Finally I had the brainwave to lift the solution from the sphere to an even more symmetric space, and this quickly led to the final solution of the problem, using the special orthogonal group rather than the sphere as the underlying domain. The solution ended up being rather simple in form, but it is still somewhat miraculous to me that it exists at all; in retrospect, given the overdetermined nature of the problem, relying on a large amount of symmetry to cut down the number of equations was basically the only hope.

The wave equation is usually expressed in the form

where is a function of both time and space , with being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain with some other manifold and replacing the Laplacian with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on . We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

which we can rewrite using integration by parts and the inner product on as

A key feature of the wave equation is *finite speed of propagation*: if, at time (say), the initial position and initial velocity are both supported in a ball , then at any later time , the position and velocity are supported in the larger ball . This can be seen for instance (formally, at least) by inspecting the exterior energy

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time .

The wave equation is second order in time, but one can turn it into a first order system by working with the pair rather than just the single field , where is the velocity field. The system is then

and the conserved energy is now

Finite speed of propagation then tells us that if are both supported on , then are supported on for all . One also has time reversal symmetry: if is a solution, then is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times using this symmetry.

If one has an eigenfunction

of the Laplacian, then we have the explicit solutions

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity , the solution is given via functional calculus by

and the propagator can be expressed as the average of half-wave operators:

One can view as a minor of the full wave propagator

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

Viewing the contraction as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable now lives on the integers rather than on , and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

where is now an integer, take values in some Hilbert space (e.g. functions on a graph ), and is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

where is a small parameter (representing the discretised time step), now takes values in the integer multiples of , and is the wave propagator operator or the heat propagator (the two operators are different, but agree to fourth order in ). One can then formally verify that the wave equation emerges from this rescaled system in the limit . (Thus, is not exactly the direct analogue of the Laplacian , but can be viewed as something like in the case of small , or if we are not rescaling to the small case. The operator is sometimes known as the *diffusion operator*)

Assuming is self-adjoint, solutions to the system (3) formally conserve the energy

This energy is positive semi-definite if is a contraction. We have the same time reversal symmetry as before: if solves the system (3), then so does . If one has an eigenfunction

to the operator , then one has an explicit solution

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field by one unit. Suppose we know that has unit speed in the sense that whenever is supported in a ball , then is supported in the ball . Then an easy induction shows that if are supported in respectively, then are supported in .

The fundamental solution to the discretised wave equation (3), in the sense of (2), is given by the formula

where and are the Chebyshev polynomials of the first and second kind, thus

and

In particular, is now a minor of , and can also be viewed as an average of with its inverse :

As before, is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers and are discrete analogues of the heat propagators and wave propagators respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality)Let be a (possibly infinite) regular graph, let , and let be vertices in . Then the probability that the simple random walk at lands at at time is at most , where is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers . Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length concentrate on the ball of radius or so centred at the origin of the random walk.

*Proof:* Let be the graph Laplacian, thus

for any , where is the degree of the regular graph and sum is over the vertices that are adjacent to . This is a contraction of unit speed, and the probability that the random walk at lands at at time is

where are the Dirac deltas at . Using (5), we can rewrite this as

where we are now using the energy form (4). We can write

where is the simple random walk of length on the integers, that is to say where are independent uniform Bernoulli signs. Thus we wish to show that

By finite speed of propagation, the inner product here vanishes if . For we can use Cauchy-Schwarz and the unitary nature of to bound the inner product by . Thus the left-hand side may be upper bounded by

and the claim now follows from the Chernoff inequality.

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality)Let , and let be supported on compact sets respectively. Thenwhere is the Euclidean distance between and .

*Proof:* By Fourier inversion one has

for any real , and thus

By finite speed of propagation, the inner product vanishes when ; otherwise, we can use Cauchy-Schwarz and the contractive nature of to bound this inner product by . Thus

Bounding by , we obtain the claim.

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than .

A core foundation of the subject now known as arithmetic combinatorics (and particularly the subfield of *additive combinatorics*) are the elementary sum set estimates (sometimes known as “Ruzsa calculus”) that relate the cardinality of various sum sets

and difference sets

as well as iterated sumsets such as , , and so forth. Here, are finite non-empty subsets of some additive group (classically one took or , but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:

Lemma 1 (Ruzsa covering lemma)Let be finite non-empty subsets of . Then may be covered by at most translates of .

*Proof:* Consider a maximal set of disjoint translates of by elements . These translates have cardinality , are disjoint, and lie in , so there are at most of them. By maximality, for any , must intersect at least one of the selected , thus , and the claim follows.

Lemma 2 (Ruzsa triangle inequality)Let be finite non-empty subsets of . Then .

*Proof:* Consider the addition map from to . Every element of has a preimage of this map of cardinality at least , thanks to the obvious identity for each . Since has cardinality , the claim follows.

Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets of small doubling, in the sense that for some bounded . (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in .)

However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. or a torus ) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group of orthogonal matrices, and let be the matrices obtained by starting with all of the orthogonal matrice in and rounding each coefficient of each matrix in this set to the nearest multiple of , for some small . This forms a finite set (whose cardinality grows as like a certain negative power of ). In the limit , the set is not a set of small doubling in the discrete sense. However, is still close to in a metric sense, being contained in the -neighbourhood of . Another key example comes from graphs of maps from a subset of one additive group to another . If is “approximately additive” in the sense that for all , is close to in some metric, then might not have small doubling in the discrete sense (because could take a large number of values), but could be considered a set of small doubling in a discretised sense.

One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric on the underlying group (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:

Definition 3Let be a metric space, let be a subset of , and let be a radius.

- The
packing numberis the largest number of points one can pack inside such that the balls are disjoint.- The
internal covering numberis the fewest number of points such that the balls cover .- The
external covering numberis the fewest number of points such that the balls cover .- The
metric entropyis the largest number of points one can find in that are -separated, thus for all .

It is an easy exercise to verify the inequalities

for any , and that is non-increasing in and non-decreasing in for the three choices (but monotonicity in can fail for !). It turns out that the external covering number is slightly more convenient than the other notions of metric entropy, so we will abbreviate . The cardinality can be viewed as the limit of the entropies as .

If we have the bounded doubling property that is covered by translates of for each , and one has a Haar measure on which assigns a positive finite mass to each ball, then any of the above entropies is comparable to , as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter by some absolute constant. Such losses can be acceptable in applications in which the underlying sets are largely “transverse” to the balls , so that the -entropy of is largely independent of ; this is a situation which arises in particular in the case of graphs discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)

Henceforth we work in an additive group equipped with a translation-invariant metric . (One can also generalise things slightly by allowing the metric to attain the values or , without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set will have finite entropy for any . We now have analogues of the two basic Ruzsa lemmas above:

Lemma 4 (Ruzsa covering lemma)Let be precompact non-empty subsets of , and let . Then may be covered by at most translates of .

*Proof:* Let be a maximal set of points such that the sets are all disjoint. Then the sets are disjoint in and have entropy , and furthermore any ball of radius can intersect at most one of the . We conclude that , so . If , then must intersect one of the , so , and the claim follows.

Lemma 5 (Ruzsa triangle inequality)Let be precompact non-empty subsets of , and let . Then .

*Proof:* Consider the addition map from to . The domain may be covered by product balls . Every element of has a preimage of this map which projects to a translate of , and thus must meet at least of these product balls. However, if two elements of are separated by a distance of at least , then no product ball can intersect both preimages. We thus see that , and the claim follows.

Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)

It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.

Let be a natural number. We consider the question of how many “almost orthogonal” unit vectors one can place in the Euclidean space . Of course, if we insist on being *exactly* orthogonal, so that for all distinct , then we can only pack at most unit vectors into this space. However, if one is willing to relax the orthogonality condition a little, so that is small rather than zero, then one can pack a lot more unit vectors into , due to the important fact that pairs of vectors in high dimensions are typically almost orthogonal to each other. For instance, if one chooses uniformly and independently at random on the unit sphere, then a standard computation (based on viewing the as gaussian vectors projected onto the unit sphere) shows that each inner product concentrates around the origin with standard deviation and with gaussian tails, and a simple application of the union bound then shows that for any fixed , one can pack unit vectors into whose inner products are all of size .

One can remove the logarithm by using some number theoretic constructions. For instance, if is twice a prime , one can identify with the space of complex-valued functions , whee is the field of elements, and if one then considers the different quadratic phases for , where is the standard character on , then a standard application of Gauss sum estimates reveals that these unit vectors in all have inner products of magnitude at most with each other. More generally, if we take and consider the different polynomial phases for , then an application of the Weil conjectures for curves, proven by Weil, shows that the inner products of the associated unit vectors with each other have magnitude at most .

As it turns out, this construction is close to optimal, in that there is a polynomial limit to how many unit vectors one can pack into with an inner product of :

Theorem 1 (Cheap Kabatjanskii-Levenstein bound)Let be unit vector in such that for some . Then we have for some absolute constant .

In particular, for fixed and large , the number of unit vectors one can pack in whose inner products all have magnitude at most will be . This doesn’t quite match the construction coming from the Weil conjectures, although it is worth noting that the upper bound of for the inner product is usually not sharp (the inner product is actually times the sum of unit phases which one expects (cf. the Sato-Tate conjecture) to be uniformly distributed on the unit circle, and so the typical inner product is actually closer to ).

Note that for , the case of the above theorem (or more precisely, Lemma 2 below) gives the bound , which is essentially optimal as the example of an orthonormal basis shows. For , the condition is trivially true from Cauchy-Schwarz, and can be arbitrariy large. Finally, in the range , we can use a volume packing argument: we have , so of we set , then the open balls of radius around each are disjoint, while all lying in a ball of radius , giving rise to the bound for some absolute constant .

As I learned recently from Philippe Michel, a more precise version of this theorem is due to Kabatjanskii and Levenstein, who studied the closely related problem of sphere packing (or more precisely, cap packing) in the unit sphere of . However, I found a short proof of the above theorem which relies on one of my favorite tricks – the tensor power trick – so I thought I would give it here.

We begin with an easy case, basically the case of the above theorem:

Lemma 2Let be unit vectors in such that for all distinct . Then .

*Proof:* Suppose for contradiction that . We consider the Gram matrix . This matrix is real symmetric with rank at most , thus if one subtracts off the identity matrix, it has an eigenvalue of with multiplicity at least . Taking Hilbert-Schmidt norms, we conclude that

But by hypothesis, the left-hand side is at most , giving the desired contradiction.

To amplify the above lemma to cover larger values of , we apply the tensor power trick. A direct application of the tensor power trick does not gain very much; however one can do a lot better by using the *symmetric* tensor power rather than the raw tensor power. This gives

Corollary 3Let be a natural number, and let be unit vectors in such that for all distinct . Then .

*Proof:* We work in the symmetric component of the tensor power , which has dimension . Applying the previous lemma to the tensor powers , we obtain the claim.

Using the trivial bound , we can lower bound

We can thus prove Theorem 1 by setting for some sufficiently large absolute constant .

## Recent Comments