You are currently browsing the category archive for the ‘math.MG’ category.

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let {F_2} be the free group on two generators {a,b}. Does there exist a metric {d} on this group which is

  • bi-invariant, thus {d(xg,yg)=d(gx,gy) = d(x,y)} for all {x,y,g \in F_2}; and
  • linear growth in the sense that {d(x^n,1) = n d(x,1)} for all {x \in F_2} and all natural numbers {n}?

By defining the “norm” of an element {x \in F_2} to be {\| x\| := d(x,1)}, an equivalent formulation of the problem asks if there exists a non-negative norm function {\| \|: F_2 \rightarrow {\bf R}^+} that obeys the conjugation invariance

\displaystyle  \| gxg^{-1} \| = \|x \| \ \ \ \ \ (1)

for all {x,g \in F_2}, the triangle inequality

\displaystyle  \| xy \| \leq \| x\| + \| y\| \ \ \ \ \ (2)

for all {x,y \in F_2}, and the linear growth

\displaystyle  \| x^n \| = |n| \|x\| \ \ \ \ \ (3)

for all {x \in F_2} and {n \in {\bf Z}}, and such that {\|x\| > 0} for all non-identity {x \in F_2}. Indeed, if such a norm exists then one can just take {d(x,y) := \| x y^{-1} \|} to give the desired metric.

One can normalise the norm of the generators to be at most {1}, thus

\displaystyle  \| a \|, \| b \| \leq 1.

This can then be used to upper bound the norm of other words in {F_2}. For instance, from (1), (3) one has

\displaystyle  \| aba^{-1} \|, \| b^{-1} a b \|, \| a^{-1} b^{-1} a \|, \| bab^{-1}\| \leq 1.

A bit less trivially, from (3), (2), (1) one can bound commutators as

\displaystyle  \| aba^{-1} b^{-1} \| = \frac{1}{3} \| (aba^{-1} b^{-1})^3 \|

\displaystyle  = \frac{1}{3} \| (aba^{-1}) (b^{-1} ab) (a^{-1} b^{-1} a) (b ab^{-1}) \|

\displaystyle  \leq \frac{4}{3}.

In a similar spirit one has

\displaystyle  \| aba^{-2} b^{-1} \| = \frac{1}{2} \| (aba^{-2} b^{-1})^2 \|

\displaystyle  = \frac{1}{2} \| (aba^{-1}) (a^{-1} b^{-1} a) (ba^{-1} b^{-1}) (ba^{-1} b^{-1}) \|

\displaystyle  \leq 2.

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm {\| g\|} of a given non-trivial group element {g} to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on {F_2} would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

I’ve just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds“, submitted to Discrete and Continuous Dynamical Systems. This is a variant of my recent paper on the universality of potential well dynamics, but instead of trying to embed dynamical systems into a potential well {\partial_{tt} u = -\nabla V(u)}, here we try to embed dynamical systems into the incompressible Euler equations

\displaystyle  \partial_t u + \nabla_u u = - \mathrm{grad}_g p \ \ \ \ \ (1)

\displaystyle  \mathrm{div}_g u = 0

on a Riemannian manifold {(M,g)}. (One is particularly interested in the case of flat manifolds {M}, particularly {{\bf R}^3} or {({\bf R}/{\bf Z})^3}, but for the main result of this paper it is essential that one is permitted to consider curved manifolds.) This system, first studied by Ebin and Marsden, is the natural generalisation of the usual incompressible Euler equations to curved space; it can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on {M} (see this previous post for a discussion of this in the flat space case).

The Euler equations can be viewed as a nonlinear equation in which the nonlinearity is a quadratic function of the velocity field {u}. It is thus natural to compare the Euler equations with quadratic ODE of the form

\displaystyle  \partial_t y = B(y,y) \ \ \ \ \ (2)

where {y: {\bf R} \rightarrow {\bf R}^n} is the unknown solution, and {B: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}^n} is a bilinear map, which we may assume without loss of generality to be symmetric. One can ask whether such an ODE may be linearly embedded into the Euler equations on some Riemannian manifold {(M,g)}, which means that there is an injective linear map {U: {\bf R}^n \rightarrow \Gamma(TM)} from {{\bf R}^n} to smooth vector fields on {M}, as well as a bilinear map {P: {\bf R}^n \times {\bf R}^n \rightarrow C^\infty(M)} to smooth scalar fields on {M}, such that the map {y \mapsto (U(y), P(y,y))} takes solutions to (2) to solutions to (1), or equivalently that

\displaystyle  U(B(y,y)) + \nabla_{U(y)} U(y) = - \mathrm{grad}_g P(y,y)

\displaystyle  \mathrm{div}_g U(y) = 0

for all {y \in {\bf R}^n}.

For simplicity let us restrict {M} to be compact. There is an obvious necessary condition for this embeddability to occur, which comes from energy conservation law for the Euler equations; unpacking everything, this implies that the bilinear form {B} in (2) has to obey a cancellation condition

\displaystyle  \langle B(y,y), y \rangle = 0 \ \ \ \ \ (3)

for some positive definite inner product {\langle, \rangle: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}} on {{\bf R}^n}. The main result of the paper is the converse to this statement: if {B} is a symmetric bilinear form obeying a cancellation condition (3), then it is possible to embed the equations (2) into the Euler equations (1) on some Riemannian manifold {(M,g)}; the catch is that this manifold will depend on the form {B} and on the dimension {n} (in fact in the construction I have, {M} is given explicitly as {SO(n) \times ({\bf R}/{\bf Z})^{n+1}}, with a funny metric on it that depends on {B}).

As a consequence, any finite dimensional portion of the usual “dyadic shell models” used as simplified toy models of the Euler equation, can actually be embedded into a genuine Euler equation, albeit on a high-dimensional and curved manifold. This includes portions of the self-similar “machine” I used in a previous paper to establish finite time blowup for an averaged version of the Navier-Stokes (or Euler) equations. Unfortunately, the result in this paper does not apply to infinite-dimensional ODE, so I cannot yet establish finite time blowup for the Euler equations on a (well-chosen) manifold. It does not seem so far beyond the realm of possibility, though, that this could be done in the relatively near future. In particular, the result here suggests that one could construct something resembling a universal Turing machine within an Euler flow on a manifold, which was one ingredient I would need to engineer such a finite time blowup.

The proof of the main theorem proceeds by an “elimination of variables” strategy that was used in some of my previous papers in this area, though in this particular case the Nash embedding theorem (or variants thereof) are not required. The first step is to lessen the dependence on the metric {g} by partially reformulating the Euler equations (1) in terms of the covelocity {g \cdot u} (which is a {1}-form) instead of the velocity {u}. Using the freedom to modify the dimension of the underlying manifold {M}, one can also decouple the metric {g} from the volume form that is used to obtain the divergence-free condition. At this point the metric can be eliminated, with a certain positive definiteness condition between the velocity and covelocity taking its place. After a substantial amount of trial and error (motivated by some “two-and-a-half-dimensional” reductions of the three-dimensional Euler equations, and also by playing around with a number of variants of the classic “separation of variables” strategy), I eventually found an ansatz for the velocity and covelocity that automatically solved most of the components of the Euler equations (as well as most of the positive definiteness requirements), as long as one could find a number of scalar fields that obeyed a certain nonlinear system of transport equations, and also obeyed a positive definiteness condition. Here I was stuck for a bit because the system I ended up with was overdetermined – more equations than unknowns. After trying a number of special cases I eventually found a solution to the transport system on the sphere, except that the scalar functions sometimes degenerated and so the positive definiteness property I wanted was only obeyed with positive semi-definiteness. I tried for some time to perturb this example into a strictly positive definite solution before eventually working out that this was not possible. Finally I had the brainwave to lift the solution from the sphere to an even more symmetric space, and this quickly led to the final solution of the problem, using the special orthogonal group rather than the sphere as the underlying domain. The solution ended up being rather simple in form, but it is still somewhat miraculous to me that it exists at all; in retrospect, given the overdetermined nature of the problem, relying on a large amount of symmetry to cut down the number of equations was basically the only hope.

Throughout this post we shall always work in the smooth category, thus all manifolds, maps, coordinate charts, and functions are assumed to be smooth unless explicitly stated otherwise.

A (real) manifold {M} can be defined in at least two ways. On one hand, one can define the manifold extrinsically, as a subset of some standard space such as a Euclidean space {{\bf R}^d}. On the other hand, one can define the manifold intrinsically, as a topological space equipped with an atlas of coordinate charts. The fundamental embedding theorems show that, under reasonable assumptions, the intrinsic and extrinsic approaches give the same classes of manifolds (up to isomorphism in various categories). For instance, we have the following (special case of) the Whitney embedding theorem:

Theorem 1 (Whitney embedding theorem) Let {M} be a compact manifold. Then there exists an embedding {u: M \rightarrow {\bf R}^d} from {M} to a Euclidean space {{\bf R}^d}.

In fact, if {M} is {n}-dimensional, one can take {d} to equal {2n}, which is often best possible (easy examples include the circle {{\bf R}/{\bf Z}} which embeds into {{\bf R}^2} but not {{\bf R}^1}, or the Klein bottle that embeds into {{\bf R}^4} but not {{\bf R}^3}). One can also relax the compactness hypothesis on {M} to second countability, but we will not pursue this extension here. We give a “cheap” proof of this theorem below the fold which allows one to take {d} equal to {2n+1}.

A significant strengthening of the Whitney embedding theorem is (a special case of) the Nash embedding theorem:

Theorem 2 (Nash embedding theorem) Let {(M,g)} be a compact Riemannian manifold. Then there exists a isometric embedding {u: M \rightarrow {\bf R}^d} from {M} to a Euclidean space {{\bf R}^d}.

In order to obtain the isometric embedding, the dimension {d} has to be a bit larger than what is needed for the Whitney embedding theorem; in this article of Gunther the bound

\displaystyle  d = \max( 	n(n+5)/2, n(n+3)/2 + 5) \ \ \ \ \ (1)

is attained, which I believe is still the record for large {n}. (In the converse direction, one cannot do better than {d = \frac{n(n+1)}{2}}, basically because this is the number of degrees of freedom in the Riemannian metric {g}.) Nash’s original proof of theorem used what is now known as Nash-Moser inverse function theorem, but a subsequent simplification of Gunther allowed one to proceed using just the ordinary inverse function theorem (in Banach spaces).

I recently had the need to invoke the Nash embedding theorem to establish a blowup result for a nonlinear wave equation, which motivated me to go through the proof of the theorem more carefully. Below the fold I give a proof of the theorem that does not attempt to give an optimal value of {d}, but which hopefully isolates the main ideas of the argument (as simplified by Gunther). One advantage of not optimising in {d} is that it allows one to freely exploit the very useful tool of pairing together two maps {u_1: M \rightarrow {\bf R}^{d_1}}, {u_2: M \rightarrow {\bf R}^{d_2}} to form a combined map {(u_1,u_2): M \rightarrow {\bf R}^{d_1+d_2}} that can be closer to an embedding or an isometric embedding than the original maps {u_1,u_2}. This lets one perform a “divide and conquer” strategy in which one first starts with the simpler problem of constructing some “partial” embeddings of {M} and then pairs them together to form a “better” embedding.

In preparing these notes, I found the articles of Deane Yang and of Siyuan Lu to be helpful.

Read the rest of this entry »

The wave equation is usually expressed in the form

\displaystyle  \partial_{tt} u - \Delta u = 0

where {u \colon {\bf R} \times {\bf R}^d \rightarrow {\bf C}} is a function of both time {t \in {\bf R}} and space {x \in {\bf R}^d}, with {\Delta} being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain {{\bf R}^d} with some other manifold and replacing the Laplacian {\Delta} with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on {{\bf R}^d}. We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

\displaystyle  \int_{{\bf R}^d} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx

which we can rewrite using integration by parts and the {L^2} inner product {\langle, \rangle} on {{\bf R}^d} as

\displaystyle  \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle \partial_t u(t), \partial_t u(t) \rangle.

A key feature of the wave equation is finite speed of propagation: if, at time {t=0} (say), the initial position {u(0)} and initial velocity {\partial_t u(0)} are both supported in a ball {B(x_0,R) := \{ x \in {\bf R}^d: |x-x_0| \leq R \}}, then at any later time {t>0}, the position {u(t)} and velocity {\partial_t u(t)} are supported in the larger ball {B(x_0,R+t)}. This can be seen for instance (formally, at least) by inspecting the exterior energy

\displaystyle  \int_{|x-x_0| > R+t} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time {t=0}.

The wave equation is second order in time, but one can turn it into a first order system by working with the pair {(u(t),v(t))} rather than just the single field {u(t)}, where {v(t) := \partial_t u(t)} is the velocity field. The system is then

\displaystyle  \partial_t u(t) = v(t)

\displaystyle  \partial_t v(t) = \Delta u(t)

and the conserved energy is now

\displaystyle  \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (1)

Finite speed of propagation then tells us that if {u(0),v(0)} are both supported on {B(x_0,R)}, then {u(t),v(t)} are supported on {B(x_0,R+t)} for all {t>0}. One also has time reversal symmetry: if {t \mapsto (u(t),v(t))} is a solution, then {t \mapsto (u(-t), -v(-t))} is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times {t<0} using this symmetry.

If one has an eigenfunction

\displaystyle  -\Delta \phi = \lambda^2 \phi

of the Laplacian, then we have the explicit solutions

\displaystyle  u(t) = e^{\pm it \lambda} \phi

\displaystyle  v(t) = \pm i \lambda e^{\pm it \lambda} \phi

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity {v(0)=0}, the solution {u(t)} is given via functional calculus by

\displaystyle  u(t) = \cos(t \sqrt{-\Delta}) u(0)

and the propagator {\cos(t \sqrt{-\Delta})} can be expressed as the average of half-wave operators:

\displaystyle  \cos(t \sqrt{-\Delta}) = \frac{1}{2} ( e^{it\sqrt{-\Delta}} + e^{-it\sqrt{-\Delta}} ).

One can view {\cos(t \sqrt{-\Delta} )} as a minor of the full wave propagator

\displaystyle  U(t) := \exp \begin{pmatrix} 0 & t \\ t\Delta & 0 \end{pmatrix}

\displaystyle  = \begin{pmatrix} \cos(t \sqrt{-\Delta}) & \frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}} \\ \sin(t\sqrt{-\Delta}) \sqrt{-\Delta} & \cos(t \sqrt{-\Delta} ) \end{pmatrix}

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

\displaystyle  \begin{pmatrix} u(t) \\ v(t) \end{pmatrix} = U(t) \begin{pmatrix} u(0) \\ v(0) \end{pmatrix}. \ \ \ \ \ (2)

Viewing the contraction {\cos(t\sqrt{-\Delta})} as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable {t} now lives on the integers {{\bf Z}} rather than on {{\bf R}}, and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

\displaystyle  u(t+1) = P u(t) + v(t) \ \ \ \ \ (3)

\displaystyle  v(t+1) = P v(t) - (1-P^2) u(t)

where {t} is now an integer, {u(t), v(t)} take values in some Hilbert space (e.g. {\ell^2} functions on a graph {G}), and {P} is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

\displaystyle  u(t+\varepsilon) = P_\varepsilon u(t) + \varepsilon v(t)

\displaystyle  v(t+\varepsilon) = P_\varepsilon v(t) - \frac{1}{\varepsilon} (1-P_\varepsilon^2) u(t)

where {\varepsilon>0} is a small parameter (representing the discretised time step), {t} now takes values in the integer multiples {\varepsilon {\bf Z}} of {\varepsilon}, and {P_\varepsilon} is the wave propagator operator {P_\varepsilon := \cos( \varepsilon \sqrt{-\Delta} )} or the heat propagator {P_\varepsilon := \exp( - \varepsilon^2 \Delta/2 )} (the two operators are different, but agree to fourth order in {\varepsilon}). One can then formally verify that the wave equation emerges from this rescaled system in the limit {\varepsilon \rightarrow 0}. (Thus, {P} is not exactly the direct analogue of the Laplacian {\Delta}, but can be viewed as something like {P_\varepsilon = 1 - \frac{\varepsilon^2}{2} \Delta + O( \varepsilon^4 )} in the case of small {\varepsilon}, or {P = 1 - \frac{1}{2}\Delta + O(\Delta^2)} if we are not rescaling to the small {\varepsilon} case. The operator {P} is sometimes known as the diffusion operator)

Assuming {P} is self-adjoint, solutions to the system (3) formally conserve the energy

\displaystyle  \frac{1}{2} \langle (1-P^2) u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (4)

This energy is positive semi-definite if {P} is a contraction. We have the same time reversal symmetry as before: if {t \mapsto (u(t),v(t))} solves the system (3), then so does {t \mapsto (u(-t), -v(-t))}. If one has an eigenfunction

\displaystyle  P \phi = \cos(\lambda) \phi

to the operator {P}, then one has an explicit solution

\displaystyle  u(t) = e^{\pm it \lambda} \phi

\displaystyle  v(t) = \pm i \sin(\lambda) e^{\pm it \lambda} \phi

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field {v} by one unit. Suppose we know that {P} has unit speed in the sense that whenever {f} is supported in a ball {B(x,R)}, then {Pf} is supported in the ball {B(x,R+1)}. Then an easy induction shows that if {u(0), v(0)} are supported in {B(x_0,R), B(x_0,R+1)} respectively, then {u(t), v(t)} are supported in {B(x_0,R+t), B(x_0, R+t+1)}.

The fundamental solution {U(t) = U^t} to the discretised wave equation (3), in the sense of (2), is given by the formula

\displaystyle  U(t) = U^t = \begin{pmatrix} P & 1 \\ P^2-1 & P \end{pmatrix}^t

\displaystyle  = \begin{pmatrix} T_t(P) & U_{t-1}(P) \\ (P^2-1) U_{t-1}(P) & T_t(P) \end{pmatrix}

where {T_t} and {U_t} are the Chebyshev polynomials of the first and second kind, thus

\displaystyle  T_t( \cos \theta ) = \cos(t\theta)

and

\displaystyle  U_t( \cos \theta ) = \frac{\sin((t+1)\theta)}{\sin \theta}.

In particular, {P} is now a minor of {U(1) = U}, and can also be viewed as an average of {U} with its inverse {U^{-1}}:

\displaystyle  \begin{pmatrix} P & 0 \\ 0 & P \end{pmatrix} = \frac{1}{2} (U + U^{-1}). \ \ \ \ \ (5)

As before, {U} is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers {P^n} and {U^n} are discrete analogues of the heat propagators {e^{t\Delta/2}} and wave propagators {U(t)} respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality) Let {G} be a (possibly infinite) regular graph, let {n \geq 1}, and let {x, y} be vertices in {G}. Then the probability that the simple random walk at {x} lands at {y} at time {n} is at most {2 \exp( - d(x,y)^2 / 2n )}, where {d} is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers {{\bf Z}}. Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length {n} concentrate on the ball of radius {O(\sqrt{n})} or so centred at the origin of the random walk.

Proof: Let {P \colon \ell^2(G) \rightarrow \ell^2(G)} be the graph Laplacian, thus

\displaystyle  Pf(x) = \frac{1}{D} \sum_{y \sim x} f(y)

for any {f \in \ell^2(G)}, where {D} is the degree of the regular graph and sum is over the {D} vertices {y} that are adjacent to {x}. This is a contraction of unit speed, and the probability that the random walk at {x} lands at {y} at time {n} is

\displaystyle  \langle P^n \delta_x, \delta_y \rangle

where {\delta_x, \delta_y} are the Dirac deltas at {x,y}. Using (5), we can rewrite this as

\displaystyle  \langle (\frac{1}{2} (U + U^{-1}))^n \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle

where we are now using the energy form (4). We can write

\displaystyle  (\frac{1}{2} (U + U^{-1}))^n = {\bf E} U^{S_n}

where {S_n} is the simple random walk of length {n} on the integers, that is to say {S_n = \xi_1 + \dots + \xi_n} where {\xi_1,\dots,\xi_n = \pm 1} are independent uniform Bernoulli signs. Thus we wish to show that

\displaystyle  {\bf E} \langle U^{S_n} \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle \leq 2 \exp(-d(x,y)^2 / 2n ).

By finite speed of propagation, the inner product here vanishes if {|S_n| < d(x,y)}. For {|S_n| \geq d(x,y)} we can use Cauchy-Schwarz and the unitary nature of {U} to bound the inner product by {1}. Thus the left-hand side may be upper bounded by

\displaystyle  {\bf P}( |S_n| \geq d(x,y) )

and the claim now follows from the Chernoff inequality. \Box

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality) Let {t > 0}, and let {f,g \in L^2({\bf R}^d)} be supported on compact sets {F,G} respectively. Then

\displaystyle  |\langle e^{t\Delta/2} f, g \rangle| \leq \sqrt{\frac{2t}{\pi d(F,G)^2}} \exp( - d(F,G)^2 / 2t ) \|f\|_{L^2} \|g\|_{L^2}

where {d(F,G)} is the Euclidean distance between {F} and {G}.

Proof: By Fourier inversion one has

\displaystyle  e^{-t\xi^2/2} = \frac{1}{\sqrt{2\pi t}} \int_{\bf R} e^{-s^2/2t} e^{is\xi}\ ds

\displaystyle  = \sqrt{\frac{2}{\pi t}} \int_0^\infty e^{-s^2/2t} \cos(s \xi )\ ds

for any real {\xi}, and thus

\displaystyle  \langle e^{t\Delta/2} f, g\rangle = \sqrt{\frac{2}{\pi}} \int_0^\infty e^{-s^2/2t} \langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds.

By finite speed of propagation, the inner product {\langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds} vanishes when {s < d(F,G)}; otherwise, we can use Cauchy-Schwarz and the contractive nature of {\cos(s \sqrt{-\Delta} )} to bound this inner product by {\|f\|_{L^2} \|g\|_{L^2}}. Thus

\displaystyle  |\langle e^{t\Delta/2} f, g\rangle| \leq \sqrt{\frac{2}{\pi t}} \|f\|_{L^2} \|g\|_{L^2} \int_{d(F,G)}^\infty e^{-s^2/2t}\ ds.

Bounding {e^{-s^2/2t}} by {e^{-d(F,G)^2/2t} e^{-d(F,G) (s-d(F,G))/t}}, we obtain the claim. \Box

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than {{\bf R}^d}.

A core foundation of the subject now known as arithmetic combinatorics (and particularly the subfield of additive combinatorics) are the elementary sum set estimates (sometimes known as “Ruzsa calculus”) that relate the cardinality of various sum sets

\displaystyle  A+B := \{ a+b: a \in A, b \in B \}

and difference sets

\displaystyle  A-B := \{ a-b: a \in A, b \in B \},

as well as iterated sumsets such as {3A=A+A+A}, {2A-2A=A+A-A-A}, and so forth. Here, {A, B} are finite non-empty subsets of some additive group {G = (G,+)} (classically one took {G={\bf Z}} or {G={\bf R}}, but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:

Lemma 1 (Ruzsa covering lemma) Let {A, B} be finite non-empty subsets of {G}. Then {A} may be covered by at most {\frac{|A+B|}{|B|}} translates of {B-B}.

Proof: Consider a maximal set of disjoint translates {a+B} of {B} by elements {a \in A}. These translates have cardinality {|B|}, are disjoint, and lie in {A+B}, so there are at most {\frac{|A+B|}{|B|}} of them. By maximality, for any {a' \in A}, {a'+B} must intersect at least one of the selected {a+B}, thus {a' \in a+B-B}, and the claim follows. \Box

Lemma 2 (Ruzsa triangle inequality) Let {A,B,C} be finite non-empty subsets of {G}. Then {|A-C| \leq \frac{|A-B| |B-C|}{|B|}}.

Proof: Consider the addition map {+: (x,y) \mapsto x+y} from {(A-B) \times (B-C)} to {G}. Every element {a-c} of {A - C} has a preimage {\{ (x,y) \in (A-B) \times (B-C)\}} of this map of cardinality at least {|B|}, thanks to the obvious identity {a-c = (a-b) + (b-c)} for each {b \in B}. Since {(A-B) \times (B-C)} has cardinality {|A-B| |B-C|}, the claim follows. \Box

Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets {A} of small doubling, in the sense that {|A+A| \leq K|A|} for some bounded {K}. (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in {K} (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in {K}.)

However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. {{\bf R}^n} or a torus {({\bf R}/{\bf Z})^n}) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets {A} remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group {O_n({\bf R})} of orthogonal {n \times n} matrices, and let {A} be the matrices obtained by starting with all of the orthogonal matrice in {O_n({\bf R})} and rounding each coefficient of each matrix in this set to the nearest multiple of {\epsilon}, for some small {\epsilon>0}. This forms a finite set (whose cardinality grows as {\epsilon\rightarrow 0} like a certain negative power of {\epsilon}). In the limit {\epsilon \rightarrow 0}, the set {A} is not a set of small doubling in the discrete sense. However, {A \cdot A} is still close to {A} in a metric sense, being contained in the {O_n(\epsilon)}-neighbourhood of {A}. Another key example comes from graphs {\Gamma := \{ (x, f(x)): x \in G \}} of maps {f: A \rightarrow H} from a subset {A} of one additive group {G = (G,+)} to another {H = (H,+)}. If {f} is “approximately additive” in the sense that for all {x,y \in G}, {f(x+y)} is close to {f(x)+f(y)} in some metric, then {\Gamma} might not have small doubling in the discrete sense (because {f(x+y)-f(x)-f(y)} could take a large number of values), but could be considered a set of small doubling in a discretised sense.

One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric {d} on the underlying group {G = (G,+)} (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:

Definition 3 Let {(X,d)} be a metric space, let {E} be a subset of {X}, and let {r>0} be a radius.

  • The packing number {N^{pack}_r(E)} is the largest number of points {x_1,\dots,x_n} one can pack inside {E} such that the balls {B(x_1,r),\dots,B(x_n,r)} are disjoint.
  • The internal covering number {N^{int}_r(E)} is the fewest number of points {x_1,\dots,x_n \in E} such that the balls {B(x_1,r),\dots,B(x_n,r)} cover {E}.
  • The external covering number {N^{ext}_r(E)} is the fewest number of points {x_1,\dots,x_n \in X} such that the balls {B(x_1,r),\dots,B(x_n,r)} cover {E}.
  • The metric entropy {N^{ent}_r(E)} is the largest number of points {x_1,\dots,x_n} one can find in {E} that are {r}-separated, thus {d(x_i,x_j) \geq r} for all {i \neq j}.

It is an easy exercise to verify the inequalities

\displaystyle  N^{ent}_{2r}(E) \leq N^{pack}_r(E) \leq N^{ext}_r(E) \leq N^{int}_r(E) \leq N^{ent}_r(E)

for any {r>0}, and that {N^*_r(E)} is non-increasing in {r} and non-decreasing in {E} for the three choices {* = pack,ext,ent} (but monotonicity in {E} can fail for {*=int}!). It turns out that the external covering number {N^{ent}_r(E)} is slightly more convenient than the other notions of metric entropy, so we will abbreviate {N_r(E) = N^{ent}_r(E)}. The cardinality {|E|} can be viewed as the limit of the entropies {N^*_r(E)} as {r \rightarrow 0}.

If we have the bounded doubling property that {B(0,2r)} is covered by {O(1)} translates of {B(0,r)} for each {r>0}, and one has a Haar measure {m} on {G} which assigns a positive finite mass to each ball, then any of the above entropies {N^*_r(E)} is comparable to {m( E + B(0,r) ) / m(B(0,r))}, as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter {r} by some absolute constant. Such losses can be acceptable in applications in which the underlying sets {A} are largely “transverse” to the balls {B(0,r)}, so that the {N_r}-entropy of {A} is largely independent of {A}; this is a situation which arises in particular in the case of graphs {\Gamma = \{ (x,f(x)): x \in G \}} discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)

Henceforth we work in an additive group {G} equipped with a translation-invariant metric {d}. (One can also generalise things slightly by allowing the metric to attain the values {0} or {+\infty}, without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set {E} will have finite entropy {N_r(E)} for any {r>0}. We now have analogues of the two basic Ruzsa lemmas above:

Lemma 4 (Ruzsa covering lemma) Let {A, B} be precompact non-empty subsets of {G}, and let {r>0}. Then {A} may be covered by at most {\frac{N_r(A+B)}{N_r(B)}} translates of {B-B+B(0,2r)}.

Proof: Let {a_1,\dots,a_n \in A} be a maximal set of points such that the sets {a_i + B + B(0,r)} are all disjoint. Then the sets {a_i+B} are disjoint in {A+B} and have entropy {N_r(a_i+B)=N_r(B)}, and furthermore any ball of radius {r} can intersect at most one of the {a_i+B}. We conclude that {N_r(A+B) \geq n N_r(B)}, so {n \leq \frac{N_r(A+B)}{N_r(B)}}. If {a \in A}, then {a+B+B(0,r)} must intersect one of the {a_i + B + B(0,r)}, so {a \in a_i + B-B + B(0,2r)}, and the claim follows. \Box

Lemma 5 (Ruzsa triangle inequality) Let {A,B,C} be precompact non-empty subsets of {G}, and let {r>0}. Then {N_{4r}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(B)}}.

Proof: Consider the addition map {+: (x,y) \mapsto x+y} from {(A-B) \times (B-C)} to {G}. The domain {(A-B) \times (B-C)} may be covered by {N_r(A-B) N_r(B-C)} product balls {B(x,r) \times B(y,r)}. Every element {a-c} of {A - C} has a preimage {\{ (x,y) \in (A-B) \times (B-C)\}} of this map which projects to a translate of {B}, and thus must meet at least {N_r(B)} of these product balls. However, if two elements of {A-C} are separated by a distance of at least {4r}, then no product ball can intersect both preimages. We thus see that {N_{4r}^{ent}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(A-C)}}, and the claim follows. \Box

Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)

It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.

Read the rest of this entry »

Let {n} be a natural number. We consider the question of how many “almost orthogonal” unit vectors {v_1,\ldots,v_m} one can place in the Euclidean space {{\bf R}^n}. Of course, if we insist on {v_1,\ldots,v_m} being exactly orthogonal, so that {\langle v_i,v_j \rangle = 0} for all distinct {i,j}, then we can only pack at most {n} unit vectors into this space. However, if one is willing to relax the orthogonality condition a little, so that {\langle v_i,v_j\rangle} is small rather than zero, then one can pack a lot more unit vectors into {{\bf R}^n}, due to the important fact that pairs of vectors in high dimensions are typically almost orthogonal to each other. For instance, if one chooses {v_i} uniformly and independently at random on the unit sphere, then a standard computation (based on viewing the {v_i} as gaussian vectors projected onto the unit sphere) shows that each inner product {\langle v_i,v_j \rangle} concentrates around the origin with standard deviation {O(1/\sqrt{n})} and with gaussian tails, and a simple application of the union bound then shows that for any fixed {K \geq 1}, one can pack {n^K} unit vectors into {{\bf R}^n} whose inner products are all of size {O( K^{1/2} n^{-1/2} \log^{1/2} n )}.

One can remove the logarithm by using some number theoretic constructions. For instance, if {n} is twice a prime {n=2p}, one can identify {{\bf R}^n} with the space {\ell^2({\bf F}_p)} of complex-valued functions {f: {\bf F}_p \rightarrow {\bf C}}, whee {{\bf F}_p} is the field of {p} elements, and if one then considers the {p^2} different quadratic phases {x \mapsto \frac{1}{\sqrt{p}} e_p( ax^2 + bx )} for {a,b \in {\bf F}_p}, where {e_p(a) := e^{2\pi i a/p}} is the standard character on {{\bf F}_p}, then a standard application of Gauss sum estimates reveals that these {p^2} unit vectors in {{\bf R}^n} all have inner products of magnitude at most {p^{-1/2}} with each other. More generally, if we take {d \geq 1} and consider the {p^d} different polynomial phases {x \mapsto \frac{1}{\sqrt{p}} e_p( a_d x^d + \ldots + a_1 x )} for {a_1,\ldots,a_d \in {\bf F}_p}, then an application of the Weil conjectures for curves, proven by Weil, shows that the inner products of the associated {p^d} unit vectors with each other have magnitude at most {(d-1) p^{-1/2}}.

As it turns out, this construction is close to optimal, in that there is a polynomial limit to how many unit vectors one can pack into {{\bf R}^n} with an inner product of {O(1/\sqrt{n})}:

Theorem 1 (Cheap Kabatjanskii-Levenstein bound) Let {v_1,\ldots,v_m} be unit vector in {{\bf R}^n} such that {|\langle v_i, v_j \rangle| \leq A n^{-1/2}} for some {1/2 \leq A \leq \frac{1}{2} \sqrt{n}}. Then we have {m \leq (\frac{Cn}{A^2})^{C A^2}} for some absolute constant {C}.

In particular, for fixed {d} and large {p}, the number of unit vectors one can pack in {{\bf R}^{2p}} whose inner products all have magnitude at most {dp^{-1/2}} will be {O( p^{O(d^2)} )}. This doesn’t quite match the construction coming from the Weil conjectures, although it is worth noting that the upper bound of {(d-1)p^{-1/2}} for the inner product is usually not sharp (the inner product is actually {p^{-1/2}} times the sum of {d-1} unit phases which one expects (cf. the Sato-Tate conjecture) to be uniformly distributed on the unit circle, and so the typical inner product is actually closer to {(d-1)^{1/2} p^{-1/2}}).

Note that for {0 \leq A < 1/2}, the {A=1/2} case of the above theorem (or more precisely, Lemma 2 below) gives the bound {m=O(n)}, which is essentially optimal as the example of an orthonormal basis shows. For {A \geq \sqrt{n}}, the condition {|\langle v_i, v_j \rangle| \leq A n^{-1/2}} is trivially true from Cauchy-Schwarz, and {m} can be arbitrariy large. Finally, in the range {\frac{1}{2} \sqrt{n} \leq A \leq \sqrt{n}}, we can use a volume packing argument: we have {\|v_i-v_j\|^2 \geq 2 (1 - A n^{-1/2})}, so of we set {r := 2^{-1/2} (1-A n^{-1/2})^{1/2}}, then the open balls of radius {r} around each {v_i} are disjoint, while all lying in a ball of radius {O(1)}, giving rise to the bound {m \leq C^n (1-A n^{-1/2})^{-n/2}} for some absolute constant {C}.

As I learned recently from Philippe Michel, a more precise version of this theorem is due to Kabatjanskii and Levenstein, who studied the closely related problem of sphere packing (or more precisely, cap packing) in the unit sphere {S^{n-1}} of {{\bf R}^n}. However, I found a short proof of the above theorem which relies on one of my favorite tricks – the tensor power trick – so I thought I would give it here.

We begin with an easy case, basically the {A=1/2} case of the above theorem:

Lemma 2 Let {v_1,\ldots,v_m} be unit vectors in {{\bf R}^n} such that {|\langle v_i, v_j \rangle| \leq \frac{1}{2n^{1/2}}} for all distinct {i,j}. Then {m < 2n}.

Proof: Suppose for contradiction that {m \geq 2n}. We consider the {2n \times 2n} Gram matrix {( \langle v_i,v_j \rangle )_{1 \leq i,j \leq 2n}}. This matrix is real symmetric with rank at most {n}, thus if one subtracts off the identity matrix, it has an eigenvalue of {-1} with multiplicity at least {n}. Taking Hilbert-Schmidt norms, we conclude that

\displaystyle  \sum_{1 \leq i,j \leq 2n: i \neq j} |\langle v_i, v_j \rangle|^2 \geq n.

But by hypothesis, the left-hand side is at most {2n(2n-1) \frac{1}{4n} = n-\frac{1}{2}}, giving the desired contradiction. \Box

To amplify the above lemma to cover larger values of {A}, we apply the tensor power trick. A direct application of the tensor power trick does not gain very much; however one can do a lot better by using the symmetric tensor power rather than the raw tensor power. This gives

Corollary 3 Let {k} be a natural number, and let {v_1,\ldots,v_m} be unit vectors in {{\bf R}^n} such that {|\langle v_i, v_j \rangle| \leq 2^{-1/k} (\binom{n+k-1}{k})^{-1/2k}} for all distinct {i,j}. Then {m < 2\binom{n+k-1}{k}}.

Proof: We work in the symmetric component {\hbox{Sym}^k {\bf R}^n} of the tensor power {({\bf R}^n)^{\otimes k} \equiv {\bf R}^{n^k}}, which has dimension {\binom{n+k-1}{k}}. Applying the previous lemma to the tensor powers {v_1^{\otimes k},\ldots,v_m^{\otimes k}}, we obtain the claim. \Box

Using the trivial bound {e^k \geq \frac{k^k}{k!}}, we can lower bound

\displaystyle  2^{-1/k} (\binom{n+k-1}{k})^{-1/2k} \geq 2^{-1/k} (n+k-1)^{-1/2} (k!)^{1/2k}

\displaystyle  \geq 2^{-1/k} e^{-1/2} k^{1/2} (n+k-1)^{-1/2} .

We can thus prove Theorem 1 by setting {k := \lfloor C A^2 \rfloor} for some sufficiently large absolute constant {C}.

In the last set of notes, we obtained the following structural theorem concerning approximate groups:

Theorem 1 Let {A} be a finite {K}-approximate group. Then there exists a coset nilprogression {P} of rank and step {O_K(1)} contained in {A^4}, such that {A} is covered by {O_K(1)} left-translates of {P} (and hence also by {O_K(1)} right-translates of {P}).

Remark 1 Under some mild additional hypotheses (e.g. if the dimensions of {P} are sufficiently large, or if {P} is placed in a certain “normal form”, details of which may be found in this paper), a coset nilprogression {P} of rank and step {O_K(1)} will be an {O_K(1)}-approximate group, thus giving a partial converse to Theorem 1. (It is not quite a full converse though, even if one works qualitatively and forgets how the constants depend on {K}: if {A} is covered by a bounded number of left- and right-translates {gP, Pg} of {P}, one needs the group elements {g} to “approximately normalise” {P} in some sense if one wants to then conclude that {A} is an approximate group.) The mild hypotheses alluded to above can be enforced in the statement of the theorem, but we will not discuss this technicality here, and refer the reader to the above-mentioned paper for details.

By placing the coset nilprogression in a virtually nilpotent group, we have the following corollary in the global case:

Corollary 2 Let {A} be a finite {K}-approximate group in an ambient group {G}. Then {A} is covered by {O_K(1)} left cosets of a virtually nilpotent subgroup {G'} of {G}.

In this final set of notes, we give some applications of the above results. The first application is to replace “{K}-approximate group” by “sets of bounded doubling”:

Proposition 3 Let {A} be a finite non-empty subset of a (global) group {G} such that {|A^2| \leq K |A|}. Then there exists a coset nilprogression {P} of rank and step {O_K(1)} and cardinality {|P| \gg_K |A|} such that {A} can be covered by {O_K(1)} left-translates of {P}, and also by {O_K(1)} right-translates of {P}.

We will also establish (a strengthening of) a well-known theorem of Gromov on groups of polynomial growth, as promised back in Notes 0, as well as a variant result (of a type known as a “generalised Margulis lemma”) controlling the almost stabilisers of discrete actions of isometries.

The material here is largely drawn from my recent paper with Emmanuel Breuillard and Ben Green.

Read the rest of this entry »

In the previous set of notes, we introduced the notion of an ultra approximate group – an ultraproduct {A = \prod_{n \rightarrow\alpha} A_n} of finite {K}-approximate groups {A_n} for some {K} independent of {n}, where each {K}-approximate group {A_n} may lie in a distinct ambient group {G_n}. Although these objects arise initially from the “finitary” objects {A_n}, it turns out that ultra approximate groups {A} can be profitably analysed by means of infinitary groups {L} (and in particular, locally compact groups or Lie groups {L}), by means of certain models {\rho: \langle A \rangle \rightarrow L} of {A} (or of the group {\langle A \rangle} generated by {A}). We will define precisely what we mean by a model later, but as a first approximation one can view a model as a representation of the ultra approximate group {A} (or of {\langle A \rangle}) that is “macroscopically faithful” in that it accurately describes the “large scale” behaviour of {A} (or equivalently, that the kernel of the representation is “microscopic” in some sense). In the next section we will see how one can use “Gleason lemma” technology to convert this macroscopic control of an ultra approximate group into microscopic control, which will be the key to classifying approximate groups.

Models of ultra approximate groups can be viewed as the multiplicative combinatorics analogue of the more well known concept of an ultralimit of metric spaces, which we briefly review below the fold as motivation.

The crucial observation is that ultra approximate groups enjoy a local compactness property which allows them to be usefully modeled by locally compact groups (and hence, through the Gleason-Yamabe theorem from previous notes, by Lie groups also). As per the Heine-Borel theorem, the local compactness will come from a combination of a completeness property and a local total boundedness property. The completeness property turns out to be a direct consequence of the countable saturation property of ultraproducts, thus illustrating one of the key advantages of the ultraproduct setting. The local total boundedness property is more interesting. Roughly speaking, it asserts that “large bounded sets” (such as {A} or {A^{100}}) can be covered by finitely many translates of “small bounded sets” {S}, where “small” is a topological group sense, implying in particular that large powers {S^m} of {S} lie inside a set such as {A} or {A^4}. The easiest way to obtain such a property comes from the following lemma of Sanders:

Lemma 1 (Sanders lemma) Let {A} be a finite {K}-approximate group in a (global) group {G}, and let {m \geq 1}. Then there exists a symmetric subset {S} of {A^4} with {|S| \gg_{K,m} |A|} containing the identity such that {S^m \subset A^4}.

This lemma has an elementary combinatorial proof, and is the key to endowing an ultra approximate group with locally compact structure. There is also a closely related lemma of Croot and Sisask which can achieve similar results, and which will also be discussed below. (The locally compact structure can also be established more abstractly using the much more general methods of definability theory, as was first done by Hrushovski, but we will not discuss this approach here.)

By combining the locally compact structure of ultra approximate groups {A} with the Gleason-Yamabe theorem, one ends up being able to model a large “ultra approximate subgroup” {A'} of {A} by a Lie group {L}. Such Lie models serve a number of important purposes in the structure theory of approximate groups. Firstly, as all Lie groups have a dimension which is a natural number, they allow one to assign a natural number “dimension” to ultra approximate groups, which opens up the ability to perform “induction on dimension” arguments. Secondly, Lie groups have an escape property (which is in fact equivalent to no small subgroups property): if a group element {g} lies outside of a very small ball {B_\epsilon}, then some power {g^n} of it will escape a somewhat larger ball {B_1}. Or equivalently: if a long orbit {g, g^2, \ldots, g^n} lies inside the larger ball {B_1}, one can deduce that the original element {g} lies inside the small ball {B_\epsilon}. Because all Lie groups have this property, we will be able to show that all ultra approximate groups {A} “essentially” have a similar property, in that they are “controlled” by a nearby ultra approximate group which obeys a number of escape-type properties analogous to those enjoyed by small balls in a Lie group, and which we will call a strong ultra approximate group. This will be discussed in the next set of notes, where we will also see how these escape-type properties can be exploited to create a metric structure on strong approximate groups analogous to the Gleason metrics studied in previous notes, which can in turn be exploited (together with an induction on dimension argument) to fully classify such approximate groups (in the finite case, at least).

There are some cases where the analysis is particularly simple. For instance, in the bounded torsion case, one can show that the associated Lie model {L} is necessarily zero-dimensional, which allows for a easy classification of approximate groups of bounded torsion.

Some of the material here is drawn from my recent paper with Ben Green and Emmanuel Breuillard, which is in turn inspired by a previous paper of Hrushovski.

Read the rest of this entry »

Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “The structure of approximate groups“, submitted to Pub. IHES. We had announced the main results of this paper in various forums (including this blog) for a few months now, but it had taken some time to fully write up the paper and put in various refinements and applications.

As announced previously, the main result of this paper is what is a (virtually, qualitatively) complete description of finite approximate groups in an arbitrary (local or global) group {G}. For simplicity let us work in the much more familiar setting of global groups, although our results also apply (but are a bit more technical to state) in the local group setting.

Recall that in a global group {G = (G,\cdot)}, a {K}-approximate group is a symmetric subset {A} of {G} containing the origin, with the property that the product set {A \cdot A} is covered by {K} left-translates of {A}. Examples of {O(1)}-approximate groups include genuine groups, convex bodies in a bounded dimensional vector space, small balls in a bounded dimensional Lie group, large balls in a discrete nilpotent group of bounded rank or step, or generalised arithmetic progressions (or more generally, coset progressions) of bounded rank in an abelian group. Specialising now to finite approximate groups, a key example of such a group is what we call a coset nilprogression: a set of the form {\pi^{-1}(P)}, where {\pi: G' \rightarrow N} is a homomorphism with finite kernel from a subgroup {G'} of {G} to a nilpotent group {N} of bounded step, and {P = P(u_1,\ldots,u_r;N_1,\ldots,N_r)} is a nilprogression with a bounded number of generators {u_1,\ldots,u_r} in {N} and some lengths {N_1,\ldots,N_r \gg 1}, where {P(u_1,\ldots,u_r;N_1,\ldots,N_r)} consists of all the words involving at most {N_1} copies of {u_1^{\pm 1}}, {N_2} copies of {u_2^{\pm 1}}, and so forth up to {N_r} copies of {u_r^{\pm 1}}. One can show (by some nilpotent algebra) that all such coset nilprogressions are {O(1)}-approximate groups so long as the step and the rank {r} are bounded (and if {N_1,\ldots,N_r} are sufficiently large).

Our main theorem (which was essentially conjectured independently by Helfgott and by Lindenstrauss) asserts, roughly speaking, that coset nilprogressions are essentially the only examples of approximate groups.

Theorem 1 Let {A} be a {K}-approximate group. Then {A^4} contains a coset nilprogression {P} of rank and step {O_K(1)}, such that {A} can be covered by {O_K(1)} left-translates of {P}.

In the torsion-free abelian case, this result is essentially Freiman’s theorem (with an alternate proof by Ruzsa); for general abelian case, it is due to Green and Ruzsa. Various partial results in this direction for some other groups (e.g. free groups, nilpotent groups, solvable groups, or simple groups of Lie type) are also known; see these previous blog posts for a summary of several of these results.

This result has a number of applications to geometric growth theory, and in particular to variants of Gromov’s theorem of groups of polynomial growth, which asserts that a finitely generated group is of polynomial growth if and only if it is virtually nilpotent. The connection lies in the fact that if the balls {B_S(R)} associated to a finite set of generators {S} has polynomial growth, then some simple volume-packing arguments combined with the pigeonhole principle will show that {B_S(R)} will end up being a {O(1)}-approximate group for many radii {R}. In fact, since our theorem only needs a single approximate group to obtain virtually nilpotent structure, we are able to obtain some new strengthenings of Gromov’s theorem. For instance, if {A} is any {K}-approximate group in a finitely generated group {G} that contains {B_S(R_0)} for some set of generators {S} and some {R_0} that is sufficiently large depending on {K}, our theorem implies that {G} is virtually nilpotent, answering a question of Petrunin. Among other things, this gives an alternate proof of a recent result of Kapovitch and Wilking (see also this previous paper of Cheeger and Colding) that a compact manifold of bounded diameter and Ricci curvature at least {-\epsilon} necessarily has a virtually nilpotent fundamental group if {\epsilon} is sufficiently small (depending only on dimension). The main point here is that no lower bound on the injectivity radius is required. Another application is a “Margulis-type lemma”, which asserts that if a metric space {X} has “bounded packing” (in the sense that any ball of radius (say) {4} is covered by a bounded number of balls of radius {1}), and {\Gamma} is a group of isometries on {X} that acts discretely (i.e. every orbit has only finitely many elements (counting multiplicity) in each bounded set), then the near-stabiliser {\{ \gamma \in \Gamma: d(\gamma x, x) \leq \epsilon \}} of a point {x} is virtually nilpotent if {\epsilon} is small enough depending on the packing constant.

There are also some variants and refinements to the main theorem proved in the paper, such as an extension to local groups, and also an improvement on the bound on the rank and step from {O_K(1)} to {O(\log K)} (but at the cost of replacing {A^4} in the theorem with {A^{O(1)}}).

I’ll be discussing the proof of the main theorem in detail in the next few lecture notes of my current graduate course. The full proof is somewhat lengthy (occupying about 50 pages of the 90-page paper), but can be summarised in the following steps:

  1. (Hrushovski) Take an arbitrary sequence {A_n} of finite {K}-approximate groups, and show that an appropriate limit {A} of such groups can be “modeled” in some sense by an open bounded subset of a locally compact group. (The precise definition of “model” is technical, but “macroscopically faithful representation” is a good first approximation.) As discussed in the previous lecture notes, we use an ultralimit for this purpose; the paper of Hrushovski where this strategy was first employed also considered more sophisticated model-theoretic limits. To build a locally compact topology, Hrushovski used some tools from definability theory; in our paper, we instead use a combinatorial lemma of Sanders (closely related to a similar result of Croot and Sisask.)
  2. (Gleason-Yamabe) The locally compact group can in turn be “modeled” by a Lie group (possibly after shrinking the group, and thus the ultralimit {A}, slightly). (This result arose from the solution to Hilbert’s fifth problem, as discussed here. For our extension to local groups, we use a recent local version of the Gleason-Yamabe theorem, due to Goldbring.)
  3. (Gleason) Using the escape properties of the Lie model, construct a norm {\| \|} (and thus a left-invariant metric {d}) on the ultralimit approximate group {A} (and also on the finitary groups {A_n}) that obeys a number of good properties, such as a commutator estimate {\| [g,h]\| \ll \|g\| \|h\|}. (This is modeled on an analogous construction used in the theory of Hilbert’s fifth problem, as discussed in this previous set of lecture notes.) This norm is essentially an escape norm associated to (a slight modification) of {A} or {A_n}.
  4. (Jordan-Bieberbach-Frobenius) We now take advantage of the finite nature of the {A_n} by locating the non-trivial element {e} of {A_n} with minimal escape norm (but one has to first quotient out the elements of zero escape norm first). The commutator estimate mentioned previously ensures that this element is essentially “central” in {A_n}. One can then quotient out a progression {P(e;N)} generated by this central element (reducing the dimension of the Lie model by one in the process) and iterates the process until the dimension of the model drops to zero. Reversing the process, this constructs a coset nilprogression inside {A_n^4}. This argument is based on the classic proof of Jordan’s theorem due to Bieberbach and Frobenius, as discussed in this blog post.

One quirk of the argument is that it requires one to work in the category of local groups rather than global groups. (This is somewhat analogous to how, in the standard proofs of Freiman’s theorem, one needs to work with the category of Freiman homomorphisms, rather than group homomorphisms.) The reason for this arises when performing the quotienting step in the Jordan-Bieberbach-Frobenius leg of the argument. The obvious way to perform this step (and the thing that we tried first) would be to quotient out by the entire cyclic group {\langle e \rangle} generated by the element {e} of minimal escape norm. However, it turns out that this doesn’t work too well, because the group quotiented out is so “large” that it can create a lot of torsion in the quotient. In particular, elements which used to have positive escape norm, can now become trapped in the quotient of {A_n}, thus sending their escape norm to zero. This leads to an inferior conclusion (in which a coset nilprogression is replaced by a more complicated tower of alternating extensions between central progressions and finite groups, similar to the towers encountered in my previous paper on this topic). To prevent this unwanted creation of torsion, one has to truncate the cyclic group {\langle e \rangle} before it escapes {A_n}, so that one quotients out by a geometric progression {P(e;N)} rather than the cyclic group. But the operation of quotienting out by a {P(e;N)}, which is a local group rather than a global one, cannot be formalised in the category of global groups, but only in the category of local groups. Because of this, we were forced to carry out the entire argument using the language of local groups. As it turns out, the arguments are ultimately more natural in this setting, although there is an initial investment of notation required, given that global group theory is much more familiar and well-developed than local group theory.

One interesting feature of the argument is that it does not use much of the existing theory of Freiman-type theorems, instead building the coset nilprogression directly from the geometric properties of the approximate group. In particular, our argument gives a new proof of Freiman’s theorem in the abelian case, which largely avoids Fourier analysis (except through the use of the theory of Hilbert’s fifth problem, which uses the Peter-Weyl theorem (or, in the abelian case, Pontryagin duality), which is basically a version of Fourier analysis).

In this set of notes we will be able to finally prove the Gleason-Yamabe theorem from Notes 0, which we restate here:

Theorem 1 (Gleason-Yamabe theorem) Let {G} be a locally compact group. Then, for any open neighbourhood {U} of the identity, there exists an open subgroup {G'} of {G} and a compact normal subgroup {K} of {G'} in {U} such that {G'/K} is isomorphic to a Lie group.

In the next set of notes, we will combine the Gleason-Yamabe theorem with some topological analysis (and in particular, using the invariance of domain theorem) to establish some further control on locally compact groups, and in particular obtaining a solution to Hilbert’s fifth problem.

To prove the Gleason-Yamabe theorem, we will use three major tools developed in previous notes. The first (from Notes 2) is a criterion for Lie structure in terms of a special type of metric, which we will call a Gleason metric:

Definition 2 Let {G} be a topological group. A Gleason metric on {G} is a left-invariant metric {d: G \times G \rightarrow {\bf R}^+} which generates the topology on {G} and obeys the following properties for some constant {C>0}, writing {\|g\|} for {d(g,\hbox{id})}:

  • (Escape property) If {g \in G} and {n \geq 1} is such that {n \|g\| \leq \frac{1}{C}}, then {\|g^n\| \geq \frac{1}{C} n \|g\|}.
  • (Commutator estimate) If {g, h \in G} are such that {\|g\|, \|h\| \leq \frac{1}{C}}, then

    \displaystyle  \|[g,h]\| \leq C \|g\| \|h\|, \ \ \ \ \ (1)

    where {[g,h] := g^{-1}h^{-1}gh} is the commutator of {g} and {h}.

Theorem 3 (Building Lie structure from Gleason metrics) Let {G} be a locally compact group that has a Gleason metric. Then {G} is isomorphic to a Lie group.

The second tool is the existence of a left-invariant Haar measure on any locally compact group; see Theorem 3 from Notes 3. Finally, we will also need the compact case of the Gleason-Yamabe theorem (Theorem 8 from Notes 3), which was proven via the Peter-Weyl theorem:

Theorem 4 (Gleason-Yamabe theorem for compact groups) Let {G} be a compact Hausdorff group, and let {U} be a neighbourhood of the identity. Then there exists a compact normal subgroup {H} of {G} contained in {U} such that {G/H} is isomorphic to a linear group (i.e. a closed subgroup of a general linear group {GL_n({\bf C})}).

To finish the proof of the Gleason-Yamabe theorem, we have to somehow use the available structures on locally compact groups (such as Haar measure) to build good metrics on those groups (or on suitable subgroups or quotient groups). The basic construction is as follows:

Definition 5 (Building metrics out of test functions) Let {G} be a topological group, and let {\psi: G \rightarrow {\bf R}^+} be a bounded non-negative function. Then we define the pseudometric {d_\psi: G \times G \rightarrow {\bf R}^+} by the formula

\displaystyle  d_\psi(g,h) := \sup_{x \in G} |\tau(g) \psi(x) - \tau(h) \psi(x)|

\displaystyle  = \sup_{x \in G} |\psi(g^{-1} x ) - \psi(h^{-1} x)|

and the semi-norm {\| \|_\psi: G \rightarrow {\bf R}^+} by the formula

\displaystyle  \|g\|_\psi := d_\psi(g, \hbox{id}).

Note that one can also write

\displaystyle  \|g\|_\psi = \sup_{x \in G} |\partial_g \psi(x)|

where {\partial_g \psi(x) := \psi(x) - \psi(g^{-1} x)} is the “derivative” of {\psi} in the direction {g}.

Exercise 1 Let the notation and assumptions be as in the above definition. For any {g,h,k \in G}, establish the metric-like properties

  1. (Identity) {d_\psi(g,h) \geq 0}, with equality when {g=h}.
  2. (Symmetry) {d_\psi(g,h) = d_\psi(h,g)}.
  3. (Triangle inequality) {d_\psi(g,k) \leq d_\psi(g,h) + d_\psi(h,k)}.
  4. (Continuity) If {\psi \in C_c(G)}, then the map {d_\psi: G \times G \rightarrow {\bf R}^+} is continuous.
  5. (Boundedness) One has {d_\psi(g,h) \leq \sup_{x \in G} |\psi(x)|}. If {\psi \in C_c(G)} is supported in a set {K}, then equality occurs unless {g^{-1} h \in K K^{-1}}.
  6. (Left-invariance) {d_\psi(g,h) = d_\psi(kg,kh)}. In particular, {d_\psi(g,h) = \| h^{-1} g \|_\psi = \| g^{-1} h \|_\psi}.

In particular, we have the norm-like properties

  1. (Identity) {\|g\|_\psi \geq 0}, with equality when {g=\hbox{id}}.
  2. (Symmetry) {\|g\|_\psi = \|g^{-1}\|_\psi}.
  3. (Triangle inequality) {\|gh\|_\psi \leq \|g\|_\psi + \|h\|_\psi}.
  4. (Continuity) If {\psi \in C_c(G)}, then the map {\|\|_\psi: G \rightarrow {\bf R}^+} is continuous.
  5. (Boundedness) One has {\|g\|_\psi \leq \sup_{x \in G} |\psi(x)|}. If {\psi \in C_c(G)} is supported in a set {K}, then equality occurs unless {g \in K K^{-1}}.

We remark that the first three properties of {d_\psi} in the above exercise ensure that {d_\psi} is indeed a pseudometric.

To get good metrics (such as Gleason metrics) on groups {G}, it thus suffices to obtain test functions {\psi} that obey suitably good “regularity” properties. We will achieve this primarily by means of two tricks. The first trick is to obtain high-regularity test functions by convolving together two low-regularity test functions, taking advantage of the existence of a left-invariant Haar measure {\mu} on {G}. The second trick is to obtain low-regularity test functions by means of a metric-like object on {G}. This latter trick may seem circular, as our whole objective is to get a metric on {G} in the first place, but the key point is that the metric one starts with does not need to have as many “good properties” as the metric one ends up with, thanks to the regularity-improving properties of convolution. As such, one can use a “bootstrap argument” (or induction argument) to create a good metric out of almost nothing. It is this bootstrap miracle which is at the heart of the proof of the Gleason-Yamabe theorem (and hence to the solution of Hilbert’s fifth problem).

The arguments here are based on the nonstandard analysis arguments used to establish Hilbert’s fifth problem by Hirschfeld and by Goldbring (and also some unpublished lecture notes of Goldbring and van den Dries). However, we will not explicitly use any nonstandard analysis in this post.

Read the rest of this entry »

Archives