You are currently browsing the category archive for the ‘math.MG’ category.

Given three points ${A,B,C}$ in the plane, the distances ${|AB|, |BC|, |AC|}$ between them have to be non-negative and obey the triangle inequalities

$\displaystyle |AB| \leq |BC| + |AC|, |BC| \leq |AC| + |AB|, |AC| \leq |AB| + |BC|$

but are otherwise unconstrained. But if one has four points ${A,B,C,D}$ in the plane, then there is an additional constraint connecting the six distances ${|AB|, |AC|, |AD|, |BC|, |BD|, |CD|}$ between them, coming from the Cayley-Menger determinant:

Proposition 1 (Cayley-Menger determinant) If ${A,B,C,D}$ are four points in the plane, then the Cayley-Menger determinant

$\displaystyle \mathrm{det} \begin{pmatrix} 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & |AB|^2 & |AC|^2 & |AD|^2 \\ 1 & |AB|^2 & 0 & |BC|^2 & |BD|^2 \\ 1 & |AC|^2 & |BC|^2 & 0 & |CD|^2 \\ 1 & |AD|^2 & |BD|^2 & |CD|^2 & 0 \end{pmatrix} \ \ \ \ \ (1)$

vanishes.

Proof: If we view ${A,B,C,D}$ as vectors in ${{\bf R}^2}$, then we have the usual cosine rule ${|AB|^2 = |A|^2 + |B|^2 - 2 A \cdot B}$, and similarly for all the other distances. The ${5 \times 5}$ matrix appearing in (1) can then be written as ${M+M^T-2\tilde G}$, where ${M}$ is the matrix

$\displaystyle M := \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \end{pmatrix}$

and ${\tilde G}$ is the (augmented) Gram matrix

$\displaystyle \tilde G := \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 0 & A \cdot A & A \cdot B & A \cdot C & A \cdot D \\ 0 & B \cdot A & B \cdot B & B \cdot C & B \cdot D \\ 0 & C \cdot A & C \cdot B & C \cdot C & C \cdot D \\ 0 & D \cdot A & D \cdot B & D \cdot C & D \cdot D \end{pmatrix}.$

The matrix ${M}$ is a rank one matrix, and so ${M^T}$ is also. The Gram matrix ${\tilde G}$ factorises as ${\tilde G = \tilde \Sigma \tilde \Sigma^T}$, where ${\tilde \Sigma}$ is the ${5 \times 2}$ matrix with rows ${0,A,B,C,D}$, and thus has rank at most ${2}$. Therefore the matrix ${M+M^T-2\tilde G}$ in (1) has rank at most ${1+1+2=4}$, and hence has determinant zero as claimed. $\Box$

For instance, if we know that ${|AB|=|AC|=|DB|=|DC|=1}$ and ${|BC|=\sqrt{2}}$, then in order for ${A,B,C,D}$ to be coplanar, the remaining distance ${|AD|}$ has to obey the equation

$\displaystyle \mathrm{det} \begin{pmatrix} 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & |AD|^2 \\ 1 & 1 & 0 & 2 & 1 \\ 1 & 1 & 2 & 0 & 1 \\ 1 & |AD|^2 & 1 & 1 & 0 \end{pmatrix} = 0.$

After some calculation the left-hand side simplifies to ${-4 |AD|^4 + 8 |AD|^2}$, so the non-negative quantity is constrained to equal either ${0}$ or ${\sqrt{2}}$. The former happens when ${A,B,C}$ form a unit right-angled triangle with right angle at ${A}$ and ${D=A}$; the latter happens when ${A,B,D,C}$ form the vertices of a unit square traversed in that order. Any other value for ${|AD|}$ is not compatible with the hypothesis for ${A,B,C,D}$ lying on a plane; hence the Cayley-Menger determinant can be used as a test for planarity.

Now suppose that we have four points ${A,B,C,D}$ on a sphere ${S_R}$ of radius ${R}$, with six distances ${|AB|_R, |AC|_R, |AD|_R, |BC|_R, |BD|_R, |AD|_R}$ now measured as lengths of arcs on the sphere. There is a spherical analogue of the Cayley-Menger determinant:

Proposition 2 (Spherical Cayley-Menger determinant) If ${A,B,C,D}$ are four points on a sphere ${S_R}$ of radius ${R}$ in ${{\bf R}^3}$, then the spherical Cayley-Menger determinant

$\displaystyle \mathrm{det} \begin{pmatrix} 1 & \cos \frac{|AB|_R}{R} & \cos \frac{|AC|_R}{R} & \cos \frac{|AD|_R}{R} \\ \cos \frac{|AB|_R}{R} & 1 & \cos \frac{|BC|_R}{R} & \cos \frac{|BD|_R}{R} \\ \cos \frac{|AC|_R}{R} & \cos \frac{|BC|_R}{R} & 1 & \cos \frac{|CD|_R}{R} \\ \cos \frac{|AD|_R}{R} & \cos \frac{|BD|_R}{R} & \cos \frac{|CD|_R}{R} & 1 \end{pmatrix} \ \ \ \ \ (2)$

vanishes.

Proof: We can assume that the sphere ${S_R}$ is centred at the origin of ${{\bf R}^3}$, and view ${A,B,C,D}$ as vectors in ${{\bf R}^3}$ of magnitude ${R}$. The angle subtended by ${AB}$ from the origin is ${|AB|_R/R}$, so by the cosine rule we have

$\displaystyle A \cdot B = R^{2} \cos \frac{|AB|_R}{R}.$

Similarly for all the other inner products. Thus the matrix in (2) can be written as ${R^{-2} G}$, where ${G}$ is the Gram matrix

$\displaystyle G := \begin{pmatrix} A \cdot A & A \cdot B & A \cdot C & A \cdot D \\ B \cdot A & B \cdot B & B \cdot C & B \cdot D \\ C \cdot A & C \cdot B & C \cdot C & C \cdot D \\ D \cdot A & D \cdot B & D \cdot C & D \cdot D \end{pmatrix}.$

We can factor ${G = \Sigma \Sigma^T}$ where ${\Sigma}$ is the ${4 \times 3}$ matrix with rows ${A,B,C,D}$. Thus ${R^{-2} G}$ has rank at most ${3}$ and thus the determinant vanishes as required. $\Box$

Just as the Cayley-Menger determinant can be used to test for coplanarity, the spherical Cayley-Menger determinant can be used to test for lying on a sphere of radius ${R}$. For instance, if we know that ${A,B,C,D}$ lie on ${S_R}$ and ${|AB|_R, |AC|_R, |BC|_R, |BD|_R, |CD|_R}$ are all equal to ${\pi R/2}$, then the above proposition gives

$\displaystyle \mathrm{det} \begin{pmatrix} 1 & 0 & 0 & \cos \frac{|AD|_R}{R} \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \cos \frac{|AD|_R}{R} & 0 & 0 & 1 \end{pmatrix} = 0.$

The left-hand side evaluates to ${1 - \cos^2 \frac{|AD|_R}{R}}$; as ${|AD|_R}$ lies between ${0}$ and ${\pi R}$, the only choices for this distance are then ${0}$ and ${\pi R}$. The former happens for instance when ${A}$ lies on the north pole ${(R,0,0)}$, ${B = (0,R,0), C = (0,R,0)}$ are points on the equator with longitudes differing by 90 degrees, and ${D=(R,0,0)}$ is also equal to the north pole; the latter occurs when ${D=(-R,0,0)}$ is instead placed on the south pole.

The Cayley-Menger and spherical Cayley-Menger determinants look slightly different from each other, but one can transform the latter into something resembling the former by row and column operations. Indeed, the determinant (2) can be rewritten as

$\displaystyle \mathrm{det} \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 - \cos \frac{|AB|_R}{R} & 1 - \cos \frac{|AC|_R}{R} & 1 - \cos \frac{|AD|_R}{R} \\ 1 & 1-\cos \frac{|AB|_R}{R} & 0 & 1-\cos \frac{|BC|_R}{R} & 1- \cos \frac{|BD|_R}{R} \\ 1 & 1-\cos \frac{|AC|_R}{R} & 1-\cos \frac{|BC|_R}{R} & 0 & 1-\cos \frac{|CD|_R}{R} \\ 1 & 1-\cos \frac{|AD|_R}{R} & 1-\cos \frac{|BD|_R}{R} & 1- \cos \frac{|CD|_R}{R} & 0 \end{pmatrix}$

and by further row and column operations, this determinant vanishes if and only if the determinant

$\displaystyle \mathrm{det} \begin{pmatrix} \frac{1}{2R^2} & 1 & 1 & 1 & 1 \\ 1 & 0 & f_R(|AB|_R) & f_R(|AC|_R) & f_R(|AD|_R) \\ 1 & f_R(|AB|_R) & 0 & f_R(|BC|_R) & f_R(|BD|_R) \\ 1 & f_R(|AC|_R) & f_R(|BC|_R) & 0 & f_R(|CD|_R) \\ 1 & f_R(|AD|_R) & f_R(|BD|_R) & f_R(|CD|_R) & 0 \end{pmatrix} \ \ \ \ \ (3)$

vanishes, where ${f_R(x) := 2R^2 (1-\cos \frac{x}{R})}$. In the limit ${R \rightarrow \infty}$ (so that the curvature of the sphere ${S_R}$ tends to zero), ${|AB|_R}$ tends to ${|AB|}$, and by Taylor expansion ${f_R(|AB|_R)}$ tends to ${|AB|^2}$; similarly for the other distances. Now we see that the planar Cayley-Menger determinant emerges as the limit of (3) as ${R \rightarrow \infty}$, as would be expected from the intuition that a plane is essentially a sphere of infinite radius.

In principle, one can now estimate the radius ${R}$ of the Earth (assuming that it is either a sphere ${S_R}$ or a flat plane ${S_\infty}$) if one is given the six distances ${|AB|_R, |AC|_R, |AD|_R, |BC|_R, |BD|_R, |CD|_R}$ between four points ${A,B,C,D}$ on the Earth. Of course, if one wishes to do so, one should have ${A,B,C,D}$ rather far apart from each other, since otherwise it would be difficult to for instance distinguish the round Earth from a flat one. As an experiment, and just for fun, I wanted to see how accurate this would be with some real world data. I decided to take ${A}$, ${B}$, ${C}$, ${D}$ be the cities of London, Los Angeles, Tokyo, and Dubai respectively. As an initial test, I used distances from this online flight calculator, measured in kilometers:

$\displaystyle |AB|_R = 8790$

$\displaystyle |AC|_R = 9597 \mathrm{km}$

$\displaystyle |AD|_R = 5488\mathrm{km}$

$\displaystyle |BC|_R = 8849\mathrm{km}$

$\displaystyle |BD|_R = 13435\mathrm{km}$

$\displaystyle |CD|_R = 7957\mathrm{km}.$

Given that the true radius of the earth was about ${R_0 := 6371 \mathrm{km}}$ kilometers, I chose the change of variables ${R = R_0/k}$ (so that ${k=1}$ corresponds to the round Earth model with the commonly accepted value for the Earth’s radius, and ${k=0}$ corresponds to the flat Earth), and obtained the following plot for (3):

In particular, the determinant does indeed come very close to vanishing when ${k=1}$, which is unsurprising since, as explained on the web site, the online flight calculator uses a model in which the Earth is an ellipsoid of radii close to ${6371}$ km. There is another radius that would also be compatible with this data at ${k\approx 1.3}$ (corresponding to an Earth of radius about ${4900}$ km), but presumably one could rule out this as a spurious coincidence by experimenting with other quadruples of cities than the ones I selected. On the other hand, these distances are highly incompatible with the flat Earth model ${k=0}$; one could also see this with a piece of paper and a ruler by trying to lay down four points ${A,B,C,D}$ on the paper with (an appropriately rescaled) version of the above distances (e.g., with ${|AB| = 8.790 \mathrm{cm}}$, ${|AC| = 9.597 \mathrm{cm}}$, etc.).

If instead one goes to the flight time calculator and uses flight travel times instead of distances, one now gets the following data (measured in hours):

$\displaystyle |AB|_R = 10\mathrm{h}\ 3\mathrm{m}$

$\displaystyle |AC|_R = 11\mathrm{h}\ 49\mathrm{m}$

$\displaystyle |AD|_R = 6\mathrm{h}\ 56\mathrm{m}$

$\displaystyle |BC|_R = 10\mathrm{h}\ 56\mathrm{m}$

$\displaystyle |BD|_R = 16\mathrm{h}\ 23\mathrm{m}$

$\displaystyle |CD|_R = 9\mathrm{h}\ 52\mathrm{m}.$

Assuming that planes travel at about ${800}$ kilometers per hour, the true radius of the Earth should be about ${R_1 := 8\mathrm{h}}$ of flight time. If one then uses the normalisation ${R = R_1/k}$, one obtains the following plot:

Not too surprisingly, this is basically a rescaled version of the previous plot, with vanishing near ${k=1}$ and at ${k=1.3}$. (The website for the flight calculator does say it calculates short and long haul flight times slightly differently, which may be the cause of the slight discrepancies between this figure and the previous one.)

Of course, these two data sets are “cheating” since they come from a model which already presupposes what the radius of the Earth is. But one can input real world flight times between these four cities instead of the above idealised data. Here one runs into the issue that the flight time from ${A}$ to ${B}$ is not necessarily the same as that from ${B}$ to ${A}$ due to such factors as windspeed. For instance, I looked up the online flight time from Tokyo to Dubai to be 11 hours and 10 minutes, whereas the online flight time from Dubai to Tokyo was 9 hours and 50 minutes. The simplest thing to do here is take an arithmetic mean of the two times as a preliminary estimate for the flight time without windspeed factors, thus for instance the Tokyo-Dubai flight time would now be 10 hours and 30 minutes, and more generally

$\displaystyle |AB|_R = 10\mathrm{h}\ 47\mathrm{m}$

$\displaystyle |AC|_R = 12\mathrm{h}\ 0\mathrm{m}$

$\displaystyle |AD|_R = 7\mathrm{h}\ 17\mathrm{m}$

$\displaystyle |BC|_R = 10\mathrm{h}\ 50\mathrm{m}$

$\displaystyle |BD|_R = 15\mathrm{h}\ 55\mathrm{m}$

$\displaystyle |CD|_R = 10\mathrm{h}\ 30\mathrm{m}.$

This data is not too far off from the online calculator data, but it does distort the graph slightly (taking ${R=8/k}$ as before):

Now one gets estimates for the radius of the Earth that are off by about a factor of ${2}$ from the truth, although the ${k=1}$ round Earth model still is twice as accurate as the flat Earth model ${k=0}$.

Given that windspeed should additively affect flight velocity rather than flight time, and the two are inversely proportional to each other, it is more natural to take a harmonic mean rather than an arithmetic mean. This gives the slightly different values

$\displaystyle |AB|_R = 10\mathrm{h}\ 51\mathrm{m}$

$\displaystyle |AC|_R = 11\mathrm{h}\ 59\mathrm{m}$

$\displaystyle |AD|_R = 7\mathrm{h}\ 16\mathrm{m}$

$\displaystyle |BC|_R = 10\mathrm{h}\ 46\mathrm{m}$

$\displaystyle |BD|_R = 15\mathrm{h}\ 54\mathrm{m}$

$\displaystyle |CD|_R = 10\mathrm{h}\ 27\mathrm{m}$

but one still gets essentially the same plot:

So the inaccuracies are presumably coming from some other source. (Note for instance that the true flight time from Tokyo to Dubai is about ${6\%}$ greater than the calculator predicts, while the flight time from LA to Dubai is about ${3\%}$ less; these sorts of errors seem to pile up in this calculation.) Nevertheless, it does seem that flight time data is (barely) enough to establish the roundness of the Earth and obtain a somewhat ballpark estimate for its radius. (I assume that the fit would be better if one could include some Southern Hemisphere cities such as Sydney or Santiago, but I was not able to find a good quadruple of widely spaced cities on both hemispheres for which there were direct flights between all six pairs.)

I’ve just uploaded to the arXiv my paper “Embedding the Heisenberg group into a bounded dimensional Euclidean space with optimal distortion“, submitted to Revista Matematica Iberoamericana. This paper concerns the extent to which one can accurately embed the metric structure of the Heisenberg group

$\displaystyle H := \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix}$

into Euclidean space, which we can write as ${\{ [x,y,z]: x,y,z \in {\bf R} \}}$ with the notation

$\displaystyle [x,y,z] := \begin{pmatrix} 1 & x & z \\ 0 & 1 & y \\ 0 & 0 & 1 \end{pmatrix}.$

Here we give ${H}$ the right-invariant Carnot-Carathéodory metric ${d}$ coming from the right-invariant vector fields

$\displaystyle X := \frac{\partial}{\partial x} + y \frac{\partial}{\partial z}; \quad Y := \frac{\partial}{\partial y}$

but not from the commutator vector field

$\displaystyle Z := [Y,X] = \frac{\partial}{\partial z}.$

This gives ${H}$ the geometry of a Carnot group. As observed by Semmes, it follows from the Carnot group differentiation theory of Pansu that there is no bilipschitz map from ${(H,d)}$ to any Euclidean space ${{\bf R}^D}$ or even to ${\ell^2}$, since such a map must be differentiable almost everywhere in the sense of Carnot groups, which in particular shows that the derivative map annihilate ${Z}$ almost everywhere, which is incompatible with being bilipschitz.

On the other hand, if one snowflakes the Heisenberg group by replacing the metric ${d}$ with ${d^{1-\varepsilon}}$ for some ${0 < \varepsilon < 1}$, then it follows from the general theory of Assouad on embedding snowflaked metrics of doubling spaces that ${(H,d^{1-\varepsilon})}$ may be embedded in a bilipschitz fashion into ${\ell^2}$, or even to ${{\bf R}^{D_\varepsilon}}$ for some ${D_\varepsilon}$ depending on ${\varepsilon}$.

Of course, the distortion of this bilipschitz embedding must degenerate in the limit ${\varepsilon \rightarrow 0}$. From the work of Austin-Naor-Tessera and Naor-Neiman it follows that ${(H,d^{1-\varepsilon})}$ may be embedded into ${\ell^2}$ with a distortion of ${O( \varepsilon^{-1/2} )}$, but no better. The Naor-Neiman paper also embeds ${(H,d^{1-\varepsilon})}$ into a finite-dimensional space ${{\bf R}^D}$ with ${D}$ independent of ${\varepsilon}$, but at the cost of worsening the distortion to ${O(\varepsilon^{-1})}$. They then posed the question of whether this worsening of the distortion is necessary.

The main result of this paper answers this question in the negative:

Theorem 1 There exists an absolute constant ${D}$ such that ${(H,d^{1-\varepsilon})}$ may be embedded into ${{\bf R}^D}$ in a bilipschitz fashion with distortion ${O(\varepsilon^{-1/2})}$ for any ${0 < \varepsilon \leq 1/2}$.

To motivate the proof of this theorem, let us first present a bilipschitz map ${\Phi: {\bf R} \rightarrow \ell^2}$ from the snowflaked line ${({\bf R},d_{\bf R}^{1-\varepsilon})}$ (with ${d_{\bf R}}$ being the usual metric on ${{\bf R}}$) into complex Hilbert space ${\ell^2({\bf C})}$. The map is given explicitly as a Weierstrass type function

$\displaystyle \Phi(x) := \sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(x) - \phi_k(0))$

where for each ${k}$, ${\phi_k: {\bf R} \rightarrow \ell^2}$ is the function

$\displaystyle \phi_k(x) := 2^k e^{2\pi i x / 2^k} e_k.$

and ${(e_k)_{k \in {\bf Z}}}$ are an orthonormal basis for ${\ell^2({\bf C})}$. The subtracting of the constant ${\phi_k(0)}$ is purely in order to make the sum convergent as ${k \rightarrow \infty}$. If ${x,y \in {\bf R}}$ are such that ${2^{k_0-2} \leq d_{\bf R}(x,y) \leq 2^{k_0-1}}$ for some integer ${k_0}$, one can easily check the bounds

$\displaystyle 2^{-\varepsilon k}|\phi_k(x) - \phi_k(y)| \lesssim d_{\bf R}(x,y)^{(1-\varepsilon)} \min( 2^{-(1-\varepsilon) (k_0-k)}, 2^{-\varepsilon (k-k_0)} )$

with the lower bound

$\displaystyle 2^{-\varepsilon k_0}|\phi_{k_0}(x) - \phi_{k_0}(y)| \gtrsim d_{\bf R}(x,y)^{(1-\varepsilon)}$

at which point one finds that

$\displaystyle d_{\bf R}(x,y)^{1-\varepsilon} \lesssim |\Phi(x) - \Phi(y)| \lesssim \varepsilon^{-1/2} d_{\bf R}(x,y)^{1-\varepsilon}$

as desired.

The key here was that each function ${\phi_k}$ oscillated at a different spatial scale ${2^k}$, and the functions were all orthogonal to each other (so that the upper bound involved a factor of ${\varepsilon^{-1/2}}$ rather than ${\varepsilon^{-1}}$). One can replicate this example for the Heisenberg group without much difficulty. Indeed, if we let ${\Gamma := \{ [a,b,c]: a,b,c \in {\bf Z} \}}$ be the discrete Heisenberg group, then the nilmanifold ${H/\Gamma}$ is a three-dimensional smooth compact manifold; thus, by the Whitney embedding theorem, it smoothly embeds into ${{\bf R}^6}$. This gives a smooth immersion ${\phi: H \rightarrow {\bf R}^6}$ which is ${\Gamma}$-automorphic in the sense that ${\phi(p\gamma) = \phi(p)}$ for all ${p \in H}$ and ${\gamma \in \Gamma}$. If one then defines ${\phi_k: H \rightarrow \ell^2 \otimes {\bf R}^6}$ to be the function

$\displaystyle \phi_k(p) := 2^k \phi( \delta_{2^{-k}}(p) ) \otimes e_k$

where ${\delta_\lambda: H \rightarrow H}$ is the scaling map

$\displaystyle \delta_\lambda([x,y,z]) := [\lambda x, \lambda y, \lambda^2 z],$

then one can repeat the previous arguments to obtain the required bilipschitz bounds

$\displaystyle d(p,q)^{1-\varepsilon} \lesssim |\Phi(p) - \Phi(q) \lesssim \varepsilon^{-1/2} d(p,q)^{1-\varepsilon}$

for the function

$\displaystyle \Phi(p) :=\sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(p) - \phi_k(0)).$

To adapt this construction to bounded dimension, the main obstruction was the requirement that the ${\phi_k}$ took values in orthogonal subspaces. But if one works things out carefully, it is enough to require the weaker orthogonality requirement

$\displaystyle B( \phi_{k_0}, \sum_{k>k_0} 2^{-\varepsilon(k-k_0)} \phi_k ) = 0$

for all ${k_0 \in {\bf Z}}$, where ${B(\phi, \psi): H \rightarrow {\bf R}^2}$ is the bilinear form

$\displaystyle B(\phi,\psi) := (X \phi \cdot X \psi, Y \phi \cdot Y \psi ).$

One can then try to construct the ${\phi_k: H \rightarrow {\bf R}^D}$ for bounded dimension ${D}$ by an iterative argument. After some standard reductions, the problem becomes this (roughly speaking): given a smooth, slowly varying function ${\psi: H \rightarrow {\bf R}^{D}}$ whose derivatives obey certain quantitative upper and lower bounds, construct a smooth oscillating function ${\phi: H \rightarrow {\bf R}^{D}}$, whose derivatives also obey certain quantitative upper and lower bounds, which obey the equation

$\displaystyle B(\phi,\psi) = 0. \ \ \ \ \ (1)$

We view this as an underdetermined system of differential equations for ${\phi}$ (two equations in ${D}$ unknowns; after some reductions, our ${D}$ can be taken to be the explicit value ${36}$). The trivial solution ${\phi=0}$ to this equation will be inadmissible for our purposes due to the lower bounds we will require on ${\phi}$ (in order to obtain the quantitative immersion property mentioned previously, as well as for a stronger “freeness” property that is needed to close the iteration). Because this construction will need to be iterated, it will be essential that the regularity control on ${\phi}$ is the same as that on ${\psi}$; one cannot afford to “lose derivatives” when passing from ${\psi}$ to ${\phi}$.

This problem has some formal similarities with the isometric embedding problem (discussed for instance in this previous post), which can be viewed as the problem of solving an equation of the form ${Q(\phi,\phi) = g}$, where ${(M,g)}$ is a Riemannian manifold and ${Q}$ is the bilinear form

$\displaystyle Q(\phi,\psi)_{ij} = \partial_i \phi \cdot \partial_j \psi.$

The isometric embedding problem also has the key obstacle that naive attempts to solve the equation ${Q(\phi,\phi)=g}$ iteratively can lead to an undesirable “loss of derivatives” that prevents one from iterating indefinitely. This obstacle was famously resolved by the Nash-Moser iteration scheme in which one alternates between perturbatively adjusting an approximate solution to improve the residual error term, and mollifying the resulting perturbation to counteract the loss of derivatives. The current equation (1) differs in some key respects from the isometric embedding equation ${Q(\phi,\phi)=g}$, in particular being linear in the unknown field ${\phi}$ rather than quadratic; nevertheless the key obstacle is the same, namely that naive attempts to solve either equation lose derivatives. Our approach to solving (1) was inspired by the Nash-Moser scheme; in retrospect, I also found similarities with Uchiyama’s constructive proof of the Fefferman-Stein decomposition theorem, discussed in this previous post (and in this recent one).

To motivate this iteration, we first express ${B(\phi,\psi)}$ using the product rule in a form that does not place derivatives directly on the unknown ${\phi}$:

$\displaystyle B(\phi,\psi) = \left( W(\phi \cdot W \psi) - \phi \cdot WW \psi\right)_{W = X,Y} \ \ \ \ \ (2)$

This reveals that one can construct solutions ${\phi}$ to (1) by solving the system of equations

$\displaystyle \phi \cdot W \psi = \phi \cdot WW \psi = 0 \ \ \ \ \ (3)$

for ${W \in \{X, Y \}}$. Because this system is zeroth order in ${\phi}$, this can easily be done by linear algebra (even in the presence of a forcing term ${B(\phi,\psi)=F}$) if one imposes a “freeness” condition (analogous to the notion of a free embedding in the isometric embedding problem) that ${X \psi(p), Y \psi(p), XX \psi(p), YY \psi(p)}$ are linearly independent at each point ${p}$, which (together with some other technical conditions of a similar nature) one then adds to the list of upper and lower bounds required on ${\psi}$ (with a related bound then imposed on ${\phi}$, in order to close the iteration). However, as mentioned previously, there is a “loss of derivatives” problem with this construction: due to the presence of the differential operators ${W}$ in (3), a solution ${\phi}$ constructed by this method can only be expected to have two degrees less regularity than ${\psi}$ at best, which makes this construction unsuitable for iteration.

To get around this obstacle (which also prominently appears when solving (linearisations of) the isometric embedding equation ${Q(\phi,\phi)=g}$), we instead first construct a smooth, low-frequency solution ${\phi_{\leq N_0} \colon H \rightarrow {\bf R}^{D}}$ to a low-frequency equation

$\displaystyle B( \phi_{\leq N_0}, P_{\leq N_0} \psi ) = 0 \ \ \ \ \ (4)$

where ${P_{\leq N_0} \psi}$ is a mollification of ${\psi}$ (of Littlewood-Paley type) applied at a small spatial scale ${1/N_0}$ for some ${N_0}$, and then gradually relax the frequency cutoff ${P_{\leq N_0}}$ to deform this low frequency solution ${\phi_{\leq N_0}}$ to a solution ${\phi}$ of the actual equation (1).

We will construct the low-frequency solution ${\phi_{\leq N_0}}$ rather explicitly, using the Whitney embedding theorem to construct an initial oscillating map ${f}$ into a very low dimensional space ${{\bf R}^6}$, composing it with a Veronese type embedding into a slightly larger dimensional space ${{\bf R}^{27}}$ to obtain a required “freeness” property, and then composing further with a slowly varying isometry ${U(p) \colon {\bf R}^{27} \rightarrow {\bf R}^{36}}$ depending on ${P_{\leq N_0}}$ and constructed by a quantitative topological lemma (relying ultimately on the vanishing of the first few homotopy groups of high-dimensional spheres), in order to obtain the required orthogonality (4). (This sort of “quantitative null-homotopy” was first proposed by Gromov, with some recent progress on optimal bounds by Chambers-Manin-Weinberger and by Chambers-Dotterer-Manin-Weinberger, but we will not need these more advanced results here, as one can rely on the classical qualitative vanishing ${\pi^k(S^d)=0}$ for ${k < d}$ together with a compactness argument to obtain (ineffective) quantitative bounds, which suffice for this application).

To perform the deformation of ${\phi_{\leq N_0}}$ into ${\phi}$, we must solve what is essentially the linearised equation

$\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = 0 \ \ \ \ \ (5)$

of (1) when ${\phi}$, ${\psi}$ (viewed as low frequency functions) are both being deformed at some rates ${\dot \phi, \dot \psi}$ (which should be viewed as high frequency functions). To avoid losing derivatives, the magnitude of the deformation ${\dot \phi}$ in ${\phi}$ should not be significantly greater than the magnitude of the deformation ${\dot \psi}$ in ${\psi}$, when measured in the same function space norms.

As before, if one directly solves the difference equation (5) using a naive application of (2) with ${B(\phi,\dot \psi)}$ treated as a forcing term, one will lose at least one derivative of regularity when passing from ${\dot \psi}$ to ${\dot \phi}$. However, observe that (2) (and the symmetry ${B(\phi, \dot \psi) = B(\dot \psi,\phi)}$) can be used to obtain the identity

$\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = \left( W(\dot \phi \cdot W \psi + \dot \psi \cdot W \phi) - (\dot \phi \cdot WW \psi + \dot \psi \cdot WW \phi)\right)_{W = X,Y} \ \ \ \ \ (6)$

and then one can solve (5) by solving the system of equations

$\displaystyle \dot \phi \cdot W \psi = - \dot \psi \cdot W \phi$

for ${W \in \{X,XX,Y,YY\}}$. The key point here is that this system is zeroth order in both ${\dot \phi}$ and ${\dot \psi}$, so one can solve this system without losing any derivatives when passing from ${\dot \psi}$ to ${\dot \phi}$; compare this situation with that of the superficially similar system

$\displaystyle \dot \phi \cdot W \psi = - \phi \cdot W \dot \psi$

that one would obtain from naively linearising (3) without exploiting the symmetry of ${B}$. There is still however one residual “loss of derivatives” problem arising from the presence of a differential operator ${W}$ on the ${\phi}$ term, which prevents one from directly evolving this iteration scheme in time without losing regularity in ${\phi}$. It is here that we borrow the final key idea of the Nash-Moser scheme, which is to replace ${\phi}$ by a mollified version ${P_{\leq N} \phi}$ of itself (where the projection ${P_{\leq N}}$ depends on the time parameter). This creates an error term in (5), but it turns out that this error term is quite small and smooth (being a “high-high paraproduct” of ${\nabla \phi}$ and ${\nabla\psi}$, it ends up being far more regular than either ${\phi}$ or ${\psi}$, even with the presence of the derivatives) and can be iterated away provided that the initial frequency cutoff ${N_0}$ is large and the function ${\psi}$ has a fairly high (but finite) amount of regularity (we will eventually use the Hölder space ${C^{20,\alpha}}$ on the Heisenberg group to measure this).

Previous set of notes: Notes 1. Next set of notes: Notes 3.
We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map ${f: U \rightarrow V}$ from an open subset ${U}$ of the complex plane to another open set ${V}$ is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of ${f}$ to be nowhere vanishing. We then say that the two open sets ${U,V}$ are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation ${f( z_0 + \Delta z) \approx f(z_0) + f'(z_0) \Delta z + O( |\Delta z|^2)}$ we see that they almost preserve small circles, indeed for ${\varepsilon}$ small the circle ${\{ z: |z-z_0| = \varepsilon\}}$ will approximately map to ${\{ w: |w - f(z_0)| = |f'(z_0)| \varepsilon \}}$.

Theorem 1 (Riemann mapping theorem) Let ${U}$ be a simply connected open subset of ${{\bf C}}$ that is not all of ${{\bf C}}$. Then ${U}$ is conformally equivalent to the unit disk ${D(0,1)}$.

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps ${f: U \rightarrow D(0,1)}$ from ${U}$ to ${D(0,1)}$ that map some fixed point ${z_0 \in U}$ to ${0}$, pick one that maximises the magnitude ${|f'(z_0)|}$ of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If ${f(U)}$ avoids some point in ${D(0,1)}$, one can compose ${f}$ with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase ${|f'(z_0)|}$ without destroying injectivity; see the previous lecture notes for details. The conformal map ${\phi: U \rightarrow D(0,1)}$ is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points ${z_0,z_1}$ in ${U}$, and requiring ${\phi(z_0)}$ to be zero and ${\phi(z_1)}$ to be positive real.
It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).
To make the above discussion more precise we need some notation.

Definition 2 (Circle packing) A (finite) circle packing is a finite collection ${(C_j)_{j \in J}}$ of circles ${C_j = \{ z \in {\bf C}: |z-z_j| = r_j\}}$ in the complex numbers indexed by some finite set ${J}$, whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices ${\{z_j: j \in J \}}$ are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.
The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5 Let ${G}$ be a connected planar graph with ${n \geq 3}$ vertices. Show that the following are equivalent:

• (i) ${G}$ is a maximal planar graph.
• (ii) ${G}$ has ${3n-6}$ edges.
• (iii) Every drawing ${D}$ of ${G}$ divides the plane into faces that have three edges each. (This includes one unbounded face.)
• (iv) At least one drawing ${D}$ of ${G}$ divides the plane into faces that have three edges each.

(Hint: use Euler’s formula ${V-E+F=2}$, where ${F}$ is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture) Let ${U}$ be a simply connected domain, with two distinct points ${z_0,z_1}$. Let ${\phi: U \rightarrow D(0,1)}$ be the conformal map from ${U}$ to ${D(0,1)}$ that maps ${z_0}$ to the origin and ${z_1}$ to a positive real. For any small ${\varepsilon>0}$, let ${{\mathcal C}_\varepsilon}$ be the portion of the regular hexagonal circle packing by circles of radius ${\varepsilon}$ that are contained in ${U}$, and let ${{\mathcal C}'_\varepsilon}$ be an circle packing of ${D(0,1)}$ with all “boundary circles” tangent to ${D(0,1)}$, giving rise to an “approximate map” ${\phi_\varepsilon: U_\varepsilon \rightarrow D(0,1)}$ defined on the subset ${U_\varepsilon}$ of ${U}$ consisting of the circles of ${{\mathcal C}_\varepsilon}$, their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that ${\phi_\varepsilon(z_0)}$ is zero and ${\phi_\varepsilon(z_1)}$ is a positive real. Then ${\phi_\varepsilon}$ converges to ${\phi}$ as ${\varepsilon \rightarrow 0}$.

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation ${\frac{\partial \phi}{\partial \overline{z}} = 0}$, while (sufficiently smooth) quasiconformal maps only obey an inequality ${|\frac{\partial \phi}{\partial \overline{z}}| \leq \frac{K-1}{K+1} |\frac{\partial \phi}{\partial z}|}$. As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map ${\phi_\varepsilon}$ alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).
Read the rest of this entry »

The Polymath14 online collaboration has uploaded to the arXiv its paper “Homogeneous length functions on groups“, submitted to Algebra & Number Theory. The paper completely classifies homogeneous length functions ${\| \|: G \rightarrow {\bf R}^+}$ on an arbitrary group ${G = (G,\cdot,e,()^{-1})}$, that is to say non-negative functions that obey the symmetry condition ${\|x^{-1}\| = \|x\|}$, the non-degeneracy condition ${\|x\|=0 \iff x=e}$, the triangle inequality ${\|xy\| \leq \|x\| + \|y\|}$, and the homogeneity condition ${\|x^2\| = 2\|x\|}$. It turns out that these norms can only arise from pulling back the norm of a Banach space by an isometric embedding of the group. Among other things, this shows that ${G}$ can only support a homogeneous length function if and only if it is abelian and torsion free, thus giving a metric description of this property.

The proof is based on repeated use of the homogeneous length function axioms, combined with elementary identities of commutators, to obtain increasingly good bounds on quantities such as ${\|[x,y]\|}$, until one can show that such norms have to vanish. See the previous post for a full proof. The result is robust in that it allows for some loss in the triangle inequality and homogeneity condition, allowing for some new results on “quasinorms” on groups that relate to quasihomomorphisms.

As there are now a large number of comments on the previous post on this project, this post will also serve as the new thread for any final discussion of this project as it winds down.

In the tradition of “Polymath projects“, the problem posed in the previous two blog posts has now been solved, thanks to the cumulative effect of many small contributions by many participants (including, but not limited to, Sean Eberhard, Tobias Fritz, Siddharta Gadgil, Tobias Hartnick, Chris Jerdonek, Apoorva Khare, Antonio Machiavelo, Pace Nielsen, Andy Putman, Will Sawin, Alexander Shamov, Lior Silberman, and David Speyer). In this post I’ll write down a streamlined resolution, eliding a number of important but ultimately removable partial steps and insights made by the above contributors en route to the solution.

Theorem 1 Let ${G = (G,\cdot)}$ be a group. Suppose one has a “seminorm” function ${\| \|: G \rightarrow [0,+\infty)}$ which obeys the triangle inequality

$\displaystyle \|xy \| \leq \|x\| + \|y\|$

for all ${x,y \in G}$, with equality whenever ${x=y}$. Then the seminorm factors through the abelianisation map ${G \mapsto G/[G,G]}$.

Proof: By the triangle inequality, it suffices to show that ${\| [x,y]\| = 0}$ for all ${x,y \in G}$, where ${[x,y] := xyx^{-1}y^{-1}}$ is the commutator.

We first establish some basic facts. Firstly, by hypothesis we have ${\|x^2\| = 2 \|x\|}$, and hence ${\|x^n \| = n \|x\|}$ whenever ${n}$ is a power of two. On the other hand, by the triangle inequality we have ${\|x^n \| \leq n\|x\|}$ for all positive ${n}$, and hence by the triangle inequality again we also have the matching lower bound, thus

$\displaystyle \|x^n \| = n \|x\|$

for all ${n > 0}$. The claim is also true for ${n=0}$ (apply the preceding bound with ${x=1}$ and ${n=2}$). By replacing ${\|x\|}$ with ${\max(\|x\|, \|x^{-1}\|)}$ if necessary we may now also assume without loss of generality that ${\|x^{-1} \| = \|x\|}$, thus

$\displaystyle \|x^n \| = |n| \|x\| \ \ \ \ \ (1)$

for all integers ${n}$.

Next, for any ${x,y \in G}$, and any natural number ${n}$, we have

$\displaystyle \|yxy^{-1} \| = \frac{1}{n} \| (yxy^{-1})^n \|$

$\displaystyle = \frac{1}{n} \| y x^n y^{-1} \|$

$\displaystyle \leq \frac{1}{n} ( \|y\| + n \|x\| + \|y\|^{-1} )$

so on taking limits as ${n \rightarrow \infty}$ we have ${\|yxy^{-1} \| \leq \|x\|}$. Replacing ${x,y}$ by ${yxy^{-1},y^{-1}}$ gives the matching lower bound, thus we have the conjugation invariance

$\displaystyle \|yxy^{-1} \| = \|x\|. \ \ \ \ \ (2)$

Next, we observe that if ${x,y,z,w}$ are such that ${x}$ is conjugate to both ${wy}$ and ${zw^{-1}}$, then one has the inequality

$\displaystyle \|x\| \leq \frac{1}{2} ( \|y \| + \| z \| ). \ \ \ \ \ (3)$

Indeed, if we write ${x = swys^{-1} = t zw^{-1} t^{-1}}$ for some ${s,t \in G}$, then for any natural number ${n}$ one has

$\displaystyle \|x\| = \frac{1}{2n} \| x^n x^n \|$

$\displaystyle = \frac{1}{2n} \| swy \dots wy s^{-1}t zw^{-1} \dots zw^{-1} t^{-1} \|$

where the ${wy}$ and ${zw^{-1}}$ terms each appear ${n}$ times. From (2) we see that conjugation by ${w}$ does not affect the norm. Using this and the triangle inequality several times, we conclude that

$\displaystyle \|x\| \leq \frac{1}{2n} ( \|s\| + n \|y\| + \| s^{-1} t\| + n \|z\| + \|t^{-1} \| ),$

and the claim (3) follows by sending ${n \rightarrow \infty}$.

The following special case of (3) will be of particular interest. Let ${x,y \in G}$, and for any integers ${m,k}$, define the quantity

$\displaystyle f(m,k) := \| x^m [x,y]^k \|.$

Observe that ${x^m [x,y]^k}$ is conjugate to both ${x (x^{m-1} [x,y]^k)}$ and to ${(y^{-1} x^m [x,y]^{k-1} xy) x^{-1}}$, hence by (3) one has

$\displaystyle \| x^m [x,y]^k \| \leq \frac{1}{2} ( \| x^{m-1} [x,y]^k \| + \| y^{-1} x^{m} [x,y]^{k-1} xy \|)$

which by (2) leads to the recursive inequality

$\displaystyle f(m,k) \leq \frac{1}{2} (f(m-1,k) + f(m+1,k-1)).$

We can write this in probabilistic notation as

$\displaystyle f(m,k) \leq {\bf E} f( (m,k) + X )$

where ${X}$ is a random vector that takes the values ${(-1,0)}$ and ${(1,-1)}$ with probability ${1/2}$ each. Iterating this, we conclude in particular that for any large natural number ${n}$, one has

$\displaystyle f(0,n) \leq {\bf E} f( Z )$

where ${Z := (0,n) + X_1 + \dots + X_{2n}}$ and ${X_1,\dots,X_{2n}}$ are iid copies of ${X}$. We can write ${Z = (1,-1/2) (Y_1 + \dots + Y_{2n})}$ where $Y_1,\dots,Y_{2n} = \pm 1$ are iid signs.  By the triangle inequality, we thus have

$\displaystyle f( Z ) \leq |Y_1+\dots+Y_{2n}| (\|x\| + \frac{1}{2} \| [x,y] \|),$

noting that $Y_1+\dots+Y_{2n}$ is an even integer.  On the other hand, $Y_1+\dots+Y_{2n}$ has mean zero and variance $2n$, hence by Cauchy-Schwarz

$\displaystyle f(0,n) \leq \sqrt{2n}( \|x\| + \frac{1}{2} \| [x,y] \|).$

But by (1), the left-hand side is equal to ${n \| [x,y]\|}$. Dividing by ${n}$ and then sending ${n \rightarrow \infty}$, we obtain the claim. $\Box$

The above theorem reduces such seminorms to abelian groups. It is easy to see from (1) that any torsion element of such groups has zero seminorm, so we can in fact restrict to torsion-free groups, which we now write using additive notation ${G = (G,+)}$, thus for instance ${\| nx \| = |n| \|x\|}$ for ${n \in {\bf Z}}$. We think of ${G}$ as a ${{\bf Z}}$-module. One can then extend the seminorm to the associated ${{\bf Q}}$-vector space ${G \otimes_{\bf Z} {\bf Q}}$ by the formula ${\|\frac{a}{b} x\| := \frac{a}{b} \|x\|}$, and then to the associated ${{\bf R}}$-vector space ${G \otimes_{\bf Z} {\bf R}}$ by continuity, at which point it becomes a genuine seminorm (provided we have ensured the symmetry condition ${\|x\| = \|x^{-1}\|}$). Conversely, any seminorm on ${G \otimes_{\bf Z} {\bf R}}$ induces a seminorm on ${G}$. (These arguments also appear in this paper of Khare and Rajaratnam.)

This post is a continuation of the previous post, which has attracted a large number of comments. I’m recording here some calculations that arose from those comments (particularly those of Pace Nielsen, Lior Silberman, Tobias Fritz, and Apoorva Khare). Please feel free to either continue these calculations or to discuss other approaches to the problem, such as those mentioned in the remaining comments to the previous post.

Let ${F_2}$ be the free group on two generators ${a,b}$, and let ${\| \|: F_2 \rightarrow {\bf R}^+}$ be a quantity obeying the triangle inequality

$\displaystyle \| xy\| \leq \|x \| + \|y\|$

and the linear growth property

$\displaystyle \| x^n \| = |n| \| x\|$

for all ${x,y \in F_2}$ and integers ${n \in {\bf Z}}$; this implies the conjugation invariance

$\displaystyle \| y^{-1} x y \| = \|x\|$

or equivalently

$\displaystyle \| xy \| = \| yx\|$

We consider inequalities of the form

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \alpha \|x\| + \beta \| y\| \ \ \ \ \ (1)$

or

$\displaystyle \| xyx^{-2}y^{-1} \| \leq \gamma \|x\| + \delta \| y\| \ \ \ \ \ (2)$

for various real numbers ${\alpha,\beta,\gamma,\delta}$. For instance, since

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \| xyx^{-1}\| + \|y^{-1} \| = \|y\| + \|y\|$

we have (1) for ${(\alpha,\beta) = (2,0)}$. We also have the following further relations:

Proposition 1

• (i) If (1) holds for ${(\alpha,\beta)}$, then it holds for ${(\beta,\alpha)}$.
• (ii) If (1) holds for ${(\alpha,\beta)}$, then (2) holds for ${(\alpha+1, \frac{\beta}{2})}$.
• (iii) If (2) holds for ${(\gamma,\delta)}$, then (1) holds for ${(\frac{2\gamma}{3}, \frac{2\delta}{3})}$.
• (iv) If (1) holds for ${(\alpha,\beta)}$ and (2) holds for ${(\gamma,\delta)}$, then (1) holds for ${(\frac{2\alpha+1+\gamma}{4}, \frac{\delta+\beta}{4})}$.

Proof: For (i) we simply observe that

$\displaystyle \| xyx^{-1} y^{-1} \| = \| (xyx^{-1} y^{-1})^{-1} \| = \| y^{-1} x^{-1} y x \| = \| y x y^{-1} x^{-1} \|.$

For (ii), we calculate

$\displaystyle \| xyx^{-2}y^{-1} \| = \frac{1}{2}\| (xyx^{-2}y^{-1})^2 \|$

$\displaystyle = \frac{1}{2} \| (xyx^{-2}y^{-1} x) (yx^{-2} y^{-1}) \|$

$\displaystyle \leq \frac{1}{2} (\| xyx^{-2}y^{-1} x\| + \|yx^{-2} y^{-1}\|)$

$\displaystyle \leq \frac{1}{2} ( \| x^2 y x^{-2} y^{-1} \| + 2 \|x\| )$

$\displaystyle \leq \frac{1}{2} ( 2 \alpha \|x\| + \beta \|y\| + 2 \|x\|)$

giving the claim.

For (iii), we calculate

$\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{3} \| (xyx^{-1}y^{-1})^3 \|$

$\displaystyle = \frac{1}{3} \| (xyx) (x^{-2} y^{-1} xy) (xyx)^{-1} (x^2 y x^{-1} y^{-1}) \|$

$\displaystyle \leq \frac{1}{3} ( \| x^{-2} y^{-1} xy\| + \| x^2 y x^{-1} y^{-1}\| )$

$\displaystyle = \frac{1}{3} ( \| xy x^{-2} y^{-1} \| + \|x^{-1} y^{-1} x^2 y \| )$

$\displaystyle \leq \frac{1}{3} ( \gamma \|x\| + \delta \|y\| + \gamma \|x\| + \delta \|y\|)$

giving the claim.

For (iv), we calculate

$\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{4} \| (xyx^{-1}y^{-1})^4 \|$

$\displaystyle = \frac{1}{4} \| (xy) (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) (xy)^{-1} (x^2yx^{-1}y^{-1}) \|$

$\displaystyle \leq \frac{1}{4} ( \| (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) \| + \|x^2yx^{-1}y^{-1}\| )$

$\displaystyle \leq \frac{1}{4} ( \|(y x^{-1} y^{-1}) (xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)$

$\displaystyle \leq \frac{1}{4} ( \|x\| + \|(xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)$

$\displaystyle = \frac{1}{4} ( \|x\| + \|x^{-2} y x^2 y^{-1} \|+ \gamma \|x\| + \delta \|y\|)$

$\displaystyle \leq \frac{1}{4} ( \|x\| + 2\alpha \|x\| + \beta \|y\| + \gamma \|x\| + \delta \|y\|)$

giving the claim. $\Box$

Here is a typical application of the above estimates. If (1) holds for ${(\alpha,\beta)}$, then by part (i) it holds for ${(\beta,\alpha)}$, then by (ii) (2) holds for ${(\beta+1,\frac{\alpha}{2})}$, then by (iv) (1) holds for ${(\frac{3\beta+2}{4}, \frac{3\alpha}{8})}$. The map ${(\alpha,\beta) \mapsto (\frac{3\beta+2}{4}, \frac{3\alpha}{8})}$ has fixed point ${(\alpha,\beta) = (\frac{16}{23}, \frac{6}{23})}$, thus

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \frac{16}{23} \|x\| + \frac{6}{23} \|y\|.$

For instance, if ${\|a\|, \|b\| \leq 1}$, then ${\|aba^{-1}b^{-1} \| \leq 22/23 = 0.95652\dots}$.

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let ${F_2}$ be the free group on two generators ${a,b}$. Does there exist a metric ${d}$ on this group which is

• bi-invariant, thus ${d(xg,yg)=d(gx,gy) = d(x,y)}$ for all ${x,y,g \in F_2}$; and
• linear growth in the sense that ${d(x^n,1) = n d(x,1)}$ for all ${x \in F_2}$ and all natural numbers ${n}$?

By defining the “norm” of an element ${x \in F_2}$ to be ${\| x\| := d(x,1)}$, an equivalent formulation of the problem asks if there exists a non-negative norm function ${\| \|: F_2 \rightarrow {\bf R}^+}$ that obeys the conjugation invariance

$\displaystyle \| gxg^{-1} \| = \|x \| \ \ \ \ \ (1)$

for all ${x,g \in F_2}$, the triangle inequality

$\displaystyle \| xy \| \leq \| x\| + \| y\| \ \ \ \ \ (2)$

for all ${x,y \in F_2}$, and the linear growth

$\displaystyle \| x^n \| = |n| \|x\| \ \ \ \ \ (3)$

for all ${x \in F_2}$ and ${n \in {\bf Z}}$, and such that ${\|x\| > 0}$ for all non-identity ${x \in F_2}$. Indeed, if such a norm exists then one can just take ${d(x,y) := \| x y^{-1} \|}$ to give the desired metric.

One can normalise the norm of the generators to be at most ${1}$, thus

$\displaystyle \| a \|, \| b \| \leq 1.$

This can then be used to upper bound the norm of other words in ${F_2}$. For instance, from (1), (3) one has

$\displaystyle \| aba^{-1} \|, \| b^{-1} a b \|, \| a^{-1} b^{-1} a \|, \| bab^{-1}\| \leq 1.$

A bit less trivially, from (3), (2), (1) one can bound commutators as

$\displaystyle \| aba^{-1} b^{-1} \| = \frac{1}{3} \| (aba^{-1} b^{-1})^3 \|$

$\displaystyle = \frac{1}{3} \| (aba^{-1}) (b^{-1} ab) (a^{-1} b^{-1} a) (b ab^{-1}) \|$

$\displaystyle \leq \frac{4}{3}.$

In a similar spirit one has

$\displaystyle \| aba^{-2} b^{-1} \| = \frac{1}{2} \| (aba^{-2} b^{-1})^2 \|$

$\displaystyle = \frac{1}{2} \| (aba^{-1}) (a^{-1} b^{-1} a) (ba^{-1} b^{-1}) (ba^{-1} b^{-1}) \|$

$\displaystyle \leq 2.$

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm ${\| g\|}$ of a given non-trivial group element ${g}$ to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on ${F_2}$ would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

I’ve just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds“, submitted to Discrete and Continuous Dynamical Systems. This is a variant of my recent paper on the universality of potential well dynamics, but instead of trying to embed dynamical systems into a potential well ${\partial_{tt} u = -\nabla V(u)}$, here we try to embed dynamical systems into the incompressible Euler equations

$\displaystyle \partial_t u + \nabla_u u = - \mathrm{grad}_g p \ \ \ \ \ (1)$

$\displaystyle \mathrm{div}_g u = 0$

on a Riemannian manifold ${(M,g)}$. (One is particularly interested in the case of flat manifolds ${M}$, particularly ${{\bf R}^3}$ or ${({\bf R}/{\bf Z})^3}$, but for the main result of this paper it is essential that one is permitted to consider curved manifolds.) This system, first studied by Ebin and Marsden, is the natural generalisation of the usual incompressible Euler equations to curved space; it can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on ${M}$ (see this previous post for a discussion of this in the flat space case).

The Euler equations can be viewed as a nonlinear equation in which the nonlinearity is a quadratic function of the velocity field ${u}$. It is thus natural to compare the Euler equations with quadratic ODE of the form

$\displaystyle \partial_t y = B(y,y) \ \ \ \ \ (2)$

where ${y: {\bf R} \rightarrow {\bf R}^n}$ is the unknown solution, and ${B: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}^n}$ is a bilinear map, which we may assume without loss of generality to be symmetric. One can ask whether such an ODE may be linearly embedded into the Euler equations on some Riemannian manifold ${(M,g)}$, which means that there is an injective linear map ${U: {\bf R}^n \rightarrow \Gamma(TM)}$ from ${{\bf R}^n}$ to smooth vector fields on ${M}$, as well as a bilinear map ${P: {\bf R}^n \times {\bf R}^n \rightarrow C^\infty(M)}$ to smooth scalar fields on ${M}$, such that the map ${y \mapsto (U(y), P(y,y))}$ takes solutions to (2) to solutions to (1), or equivalently that

$\displaystyle U(B(y,y)) + \nabla_{U(y)} U(y) = - \mathrm{grad}_g P(y,y)$

$\displaystyle \mathrm{div}_g U(y) = 0$

for all ${y \in {\bf R}^n}$.

For simplicity let us restrict ${M}$ to be compact. There is an obvious necessary condition for this embeddability to occur, which comes from energy conservation law for the Euler equations; unpacking everything, this implies that the bilinear form ${B}$ in (2) has to obey a cancellation condition

$\displaystyle \langle B(y,y), y \rangle = 0 \ \ \ \ \ (3)$

for some positive definite inner product ${\langle, \rangle: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}}$ on ${{\bf R}^n}$. The main result of the paper is the converse to this statement: if ${B}$ is a symmetric bilinear form obeying a cancellation condition (3), then it is possible to embed the equations (2) into the Euler equations (1) on some Riemannian manifold ${(M,g)}$; the catch is that this manifold will depend on the form ${B}$ and on the dimension ${n}$ (in fact in the construction I have, ${M}$ is given explicitly as ${SO(n) \times ({\bf R}/{\bf Z})^{n+1}}$, with a funny metric on it that depends on ${B}$).

As a consequence, any finite dimensional portion of the usual “dyadic shell models” used as simplified toy models of the Euler equation, can actually be embedded into a genuine Euler equation, albeit on a high-dimensional and curved manifold. This includes portions of the self-similar “machine” I used in a previous paper to establish finite time blowup for an averaged version of the Navier-Stokes (or Euler) equations. Unfortunately, the result in this paper does not apply to infinite-dimensional ODE, so I cannot yet establish finite time blowup for the Euler equations on a (well-chosen) manifold. It does not seem so far beyond the realm of possibility, though, that this could be done in the relatively near future. In particular, the result here suggests that one could construct something resembling a universal Turing machine within an Euler flow on a manifold, which was one ingredient I would need to engineer such a finite time blowup.

The proof of the main theorem proceeds by an “elimination of variables” strategy that was used in some of my previous papers in this area, though in this particular case the Nash embedding theorem (or variants thereof) are not required. The first step is to lessen the dependence on the metric ${g}$ by partially reformulating the Euler equations (1) in terms of the covelocity ${g \cdot u}$ (which is a ${1}$-form) instead of the velocity ${u}$. Using the freedom to modify the dimension of the underlying manifold ${M}$, one can also decouple the metric ${g}$ from the volume form that is used to obtain the divergence-free condition. At this point the metric can be eliminated, with a certain positive definiteness condition between the velocity and covelocity taking its place. After a substantial amount of trial and error (motivated by some “two-and-a-half-dimensional” reductions of the three-dimensional Euler equations, and also by playing around with a number of variants of the classic “separation of variables” strategy), I eventually found an ansatz for the velocity and covelocity that automatically solved most of the components of the Euler equations (as well as most of the positive definiteness requirements), as long as one could find a number of scalar fields that obeyed a certain nonlinear system of transport equations, and also obeyed a positive definiteness condition. Here I was stuck for a bit because the system I ended up with was overdetermined – more equations than unknowns. After trying a number of special cases I eventually found a solution to the transport system on the sphere, except that the scalar functions sometimes degenerated and so the positive definiteness property I wanted was only obeyed with positive semi-definiteness. I tried for some time to perturb this example into a strictly positive definite solution before eventually working out that this was not possible. Finally I had the brainwave to lift the solution from the sphere to an even more symmetric space, and this quickly led to the final solution of the problem, using the special orthogonal group rather than the sphere as the underlying domain. The solution ended up being rather simple in form, but it is still somewhat miraculous to me that it exists at all; in retrospect, given the overdetermined nature of the problem, relying on a large amount of symmetry to cut down the number of equations was basically the only hope.

Throughout this post we shall always work in the smooth category, thus all manifolds, maps, coordinate charts, and functions are assumed to be smooth unless explicitly stated otherwise.

A (real) manifold ${M}$ can be defined in at least two ways. On one hand, one can define the manifold extrinsically, as a subset of some standard space such as a Euclidean space ${{\bf R}^d}$. On the other hand, one can define the manifold intrinsically, as a topological space equipped with an atlas of coordinate charts. The fundamental embedding theorems show that, under reasonable assumptions, the intrinsic and extrinsic approaches give the same classes of manifolds (up to isomorphism in various categories). For instance, we have the following (special case of) the Whitney embedding theorem:

Theorem 1 (Whitney embedding theorem) Let ${M}$ be a compact manifold. Then there exists an embedding ${u: M \rightarrow {\bf R}^d}$ from ${M}$ to a Euclidean space ${{\bf R}^d}$.

In fact, if ${M}$ is ${n}$-dimensional, one can take ${d}$ to equal ${2n}$, which is often best possible (easy examples include the circle ${{\bf R}/{\bf Z}}$ which embeds into ${{\bf R}^2}$ but not ${{\bf R}^1}$, or the Klein bottle that embeds into ${{\bf R}^4}$ but not ${{\bf R}^3}$). One can also relax the compactness hypothesis on ${M}$ to second countability, but we will not pursue this extension here. We give a “cheap” proof of this theorem below the fold which allows one to take ${d}$ equal to ${2n+1}$.

A significant strengthening of the Whitney embedding theorem is (a special case of) the Nash embedding theorem:

Theorem 2 (Nash embedding theorem) Let ${(M,g)}$ be a compact Riemannian manifold. Then there exists a isometric embedding ${u: M \rightarrow {\bf R}^d}$ from ${M}$ to a Euclidean space ${{\bf R}^d}$.

In order to obtain the isometric embedding, the dimension ${d}$ has to be a bit larger than what is needed for the Whitney embedding theorem; in this article of Gunther the bound

$\displaystyle d = \max( n(n+5)/2, n(n+3)/2 + 5) \ \ \ \ \ (1)$

is attained, which I believe is still the record for large ${n}$. (In the converse direction, one cannot do better than ${d = \frac{n(n+1)}{2}}$, basically because this is the number of degrees of freedom in the Riemannian metric ${g}$.) Nash’s original proof of theorem used what is now known as Nash-Moser inverse function theorem, but a subsequent simplification of Gunther allowed one to proceed using just the ordinary inverse function theorem (in Banach spaces).

I recently had the need to invoke the Nash embedding theorem to establish a blowup result for a nonlinear wave equation, which motivated me to go through the proof of the theorem more carefully. Below the fold I give a proof of the theorem that does not attempt to give an optimal value of ${d}$, but which hopefully isolates the main ideas of the argument (as simplified by Gunther). One advantage of not optimising in ${d}$ is that it allows one to freely exploit the very useful tool of pairing together two maps ${u_1: M \rightarrow {\bf R}^{d_1}}$, ${u_2: M \rightarrow {\bf R}^{d_2}}$ to form a combined map ${(u_1,u_2): M \rightarrow {\bf R}^{d_1+d_2}}$ that can be closer to an embedding or an isometric embedding than the original maps ${u_1,u_2}$. This lets one perform a “divide and conquer” strategy in which one first starts with the simpler problem of constructing some “partial” embeddings of ${M}$ and then pairs them together to form a “better” embedding.

In preparing these notes, I found the articles of Deane Yang and of Siyuan Lu to be helpful.

The wave equation is usually expressed in the form

$\displaystyle \partial_{tt} u - \Delta u = 0$

where ${u \colon {\bf R} \times {\bf R}^d \rightarrow {\bf C}}$ is a function of both time ${t \in {\bf R}}$ and space ${x \in {\bf R}^d}$, with ${\Delta}$ being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain ${{\bf R}^d}$ with some other manifold and replacing the Laplacian ${\Delta}$ with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on ${{\bf R}^d}$. We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

$\displaystyle \int_{{\bf R}^d} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx$

which we can rewrite using integration by parts and the ${L^2}$ inner product ${\langle, \rangle}$ on ${{\bf R}^d}$ as

$\displaystyle \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle \partial_t u(t), \partial_t u(t) \rangle.$

A key feature of the wave equation is finite speed of propagation: if, at time ${t=0}$ (say), the initial position ${u(0)}$ and initial velocity ${\partial_t u(0)}$ are both supported in a ball ${B(x_0,R) := \{ x \in {\bf R}^d: |x-x_0| \leq R \}}$, then at any later time ${t>0}$, the position ${u(t)}$ and velocity ${\partial_t u(t)}$ are supported in the larger ball ${B(x_0,R+t)}$. This can be seen for instance (formally, at least) by inspecting the exterior energy

$\displaystyle \int_{|x-x_0| > R+t} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx$

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time ${t=0}$.

The wave equation is second order in time, but one can turn it into a first order system by working with the pair ${(u(t),v(t))}$ rather than just the single field ${u(t)}$, where ${v(t) := \partial_t u(t)}$ is the velocity field. The system is then

$\displaystyle \partial_t u(t) = v(t)$

$\displaystyle \partial_t v(t) = \Delta u(t)$

and the conserved energy is now

$\displaystyle \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (1)$

Finite speed of propagation then tells us that if ${u(0),v(0)}$ are both supported on ${B(x_0,R)}$, then ${u(t),v(t)}$ are supported on ${B(x_0,R+t)}$ for all ${t>0}$. One also has time reversal symmetry: if ${t \mapsto (u(t),v(t))}$ is a solution, then ${t \mapsto (u(-t), -v(-t))}$ is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times ${t<0}$ using this symmetry.

If one has an eigenfunction

$\displaystyle -\Delta \phi = \lambda^2 \phi$

of the Laplacian, then we have the explicit solutions

$\displaystyle u(t) = e^{\pm it \lambda} \phi$

$\displaystyle v(t) = \pm i \lambda e^{\pm it \lambda} \phi$

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity ${v(0)=0}$, the solution ${u(t)}$ is given via functional calculus by

$\displaystyle u(t) = \cos(t \sqrt{-\Delta}) u(0)$

and the propagator ${\cos(t \sqrt{-\Delta})}$ can be expressed as the average of half-wave operators:

$\displaystyle \cos(t \sqrt{-\Delta}) = \frac{1}{2} ( e^{it\sqrt{-\Delta}} + e^{-it\sqrt{-\Delta}} ).$

One can view ${\cos(t \sqrt{-\Delta} )}$ as a minor of the full wave propagator

$\displaystyle U(t) := \exp \begin{pmatrix} 0 & t \\ t\Delta & 0 \end{pmatrix}$

$\displaystyle = \begin{pmatrix} \cos(t \sqrt{-\Delta}) & \frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}} \\ \sin(t\sqrt{-\Delta}) \sqrt{-\Delta} & \cos(t \sqrt{-\Delta} ) \end{pmatrix}$

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

$\displaystyle \begin{pmatrix} u(t) \\ v(t) \end{pmatrix} = U(t) \begin{pmatrix} u(0) \\ v(0) \end{pmatrix}. \ \ \ \ \ (2)$

Viewing the contraction ${\cos(t\sqrt{-\Delta})}$ as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable ${t}$ now lives on the integers ${{\bf Z}}$ rather than on ${{\bf R}}$, and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

$\displaystyle u(t+1) = P u(t) + v(t) \ \ \ \ \ (3)$

$\displaystyle v(t+1) = P v(t) - (1-P^2) u(t)$

where ${t}$ is now an integer, ${u(t), v(t)}$ take values in some Hilbert space (e.g. ${\ell^2}$ functions on a graph ${G}$), and ${P}$ is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

$\displaystyle u(t+\varepsilon) = P_\varepsilon u(t) + \varepsilon v(t)$

$\displaystyle v(t+\varepsilon) = P_\varepsilon v(t) - \frac{1}{\varepsilon} (1-P_\varepsilon^2) u(t)$

where ${\varepsilon>0}$ is a small parameter (representing the discretised time step), ${t}$ now takes values in the integer multiples ${\varepsilon {\bf Z}}$ of ${\varepsilon}$, and ${P_\varepsilon}$ is the wave propagator operator ${P_\varepsilon := \cos( \varepsilon \sqrt{-\Delta} )}$ or the heat propagator ${P_\varepsilon := \exp( - \varepsilon^2 \Delta/2 )}$ (the two operators are different, but agree to fourth order in ${\varepsilon}$). One can then formally verify that the wave equation emerges from this rescaled system in the limit ${\varepsilon \rightarrow 0}$. (Thus, ${P}$ is not exactly the direct analogue of the Laplacian ${\Delta}$, but can be viewed as something like ${P_\varepsilon = 1 - \frac{\varepsilon^2}{2} \Delta + O( \varepsilon^4 )}$ in the case of small ${\varepsilon}$, or ${P = 1 - \frac{1}{2}\Delta + O(\Delta^2)}$ if we are not rescaling to the small ${\varepsilon}$ case. The operator ${P}$ is sometimes known as the diffusion operator)

Assuming ${P}$ is self-adjoint, solutions to the system (3) formally conserve the energy

$\displaystyle \frac{1}{2} \langle (1-P^2) u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (4)$

This energy is positive semi-definite if ${P}$ is a contraction. We have the same time reversal symmetry as before: if ${t \mapsto (u(t),v(t))}$ solves the system (3), then so does ${t \mapsto (u(-t), -v(-t))}$. If one has an eigenfunction

$\displaystyle P \phi = \cos(\lambda) \phi$

to the operator ${P}$, then one has an explicit solution

$\displaystyle u(t) = e^{\pm it \lambda} \phi$

$\displaystyle v(t) = \pm i \sin(\lambda) e^{\pm it \lambda} \phi$

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field ${v}$ by one unit. Suppose we know that ${P}$ has unit speed in the sense that whenever ${f}$ is supported in a ball ${B(x,R)}$, then ${Pf}$ is supported in the ball ${B(x,R+1)}$. Then an easy induction shows that if ${u(0), v(0)}$ are supported in ${B(x_0,R), B(x_0,R+1)}$ respectively, then ${u(t), v(t)}$ are supported in ${B(x_0,R+t), B(x_0, R+t+1)}$.

The fundamental solution ${U(t) = U^t}$ to the discretised wave equation (3), in the sense of (2), is given by the formula

$\displaystyle U(t) = U^t = \begin{pmatrix} P & 1 \\ P^2-1 & P \end{pmatrix}^t$

$\displaystyle = \begin{pmatrix} T_t(P) & U_{t-1}(P) \\ (P^2-1) U_{t-1}(P) & T_t(P) \end{pmatrix}$

where ${T_t}$ and ${U_t}$ are the Chebyshev polynomials of the first and second kind, thus

$\displaystyle T_t( \cos \theta ) = \cos(t\theta)$

and

$\displaystyle U_t( \cos \theta ) = \frac{\sin((t+1)\theta)}{\sin \theta}.$

In particular, ${P}$ is now a minor of ${U(1) = U}$, and can also be viewed as an average of ${U}$ with its inverse ${U^{-1}}$:

$\displaystyle \begin{pmatrix} P & 0 \\ 0 & P \end{pmatrix} = \frac{1}{2} (U + U^{-1}). \ \ \ \ \ (5)$

As before, ${U}$ is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers ${P^n}$ and ${U^n}$ are discrete analogues of the heat propagators ${e^{t\Delta/2}}$ and wave propagators ${U(t)}$ respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality) Let ${G}$ be a (possibly infinite) regular graph, let ${n \geq 1}$, and let ${x, y}$ be vertices in ${G}$. Then the probability that the simple random walk at ${x}$ lands at ${y}$ at time ${n}$ is at most ${2 \exp( - d(x,y)^2 / 2n )}$, where ${d}$ is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers ${{\bf Z}}$. Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length ${n}$ concentrate on the ball of radius ${O(\sqrt{n})}$ or so centred at the origin of the random walk.

Proof: Let ${P \colon \ell^2(G) \rightarrow \ell^2(G)}$ be the graph Laplacian, thus

$\displaystyle Pf(x) = \frac{1}{D} \sum_{y \sim x} f(y)$

for any ${f \in \ell^2(G)}$, where ${D}$ is the degree of the regular graph and sum is over the ${D}$ vertices ${y}$ that are adjacent to ${x}$. This is a contraction of unit speed, and the probability that the random walk at ${x}$ lands at ${y}$ at time ${n}$ is

$\displaystyle \langle P^n \delta_x, \delta_y \rangle$

where ${\delta_x, \delta_y}$ are the Dirac deltas at ${x,y}$. Using (5), we can rewrite this as

$\displaystyle \langle (\frac{1}{2} (U + U^{-1}))^n \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle$

where we are now using the energy form (4). We can write

$\displaystyle (\frac{1}{2} (U + U^{-1}))^n = {\bf E} U^{S_n}$

where ${S_n}$ is the simple random walk of length ${n}$ on the integers, that is to say ${S_n = \xi_1 + \dots + \xi_n}$ where ${\xi_1,\dots,\xi_n = \pm 1}$ are independent uniform Bernoulli signs. Thus we wish to show that

$\displaystyle {\bf E} \langle U^{S_n} \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle \leq 2 \exp(-d(x,y)^2 / 2n ).$

By finite speed of propagation, the inner product here vanishes if ${|S_n| < d(x,y)}$. For ${|S_n| \geq d(x,y)}$ we can use Cauchy-Schwarz and the unitary nature of ${U}$ to bound the inner product by ${1}$. Thus the left-hand side may be upper bounded by

$\displaystyle {\bf P}( |S_n| \geq d(x,y) )$

and the claim now follows from the Chernoff inequality. $\Box$

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality) Let ${t > 0}$, and let ${f,g \in L^2({\bf R}^d)}$ be supported on compact sets ${F,G}$ respectively. Then

$\displaystyle |\langle e^{t\Delta/2} f, g \rangle| \leq \sqrt{\frac{2t}{\pi d(F,G)^2}} \exp( - d(F,G)^2 / 2t ) \|f\|_{L^2} \|g\|_{L^2}$

where ${d(F,G)}$ is the Euclidean distance between ${F}$ and ${G}$.

Proof: By Fourier inversion one has

$\displaystyle e^{-t\xi^2/2} = \frac{1}{\sqrt{2\pi t}} \int_{\bf R} e^{-s^2/2t} e^{is\xi}\ ds$

$\displaystyle = \sqrt{\frac{2}{\pi t}} \int_0^\infty e^{-s^2/2t} \cos(s \xi )\ ds$

for any real ${\xi}$, and thus

$\displaystyle \langle e^{t\Delta/2} f, g\rangle = \sqrt{\frac{2}{\pi}} \int_0^\infty e^{-s^2/2t} \langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds.$

By finite speed of propagation, the inner product ${\langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds}$ vanishes when ${s < d(F,G)}$; otherwise, we can use Cauchy-Schwarz and the contractive nature of ${\cos(s \sqrt{-\Delta} )}$ to bound this inner product by ${\|f\|_{L^2} \|g\|_{L^2}}$. Thus

$\displaystyle |\langle e^{t\Delta/2} f, g\rangle| \leq \sqrt{\frac{2}{\pi t}} \|f\|_{L^2} \|g\|_{L^2} \int_{d(F,G)}^\infty e^{-s^2/2t}\ ds.$

Bounding ${e^{-s^2/2t}}$ by ${e^{-d(F,G)^2/2t} e^{-d(F,G) (s-d(F,G))/t}}$, we obtain the claim. $\Box$

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than ${{\bf R}^d}$.