You are currently browsing the tag archive for the ‘Heisenberg group’ tag.

Earlier this month, Hao Huang (who, incidentally, was a graduate student here at UCLA) gave a remarkably short proof of a long-standing problem in theoretical computer science known as the sensitivity conjecture. See for instance this blog post of Gil Kalai for further discussion and links to many other online discussions of this result. One formulation of the theorem proved is as follows. Define the {n}-dimensional hypercube graph {Q_n} to be the graph with vertex set {({\bf Z}/2{\bf Z})^n}, and with every vertex {v \in ({\bf Z}/2{\bf Z})^n} joined to the {n} vertices {v + e_1,\dots,v+e_n}, where {e_1,\dots,e_n} is the standard basis of {({\bf Z}/2{\bf Z})^n}.

Theorem 1 (Lower bound on maximum degree of induced subgraphs of hypercube) Let {E} be a set of at least {2^{n-1}+1} vertices in {Q_n}. Then there is a vertex in {E} that is adjacent (in {Q_n}) to at least {\sqrt{n}} other vertices in {E}.

The bound {\sqrt{n}} (or more precisely, {\lceil \sqrt{n} \rceil}) is completely sharp, as shown by Chung, Furedi, Graham, and Seymour; we describe this example below the fold. When combined with earlier reductions of Gotsman-Linial and Nisan-Szegedy; we give these below the fold also.

Let {A = (a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}} be the adjacency matrix of {Q_n} (where we index the rows and columns directly by the vertices in {({\bf Z}/2{\bf Z})^n}, rather than selecting some enumeration {1,\dots,2^n}), thus {a_{vw}=1} when {w = v+e_i} for some {i=1,\dots,n}, and {a_{vw}=0} otherwise. The above theorem then asserts that if {E} is a set of at least {2^{n-1}+1} vertices, then the {E \times E} minor {(a_{vw})_{v,w \in E}} of {A} has a row (or column) that contains at least {\sqrt{n}} non-zero entries.

The key step to prove this theorem is the construction of rather curious variant {\tilde A} of the adjacency matrix {A}:

Proposition 2 There exists a {({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n} matrix {\tilde A = (\tilde a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}} which is entrywise dominated by {A} in the sense that

\displaystyle  |\tilde a_{vw}| \leq a_{vw} \hbox{ for all } v,w \in ({\bf Z}/2{\bf Z})^n \ \ \ \ \ (1)

and such that {\tilde A} has {\sqrt{n}} as an eigenvalue with multiplicity {2^{n-1}}.

Assuming this proposition, the proof of Theorem 1 can now be quickly concluded. If we view {\tilde A} as a linear operator on the {2^n}-dimensional space {\ell^2(({\bf Z}/2{\bf Z})^n)} of functions of {({\bf Z}/2{\bf Z})^n}, then by hypothesis this space has a {2^{n-1}}-dimensional subspace {V} on which {\tilde A} acts by multiplication by {\sqrt{n}}. If {E} is a set of at least {2^{n-1}+1} vertices in {Q_n}, then the space {\ell^2(E)} of functions on {E} has codimension at most {2^{n-1}-1} in {\ell^2(({\bf Z}/2{\bf Z})^n)}, and hence intersects {V} non-trivially. Thus the {E \times E} minor {\tilde A_E} of {\tilde A} also has {\sqrt{n}} as an eigenvalue (this can also be derived from the Cauchy interlacing inequalities), and in particular this minor has operator norm at least {\sqrt{n}}. By Schur’s test, this implies that one of the rows or columns of this matrix has absolute values summing to at least {\sqrt{n}}, giving the claim.

Remark 3 The argument actually gives a strengthening of Theorem 1: there exists a vertex {v_0} of {E} with the property that for every natural number {k}, there are at least {n^{k/2}} paths of length {k} in the restriction {Q_n|_E} of {Q_n} to {E} that start from {v_0}. Indeed, if we let {(u_v)_{v \in E}} be an eigenfunction of {\tilde A} on {\ell^2(E)}, and let {v_0} be a vertex in {E} that maximises the value of {|u_{v_0}|}, then for any {k} we have that the {v_0} component of {\tilde A_E^k (u_v)_{v \in E}} is equal to {n^{k/2} |u_{v_0}|}; on the other hand, by the triangle inequality, this component is at most {|u_{v_0}|} times the number of length {k} paths in {Q_n|_E} starting from {v_0}, giving the claim.

This argument can be viewed as an instance of a more general “interlacing method” to try to control the behaviour of a graph {G} on all large subsets {E} by first generating a matrix {\tilde A} on {G} with very good spectral properties, which are then partially inherited by the {E \times E} minor of {\tilde A} by interlacing inequalities. In previous literature using this method (see e.g., this survey of Haemers, or this paper of Wilson), either the original adjacency matrix {A}, or some non-negatively weighted version of that matrix, was used as the controlling matrix {\tilde A}; the novelty here is the use of signed controlling matrices. It will be interesting to see what further variants and applications of this method emerge in the near future. (Thanks to Anurag Bishoi in the comments for these references.)

The “magic” step in the above argument is constructing {\tilde A}. In Huang’s paper, {\tilde A} is constructed recursively in the dimension {n} in a rather simple but mysterious fashion. Very recently, Roman Karasev gave an interpretation of this matrix in terms of the exterior algebra on {{\bf R}^n}. In this post I would like to give an alternate interpretation in terms of the operation of twisted convolution, which originated in the theory of the Heisenberg group in quantum mechanics.

Firstly note that the original adjacency matrix {A}, when viewed as a linear operator on {\ell^2(({\bf Z}/2{\bf Z})^n)}, is a convolution operator

\displaystyle  A f = f * \mu


\displaystyle \mu(x) := \sum_{i=1}^n 1_{x=e_i}

is the counting measure on the standard basis {e_1,\dots,e_n}, and {*} denotes the ordinary convolution operation

\displaystyle  f * g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} f(y) g(x-y) = \sum_{y_1+y_2 = x} f(y_1) g(y_2).

As is well known, this operation is commutative and associative. Thus for instance the square {A^2} of the adjacency operator {A} is also a convolution operator

\displaystyle  A^2 f = f * (\mu * \mu)(x)

where the convolution kernel {\mu * \mu} is moderately complicated:

\displaystyle  \mu*\mu(x) = n \times 1_{x=0} + \sum_{1 \leq i < j \leq n} 2 \times 1_{x = e_i + e_j}.

The factor {2} in this expansion comes from combining the two terms {1_{x=e_i} * 1_{x=e_j}} and {1_{x=e_j} * 1_{x=e_i}}, which both evaluate to {1_{x=e_i+e_j}}.

More generally, given any bilinear form {B: ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n \rightarrow {\bf Z}/2{\bf Z}}, one can define the twisted convolution

\displaystyle  f *_B g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} (-1)^{B(y,x-y)} f(y) g(x-y)

\displaystyle  = \sum_{y_1+y_2=x} (-1)^{B(y_1,y_2)} f(y_1) g(y_2)

of two functions {f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}. This operation is no longer commutative (unless {B} is symmetric). However, it remains associative; indeed, one can easily compute that

\displaystyle  (f *_B g) *_B h(x) = f *_B (g *_B h)(x)

\displaystyle = \sum_{y_1+y_2+y_3=x} (-1)^{B(y_1,y_2)+B(y_1,y_3)+B(y_2,y_3)} f(y_1) g(y_2) h(y_3).

In particular, if we define the twisted convolution operator

\displaystyle  A_B f(x) := f *_B \mu(x)

then the square {A_B^2} is also a twisted convolution operator

\displaystyle  A_B^2 f = f *_B (\mu *_B \mu)

and the twisted convolution kernel {\mu *_B \mu} can be computed as

\displaystyle  \mu *_B \mu(x) = (\sum_{i=1}^n (-1)^{B(e_i,e_i)}) 1_{x=0}

\displaystyle + \sum_{1 \leq i < j \leq n} ((-1)^{B(e_i,e_j)} + (-1)^{B(e_j,e_i)}) 1_{x=e_i+e_j}.

For general bilinear forms {B}, this twisted convolution is just as messy as {\mu * \mu} is. But if we take the specific bilinear form

\displaystyle  B(x,y) := \sum_{1 \leq i < j \leq n} x_i y_j \ \ \ \ \ (2)

then {B(e_i,e_i)=0} for {1 \leq i \leq n} and {B(e_i,e_j)=1, B(e_j,e_i)=0} for {1 \leq i < j \leq n}, and the above twisted convolution simplifies to

\displaystyle  \mu *_B \mu(x) = n 1_{x=0}

and now {A_B^2} is very simple:

\displaystyle  A_B^2 f = n f.

Thus the only eigenvalues of {A_B} are {+\sqrt{n}} and {-\sqrt{n}}. The matrix {A_B} is entrywise dominated by {A} in the sense of (1), and in particular has trace zero; thus the {+\sqrt{n}} and {-\sqrt{n}} eigenvalues must occur with equal multiplicity, so in particular the {+\sqrt{n}} eigenvalue occurs with multiplicity {2^{n-1}} since the matrix has dimensions {2^n \times 2^n}. This establishes Proposition 2.

Remark 4 Twisted convolution {*_B} is actually just a component of ordinary convolution, but not on the original group {({\bf Z}/2{\bf Z})^n}; instead it relates to convolution on a Heisenberg group extension of this group. More specifically, define the Heisenberg group {H} to be the set of pairs {(x, t) \in ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})} with group law

\displaystyle  (x,t) \cdot (y,s) := (x+y, t+s+B(x,y))

and inverse operation

\displaystyle  (x,t)^{-1} = (-x, -t+B(x,x))

(one can dispense with the negative signs here if desired, since we are in characteristic two). Convolution on {H} is defined in the usual manner: one has

\displaystyle  F*G( (x,t) ) := \sum_{(y,s) \in H} F(y,s) G( (y,s)^{-1} (x,t) )

for any {F,G \in \ell^2(H)}. Now if {f \in \ell^2(({\bf Z}/2{\bf Z})^n)} is a function on the original group {({\bf Z}/2{\bf Z})^n}, we can define the lift {\tilde f \in \ell^2(H)} by the formula

\displaystyle  \tilde f(x,t) := (-1)^t f(x)

and then by chasing all the definitions one soon verifies that

\displaystyle  \tilde f * \tilde g = 2 \widetilde{f *_B g}

for any {f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}, thus relating twisted convolution {*_B} to Heisenberg group convolution {*}.

Remark 5 With the twisting by the specific bilinear form {B} given by (2), convolution by {1_{x=e_i}} and {1_{x=e_j}} now anticommute rather than commute. This makes the twisted convolution algebra {(\ell^2(({\bf Z}/2{\bf Z})^n), *_B)} isomorphic to a Clifford algebra {Cl({\bf R}^n,I_n)} (the real or complex algebra generated by formal generators {v_1,\dots,v_n} subject to the relations {(v_iv_j+v_jv_i)/2 = 1_{i=j}} for {i,j=1,\dots,n}) rather than the commutative algebra more familiar to abelian Fourier analysis. This connection to Clifford algebra (also observed independently by Tom Mrowka and by Daniel Matthews) may be linked to the exterior algebra interpretation of the argument in the recent preprint of Karasev mentioned above.

Remark 6 One could replace the form (2) in this argument by any other bilinear form {B'} that obeyed the relations {B'(e_i,e_i)=0} and {B'(e_i,e_j) + B'(e_j,e_i)=1} for {i \neq j}. However, this additional level of generality does not add much; any such {B'} will differ from {B} by an antisymmetric form {C} (so that {C(x,x) = 0} for all {x}, which in characteristic two implied that {C(x,y) = C(y,x)} for all {x,y}), and such forms can always be decomposed as {C(x,y) = C'(x,y) + C'(y,x)}, where {C'(x,y) := \sum_{i<j} C(e_i,e_j) x_i y_j}. As such, the matrices {A_B} and {A_{B'}} are conjugate, with the conjugation operator being the diagonal matrix with entries {(-1)^{C'(x,x)}} at each vertex {x}.

Remark 7 (Added later) This remark combines the two previous remarks. One can view any of the matrices {A_{B'}} in Remark 6 as components of a single canonical matrix {A_{Cl}} that is still of dimensions {({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n}, but takes values in the Clifford algebra {Cl({\bf R}^n,I_n)} from Remark 5; with this “universal algebra” perspective, one no longer needs to make any arbitrary choices of form {B}. More precisely, let {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} denote the vector space of functions {f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)} from the hypercube to the Clifford algebra; as a real vector space, this is a {2^{2n}} dimensional space, isomorphic to the direct sum of {2^n} copies of {\ell^2(({\bf Z}/2{\bf Z})^n)}, as the Clifford algebra is itself {2^n} dimensional. One can then define a canonical Clifford adjacency operator {A_{Cl}} on this space by

\displaystyle  A_{Cl} f(x) := \sum_{i=1}^n f(x+e_i) v_i

where {v_1,\dots,v_n} are the generators of {Cl({\bf R}^n,I_n)}. This operator can either be identified with a Clifford-valued {2^n \times 2^n} matrix or as a real-valued {2^{2n} \times 2^{2n}} matrix. In either case one still has the key algebraic relations {A_{Cl}^2 = n} and {\mathrm{tr} A_{Cl} = 0}, ensuring that when viewed as a real {2^{2n} \times 2^{2n}} matrix, half of the eigenvalues are equal to {+\sqrt{n}} and half equal to {-\sqrt{n}}. One can then use this matrix in place of any of the {A_{B'}} to establish Theorem 1 (noting that Schur’s test continues to work for Clifford-valued matrices because of the norm structure on {Cl({\bf R}^n,I_n)}).

To relate {A_{Cl}} to the real {2^n \times 2^n} matrices {A_{B'}}, first observe that each point {x} in the hypercube {({\bf Z}/2{\bf Z})^n} can be associated with a one-dimensional real subspace {\ell_x} (i.e., a line) in the Clifford algebra {Cl({\bf R}^n,I_n)} by the formula

\displaystyle  \ell_{e_{i_1} + \dots + e_{i_k}} := \mathrm{span}_{\bf R}( v_{i_1} \dots v_{i_k} )

for any {i_1,\dots,i_k \in \{1,\dots,n\}} (note that this definition is well-defined even if the {i_1,\dots,i_k} are out of order or contain repetitions). This can be viewed as a discrete line bundle over the hypercube. Since {\ell_{x+e_i} = \ell_x e_i} for any {i}, we see that the {2^n}-dimensional real linear subspace {V} of {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} of sections of this bundle, that is to say the space of functions {f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)} such that {f(x) \in \ell_x} for all {x \in ({\bf Z}/2{\bf Z})^n}, is an invariant subspace of {A_{Cl}}. (Indeed, using the left-action of the Clifford algebra on {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}, which commutes with {A_{Cl}}, one can naturally identify {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} with {Cl({\bf R}^n,I_n) \otimes V}, with the left action of {Cl({\bf R}^n,I_n)} acting purely on the first factor and {A_{Cl}} acting purely on the second factor.) Any trivialisation of this line bundle lets us interpret the restriction {A_{Cl}|_V} of {A_{Cl}} to {V} as a real {2^n \times 2^n} matrix. In particular, given one of the bilinear forms {B'} from Remark 6, we can identify {V} with {\ell^2(({\bf Z}/2{\bf Z})^n)} by identifying any real function {f \in \ell^2( ({\bf Z}/2{\bf Z})^n)} with the lift {\tilde f \in V} defined by

\displaystyle  \tilde f(e_{i_1} + \dots + e_{i_k}) := (-1)^{\sum_{1 \leq j < j' \leq k} B'(e_{i_j}, e_{i_{j'}})}

\displaystyle f(e_{i_1} + \dots + e_{i_k}) v_{i_1} \dots v_{i_k}

whenever {1 \leq i_1 < \dots < i_k \leq n}. A somewhat tedious computation using the properties of {B'} then eventually gives the intertwining identity

\displaystyle  A_{Cl} \tilde f = \widetilde{A_{B'} f}

and so {A_{B'}} is conjugate to {A_{Cl}|_V}.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “Embedding the Heisenberg group into a bounded dimensional Euclidean space with optimal distortion“, submitted to Revista Matematica Iberoamericana. This paper concerns the extent to which one can accurately embed the metric structure of the Heisenberg group

\displaystyle H := \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix}

into Euclidean space, which we can write as {\{ [x,y,z]: x,y,z \in {\bf R} \}} with the notation

\displaystyle [x,y,z] := \begin{pmatrix} 1 & x & z \\ 0 & 1 & y \\ 0 & 0 & 1 \end{pmatrix}.

Here we give {H} the right-invariant Carnot-Carathéodory metric {d} coming from the right-invariant vector fields

\displaystyle X := \frac{\partial}{\partial x} + y \frac{\partial}{\partial z}; \quad Y := \frac{\partial}{\partial y}

but not from the commutator vector field

\displaystyle Z := [Y,X] = \frac{\partial}{\partial z}.

This gives {H} the geometry of a Carnot group. As observed by Semmes, it follows from the Carnot group differentiation theory of Pansu that there is no bilipschitz map from {(H,d)} to any Euclidean space {{\bf R}^D} or even to {\ell^2}, since such a map must be differentiable almost everywhere in the sense of Carnot groups, which in particular shows that the derivative map annihilate {Z} almost everywhere, which is incompatible with being bilipschitz.

On the other hand, if one snowflakes the Heisenberg group by replacing the metric {d} with {d^{1-\varepsilon}} for some {0 < \varepsilon < 1}, then it follows from the general theory of Assouad on embedding snowflaked metrics of doubling spaces that {(H,d^{1-\varepsilon})} may be embedded in a bilipschitz fashion into {\ell^2}, or even to {{\bf R}^{D_\varepsilon}} for some {D_\varepsilon} depending on {\varepsilon}.

Of course, the distortion of this bilipschitz embedding must degenerate in the limit {\varepsilon \rightarrow 0}. From the work of Austin-Naor-Tessera and Naor-Neiman it follows that {(H,d^{1-\varepsilon})} may be embedded into {\ell^2} with a distortion of {O( \varepsilon^{-1/2} )}, but no better. The Naor-Neiman paper also embeds {(H,d^{1-\varepsilon})} into a finite-dimensional space {{\bf R}^D} with {D} independent of {\varepsilon}, but at the cost of worsening the distortion to {O(\varepsilon^{-1})}. They then posed the question of whether this worsening of the distortion is necessary.

The main result of this paper answers this question in the negative:

Theorem 1 There exists an absolute constant {D} such that {(H,d^{1-\varepsilon})} may be embedded into {{\bf R}^D} in a bilipschitz fashion with distortion {O(\varepsilon^{-1/2})} for any {0 < \varepsilon \leq 1/2}.

To motivate the proof of this theorem, let us first present a bilipschitz map {\Phi: {\bf R} \rightarrow \ell^2} from the snowflaked line {({\bf R},d_{\bf R}^{1-\varepsilon})} (with {d_{\bf R}} being the usual metric on {{\bf R}}) into complex Hilbert space {\ell^2({\bf C})}. The map is given explicitly as a Weierstrass type function

\displaystyle \Phi(x) := \sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(x) - \phi_k(0))

where for each {k}, {\phi_k: {\bf R} \rightarrow \ell^2} is the function

\displaystyle \phi_k(x) := 2^k e^{2\pi i x / 2^k} e_k.

and {(e_k)_{k \in {\bf Z}}} are an orthonormal basis for {\ell^2({\bf C})}. The subtracting of the constant {\phi_k(0)} is purely in order to make the sum convergent as {k \rightarrow \infty}. If {x,y \in {\bf R}} are such that {2^{k_0-2} \leq d_{\bf R}(x,y) \leq 2^{k_0-1}} for some integer {k_0}, one can easily check the bounds

\displaystyle |\phi_k(x) - \phi_k(y)| \lesssim d_{\bf R}(x,y)^{(1-\varepsilon)} \min( 2^{-(1-\varepsilon) (k_0-k)}, 2^{-\varepsilon (k-k_0)} )

with the lower bound

\displaystyle |\phi_{k_0}(x) - \phi_{k_0}(y)| \gtrsim d_{\bf R}(x,y)^{(1-\varepsilon)}

at which point one finds that

\displaystyle d_{\bf R}(x,y)^{1-\varepsilon} \lesssim |\Phi(x) - \Phi(y)| \lesssim \varepsilon^{-1/2} d_{\bf R}(x,y)^{1-\varepsilon}

as desired.

The key here was that each function {\phi_k} oscillated at a different spatial scale {2^k}, and the functions were all orthogonal to each other (so that the upper bound involved a factor of {\varepsilon^{-1/2}} rather than {\varepsilon^{-1}}). One can replicate this example for the Heisenberg group without much difficulty. Indeed, if we let {\Gamma := \{ [a,b,c]: a,b,c \in {\bf Z} \}} be the discrete Heisenberg group, then the nilmanifold {H/\Gamma} is a three-dimensional smooth compact manifold; thus, by the Whitney embedding theorem, it smoothly embeds into {{\bf R}^6}. This gives a smooth immersion {\phi: H \rightarrow {\bf R}^6} which is {\Gamma}-automorphic in the sense that {\phi(p\gamma) = \phi(p)} for all {p \in H} and {\gamma \in \Gamma}. If one then defines {\phi_k: H \rightarrow \ell^2 \otimes {\bf R}^6} to be the function

\displaystyle \phi_k(p) := 2^k \phi( \delta_{2^{-k}}(p) ) \otimes e_k

where {\delta_\lambda: H \rightarrow H} is the scaling map

\displaystyle \delta_\lambda([x,y,z]) := [\lambda x, \lambda y, \lambda^2 z],

then one can repeat the previous arguments to obtain the required bilipschitz bounds

\displaystyle d(p,q)^{1-\varepsilon} \lesssim |\Phi(p) - \Phi(q) \lesssim \varepsilon^{-1/2} d(p,q)^{1-\varepsilon}

for the function

\displaystyle \Phi(p) :=\sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(p) - \phi_k(0)).

To adapt this construction to bounded dimension, the main obstruction was the requirement that the {\phi_k} took values in orthogonal subspaces. But if one works things out carefully, it is enough to require the weaker orthogonality requirement

\displaystyle B( \phi_{k_0}, \sum_{k>k_0} 2^{-\varepsilon(k-k_0)} \phi_k ) = 0

for all {k_0 \in {\bf Z}}, where {B(\phi, \psi): H \rightarrow {\bf R}^2} is the bilinear form

\displaystyle B(\phi,\psi) := (X \phi \cdot X \psi, Y \phi \cdot Y \psi ).

One can then try to construct the {\phi_k: H \rightarrow {\bf R}^D} for bounded dimension {D} by an iterative argument. After some standard reductions, the problem becomes this (roughly speaking): given a smooth, slowly varying function {\psi: H \rightarrow {\bf R}^{D}} whose derivatives obey certain quantitative upper and lower bounds, construct a smooth oscillating function {\phi: H \rightarrow {\bf R}^{D}}, whose derivatives also obey certain quantitative upper and lower bounds, which obey the equation

\displaystyle B(\phi,\psi) = 0. \ \ \ \ \ (1)


We view this as an underdetermined system of differential equations for {\phi} (two equations in {D} unknowns; after some reductions, our {D} can be taken to be the explicit value {36}). The trivial solution {\phi=0} to this equation will be inadmissible for our purposes due to the lower bounds we will require on {\phi} (in order to obtain the quantitative immersion property mentioned previously, as well as for a stronger “freeness” property that is needed to close the iteration). Because this construction will need to be iterated, it will be essential that the regularity control on {\phi} is the same as that on {\psi}; one cannot afford to “lose derivatives” when passing from {\psi} to {\phi}.

This problem has some formal similarities with the isometric embedding problem (discussed for instance in this previous post), which can be viewed as the problem of solving an equation of the form {Q(\phi,\phi) = g}, where {(M,g)} is a Riemannian manifold and {Q} is the bilinear form

\displaystyle Q(\phi,\psi)_{ij} = \partial_i \phi \cdot \partial_j \psi.

The isometric embedding problem also has the key obstacle that naive attempts to solve the equation {Q(\phi,\phi)=g} iteratively can lead to an undesirable “loss of derivatives” that prevents one from iterating indefinitely. This obstacle was famously resolved by the Nash-Moser iteration scheme in which one alternates between perturbatively adjusting an approximate solution to improve the residual error term, and mollifying the resulting perturbation to counteract the loss of derivatives. The current equation (1) differs in some key respects from the isometric embedding equation {Q(\phi,\phi)=g}, in particular being linear in the unknown field {\phi} rather than quadratic; nevertheless the key obstacle is the same, namely that naive attempts to solve either equation lose derivatives. Our approach to solving (1) was inspired by the Nash-Moser scheme; in retrospect, I also found similarities with Uchiyama’s constructive proof of the Fefferman-Stein decomposition theorem, discussed in this previous post (and in this recent one).

To motivate this iteration, we first express {B(\phi,\psi)} using the product rule in a form that does not place derivatives directly on the unknown {\phi}:

\displaystyle B(\phi,\psi) = \left( W(\phi \cdot W \psi) - \phi \cdot WW \psi\right)_{W = X,Y} \ \ \ \ \ (2)


This reveals that one can construct solutions {\phi} to (1) by solving the system of equations

\displaystyle \phi \cdot W \psi = \phi \cdot WW \psi = 0 \ \ \ \ \ (3)


for {W \in \{X, Y \}}. Because this system is zeroth order in {\phi}, this can easily be done by linear algebra (even in the presence of a forcing term {B(\phi,\psi)=F}) if one imposes a “freeness” condition (analogous to the notion of a free embedding in the isometric embedding problem) that {X \psi(p), Y \psi(p), XX \psi(p), YY \psi(p)} are linearly independent at each point {p}, which (together with some other technical conditions of a similar nature) one then adds to the list of upper and lower bounds required on {\psi} (with a related bound then imposed on {\phi}, in order to close the iteration). However, as mentioned previously, there is a “loss of derivatives” problem with this construction: due to the presence of the differential operators {W} in (3), a solution {\phi} constructed by this method can only be expected to have two degrees less regularity than {\psi} at best, which makes this construction unsuitable for iteration.

To get around this obstacle (which also prominently appears when solving (linearisations of) the isometric embedding equation {Q(\phi,\phi)=g}), we instead first construct a smooth, low-frequency solution {\phi_{\leq N_0} \colon H \rightarrow {\bf R}^{D}} to a low-frequency equation

\displaystyle B( \phi_{\leq N_0}, P_{\leq N_0} \psi ) = 0 \ \ \ \ \ (4)


where {P_{\leq N_0} \psi} is a mollification of {\psi} (of Littlewood-Paley type) applied at a small spatial scale {1/N_0} for some {N_0}, and then gradually relax the frequency cutoff {P_{\leq N_0}} to deform this low frequency solution {\phi_{\leq N_0}} to a solution {\phi} of the actual equation (1).

We will construct the low-frequency solution {\phi_{\leq N_0}} rather explicitly, using the Whitney embedding theorem to construct an initial oscillating map {f} into a very low dimensional space {{\bf R}^6}, composing it with a Veronese type embedding into a slightly larger dimensional space {{\bf R}^{27}} to obtain a required “freeness” property, and then composing further with a slowly varying isometry {U(p) \colon {\bf R}^{27} \rightarrow {\bf R}^{36}} depending on {P_{\leq N_0}} and constructed by a quantitative topological lemma (relying ultimately on the vanishing of the first few homotopy groups of high-dimensional spheres), in order to obtain the required orthogonality (4). (This sort of “quantitative null-homotopy” was first proposed by Gromov, with some recent progress on optimal bounds by Chambers-Manin-Weinberger and by Chambers-Dotterer-Manin-Weinberger, but we will not need these more advanced results here, as one can rely on the classical qualitative vanishing {\pi^k(S^d)=0} for {k < d} together with a compactness argument to obtain (ineffective) quantitative bounds, which suffice for this application).

To perform the deformation of {\phi_{\leq N_0}} into {\phi}, we must solve what is essentially the linearised equation

\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = 0 \ \ \ \ \ (5)


of (1) when {\phi}, {\psi} (viewed as low frequency functions) are both being deformed at some rates {\dot \phi, \dot \psi} (which should be viewed as high frequency functions). To avoid losing derivatives, the magnitude of the deformation {\dot \phi} in {\phi} should not be significantly greater than the magnitude of the deformation {\dot \psi} in {\psi}, when measured in the same function space norms.

As before, if one directly solves the difference equation (5) using a naive application of (2) with {B(\phi,\dot \psi)} treated as a forcing term, one will lose at least one derivative of regularity when passing from {\dot \psi} to {\dot \phi}. However, observe that (2) (and the symmetry {B(\phi, \dot \psi) = B(\dot \psi,\phi)}) can be used to obtain the identity

\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = \left( W(\dot \phi \cdot W \psi + \dot \psi \cdot W \phi) - (\dot \phi \cdot WW \psi + \dot \psi \cdot WW \phi)\right)_{W = X,Y} \ \ \ \ \ (6)


and then one can solve (5) by solving the system of equations

\displaystyle \dot \phi \cdot W \psi = - \dot \psi \cdot W \phi

for {W \in \{X,XX,Y,YY\}}. The key point here is that this system is zeroth order in both {\dot \phi} and {\dot \psi}, so one can solve this system without losing any derivatives when passing from {\dot \psi} to {\dot \phi}; compare this situation with that of the superficially similar system

\displaystyle \dot \phi \cdot W \psi = - \phi \cdot W \dot \psi

that one would obtain from naively linearising (3) without exploiting the symmetry of {B}. There is still however one residual “loss of derivatives” problem arising from the presence of a differential operator {W} on the {\phi} term, which prevents one from directly evolving this iteration scheme in time without losing regularity in {\phi}. It is here that we borrow the final key idea of the Nash-Moser scheme, which is to replace {\phi} by a mollified version {P_{\leq N} \phi} of itself (where the projection {P_{\leq N}} depends on the time parameter). This creates an error term in (5), but it turns out that this error term is quite small and smooth (being a “high-high paraproduct” of {\nabla \phi} and {\nabla\psi}, it ends up being far more regular than either {\phi} or {\psi}, even with the presence of the derivatives) and can be iterated away provided that the initial frequency cutoff {N_0} is large and the function {\psi} has a fairly high (but finite) amount of regularity (we will eventually use the Hölder space {C^{20,\alpha}} on the Heisenberg group to measure this).