This is an addendum to last quarter’s course notes on Hilbert’s fifth problem, which I am in the process of reviewing in order to transcribe them into a book (as was done similarly for several other sets of lecture notes on this blog). When reviewing the zeroth set of notes in particular, I found that I had made a claim (Proposition 11 from those notes) which asserted, roughly speaking, that any sufficiently large nilprogression was an approximate group, and promised to prove it later in the course when we had developed the ability to calculate efficiently in nilpotent groups. As it turned out, I managed finish the course without the need to develop these calculations, and so the proposition remained unproven. In order to rectify this, I will use this post to lay out some of the basic algebra of nilpotent groups, and use it to prove the above proposition, which turns out to be a bit tricky. (In my paper with Breuillard and Green, we avoid the need for this proposition by restricting attention to a special type of nilprogression, which we call a nilprogression in ${C}$-normal form, for which the computations are simpler.)

There are several ways to think about nilpotent groups; for instance one can use the model example of the Heisenberg group

$\displaystyle H(R) :=\begin{pmatrix} 1 & R & R \\ 0 & 1 & R\\ 0 & 0 & 1 \end{pmatrix}$

over an arbitrary ring ${R}$ (which need not be commutative), or more generally any matrix group consisting of unipotent upper triangular matrices, and view a general nilpotent group as being an abstract generalisation of such concrete groups. (In the case of nilpotent Lie groups, at least, this is quite an accurate intuition, thanks to Engel’s theorem.) Or, one can adopt a Lie-theoretic viewpoint and try to think of nilpotent groups as somehow arising from nilpotent Lie algebras; this intuition is rigorous when working with nilpotent Lie groups (at least when the characteristic is large, in order to avoid issues coming from the denominators in the Baker-Campbell-Hausdorff formula), but also retains some conceptual value in the non-Lie setting. In particular, nilpotent groups (particularly finitely generated ones) can be viewed in some sense as “nilpotent Lie groups over ${{\bf Z}}$“, even though Lie theory does not quite work perfectly when the underlying scalars merely form an integral domain instead of a field.

Another point of view, which arises naturally both in analysis and in algebraic geometry, is to view nilpotent groups as modeling “infinitesimal” perturbations of the identity, where the infinitesimals have a certain finite order. For instance, given a (not necessarily commutative) ring ${R}$ without identity (representing all the “small” elements of some larger ring or algebra), we can form the powers ${R^j}$ for ${j=1,2,\ldots}$, defined as the ring generated by ${j}$-fold products ${r_1 \ldots r_j}$ of elements ${r_1,\ldots,r_j}$ in ${R}$; this is an ideal of ${R}$ which represents the elements which are “${j^{th}}$ order” in some sense. If one then formally adjoins an identity ${1}$ onto the ring ${R}$, then for any ${s \geq 1}$, the multiplicative group ${G := 1+R \hbox{ mod } R^{s+1}}$ is a nilpotent group of step at most ${s}$. For instance, if ${R}$ is the ring of strictly upper ${s \times s}$ matrices (over some base ring), then ${R^{s+1}}$ vanishes and ${G}$ becomes the group of unipotent upper triangular matrices over the same ring, thus recovering the previous matrix-based example. In analysis applications, ${R}$ might be a ring of operators which are somehow of “order” ${O(\epsilon)}$ or ${O(\hbar)}$ for some small parameter ${\epsilon}$ or ${\hbar}$, and one wishes to perform Taylor expansions up to order ${O(\epsilon^s)}$ or ${O(\hbar^s)}$, thus discarding (i.e. quotienting out) all errors in ${R^{s+1}}$.

From a dynamical or group-theoretic perspective, one can also view nilpotent groups as towers of central extensions of a trivial group. Finitely generated nilpotent groups can also be profitably viewed as a special type of polycylic group; this is the perspective taken in this previous blog post. Last, but not least, one can view nilpotent groups from a combinatorial group theory perspective, as being words from some set of generators of various “degrees” subject to some commutation relations, with commutators of two low-degree generators being expressed in terms of higher degree objects, and all commutators of a sufficiently high degree vanishing. In particular, generators of a given degree can be moved freely around a word, as long as one is willing to generate commutator errors of higher degree.

With this last perspective, in particular, one can start computing in nilpotent groups by adopting the philosophy that the lowest order terms should be attended to first, without much initial concern for the higher order errors generated in the process of organising the lower order terms. Only after the lower order terms are in place should attention then turn to higher order terms, working successively up the hierarchy of degrees until all terms are dealt with. This turns out to be a relatively straightforward philosophy to implement in many cases (particularly if one is not interested in explicit expressions and constants, being content instead with qualitative expansions of controlled complexity), but the arguments are necessarily recursive in nature and as such can become a bit messy, and require a fair amount of notation to express precisely. So, unfortunately, the arguments here will be somewhat cumbersome and notation-heavy, even if the underlying methods of proof are relatively simple.

— 1. Some elementary group theory —

Let ${g, h}$ be two elements of a group ${G = (G,\cdot)}$. We define the conjugate ${g^h}$ and commutator ${[g,h]}$ by the formulae

$\displaystyle g^h := h^{-1} g h \ \ \ \ \ (1)$

and

$\displaystyle [g,h] := g^{-1} h^{-1} gh. \ \ \ \ \ (2)$

(Note that this convention for ${[g,h]}$ is not universal; for instance, the alternate convention ${[g,h] = ghg^{-1}h^{-1}}$ also appears in the literature. The distinctions between the two conventions however are quite minor; the conventions here are optimised for pulling group elements to the right of a word, whereas other conventions may be slightly better for pulling group elements to the left of a word.)

Conjugation by a fixed element ${h}$ is an automorphism of ${G}$, thus

$\displaystyle (gk)^h = g^h k^h$

and

$\displaystyle (g^h)^{-1} = (g^{-1})^h$

and

$\displaystyle 1^h = 1$

and

$\displaystyle [g,k]^h = [g^h,k^h]$

for all ${g,h,k \in G}$. Conjugation is also an action, thus

$\displaystyle (g^h)^k = g^{hk}$

and

$\displaystyle g^1 = g.$

An automorphism of the form ${g \mapsto g^h}$ is called an inner automorphism.

Conjugation is related to multiplication by the identity

$\displaystyle gh = h g^h,$

thus one can pull ${g}$ to the right of ${h}$ at the cost of twisting (i.e. conjugating) it by ${h}$. Commutation is related to multiplication by the identity

$\displaystyle gh = hg [g,h],$

thus one can pull ${g}$ to the right of ${h}$ at the cost of adding an additional commutator factor ${[g,h]}$ to the right. Finally, commutation is related to conjugation by the identity

$\displaystyle g^h = g [g,h].$

The commutator can be viewed as a nonlinear group-theoretic analogue of the Lie bracket. For instance, in a matrix group ${G = GL_n({\bf C})}$, we observe that the commutator ${[1+\epsilon A, 1+\epsilon B]}$ of two elements ${1+\epsilon A}$ and ${1+\epsilon B}$ close to the identity is of the form ${1 + \epsilon^2 (AB-BA) + O(\epsilon^3)}$, thus linking the group-theoretic commutator to the Lie bracket ${A,B \mapsto AB-BA}$.

Because of this link, we expect the group-theoretic commutator to obey some nonlinear analogues of the basic Liebracket identities, and this is indeed the case. For instance, one easily observes that the commutator is antisymmetric in the sense that

$\displaystyle [h,g] = [g,h]^{-1} \ \ \ \ \ (3)$

and is approximately odd in the sense that

$\displaystyle [g^{-1},h] = ([g,h]^{-1})^{g^{-1}} \ \ \ \ \ (4)$

and

$\displaystyle [g,h^{-1}] = ([g,h]^{-1})^{h^{-1}} \ \ \ \ \ (5)$

for any ${g,h \in G}$. We also have the easily verified approximate bilinearity identities

$\displaystyle [g,hk] = [g,h] [g,k]^h$

and

$\displaystyle [gh,k] = [g,k]^h [h,k]$

for any ${g,h,k \in G}$. Finally, we have the approximate Jacobi identity (better known as the Hall-Witt identity)

$\displaystyle [[g,h^{-1}],k]^h [[h,k^{-1}],g]^k [[k,g^{-1}],h]^g = 1. \ \ \ \ \ (6)$

A subgroup ${H}$ of ${G}$ is said to be normal if it is preserved by all inner automorphisms, thus ${H^g = H}$ for all ${g}$ (writing ${H^g := \{h^g: h \in H \}}$, of course), and characteristic if it is preserved by all automorphisms (not necessarily inner). Thus, all characteristic subgroups are normal, but the converse is not necessarily true. We write ${H \leq G}$ or ${G \geq H}$ if ${H}$ is a subgroup of ${G}$, and ${H \rhd G}$ or ${G \lhd H}$ if ${H}$ is a normal subgroup of ${G}$.

If ${N}$ is a normal subgroup of ${G}$, we write ${g \mapsto g \hbox{ mod } N}$ for the quotient map from ${G}$ to ${G/N}$, thus ${g = h \hbox{ mod } N}$ if ${g \in hN = Nh}$. Given any other subgroup ${H}$ of ${G}$, we write ${H \hbox{ mod } N}$ for the image of ${H}$ under the ${\hbox{ mod } N}$ map, thus

$\displaystyle H \hbox{ mod } N \equiv HN / N = NH / N \equiv H / (H \cap N).$

Given two subgroups ${H,K}$ of a group ${G}$, we define the commutator group ${[H,K]}$ to be the group generated by the commutators ${[h,k]}$ with ${h \in H, k\in K}$. We say that ${H}$ and ${K}$ commute if ${[H,K]}$ is trivial, or equivalently if every element of ${H}$ commutes with every element of ${K}$. Note that if ${H, K}$ are normal, then ${[H,K]}$ is also normal. In this case, one can view ${[H,K]}$ as the smallest subgroup of ${G}$ such that ${H, K}$ commute modulo ${[H,K]}$ (or equivalently, that ${H \hbox{ mod } [H,K]}$ and ${K \hbox{ mod } [H,K]}$ commute). Similarly, if ${H,K}$ are characteristic, then ${[H,K]}$ is also characteristic.

Observe from (3) that

$\displaystyle [H,K] = [K,H]$

for any subgroups ${H, K}$. Also that if ${N}$ is a normal subgroup of ${G}$, then (as ${\hbox{ mod } N}$ is a homomorphism)

$\displaystyle HK \hbox{ mod } N = (H \hbox{ mod } N) (K \hbox{ mod } N)$

and

$\displaystyle [H,K] \hbox{ mod } N = [H \hbox{ mod } N, K \hbox{ mod } N] \ \ \ \ \ (7)$

Exercise 1

• (i) If ${H, K}$ are normal subgroups of ${G}$ generated by subsets ${A \subset H}$, ${B \subset K}$ respectively, show that ${[H,K]}$ is the normal subgroup of ${G}$ generated (as a normal subgroup) by the commutators ${[a,b]}$ with ${a \in A}$, ${b \in B}$.
• (ii) If ${N,H,K}$ are normal subgroups of ${G}$, show that ${[[N,H],K]}$ lies in the normal subgroup generated by ${[[H,K],N]}$ and ${[[K,N],H]}$. (Hint: use the Hall-Witt identity (6).)

Given an arbitrary group ${G}$, we define the lower central series ${G = G_1 \lhd G_2 \ldots}$ of ${G}$ by setting ${G_1 := G}$ and ${G_{i+1} := [G,G_i]}$ for all ${i \geq 1}$. Observe that all of the groups ${G_i}$ in this series are characteristic and thus normal. A group ${G}$ is said to be nilpotent of step at most ${s}$ if ${G_{s+1}}$ is trivial; in particular, by (7), we see that ${G \mod G_{s+1} = G/G_{s+1}}$ is nilpotent of step at most ${s}$ for any group ${G}$. We have a basic inclusion:

Exercise 2 We have ${[G_i,G_j] \subset G_{i+j}}$ for all ${i,j \geq 1}$. (Hint: use Exercise 1.)

The commutator structure can be clarified by passing to the “top order” components of this commutator, as follows. (Strictly speaking, this analysis is not needed to study nilprogressions, but it is still conceptually useful nonetheless.) Consider the subquotients ${V_i := G_i \mod G_{i+1}}$ of the group ${G}$ for ${i=1,2,\ldots}$. As ${G_{i+1}}$ contains ${[G_i,G_i]}$, we see that each group ${V_i}$ is abelian. To emphasise this, we will write ${V_i}$ additively instead of multiplicatively, thus we shall denote the group operation on ${V_i}$ as ${+}$.

Lemma 1 For each ${i,j \geq 1}$, the commutator map ${[,]: G_i \times G_j \rightarrow [G_i,G_j]}$ descends to a map ${[,]_{i,j}: V_i \times V_j \rightarrow V_{i+j}}$, thus

$\displaystyle [g, h] \mod G_{i+j+1} = [g \mod G_{i+1}, h \mod G_{j+1}]_{i,j} \ \ \ \ \ (8)$

for ${g \in G_i, h \in G_j}$.

Proof: We can quotient out by ${G_{i+j+1}}$, and assume that ${G_{i+j+1}}$ is trivial. In particular, by Lemma 2, ${G_{i+1}}$ commutes with ${G_j}$, and ${G_i}$ commutes with ${G_{j+1}}$. As such, it is easy to see that if ${g \in G_i, h \in G_j}$, then ${[g,h]}$ is unchanged if one multiplies ${g}$ (on the left or right) by an element of ${G_{i+1}}$, and similarly for ${h}$ and ${G_{j+1}}$. This defines the map ${[,]_{i,j}}$ as required. $\Box$

We refer to the maps ${[,]_{i,j}}$ as quotiented commutator maps. The identity (8) asserts that these maps ${[,]_{i,j}}$ capture the “top order” nonlinear behaviour of the group ${G}$. As the following exercise shows, these maps behave like (graded components of) a Lie bracket.

Exercise 3 Let ${G}$ be a group with lower central series ${G = G_1 \lhd G_2 \lhd \ldots}$, subquotients ${V_i}$, and quotiented commutator maps ${[,]_{i,j}: V_i \times V_j \rightarrow V_{i+j}}$. Let ${i,j,k \geq 1}$, let ${x,x' \in V_i}$, ${y, y' \in V_j}$, and ${z \in V_k}$. Establish the antisymmetry

$\displaystyle [x,y]_{i,j} = -[y,x]_{j,i},$

the bihomomorphism properties

$\displaystyle [x+x',y]_{i,j} = [x,y]_{i,j} + [x',y]_{i,j}$

$\displaystyle [-x,y]_{i,j} = -[x,y]_{i,j}$

$\displaystyle [0,y]_{i,j} = 0$

and

$\displaystyle [x,y+y']_{i,j} = [x,y]_{i,j} + [x,y']_{i,j}$

$\displaystyle [x,-y]_{i,j} = -[x,y]_{i,j}$

$\displaystyle [x,0]_{i,j} = 0$

and the Jacobi identity

$\displaystyle [[x,y]_{i,j}, z]_{i+j,k} + [[y,z]_{j,k},x]_{j+k,i} + [[z,x]_{k,i},y]_{k+i,j} = 0.$

Remark 1 If one wished, one could combine all of these subquotients ${V_i}$ and quotiented commutator maps ${[,]_{i,j}}$ into a single graded object, namely the additive group ${V := \oplus_i V_i}$ consisting of tuples ${(v_i)_{i =1}^\infty}$ with ${v_i \in V_i}$ (and all but finitely many of the ${v_i}$ trivial), with a bracket ${[,]: V \times V \rightarrow V}$ defined by

$\displaystyle [ (v_i)_{i=1}^\infty, (w_j)_{j=1}^\infty ] := (\sum_{i+j=k} [v_i,w_j]_{i,j})_{k=1}^\infty .$

Then this bracket is an antisymmetric bihomomorphism obeying the Jacobi identity, thus ${V}$ is a Lie algebra over the integers ${{\bf Z}}$. One could view this object as the “top order” or “Carnot” component of the “Lie algebra” of ${G}$. We will however not need this object here.

We can iterate the “bilinear” commutator operations to create “multilinear” operations. Given a non-empty finite set ${{\mathcal A}}$ of formal group elements ${{\bf g}}$, we can define a formal commutator word ${w}$ on ${{\mathcal A}}$ to be a word that can be generated by the following rules:

• If ${{\bf g}}$ is a formal group element, then ${{\bf g}}$ is a formal commutator word on ${\{ {\bf g} \}}$.
• If ${{\mathcal A}, {\mathcal B}}$ are disjoint finite non-empty sets of formal group elements, and ${u, v}$ are formal commutator words on ${{\mathcal A}, {\mathcal B}}$ respectively, then ${[u,v]}$ is a formal commutator word on ${{\mathcal A} \cup {\mathcal B}}$.

We refer to ${|{\mathcal A}|}$ as the length of the formal commutator word. Thus, for instance, ${[[{\bf g}_1, {\bf g}_2], [{\bf g}_3, {\bf g}_4]]}$ is a formal commutator word on ${\{ {\bf g}_1, {\bf g}_2, {\bf g}_3, {\bf g}_4 \}}$ of length ${4}$. Given a formal commutator word ${w}$ on ${{\mathcal A}}$ and an assignment ${g: {\bf g} \mapsto g({\bf g})}$ of group elements ${g({\bf g})\in G}$ to each formal group element ${{\bf g} \in {\mathcal A}}$, we can define the group element ${w(g) \in G}$ by substituting ${g({\bf g})}$ for ${{\bf g}}$ in ${w}$ for each ${{\bf g} \in G}$. This gives a commutator word map ${w: G^{\mathcal A} \rightarrow G}$. Given an assignment ${i: {\bf g} \mapsto i({\bf g})}$ of a positive integer (or “degree”) ${i({\bf g})}$ to each formal group element ${{\mathcal A}}$, Lemma 1 and induction then gives a map ${w_i: \prod_{{\bf g} \in {\mathcal A}} V_{i({\bf g})} \rightarrow V_{\sum_{{\bf g} \in {\mathcal A}} i({\bf g})}}$ which is a multihomomorphism (i.e. a homomorphism in each variable). From (8) one has that

$\displaystyle w( g ) \hbox{ mod } G_{\sum_{{\bf g} \in {\mathcal A}} i({\bf g}) + 1} = w_i( g \hbox{ mod } G_{i+1} )$

where ${g \hbox{ mod } G_{i+1}}$ is shorthand for the assignment ${{\bf g} \mapsto g({\bf g}) \hbox{ mod } G_{i({\bf g}) + 1}}$.

In particular, using the degree map ${i({\bf g}) := 1}$, we obtain a multihomomorphism

$\displaystyle w_1: V_1^l \rightarrow V_l$

for any commutator word ${w}$ of length ${l}$, such that

$\displaystyle w( g ) \hbox{ mod } G_{l + 1} = w_1( g \hbox{ mod } G_2 ).$

From the multihomomorphism nature of ${w_1}$, we conclude in particular that

$\displaystyle w( g_1^{n_1}, \ldots, g_l^{n_l} ) = w(g_1,\ldots,g_l)^{n_1 \ldots n_l} \hbox{ mod } G_{l+1} \ \ \ \ \ (9)$

for any ${g_1,\ldots,g_l \in G}$ and integers ${n_1,\ldots,n_l}$. A variant of this approximate identity will be key in understanding nilprogressions.

One can use commutator words to generate a nilpotent group ${G}$ and its lower central series:

Exercise 4 Let ${G}$ be a nilpotent group that is generated by a set ${S}$ of generators.

• (i) Show that for every positive integer ${i}$, ${G_i}$ is generated by the words ${w(g)}$, where ${w}$ ranges over formal commutator words of length ${l}$ at least ${i}$, and ${g}$ is a collection of ${l}$ generators from ${S}$ (possibly with repetition).
• (ii) Suppose further that ${S}$ is finite (so that there are only finitely many possible choices for ${w(g)}$ for any given length ${l}$, and let ${e_1,\ldots,e_n}$ be all the values of ${w(g)}$ as ${w}$ varies over formal commutator words of length at most ${s}$, and ${g}$ ranges over collections of generators from ${S}$, arranged in non-decreasing length of ${w}$ (this arrangement is not unique, and may contain repeated values). Show that for every positive integer ${i}$, any element in ${G_i}$ can be expressed in the form ${e_j^{a_j} \ldots e_n^{a_n}}$, where ${e_j,\ldots,e_n}$ are the elements of ${e_1,\ldots,e_n}$ arising from words ${w}$ of length at least ${i}$, and ${a_j,\ldots,a_n}$ are integers.

— 2. Nilprogressions —

Recall from Notes 0 that a noncommutative progression ${P(a_1,\ldots,a_r;N_1,\ldots,N_r)}$ in a group ${G}$ with generators ${a_1,\ldots,a_r \in G}$ and dimensions ${N_1,\ldots,N_r > 0}$ is the set of all words with alphabet ${a_1^{\pm 1}, \ldots, a_r^{\pm 1}}$, such that for each ${1 \leq i \leq r}$, the symbols ${a_i, a_i^{-1}}$ are used a total of at most ${N_i}$ times. A nilprogression is a noncommutative progression in a nilpotent group. The objective of this section is to prove Proposition 11 from Notes 0, restated here:

Proposition 2 Suppose that ${a_1,\ldots,a_r \in G}$ generate a nilpotent group of step ${s}$, and suppose that ${N_1,\ldots,N_r}$ are all sufficiently large depending on ${r,s}$. Then ${P := P( a_1,\ldots,a_r;N_1,\ldots,N_r)}$ is an ${O_{r,s}(1)}$-approximate group.

Recall that a subset ${A}$ of a group ${G}$ is a ${K}$-approximate group if it is symmetric, contains the origin, and if ${A^2}$ can be covered by ${K}$ left-translates (or equivalently, right-translates) of ${A}$. The first two properties are clear, so it suffices to show that ${P^2}$ can be covered by ${O_{r,s}(1)}$ right-translates of ${P}$.

We allow all implied constants to depend on ${r, s}$. We pick a small constant ${\epsilon>0}$ (depending on ${r,s}$). It will suffice to show the inclusion

$\displaystyle P( a_1,\ldots,a_r;2N_1,\ldots,2N_r) \subset P( a_1,\ldots,a_r;O(\epsilon N_1 + 1),\ldots,O( \epsilon N_r + 1) ) X$

for some set ${X}$ of cardinality ${|X| \ll_\epsilon 1}$.

We will need some auxiliary objects. For ${i \geq 1}$ and ${t>0}$, let

$\displaystyle Q^t_i( a_1,\ldots,a_r;N_1,\ldots,N_r) := P( a'_1,\ldots,a'_R; tN'_1,\ldots,tN'_R ), \ \ \ \ \ (10)$

where ${a'_1,\ldots,a'_R}$ consists of all group elements of the form ${w(a_{i_1},\ldots,a_{i_l})}$ where ${w}$ is a formal commutator word of length ${l \geq i}$, ${1 \leq i_1,\ldots,i_l \leq r}$, and each ${a'_j = w(a_{i_1},\ldots,a_{i_l})}$ is associated to the dimension ${N'_j := N_{i_1} \ldots N_{i_l}}$ (note that we allow some of the ${a'_i}$ to be equal, or to degenerate to the identity). Thus the ${Q_i(a_1,\ldots,a_r;N_1,\ldots,N_r)}$ are non-increasing in ${i}$ and become trivial for ${i \geq s}$. We will usually abbreviate ${Q^t_i(a_1,\ldots,a_r;N_1,\ldots,N_r)}$ as ${Q^t_i}$. Trivially we also have

$\displaystyle P(a_1,\ldots,a_r;N_1,\ldots,N_r) \subset Q^t_1.$

Observe also that the ${Q^t_i}$ are symmetric, contain the identity, and ${Q^t_i Q^{t'}_i \subset Q^{t+t'}_i}$.

Exercise 5 (Approximate filtration property) If ${g \in Q_i^{O(1)}}$ and ${h \in Q_j(a_1,\ldots,a_r;N_1,\ldots,N_r)}$, show that

$\displaystyle [g,h] \in Q_{i+j}^{O(1)}$

and

$\displaystyle g^h \in Q_i^{O(1)}.$

Next, we need the following variant of (9).

Lemma 3 Let ${w}$ be a formal commutator word of length ${l \geq 1}$, and let ${1 \leq i_1,\ldots,i_l \leq r}$. For each ${1 \leq j \leq r}$, let ${n_j}$ be an integer with ${n_j = O(N_{i_j})}$. Then

$\displaystyle w(a_{i_1}^{n_1},\ldots,a_{i_l}^{n_l}) \in w(a_{i_1},\ldots,a_{i_l})^{n_1 \ldots n_l} Q_{l+1}^{O(1)}.$

Proof: When ${l > s}$, the expressions involving ${w}$ collapse to the identity, and the claim follows, so we may assume that ${1 \leq l \leq s}$. We induct on ${l}$. The claim ${l=1}$ is trivial, so suppose that ${1 < l \leq s}$ and that the claim has been proven for smaller values of ${l}$.

We first establish the key case ${l=2}$. In this case, it suffices to show that

$\displaystyle [a^n, b^m] \in [a,b]^{nm} Q_3^{O(1)}( a,b; |n|, |m| ) \ \ \ \ \ (11)$

for any ${a,b \in G}$ and ${n,m \in {\bf Z}}$. By using commutator identities (4), (5) we may assume that ${n,m}$ are positive. It suffices to show that

$\displaystyle (a^n)^{b^m} \in a^n [a,b]^{nm} Q_3^{O(1)}(a,b; n, m ).$

We can write ${(a^n)^{b^m} = (a^{b^m})^n}$. It is then not difficult to see (and we leave as an exercise to the reader) that the claim will follow if we can show

$\displaystyle (a)^{b^m} \in a [a,b]^{m} Q_3^{O(1)}(a,b; 1, m ),$

or equivalently

$\displaystyle [a,b^m] \in [a,b]^m Q_3^{O(1)}(a,b; 1,m).$

Thus we have effectively reduced the problem to the case ${n=1}$. A similar argument allows us to also obtain the additional reduction ${m=1}$, at which point the claim is trivial. This completes the treatment of the ${l=2}$ case.

Now we handle the general case. After some relabeling, we may write ${w = [u,v]}$, where ${u,v}$ are words of length ${l_1,l_2}$ respectively for some ${l_1,l_2 adding up to ${l}$, with

$\displaystyle w(a_{i_1}^{n_1},\ldots,a_{i_l}^{n_l}) = [u(a_{i_1}^{n_1},\ldots,a_{i_{l_1}}^{n_{l_1}}), v(a_{i_{l_1+1}}^{n_{l_1+1}},\ldots,a_{i_l}^{n_l}) ]$

and similarly

$\displaystyle w(a_{i_1},\ldots,a_{i_l}) = [u(a_{i_1},\ldots,a_{i_{l_1}}, v(a_{i_{l_1+1}},\ldots,a_{i_l}) ].$

By induction hypothesis, one has

$\displaystyle u(a_{i_1}^{n_1},\ldots,a_{i_{l_1}}^{n_{l_1}}) \in u(a_{i_1},\ldots,a_{i_{l_1}})^{n_1 \ldots n_{l_1}} Q_{l_1+1}^{O(1)}.$

In particular we have

$\displaystyle u(a_{i_1}^{n_1},\ldots,a_{i_{l_1}}^{n_{l_1}}) \in Q_{l_1}^{O(1)}.$

Similarly we have

$\displaystyle v(a_{i_{l_1+1}}^{n_{l_1+1}},\ldots,a_{i_l}^{n_l}) \in v(a_{i_{l_1+1}},\ldots,a_{i_l})^{n_{l_i+1}\ldots n_l} Q_{l_2+1}^{O(1)} \subset Q_{l_2}^{O(1)}.$

Using (5) and the approximate homomorphism properties of commutators, we conclude that

$\displaystyle [u(a_{i_1}^{n_1},\ldots,a_{i_{l_1}}^{n_{l_1}}), v(a_{i_{l_1+1}}^{n_{l_1+1}},\ldots,a_{i_l}^{n_l}) ] \in [u(a_{i_1},\ldots,a_{i_{l_1}})^{n_1 \ldots n_{l_1}}, v(a_{i_{l_1+1}},\ldots,a_{i_l})^{n_{l_i+1}\ldots n_l}] Q_{l+1}^{O(1)},$

but from (11) we have

$\displaystyle [u(a_{i_1},\ldots,a_{i_{l_1}})^{n_1 \ldots n_{l_1}}, v(a_{i_{l_1+1}},\ldots,a_{i_l})^{n_{l_i+1}\ldots n_l}] \subset [u(a_{i_1},\ldots,a_{i_{l_1}}), v(a_{i_{l_1+1}},\ldots,a_{i_l})]^{n_1 \ldots n_l} Q_{l+1}^{O(1)}.$

The claim follows.

Exercise 6 For positive integers ${a,b}$, show that

$\displaystyle g^n \in a^n [a,b]^{nm} Q_3^{O(1)}(a,b; n, m )$

whenever

$\displaystyle g \in a [a,b]^{m} Q_3^{O(1)}(a,b; 1, m ).$

Next, we need the following elementary number-theoretic lemma:

Exercise 7 Let ${M_1,\ldots,M_l \geq 1}$, and let ${n}$ be an integer with ${n = O(M_1 \ldots M_l)}$. Show that ${n}$ can be expressed as the sum of ${O_l(1)}$ terms of the form ${n_1 \ldots n_l}$, where the ${n_1,\ldots,n_l}$ are integers with ${n_i = O(M_i)}$ for each ${i=1,\ldots,l}$. (Hint: Despite the superficial similarity here with non-trivial number-theoretic questions such as the Waring problem, this is actually a very elementary fact which can be proven by induction on ${l}$.)

Exercise 8 Let ${w}$ be a formal commutator word of length ${l \geq 1}$, and let ${1 \leq i_1,\ldots,i_l \leq r}$. Let ${n = O(N_{i_1} \ldots N_{i_r})}$. Show that

$\displaystyle w(a_{i_1},\ldots,a_{i_l})^n \in P^{O(1)} Q_{l+1}^{O(1)}.$

Conclude that

$\displaystyle Q_l^C \subset P^{O(1)} Q_{l+1}^{O(1)}$

whenever ${C=O(1)}$, and on iteration conclude that

$\displaystyle Q_1^C \subset P^{O(1)}$

whenever ${C = O(1)}$.

A similar argument shows that ${Q_1^\epsilon \subset P}$ for a sufficiently small ${\epsilon>0}$ depending only on ${r,s}$, assuming that ${N_1,\ldots,N_r}$ are sufficiently large depending on ${r,s}$. Since ${P^2 \subset Q^2_1}$, it thus suffices to show that for any ${\delta>0}$, that ${Q^2_1}$ can be covered by ${O_\delta(1)}$ right-translates of ${Q^{O(\delta)}_1}$. But this then follows by iterating the following exercise:

Exercise 9 Let ${\delta>0}$, ${l \geq 1}$, and ${C = O(1)}$. Show that ${Q^C_l}$ can be covered by ${O_\delta(1)}$ right-translates of ${Q^{O(\delta)}_l \cdot Q^{O(1)}_{l+1}}$. (Hint: this is similar to Exercise 5. Factor an element of ${Q^C_l}$ into words ${w(a_{i_1},\ldots,a_{i_l})}$ of length ${l}$, together with words of higher length. Gather all the words of length ${l}$ into monomials ${w(a_{i_1},\ldots,a_{i_l})^n}$ with ${n = O( N_{i_1} \ldots N_{i_l})}$, times a factor in ${Q^{O(1)}_{l+1}}$. Split these monomials into a monomial with exponent ${O(\delta N_{i_1} \ldots N_{i_l})}$, times a monomial which can take at most ${O_\delta(1)}$ possible values. Then push the latter monomials (and the ${Q^{O(1)}_{l+1}}$ factor) to the right.)

Exercise 10 (Polynomial growth) Suppose that ${a_1,\ldots,a_r \in G}$ generate a nilpotent group of step ${s}$, and suppose that ${N_1,\ldots,N_r \geq 1}$. Show that

$\displaystyle |P(a_1,\ldots,a_r;N_1,\ldots,N_r)| \ll_{r,s} (N_1 \ldots N_r)^{O_{r,s}(1)}.$

Exercise 11 Suppose that ${a_1,\ldots,a_r \in G}$ generate a nilpotent group of step ${s}$, and suppose that ${N_1,\ldots,N_r}$ are sufficiently large depending on ${r,s}$. Write ${P := |P(a_1,\ldots,a_r;N_1,\ldots,N_r)|}$. Let ${a'_1,\ldots,a'_R, N'_1,\ldots,N'_R}$ be as in (10), arranged in non-decreasing order of the length of the associated formal commutator words.

• (i) Show that every element of ${P}$ can be represented in the form

$\displaystyle (a'_1)^{n_1} \ldots (a'_R)^{n_R}$

for some integers ${n_i = O_{r,s}(N'_i)}$. (We do not claim however that this representation is unique, and indeed the generators ${a'_1,\ldots,a'_R}$ are likely to contain quite a lot of redundancy.)

• (ii) Conversely, show that there exists an ${\epsilon>0}$ depending only on ${r,s}$ such that any expression of the form

$\displaystyle (a'_1)^{n_1} \ldots (a'_R)^{n_R}$

with integers ${n_i}$ with ${|n_i| \leq \epsilon N'_i}$, lies in ${P}$.