A celebrated theorem of Gromov reads:

Theorem 1 Every finitely generated group of polynomial growth is virtually nilpotent.

The original proof of Gromov’s theorem was quite non-elementary, using an infinitary limit and exploiting the work surrounding the solution to Hilbert’s fifth problem. More recently, Kleiner provided a proof which was more elementary (based in large part on an earlier paper of Colding and Minicozzi), though still not entirely so, relying in part on (a weak form of the) Tits alternative and also on an ultrafilter argument of Korevaar-Schoen and Mok. I discuss Kleiner’s argument more in this previous blog post.

Recently, Yehuda Shalom and I established a quantitative version of Gromov’s theorem by making every component of Kleiner’s argument finitary. Technically, this provides a fully elementary proof of Gromov’s theorem (we do use one infinitary limit to simplify the argument a little bit, but this is not truly necessary); however, because we were trying to quantify as much of the result as possible, the argument became quite lengthy.

In this note I want to record a short version of the argument of Yehuda and myself which is not quantitative, but gives a self-contained and largely elementary proof of Gromov’s theorem. The argument is not too far from the Kleiner argument, but has a number of simplifications at various places. In a number of places, there was a choice to take between a short argument that was “inefficient” in the sense that it did not lead to a good quantitative bound, and a lengthier argument which led to better quantitative bounds. I have opted for the former in all such cases.

Yehuda and I plan to write a short paper containing this argument as well as some additional material, but due to some interest in this particular proof, we are detailing it here on this blog in advance of our paper.

Note: this post will assume familiarity with the basic terminology of group theory, and will move somewhat quickly through the technical details.

— 1. Overview of argument —

The argument requires four separate ingredients. The first is the existence of non-trivial Lipschitz harmonic functions ${f: G \rightarrow {\bf R}}$:

Theorem 2 (Existence of non-trivial Lipschitz harmonic functions) Let ${G}$ be an infinite group generated by a finite symmetric set ${S}$. Then there exists a non-constant function ${f: G \rightarrow {\bf R}}$ which is harmonic in the sense that

$\displaystyle f(x) = \frac{1}{|S|}\sum_{s \in S} f(xs)$

for all ${x \in G}$, and Lipschitz in the sense that

$\displaystyle |f(x)-f(xs)| \leq C$

for all ${x \in G}$ and ${s \in S}$, and some ${C < \infty}$.

The second is that there are not too many such harmonic functions:

Theorem 3 (Kleiner’s theorem) Let ${G}$ be a group of polynomial growth generated by a finite symmetric set ${S}$ of generators. Then the vector space ${V}$ of Lipschitz harmonic functions is finite-dimensional.

The third ingredient is that Gromov’s theorem is true in the compact linear Lie group case:

Theorem 4 (Gromov’s theorem in the compact Lie case) Let ${G}$ be a finitely generated subgroup of a compact linear Lie group ${H \subset GL_n({\bf C})}$ of polynomial growth. Then ${G}$ is virtually abelian.

The final ingredient is that Gromov’s theorem is inductively true once one can locate an infinite cyclic quotient:

Theorem 5 (Gromov’s theorem with an cyclic quotient) Let ${G}$ be a finitely generated group which has polynomial growth of exponent at most ${d}$ (i.e. the volume of a ball ${B_S(r)}$ grows like ${O(r^d)}$ for any fixed set of generators ${S}$). Suppose inductively that Gromov’s theorem is already known for groups of polynomial growth of exponent at most ${d-1}$, and suppose that ${G}$ contains a finite index subgroup ${G'}$ which can be mapped homomorphically onto an infinite cyclic group. Then ${G}$ is virtually nilpotent.

We prove these four facts in later sections. For now, let us see how they combine to establish Gromov’s theorem in full generality.

We assume that ${G}$ has polynomial growth of order ${d}$, and assume inductively that Gromov’s theorem has already been established for growth of order ${d-1}$ or less. We fix a symmetric set ${S}$ of generators.

We may assume that ${G}$ is infinite otherwise we are already done. So by Theorem 2, the space ${V}$ of (complex) Lipschitz harmonic functions consists of more than just the constants ${{\bf R}}$. In particular, setting ${W := V/{\bf C}}$, we have a non-trivial short exact sequence

$\displaystyle 0 \rightarrow {\bf C} \rightarrow V \rightarrow W \rightarrow 0.$

The left translation action of ${G}$ preserves the space of Lipschitz harmonic functions, and is thus an action of ${G}$ on ${V}$. Since ${G}$ preserves constants, it is also an action of ${G}$ on ${W}$. Now, on ${W}$, the homogeneous Lipschitz norm is a genuine norm, and is preserved by the action of ${G}$. Since all norms are equivalent on a finite-dimensional space, we can place an arbitrary Euclidean structure on ${W}$ and conclude that this structure is preserved up to constants by ${G}$. So, the image of the action of ${G}$ on ${W}$ is precompact, and thus its closure is a compact Lie group. By Theorem 4, this image is virtually abelian. If it is infinite, then we thus see that a finite index subgroup of ${G}$ has an infinite abelian image, and thus has a surjective homomorphism onto the integers, and we are done by Theorem 5. So we may assume that this image is finite; thus there is a finite index subgroup ${G'}$ of ${G}$ that is trivial on ${W}$. The action of ${G'}$ on ${V}$ then collapses to the form ${g f = f + \lambda_g(f)}$ for some linear functional ${\lambda_g \in V^*}$ (in fact ${\lambda_g}$ annihilates ${1}$ and so comes from ${W^*}$). Note that ${\lambda}$ is then an additive representation of ${G}$. If the image of this representation is infinite, then we are again done by Theorem 5, so we may assume that it is finite; thus there is a finite index subgroup ${G''}$ of ${G'}$ that is trivial on ${V}$. In other words, all Lipschitz harmonic functions are ${G''}$-invariant, and thus take only finitely many values. But looking at the maximum such value and using harmonicity (i.e. using the maximum principle) we conclude that all Lipschitz harmonic functions are constant, a contradiction.

— 2. Building a Lipschitz harmonic function —

Now we prove Theorem 2. We introduce the function

$\displaystyle \mu := \frac{1}{|S|} \sum_{s \in S} \delta_s$

where ${\delta_s}$ is the Kronecker delta function. The property of a function ${f: G \rightarrow {\bf C}}$ being harmonic is then simply that ${f*\mu = f}$, using the discrete convolution structure on the group.

To build such a function, we consider the functions

$\displaystyle f_n := \frac{1}{n} \sum_{m=1}^n \mu^{(m)}$

where ${\mu^{(m)} := \mu*\ldots*\mu}$ is the convolution of ${m}$ copies of ${\mu}$. This sequence of functions is “asymptotically harmonic” in the sense that

$\displaystyle \| f_n \|_{\ell^1(G)} = 1$

but

$\displaystyle \| f_n - f_n * \mu \|_{\ell^1(G)} = O(1/n)$

(we allow implied constants to depend on ${S}$).

There are now two cases. The first case is the non-amenable case, when we have

$\displaystyle \| f_n - f_n * \delta_s \|_{\ell^1(G)} > \epsilon > 0$

for some ${s \in S}$, some ${\epsilon > 0}$, and infinitely many ${n}$; informally, this means that the averaged iterated convolutions ${f_n}$ are not getting smoother as ${n \rightarrow \infty}$. By the duality of ${\ell^1(G)}$ and ${\ell^\infty(G)}$, we see that for each such ${n}$ we can find ${H_n}$ with ${\| H_n\|_{\ell^\infty(G)} = 1}$ such that

$\displaystyle |H_n*f_n(\hbox{id}) - H_n*f_n(s)| > \epsilon.$

But Young’s inequality, ${H_n*f_n}$ has ${\ell^\infty(G)}$ norm of at most ${1}$, and

$\displaystyle \|H_n*f_n - H_n*f_n*\mu\|_{L^\infty(G)} = O(1/n).$

Using the sequential Banach-Alaoglu theorem we may take a subsequence limit and obtain a non-trivial bounded harmonic function. Since bounded functions are automatically Lipschitz, the claim follows.

The second case is the amenable case, when we have

$\displaystyle \| f_n - f_n*\delta_s \|_{\ell^1(G)} \rightarrow 0$

as ${n \rightarrow \infty}$ for each ${s \in S}$. Setting ${F_n := f_n^{1/2}}$, one soon verifies that

$\displaystyle \| F_n \|_{\ell^2(G)} = 1$

and

$\displaystyle \| F_n - F_n * \delta_s \|_{\ell^2(G)} = o(1)$

In particular

$\displaystyle \| F_n - F_n * \mu \|_{\ell^2(G)} = o(1).$

From this and the spectral theorem, we see that the positive-definite Laplacian operator ${\Delta: \ell^2(G) \rightarrow \ell^2(G)}$ defined by the formula

$\displaystyle \Delta F := F - F*\mu$

has non-trivial spectrum at the origin. On the other hand, as ${G}$ is infinite, there are no non-trivial harmonic functions in ${\ell^2(G)}$ (as can be seen from the maximum principle), and so the spectrum at the origin is not coming from a zero eigenfunction. From this and the spectral theorem (taking spectral projections to ${[0,\epsilon]}$ for small ${\epsilon}$), one can find a sequence ${G_n \in \ell^2(G)}$ of functions such that

$\displaystyle \sum_{g \in G} G_n(g) \Delta G_n(g) = 1$

but

$\displaystyle \| \Delta G_n \|_{\ell^2(G)} \rightarrow 0$

as ${n \rightarrow \infty}$.

A summation by parts gives the Dirichlet energy identity

$\displaystyle \sum_{g \in G} G_n(g) \Delta G_n(g) = \frac{1}{2|S|} \sum_{s \in S} \| G_n - G_n * \delta_s \|_{\ell^2(G)}^2$

and thus

$\displaystyle \| G_n - G_n * \delta_s \|_{\ell^2(G)} = O(1),$

and also there exists ${s_0 \in S}$ such that

$\displaystyle \| G_n - G_n * \delta_{s_0} \|_{\ell^2(G)} \gg 1$

for infinitely many ${n}$. By the self-duality of ${\ell^2(G)}$, we may thus find a sequence ${H_n \in \ell^2(G)}$ with ${\|H_n\|_{\ell^2(G)} = 1}$ such that

$\displaystyle |H_n*G_n(\hbox{id}) - H_n*G_n(s_0)| \gg 1$

for infinitely many ${n}$. From Young’s inequality we also see that

$\displaystyle \| H_n*G_n - H_n*G_n * \delta_s \|_{\ell^\infty(G)} = O(1)$

(so ${H_n*G_n}$ is uniformly Lipschitz) and

$\displaystyle \| \Delta(H_n*G_n) \|_{\ell^\infty(G)} \rightarrow 0$

as ${n \rightarrow \infty}$, thus ${H_n*G_n}$ is asymptotically harmonic. Using the Arzelá-Ascoli theorem to take another subsequence limit (after first subtracting a constant to normalise ${H_n*G_n}$ to be zero at the identity, so that ${H_n*G_n}$ becomes locally bounded by the uniform Lispchitz property) we obtain the required non-trivial Lipschitz harmonic function.

Remark 1 In the case of groups of polynomial growth, one can verify that one is always in the “amenable” case. In the non-amenable case, the theory of Poisson boundaries gives a plentiful supply of bounded Lipschitz harmonic functions (in fact, there is an infinite-dimensional space of such).

— 3. Kleiner’s theorem —

We now prove Theorem 3. Our proof will basically repeat those in Kleiner’s original paper. For simplicity, let us assume a stronger condition than polynomial growth, namely bounded doubling

$\displaystyle |B_S(2R)| \leq C |B_S(R)|$

for some fixed constant ${C}$ and all ${R>0}$. In general, polynomial growth does not obviously imply bounded doubling at all scales, but there is a simple pigeonhole argument that gives bounded doubling on most scales, and this turns out to be enough to run the argument below. But in order not to deal with the (minor) technicalities arising from exceptional scales in which bounded doubling fails, I will assume bounded doubling at all scales. The full proof in the general case can, of course, be found in Kleiner’s paper (which in turn was based upon an earlier argument of Colding and Minicozzi).

Let ${\epsilon > 0}$ be a small parameter. The key lemma is

Lemma 6 (Elliptic regularity) Cover ${B_S(4R)}$ by balls ${B}$ of radius ${\epsilon R}$. Suppose that a harmonic function ${f: G \rightarrow {\bf R}}$ has mean zero on every such ball. Then one has

$\displaystyle \| f \|_{\ell^2(B_S(R))} \ll \epsilon \| f \|_{\ell^2(B_S(4R))}.$

Let’s see how this lemma establishes the theorem. Consider some Lipschitz harmonic functions ${u_1,\ldots,u_D}$, which we normalise to all vanish at the identity. Let ${V}$ be the space spanned by ${u_1,\ldots,u_D}$. For each ${R}$, the ${L^2(B_S(R))}$ inner product gives a quadratic form ${Q_R}$ on ${V}$. Using this quadratic form, we can build a Gram matrix determinant

$\displaystyle \det( Q_R(u_i,u_j) )_{1 \leq i,j \leq D}.$

From the Lipschitz nature of the harmonic functions, we have a bound of the form

$\displaystyle \det( Q_R(u_i,u_j) )_{1 \leq i,j \leq D} \ll R^{D} \ \ \ \ \ (1)$

as ${R \rightarrow \infty}$. On the other hand, we also have the monotonicity property

$\displaystyle \det( Q_R(u_i,u_j) )_{1 \leq i,j \leq D} \leq \det( Q_{4R}(u_i,u_j) )_{1 \leq i,j \leq D}.$

Now by bounded doubling, we can cover ${B_S(4R)}$ by ${O_\epsilon(1)}$ balls of radius ${\epsilon R}$. By Lemma 6, the space of functions in ${V}$ which have mean zero on each such ball is such that ${Q_R}$ is bounded (as a quadratic form) by ${O(\epsilon)}$ times ${Q_{4R}}$ on this space. Furthermore, by linear algebra, this space has codimension ${O_\epsilon(1)}$ in ${V}$. Using this, we obtain the improved bound

$\displaystyle \det( Q_R(u_i,u_j) )_{1 \leq i,j \leq D} \leq O(\epsilon)^{D - O_\epsilon(1)} \det( Q_{4R}(u_i,u_j) )_{1 \leq i,j \leq D}.$

For ${\epsilon}$ small enough and ${D}$ large enough, the rate of growth ${O(\epsilon)^{D - O_\epsilon(1)}}$ is strictly less than ${4^{-D}}$. Iterating this estimate by doubling ${R}$ off to infinity, and comparing against (1), we conclude in the limit that

$\displaystyle \det( Q_R(u_i,u_j) )_{1 \leq i,j \leq D} = 0$

for all ${R}$, and so ${u_1,\ldots,u_D}$ cannot be linearly independent. This implies that the space of Lipschitz harmonic functions has dimension at most ${D+1}$, and the claim follows.

It remains to prove the lemma. Fix the harmonic function ${f}$.

There are two basic ingredients here. The first is the reverse Poincaré inequality

$\displaystyle \sum_{x \in B_S(2R)} |\nabla f(x)|^2 \ll R^{-2} \sum_{x \in B(x_0,4R)} |f(x)|^2$

where

$\displaystyle |\nabla f(x)|^2 := \sum_{s \in S} |f(x)-f(xs)|^2.$

This claim (which heavily exploits the harmonicity of ${f}$) is proven by writing ${|f|^2}$ as ${f (f*\mu)}$, multiplying by a suitable cutoff function adapted to ${B(x_0,2r)}$ and equalling one on ${B(x_0,r)}$, and summing by parts; we omit the standard details. (The above inequality is in the general spirit of the philosophy that functions that are harmonic on a ball, should be smooth that ball.)

The second claim is the Poincaré inequality

$\displaystyle \sum_{x,y \in B(x_0,r)} |f(x)-f(y)|^2 \ll r^{2} |B_S(r)| \sum_{x \in B(x_0,3r)} |\nabla f(x)|^2,$

which does not require harmonicity. To prove this claim, observe that the left-hand side can be bounded by

$\displaystyle \sum_{g \in B_S(2r)} \sum_{x \in B(x_0,r)} |f(x)-f(xg)|^2.$

But by expanding each ${g \in B_S(2r)}$ as a word of length most ${2r}$ and using the triangle inequality in ${\ell^2}$ and Cauchy-Schwarz, we have

$\displaystyle \sum_{x \in B(x_0,r)} |f(x)-f(xg)|^2 \ll r^2 \sum_{x \in B(x_0,3r)} |\nabla f(x)|^2$

and the claim follows.

If ${f}$ has mean zero on ${B(x_0,r)}$, the Poincaré inequality implies that

$\displaystyle \sum_{x \in B(x_0,r)} |f(x)|^2 \ll r^{2} \sum_{x \in B(x_0,3r)} |\nabla f(x)|^2. \ \ \ \ \ (2)$

To prove the lemma, we first use bounded doubling to refine the family of balls ${B = B(x_i,\epsilon R)}$ so that the triples ${3B = B(x_i,3\epsilon R)}$ have bounded overlap. Applying (2) for each such ball and summing we obtain the claim.

— 4. The compact Lie case —

Now we prove Theorem 4. It is a classical fact that all compact linear Lie groups ${H}$ are isomorphic to a subgroup of a unitary group ${U(n)}$; indeed, if one takes the standard inner product on ${{\bf C}^n}$ and averages it by the Haar measure of ${H}$, one obtains an inner product which is ${H}$-invariant, and so ${H}$ can be embedded inside the unitary group associated to this group. Thus it suffices to prove the claim when ${H=U(n)}$.

A key observation is that if two unitary elements ${g, h}$ are close to the identity, then their commutator ${[g,h] = ghg^{-1}h^{-1}}$ is even closer to the identity. Indeed, since multiplication on the left or right by unitary elements does not affect the operator norm, we have

$\displaystyle \| [g,h] - 1 \|_{op} = \| gh - hg \|_{op}$

$\displaystyle = \| (g-1)(h-1) - (h-1)(g-1) \|_{op}$

and so by the triangle inequality

$\displaystyle \| [g,h] - 1 \|_{op} \leq 2 \|g-1\|_{op} \|h-1\|_{op}. \ \ \ \ \ (3)$

We now need to exploit (3) to prove Theorem 4. As a warm-up, we first prove the following slightly easier classical result:

Theorem 7 (Jordan’s theorem) Let ${G}$ be a finite subgroup of ${U(n)}$. Then ${G}$ contains an abelian subgroup of index ${O_n(1)}$ (i.e. at most ${C_n}$, where ${C_n}$ depends only on ${n}$).

And indeed, the proof of the two results are very similar. Let us first prove Jordan’s theorem. We do this by induction on ${n}$, the case ${n=1}$ being trivial. Suppose first that ${G}$ contains a central element ${g}$ which is not a multiple of the identity. Then, by definition, ${G}$ is contained in the centraliser ${Z(g)}$ of ${g}$, which by the spectral theorem is isomorphic to a product ${U(n_1) \times \ldots \times U(n_k)}$ of smaller unitary groups. Projecting ${G}$ to each of these factor groups and applying the induction hypothesis, we obtain the claim.

Thus we may assume that ${G}$ contains no central elements other than multiples of the identity. Now pick a small ${\epsilon > 0}$ (one could take ${\epsilon=1/10}$ in fact) and consider the subgroup ${G'}$ of ${G}$ generated by those elements of ${G}$ that are within ${\epsilon}$ of the identity (in the operator norm). By considering a maximal ${\epsilon}$-net of ${G}$ we see that ${G'}$ has index at most ${O_{n,\epsilon}(1)}$ in ${G}$. By arguing as before, we may assume that ${G'}$ has no central elements other than multiples of the identity.

If ${G'}$ consists only of multiples of the identity, then we are done. If not, take an element ${g}$ of ${G'}$ that is not a multiple of the identity, and which is as close as possible to the identity (here is where we use that ${G}$ is finite). By (3), we see that if ${\epsilon}$ is sufficiently small depending on ${n}$, and if ${h}$ is one of the generators of ${G'}$, then ${[g,h]}$ lies in ${G'}$ and is closer to the identity than ${g}$, and is thus a multiple of the identity. On the other hand, ${[g,h]}$ has determinant ${1}$. Given that it is so close to the identity, it must therefore be the identity (if ${\epsilon}$ is small enough). In other words, ${g}$ is central in ${G'}$, and is thus a multiple of the identity. But this contradicts the hypothesis that there are no central elements other than multiples of the identity, and we are done.

The proof of Theorem 4 is analogous. Again, we pick a small ${\epsilon > 0}$, and define ${G'}$ as before. If ${G'}$ has a central element that is not a multiple of the identity, then we can again argue via induction, so suppose that there are no such elements.

Being finitely generated, it is not difficult to show that ${G'}$ can be generated by a finite set ${S}$ of generators within distance ${\epsilon}$ of the identity. Now pick an element ${h_1 \in S}$ which is not a multiple of the identity, and is at a distance ${\delta_1}$ from the identity for some ${0 < \delta_1 \leq \epsilon}$. We look at all the commutators ${[g,h_1]}$ where ${g \in S}$. By (3), they are all at distance ${O(\epsilon \delta_1)}$ from the identity, and have determinant ${1}$. If they are all constant multiples of the identity, then by arguing as before we see that ${h_1}$ is central in ${G'}$, a contradiction, so we can find an element ${h_2 := [g_1,h_1]}$ for some ${g_1 \in S}$ which is a distance ${\delta_2 = O_n(\epsilon \delta_1)}$ from the origin and is not a multiple of the identity. Continuing this, we can construct ${h_3 := [g_2,h_2]}$, etc., where each ${h_n}$ is a distance ${0 < \delta_n = O( \epsilon \delta_{n-1} )}$ from the identity, and is a commutator of ${h_{n-1}}$ with a generator.

Because of the lacunary nature of the distances of ${h_1,h_2,h_3,\ldots}$, we easily see that the words ${h_1^{i_1} \ldots h_m^{i_m}}$ with ${0 \leq i_1,\ldots,i_m \leq c \epsilon^{-1}}$ are distinct for some small ${c > 0}$. On the other hand, all of these words lie in the ball of radius ${O( m \epsilon^{-1} 2^m )}$ generated by ${S}$. This contradicts the polynomial growth hypothesis for ${\epsilon}$ taken small enough and ${m}$ large enough.

Remark 2 Theorem 4 can be deduced as a corollary of Gromov’s theorem, though we do not do so here as this would be circular. Indeed, it is not hard to see that the image of a torsion-free nilpotent group in a unitary group must be abelian.

— 5. The case of an infinite abelian quotient —

Now we prove Theorem 5 (which was already observed in Gromov’s original paper, and also closely related to earlier work of Milnor and of Wolf).

Since ${G}$ is finitely generated and has polynomial growth of order ${d}$, the finite index subgroup ${G'}$ is also finitely generated of growth ${d}$. By hypothesis, there is a non-trivial homomorphism ${\phi: G' \rightarrow {\bf Z}}$. Using the Euclidean algorithm, one can move the generators ${e_1,\ldots,e_m}$ of ${G'}$ around so that all but one of them, say ${e_1,\ldots,e_{m-1}}$, lie in the kernel ${\hbox{ker}(\phi)}$ of ${\phi}$; we thus see that this kernel must then be generated by ${e_1,\ldots,e_{m-1}}$ and their conjugates ${e_m^k e_i e_m^{-k}}$ by powers of ${e_m}$.

Let ${S_k}$ be the set of ${e_m^{k'} e_i e_m^{-k'}}$ for ${1 \leq i \leq m-1}$ and ${|k'| \leq k}$, and let ${B_k}$ be the words of length at most ${k}$ generated by elements of ${S_k}$. Observe that if at least one of the elements in ${S_{k+1}}$ is not contained in ${B_k \cdot B_k^{-1}}$, then ${B_{k+1}}$ is at least twice as big as ${B_k}$. Because of polynomial growth, this implies that ${S_{k+1} \subset B_k \cdot B_k^{-1}}$ for some ${k \geq 1}$, which implies that ${\hbox{ker}(\phi)}$ is generated by ${S_k}$.

Observe that the ball of radius ${R}$ generated by ${S_k}$ and ${e_m}$ is at least ${R/2}$ times as large as the ball of radius ${R/2}$ generated by ${S_k}$. Since ${G'}$ has growth ${d}$, we conclude that ${\hbox{ker}(\phi)}$ has growth at most ${d-1}$, and is thus virtually nilpotent by hypothesis.

We have just seen that the kernel ${\hbox{ker}(\phi)}$ contains a nilpotent subgroup ${N}$ of some finite index ${M}$; it is thus finitely generated. We can take ${N}$ to be a normal subgroup of ${\hbox{ker}(\phi)}$. From Lagrange’s theorem, we see that the group ${N'}$ generated by the powers ${g^M}$ with ${g \in \hbox{ker}(\phi)}$ is then contained in ${N}$ and is therefore nilpotent. ${N'}$ is clearly a characteristic subgroup of ${\hbox{ker}(\phi)}$, and is thus normal in ${N}$. The group ${N/N'}$ is nilpotent and finitely generated with every element being of order ${M}$, and is thus finite; thus ${N'}$ is finite index in ${\hbox{ker}(\phi)}$. Since it is characteristic, it is in particular invariant under conjugation by ${e_m}$. If one lets ${G'' = {\bf Z} \ltimes_{e_m} N'}$ be the group generated by ${N'}$ and ${e_m}$, we see that ${G''}$ is a finite index subgroup of ${G}$. (Note that as ${e_m}$ is not annihilated by ${\phi}$, it will have infinite torsion even after quotienting out by ${N'}$.) In particular, it has polynomial growth.

To conclude, we need to show that ${G''}$ is virtually nilpotent. It will suffice to show that the conjugation action of ${e_m^a}$ on ${N'}$ acts unipotently on ${N'}$ for some finite ${a>0}$. We can induct on the step of the nilpotent group ${N'}$, assuming that the claim has already been proven for the quotient group ${N'/Z(N')}$ (where ${Z(N')}$ is the centre of ${N'}$), which has one lower step on ${N'}$. Thus it suffices to prove unipotence on just the center ${Z(N')}$, which is a finitely generated abelian group and thus isomorphic to some ${{\bf Z}^d \times H}$ for some finite group ${H}$. By Lagrange’s theorem, the action on ${H}$ becomes trivial after raising ${e_m}$ to a suitable power, so we only need to consider the action on ${{\bf Z}^d}$. In this case the conjugation action can be viewed as a matrix ${A}$ in ${SL_d({\bf Z})}$. Because ${G''}$ has polynomial growth, the powers ${A^n}$ of ${A}$ for ${n \in {\bf Z}}$ cannot grow exponentially (as otherwise the number of subsums of ${A^n e_i}$ for ${i=1,\ldots,d}$ and ${n=1,\ldots,N}$ would grow exponentially in ${N}$, contradicting the polynomial growth hypothesis); in other words, all the eigenvalues of ${A}$ have unit magnitude. On the other hand, these eigenvalues consist of Galois conjugacy classes of algebraic integers. But Kronecker’s theorem asserts that the only algebraic integers ${\alpha}$ whose Galois conjugates all have unit magnitude are the roots of unity (Proof: the algebraic integers ${\alpha^n}$ have bounded degree and uniformly bounded Galois conjugates, hence their minimal polynomial has bounded integer coefficients, and thus repeat itself). We conclude that all the eigenvalues of ${A}$ are roots of unity, i.e. some power of ${A}$ is unipotent, and the claim follows.