You are currently browsing the tag archive for the ‘free group’ tag.

This post is a continuation of the previous post, which has attracted a large number of comments. I’m recording here some calculations that arose from those comments (particularly those of Pace Nielsen, Lior Silberman, Tobias Fritz, and Apoorva Khare). Please feel free to either continue these calculations or to discuss other approaches to the problem, such as those mentioned in the remaining comments to the previous post.

Let ${F_2}$ be the free group on two generators ${a,b}$, and let ${\| \|: F_2 \rightarrow {\bf R}^+}$ be a quantity obeying the triangle inequality

$\displaystyle \| xy\| \leq \|x \| + \|y\|$

and the linear growth property

$\displaystyle \| x^n \| = |n| \| x\|$

for all ${x,y \in F_2}$ and integers ${n \in {\bf Z}}$; this implies the conjugation invariance

$\displaystyle \| y^{-1} x y \| = \|x\|$

or equivalently

$\displaystyle \| xy \| = \| yx\|$

We consider inequalities of the form

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \alpha \|x\| + \beta \| y\| \ \ \ \ \ (1)$

or

$\displaystyle \| xyx^{-2}y^{-1} \| \leq \gamma \|x\| + \delta \| y\| \ \ \ \ \ (2)$

for various real numbers ${\alpha,\beta,\gamma,\delta}$. For instance, since

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \| xyx^{-1}\| + \|y^{-1} \| = \|y\| + \|y\|$

we have (1) for ${(\alpha,\beta) = (2,0)}$. We also have the following further relations:

Proposition 1

• (i) If (1) holds for ${(\alpha,\beta)}$, then it holds for ${(\beta,\alpha)}$.
• (ii) If (1) holds for ${(\alpha,\beta)}$, then (2) holds for ${(\alpha+1, \frac{\beta}{2})}$.
• (iii) If (2) holds for ${(\gamma,\delta)}$, then (1) holds for ${(\frac{2\gamma}{3}, \frac{2\delta}{3})}$.
• (iv) If (1) holds for ${(\alpha,\beta)}$ and (2) holds for ${(\gamma,\delta)}$, then (1) holds for ${(\frac{2\alpha+1+\gamma}{4}, \frac{\delta+\beta}{4})}$.

Proof: For (i) we simply observe that

$\displaystyle \| xyx^{-1} y^{-1} \| = \| (xyx^{-1} y^{-1})^{-1} \| = \| y^{-1} x^{-1} y x \| = \| y x y^{-1} x^{-1} \|.$

For (ii), we calculate

$\displaystyle \| xyx^{-2}y^{-1} \| = \frac{1}{2}\| (xyx^{-2}y^{-1})^2 \|$

$\displaystyle = \frac{1}{2} \| (xyx^{-2}y^{-1} x) (yx^{-2} y^{-1}) \|$

$\displaystyle \leq \frac{1}{2} (\| xyx^{-2}y^{-1} x\| + \|yx^{-2} y^{-1}\|)$

$\displaystyle \leq \frac{1}{2} ( \| x^2 y x^{-2} y^{-1} \| + 2 \|x\| )$

$\displaystyle \leq \frac{1}{2} ( 2 \alpha \|x\| + \beta \|y\| + 2 \|x\|)$

giving the claim.

For (iii), we calculate

$\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{3} \| (xyx^{-1}y^{-1})^3 \|$

$\displaystyle = \frac{1}{3} \| (xyx) (x^{-2} y^{-1} xy) (xyx)^{-1} (x^2 y x^{-1} y^{-1}) \|$

$\displaystyle \leq \frac{1}{3} ( \| x^{-2} y^{-1} xy\| + \| x^2 y x^{-1} y^{-1}\| )$

$\displaystyle = \frac{1}{3} ( \| xy x^{-2} y^{-1} \| + \|x^{-1} y^{-1} x^2 y \| )$

$\displaystyle \leq \frac{1}{3} ( \gamma \|x\| + \delta \|y\| + \gamma \|x\| + \delta \|y\|)$

giving the claim.

For (iv), we calculate

$\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{4} \| (xyx^{-1}y^{-1})^4 \|$

$\displaystyle = \frac{1}{4} \| (xy) (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) (xy)^{-1} (x^2yx^{-1}y^{-1}) \|$

$\displaystyle \leq \frac{1}{4} ( \| (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) \| + \|x^2yx^{-1}y^{-1}\| )$

$\displaystyle \leq \frac{1}{4} ( \|(y x^{-1} y^{-1}) (xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)$

$\displaystyle \leq \frac{1}{4} ( \|x\| + \|(xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)$

$\displaystyle = \frac{1}{4} ( \|x\| + \|x^{-2} y x^2 y^{-1} \|+ \gamma \|x\| + \delta \|y\|)$

$\displaystyle \leq \frac{1}{4} ( \|x\| + 2\alpha \|x\| + \beta \|y\| + \gamma \|x\| + \delta \|y\|)$

giving the claim. $\Box$

Here is a typical application of the above estimates. If (1) holds for ${(\alpha,\beta)}$, then by part (i) it holds for ${(\beta,\alpha)}$, then by (ii) (2) holds for ${(\beta+1,\frac{\alpha}{2})}$, then by (iv) (1) holds for ${(\frac{3\beta+2}{4}, \frac{3\alpha}{8})}$. The map ${(\alpha,\beta) \mapsto (\frac{3\beta+2}{4}, \frac{3\alpha}{8})}$ has fixed point ${(\alpha,\beta) = (\frac{16}{23}, \frac{6}{23})}$, thus

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \frac{16}{23} \|x\| + \frac{6}{23} \|y\|.$

For instance, if ${\|a\|, \|b\| \leq 1}$, then ${\|aba^{-1}b^{-1} \| \leq 22/23 = 0.95652\dots}$.

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let ${F_2}$ be the free group on two generators ${a,b}$. Does there exist a metric ${d}$ on this group which is

• bi-invariant, thus ${d(xg,yg)=d(gx,gy) = d(x,y)}$ for all ${x,y,g \in F_2}$; and
• linear growth in the sense that ${d(x^n,1) = n d(x,1)}$ for all ${x \in F_2}$ and all natural numbers ${n}$?

By defining the “norm” of an element ${x \in F_2}$ to be ${\| x\| := d(x,1)}$, an equivalent formulation of the problem asks if there exists a non-negative norm function ${\| \|: F_2 \rightarrow {\bf R}^+}$ that obeys the conjugation invariance

$\displaystyle \| gxg^{-1} \| = \|x \| \ \ \ \ \ (1)$

for all ${x,g \in F_2}$, the triangle inequality

$\displaystyle \| xy \| \leq \| x\| + \| y\| \ \ \ \ \ (2)$

for all ${x,y \in F_2}$, and the linear growth

$\displaystyle \| x^n \| = |n| \|x\| \ \ \ \ \ (3)$

for all ${x \in F_2}$ and ${n \in {\bf Z}}$, and such that ${\|x\| > 0}$ for all non-identity ${x \in F_2}$. Indeed, if such a norm exists then one can just take ${d(x,y) := \| x y^{-1} \|}$ to give the desired metric.

One can normalise the norm of the generators to be at most ${1}$, thus

$\displaystyle \| a \|, \| b \| \leq 1.$

This can then be used to upper bound the norm of other words in ${F_2}$. For instance, from (1), (3) one has

$\displaystyle \| aba^{-1} \|, \| b^{-1} a b \|, \| a^{-1} b^{-1} a \|, \| bab^{-1}\| \leq 1.$

A bit less trivially, from (3), (2), (1) one can bound commutators as

$\displaystyle \| aba^{-1} b^{-1} \| = \frac{1}{3} \| (aba^{-1} b^{-1})^3 \|$

$\displaystyle = \frac{1}{3} \| (aba^{-1}) (b^{-1} ab) (a^{-1} b^{-1} a) (b ab^{-1}) \|$

$\displaystyle \leq \frac{4}{3}.$

In a similar spirit one has

$\displaystyle \| aba^{-2} b^{-1} \| = \frac{1}{2} \| (aba^{-2} b^{-1})^2 \|$

$\displaystyle = \frac{1}{2} \| (aba^{-1}) (a^{-1} b^{-1} a) (ba^{-1} b^{-1}) (ba^{-1} b^{-1}) \|$

$\displaystyle \leq 2.$

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm ${\| g\|}$ of a given non-trivial group element ${g}$ to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on ${F_2}$ would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

I’ve just uploaded to the arXiv my paper “Failure of the ${L^1}$ pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map ${T: X \rightarrow X}$ on a probability space ${X = (X,\mu)}$, then for any ${f \in L^1(X)}$, the averages ${\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}}$ converge pointwise almost everywhere. (In the important case when the shift map ${T}$ is ergodic, the pointwise limit is simply the mean ${\int_X f\ d\mu}$ of the original function ${f}$.)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group ${F_2}$ on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for ${F_2}$-actions ${(T_g)_{g \in F_2}}$ on probability spaces ${(X,\mu)}$. For instance, for the spherical averaging operators

$\displaystyle {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}$

(where ${|g|}$ denotes the length of the reduced word that forms ${g}$), they showed that ${{\mathcal A}_{2n} f}$ converged pointwise almost everywhere provided that ${f}$ was in ${L^p(X)}$ for some ${p>1}$. (The need to restrict to spheres of even radius can be seen by considering the action of ${F_2}$ on the two-element set ${\{0,1\}}$ in which both generators of ${F_2}$ act by interchanging the elements, in which case ${{\mathcal A}_n}$ is determined by the parity of ${n}$.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition ${f \in L^p(X)}$ to the weaker condition ${f \in L \log L(X)}$.

The question remained open as to whether the pointwise ergodic theorem for ${F_2}$-actions held if one only assumed that ${f}$ was in ${L^1(X)}$. Nevo and Stein were able to establish this for the Cesáro averages ${\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}$, but not for ${{\mathcal A}_n}$ itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on ${\ell^1(F_2)}$, but due to the non-amenability of ${F_2}$, this inequality did not transfer to ${L^1(X)}$ and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the ${L^1}$ pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our ${\ell^1(F_2)}$ maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an ${L^1}$ ergodic theorem for iterates ${P^n}$ of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the ${F_2}$ setting, thus settling the problem in the negative:

Theorem 1 (Failure of ${L^1}$ pointwise ergodic theorem) There exists a measure-preserving ${F_2}$-action on a probability space ${X}$ and a non-negative function ${f \in L^1(X)}$ such that ${\sup_n {\mathcal A}_{2n} f(x) = +\infty}$ for almost every ${x}$.

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator ${P}$ on a probability space ${X}$ and a non-negative ${f \in L^1(X)}$ such that ${\sup_n P^n f(x) = +\infty}$ for almost every ${x}$. By some standard manipulations, it suffices to show that for any given ${\alpha > 0}$ and ${\varepsilon>0}$, there exists a self-adjoint Markov operator ${P}$ on a probability space ${X}$ and a non-negative ${f \in L^1(X)}$ with ${\|f\|_{L^1(X)} \leq \alpha}$, such that ${\sup_n P^n f \geq 1-\varepsilon}$ on a set of measure at least ${1-\varepsilon}$. Actually, it will be convenient to replace the Markov chain ${(P^n f)_{n \geq 0}}$ with an ancient Markov chain ${(f_n)_{n \in {\bf Z}}}$ – that is to say, a sequence of non-negative functions ${f_n}$ for both positive and negative ${f}$, such that ${f_{n+1} = P f_n}$ for all ${n \in {\bf Z}}$. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the ${F_2}$ version of the argument can be run using infinitely ancient chains.)

For any ${\alpha>0}$, let ${P(\alpha)}$ denote the claim that for any ${\varepsilon>0}$, there exists an ancient Markov chain ${(f_n)_{n \in {\bf Z}}}$ with ${\|f_n\|_{L^1(X)} = \alpha}$ such that ${\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon}$ on a set of measure at least ${1-\varepsilon}$. Clearly ${P(1)}$ holds since we can just take ${f_n=1}$ for all ${n}$. Our objective is to show that ${P(\alpha)}$ holds for arbitrarily small ${\alpha}$. The heart of Ornstein’s argument is then the implication

$\displaystyle P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)$

for any ${0 < \alpha \leq 1}$, which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain ${(f_n)_{n \in {\bf Z}}}$ on some probability space ${X}$ of total mass ${\|f_n\|_{L^1(X)} = \alpha}$, such that ${\sup_n f_n}$ attains the value of ${1}$ or greater almost everywhere. Assuming that the Markov process is irreducible, the ${f_n}$ will eventually converge as ${n \rightarrow \infty}$ to the constant value of ${\|f_n\|_{L^1(X)}}$, in particular its final state will essentially stay above ${\alpha}$ (up to small errors).

Now suppose we duplicate the Markov process by replacing ${X}$ with a double copy ${X \times \{1,2\}}$ (giving ${\{1,2\}}$ the uniform probability measure), and using the disjoint sum of the Markov operators on ${X \times \{1\}}$ and ${X \times \{2\}}$ as the propagator, so that there is no interaction between the two components of this new system. Then the functions ${f'_n(x,i) := f_n(x) 1_{i=1}}$ form an ancient Markov chain of mass at most ${\alpha/2}$ that lives solely in the first half ${X \times \{1\}}$ of this copy, and ${\sup_n f'_n}$ attains the value of ${1}$ or greater on almost all of the first half ${X \times \{1\}}$, but is zero on the second half. The final state of ${f'_n}$ will be to stay above ${\alpha}$ in the first half ${X \times \{1\}}$, but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves ${X \times \{1\}}$, ${X \times \{2\}}$ of the system (I mentally think of ${X \times \{1\}}$ and ${X \times \{2\}}$ as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain ${(f'_n)_{n \in {\bf Z}}}$ in the previous example gets replaced by a slightly different ancient Markov chain ${(f''_n)_{n \in {\bf Z}}}$ which is more or less identical with ${f'_n}$ for negative times ${n}$, or for bounded positive times ${n}$, but for very large values of ${n}$ the final state is now constant across the entire state space ${X \times \{1,2\}}$, and will stay above ${\alpha/2}$ on this space.

Finally, we consider an ancient Markov chain ${F_n}$ which is basically of the form

$\displaystyle F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}$

for some large parameter ${M}$ and for all ${n \leq M}$ (the approximation becomes increasingly inaccurate for ${n}$ much larger than ${M}$, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces ${X \times \{1\}, X \times \{2\}}$, but with the second copy delayed by a large time delay ${M}$, and also attenuated in amplitude by a factor of ${1-\frac{\alpha}{2}}$. The total mass of this process is now ${\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}$. Because of the ${f''_n}$ component of ${F_n}$, we see that ${\sup_n F_n}$ basically attains the value of ${1}$ or greater on the first half ${X \times \{1\}}$. On the second half ${X \times \{2\}}$, we work with times ${n}$ close to ${M}$. If ${M}$ is large enough, ${f''_n}$ would have averaged out to about ${\alpha/2}$ at such times, but the ${(1 - \frac{\alpha}{2}) f_{n-M}(x)}$ component can get as large as ${1-\alpha/2}$ here. Summing (and continuing to ignore various epsilon losses), we see that ${\sup_n F_n}$ can get as large as ${1}$ on almost all of the second half of ${X \times \{2\}}$. This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages ${{\mathcal A}_n}$ for a free group action can be lifted up to become powers ${P^n}$ of a Markov operator, basically by randomly assigning a “velocity vector” ${s \in \{a,b,a^{-1},b^{-1}\}}$ to one’s base point ${x}$ and then applying the Markov process that moves ${x}$ along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from ${s}$ to ${s^{-1}}$). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of ${F_2}$ systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with ${F_2}$-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case ${P(1)}$) comes from basically considering the action of ${F_2}$ on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time ${n=-\infty}$, and only reaches the boundary of this ball at the time ${n=0}$.

In a multiplicative group ${G}$, the commutator of two group elements ${g, h}$ is defined as ${[g,h] := g^{-1}h^{-1}gh}$ (other conventions are also in use, though they are largely equivalent for the purposes of this discussion). A group is said to be nilpotent of step ${s}$ (or more precisely, step ${\leq s}$), if all iterated commutators of order ${s+1}$ or higher necessarily vanish. For instance, a group is nilpotent of order ${1}$ if and only if it is abelian, and it is nilpotent of order ${2}$ if and only if ${[[g_1,g_2],g_3]=id}$ for all ${g_1,g_2,g_3}$ (i.e. all commutator elements ${[g_1,g_2]}$ are central), and so forth. A good example of an ${s}$-step nilpotent group is the group of ${s+1 \times s+1}$ upper-triangular unipotent matrices (i.e. matrices with ${1}$s on the diagonal and zero below the diagonal), and taking values in some ring (e.g. reals, integers, complex numbers, etc.).

Another important example of nilpotent groups arise from operations on polynomials. For instance, if ${V_{\leq s}}$ is the vector space of real polynomials of one variable of degree at most ${s}$, then there are two natural affine actions on ${V_{\leq s}}$. Firstly, every polynomial ${Q}$ in ${V_{\leq s}}$ gives rise to an “vertical” shift ${P \mapsto P+Q}$. Secondly, every ${h \in {\bf R}}$ gives rise to a “horizontal” shift ${P \mapsto P(\cdot+h)}$. The group generated by these two shifts is a nilpotent group of step ${\leq s}$; this reflects the well-known fact that a polynomial of degree ${\leq s}$ vanishes once one differentiates more than ${s}$ times. Because of this link between nilpotentcy and polynomials, one can view nilpotent algebra as a generalisation of polynomial algebra.

Suppose one has a finite number ${g_1,\ldots,g_n}$ of generators. Using abstract algebra, one can then construct the free nilpotent group ${{\mathcal F}_{\leq s}(g_1,\ldots,g_n)}$ of step ${\leq s}$, defined as the group generated by the ${g_1,\ldots,g_n}$ subject to the relations that all commutators of order ${s+1}$ involving the generators are trivial. This is the universal object in the category of nilpotent groups of step ${\leq s}$ with ${n}$ marked elements ${g_1,\ldots,g_n}$. In other words, given any other ${\leq s}$-step nilpotent group ${G'}$ with ${n}$ marked elements ${g'_1,\ldots,g'_n}$, there is a unique homomorphism from the free nilpotent group to ${G'}$ that maps each ${g_j}$ to ${g'_j}$ for ${1 \leq j \leq n}$. In particular, the free nilpotent group is well-defined up to isomorphism in this category.

In many applications, one wants to have a more concrete description of the free nilpotent group, so that one can perform computations more easily (and in particular, be able to tell when two words in the group are equal or not). This is easy for small values of ${s}$. For instance, when ${s=1}$, ${{\mathcal F}_{\leq 1}(g_1,\ldots,g_n)}$ is simply the free abelian group generated by ${g_1,\ldots,g_n}$, and so every element ${g}$ of ${{\mathcal F}_{\leq 1}(g_1,\ldots,g_n)}$ can be described uniquely as

$\displaystyle g = \prod_{j=1}^n g_j^{m_j} := g_1^{m_1} \ldots g_n^{m_n} \ \ \ \ \ (1)$

for some integers ${m_1,\ldots,m_n}$, with the obvious group law. Indeed, to obtain existence of this representation, one starts with any representation of ${g}$ in terms of the generators ${g_1,\ldots,g_n}$, and then uses the abelian property to push the ${g_1}$ factors to the far left, followed by the ${g_2}$ factors, and so forth. To show uniqueness, we observe that the group ${G}$ of formal abelian products ${\{ g_1^{m_1} \ldots g_n^{m_n}: m_1,\ldots,m_n \in {\bf Z} \} \equiv {\bf Z}^k}$ is already a ${\leq 1}$-step nilpotent group with marked elements ${g_1,\ldots,g_n}$, and so there must be a homomorphism from the free group to ${G}$. Since ${G}$ distinguishes all the products ${g_1^{m_1} \ldots g_n^{m_n}}$ from each other, the free group must also.

It is only slightly more tricky to describe the free nilpotent group ${{\mathcal F}_{\leq 2}(g_1,\ldots,g_n)}$ of step ${\leq 2}$. Using the identities

$\displaystyle gh = hg [g,h]; \quad gh^{-1} = ([g,h]^{-1})^{g^{-1}} h^{-1} g; \quad g^{-1} h = h [g,h]^{-1} g^{-1}; \quad g^{-1} h^{-1} := [g,h] g^{-1} h^{-1}$

(where ${g^h := h^{-1} g h}$ is the conjugate of ${g}$ by ${h}$) we see that whenever ${1 \leq i < j \leq n}$, one can push a positive or negative power of ${g_i}$ past a positive or negative power of ${g_j}$, at the cost of creating a positive or negative power of ${[g_i,g_j]}$, or one of its conjugates. Meanwhile, in a ${\leq 2}$-step nilpotent group, all the commutators are central, and one can pull all the commutators out of a word and collect them as in the abelian case. Doing all this, we see that every element ${g}$ of ${{\mathcal F}_{\leq 2}(g_1,\ldots,g_n)}$ has a representation of the form

$\displaystyle g = (\prod_{j=1}^n g_j^{m_j}) (\prod_{1 \leq i < j \leq n} [g_i,g_j]^{m_{[i,j]}}) \ \ \ \ \ (2)$

for some integers ${m_j}$ for ${1 \leq j \leq n}$ and ${m_{[i,j]}}$ for ${1 \leq i < j \leq n}$. Note that we don’t need to consider commutators ${[g_i,g_j]}$ for ${i \geq j}$, since

$\displaystyle [g_i,g_i] = id$

and

$\displaystyle [g_i,g_j] = [g_j,g_i]^{-1}.$

It is possible to show also that this representation is unique, by repeating the previous argument, i.e. by showing that the set of formal products

$\displaystyle G := \{ (\prod_{j=1}^k g_j^{m_j}) (\prod_{1 \leq i < j \leq n} [g_i,g_j]^{m_{[i,j]}}): m_j, m_{[i,j]} \in {\bf Z} \}$

forms a ${\leq 2}$-step nilpotent group, after using the above rules to define the group operations. This can be done, but verifying the group axioms (particularly the associative law) for ${G}$ is unpleasantly tedious.

Once one sees this, one rapidly loses an appetite for trying to obtain a similar explicit description for free nilpotent groups for higher step, especially once one starts seeing that higher commutators obey some non-obvious identities such as the Hall-Witt identity

$\displaystyle [[g, h^{-1}], k]^h\cdot[[h, k^{-1}], g]^k\cdot[[k, g^{-1}], h]^g = 1 \ \ \ \ \ (3)$

(a nonlinear version of the Jacobi identity in the theory of Lie algebras), which make one less certain as to the existence or uniqueness of various proposed generalisations of the representations (1) or (2). For instance, in the free ${\leq 3}$-step nilpotent group, it turns out that for representations of the form

$\displaystyle g = (\prod_{j=1}^n g_j^{m_j}) (\prod_{1 \leq i < j \leq n} [g_i,g_j]^{m_{[i,j]}}) (\prod_{1 \leq i < j < k \leq n} [[g_i,g_j],g_k]^{n_{[[i,j],k]}})$

one has uniqueness but not existence (e.g. even in the simplest case ${n=3}$, there is no place in this representation for, say, ${[[g_1,g_3],g_2]}$ or ${[[g_1,g_2],g_2]}$), but if one tries to insert more triple commutators into the representation to make up for this, one has to be careful not to lose uniqueness due to identities such as (3). One can paste these in by ad hoc means in the ${s=3}$ case, but the ${s=4}$ case looks more fearsome still, especially now that the quadruple commutators split into several distinct-looking species such as ${[[g_i,g_j],[g_k,g_l]]}$ and ${[[[g_i,g_j],g_k],g_l]}$ which are nevertheless still related to each other by identities such as (3). While one can eventually disentangle this mess for any fixed ${n}$ and ${s}$ by a finite amount of combinatorial computation, it is not immediately obvious how to give an explicit description of ${{\mathcal F}_{\leq s}(g_1,\ldots,g_n)}$ uniformly in ${n}$ and ${s}$.

Nevertheless, it turns out that one can give a reasonably tractable description of this group if one takes a polycyclic perspective rather than a nilpotent one – i.e. one views the free nilpotent group as a tower of group extensions of the trivial group by the cyclic group ${{\bf Z}}$. This seems to be a fairly standard observation in group theory – I found it in this book of Magnus, Karrass, and Solitar, via this paper of Leibman – but seems not to be so widely known outside of that field, so I wanted to record it here.

Notational convention: In this post only, I will colour a statement red if it assumes the axiom of choice. (For the rest of the course, the axiom of choice will be implicitly assumed throughout.) $\diamond$

The famous Banach-Tarski paradox asserts that one can take the unit ball in three dimensions, divide it up into finitely many pieces, and then translate and rotate each piece so that their union is now two disjoint unit balls.  As a consequence of this paradox, it is not possible to create a finitely additive measure on ${\Bbb R}^3$ that is both translation and rotation invariant, which can measure every subset of ${\Bbb R}^3$, and which gives the unit ball a non-zero measure. This paradox helps explain why Lebesgue measure (which is countably additive and both translation and rotation invariant, and gives the unit ball a non-zero measure) cannot measure every set, instead being restricted to measuring sets that are Lebesgue measurable.

On the other hand, it is not possible to replicate the Banach-Tarski paradox in one or two dimensions; the unit interval in ${\Bbb R}$ or unit disk in ${\Bbb R}^2$ cannot be rearranged into two unit intervals or two unit disks using only finitely many pieces, translations, and rotations, and indeed there do exist non-trivial finitely additive measures on these spaces. However, it is possible to obtain a Banach-Tarski type paradox in one or two dimensions using countably many such pieces; this rules out the possibility of extending Lebesgue measure to a countably additive translation invariant measure on all subsets of ${\Bbb R}$ (or any higher-dimensional space).

In these notes I would like to establish all of the above results, and tie them in with some important concepts and tools in modern group theory, most notably amenability and the ping-pong lemma.  This material is not required for the rest of the course, but nevertheless has some independent interest.