You are currently browsing the yearly archive for 2011.

Let ${n}$ be a natural number, and let ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ be a permutation of ${\{1,\ldots,n\}}$, drawn uniformly at random. Using the cycle decomposition, one can view ${\sigma}$ as the disjoint union of cycles of varying lengths (from ${1}$ to ${n}$). For each ${1 \leq k \leq n}$, let ${C_k}$ denote the number of cycles of ${\sigma}$ of length ${k}$; thus the ${C_k}$ are natural number-valued random variables with the constraint

$\displaystyle \sum_{k=1}^n k C_k = n. \ \ \ \ \ (1)$

We let ${C := \sum_{k=1}^n C_k}$ be the number of cycles (of arbitrary length); this is another natural number-valued random variable, of size at most ${n}$.

I recently had need to understand the distribution of the random variables ${C_k}$ and ${C}$. As it turns out this is an extremely classical subject, but as an exercise I worked out what I needed using a quite tedious computation involving generating functions that I will not reproduce here. But the resulting identities I got were so nice, that they strongly suggested the existence of elementary bijective (or “double counting”) proofs, in which the identities are proven with a minimum of computation, by interpreting each side of the identity as the cardinality (or probability) of the same quantity (or event), viewed in two different ways. I then found these bijective proofs, which I found to be rather cute; again, these are all extremely classical (closely related, for instance, to Stirling numbers of the first kind), but I thought some readers might be interested in trying to find these proofs themselves as an exercise (and I also wanted a place to write the identities down so I could retrieve them later), so I have listed the identities I found below.

1. For any ${1 \leq k \leq n}$, one has ${{\bf E} C_k = \frac{1}{k}}$. In particular, ${{\bf E} C = 1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + O(1)}$.
2. More generally, for any ${1 \leq k \leq n}$ and ${j \geq 1}$ with ${jk \leq n}$, one has ${{\bf E} \binom{C_k}{j} = \frac{1}{k^j j!}}$.
3. More generally still, for any ${1 \leq k_1 < \ldots < k_r \leq n}$ and ${j_1,\ldots,j_r \geq 1}$ with ${\sum_{i=1}^r j_i k_i \leq n}$, one has

$\displaystyle {\bf E} \prod_{i=1}^r \binom{C_{k_i}}{j_i} = \prod_{i=1}^r \frac{1}{k_i^{j_i} j_i!}.$

4. In particular, we have Cauchy’s formula: if ${\sum_{k=1}^n j_k k = n}$, then the probability that ${C_k = j_k}$ for all ${k=1,\ldots,n}$ is precisely ${\prod_{k=1}^n \frac{1}{k^{j_k} j_k!}}$. (This in particular leads to a reasonably tractable formula for the joint generating function of the ${C_k}$, which is what I initially used to compute everything that I needed, before finding the slicker bijective proofs.)
5. For fixed ${k}$, ${C_k}$ converges in distribution as ${n \rightarrow \infty}$ to the Poisson distribution of intensity ${\frac{1}{k}}$.
6. More generally, for fixed ${1 \leq k_1 < \ldots < k_r}$, ${C_{k_1},\ldots,C_{k_r}}$ converge in joint distribution to ${r}$ independent Poisson distributions of intensity ${\frac{1}{k_1},\ldots,\frac{1}{k_r}}$ respectively. (A more precise version of this claim can be found in this paper of Arratia and Tavaré.)
7. One has ${{\bf E} 2^C = n+1}$.
8. More generally, one has ${{\bf E} m^C = \binom{n+m-1}{n}}$ for all natural numbers ${m}$.

One of the basic problems in analytic number theory is to estimate sums of the form

$\displaystyle \sum_{p

as ${x \rightarrow \infty}$, where ${p}$ ranges over primes and ${f}$ is some explicit function of interest (e.g. a linear phase function ${f(p) = e^{2\pi i \alpha p}}$ for some real number ${\alpha}$). This is essentially the same task as obtaining estimates on the sum

$\displaystyle \sum_{n

where ${\Lambda}$ is the von Mangoldt function. If ${f}$ is bounded, ${f(n)=O(1)}$, then from the prime number theorem one has the trivial bound

$\displaystyle \sum_{n

but often (when ${f}$ is somehow “oscillatory” in nature) one is seeking the refinement

$\displaystyle \sum_{n

or equivalently

$\displaystyle \sum_{p

Thanks to identities such as

$\displaystyle \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)$

where ${\mu}$ is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

$\displaystyle \sum_{n

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of ${\log x}$ before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When ${f}$ is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of ${f}$. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

$\displaystyle \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),$

the Type I sum

$\displaystyle -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),$

the Type II sum

$\displaystyle -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),$

and the error term ${\sum_{d \leq V} \Lambda(n) f(n)}$, whenever ${1 \leq U, V \leq x}$ are parameters, and ${a, b}$ are the sequences

$\displaystyle a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)$

and

$\displaystyle b(m) := \sum_{d|m: d \leq U} \mu(d).$

Similarly one can express (4) as the Type I sum

$\displaystyle -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),$

the Type II sum

$\displaystyle - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)$

and the error term ${\sum_{d \leq UV} \mu(n) f(N)}$, whenever ${1 \leq U,V \leq x}$ with ${UV \leq x}$, and ${c}$ is the sequence

$\displaystyle c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).$

After eliminating troublesome sequences such as ${a(), b(), c()}$ via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

$\displaystyle \sum_{r \leq y} f(rd)$

or Type II sums such as

$\displaystyle \sum_{r \leq y} f(rd) \overline{f(rd')}$

for various ${y, d, d' \geq 1}$. Here, the trivial bound is ${O(y)}$, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like ${O( \frac{y}{\log^C y})}$ for some constant ${C}$ (e.g. ${C=5}$) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of ${f}$. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let ${f: {\bf N} \rightarrow {\bf C}}$ be a bounded function such that

$\displaystyle \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)$

for any distinct primes ${p, q}$ (where the decay rate of the error term ${o(x)}$ may depend on ${p}$ and ${q}$). Then

$\displaystyle \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)$

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which ${\mu}$ can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that ${\sum_{n \leq x} f(n) = o(x)}$ if one has ${\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)}$ for each fixed non-zero ${h}$.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

$\displaystyle \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)$

for any irrational ${\alpha}$. (For rational ${\alpha}$, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then ${\mu(n)}$ exhibits strong correlation with ${f(n)}$; by change of variables, we then expect ${\mu(pn)}$ to correlate with ${f(pn)}$ and ${\mu(pm)}$ to correlate with ${f(qn)}$, for “typical” ${p,q}$ at least. On the other hand, since ${\mu}$ is multiplicative, ${\mu(pn)}$ exhibits strong correlation with ${\mu(qn)}$. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if ${P}$ is a “large” but finite set of primes (in the sense that the sum ${A := \sum_{p \in P} \frac{1}{p}}$ is large), then for a typical large number ${n}$ (much larger than the elements of ${P}$), the number of primes in ${P}$ that divide ${n}$ is pretty close to ${A = \sum_{p \in P} \frac{1}{p}}$:

$\displaystyle \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)$

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against ${\mu(n) f(n)}$ and obtain an approximation

$\displaystyle \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)$

that approximates a sum of ${\mu(n) f(n)}$ by a bunch of sparser sums of ${\mu(n) f(n)}$. Since

$\displaystyle x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},$

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

$\displaystyle \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})$

for all ${p \in P}$ (or at least for “most” ${p \in P}$).

Now we make the change of variables ${n = pm}$. As the Möbius function is multiplicative, we usually have ${\mu(n) = \mu(p) \mu(m) = - \mu(m)}$. (There is an exception when ${n}$ is divisible by ${p^2}$, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

$\displaystyle \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )$

for most ${p \in P}$. However, by the hypothesis (5), the sequences ${m \mapsto f(pm)}$ are asymptotically orthogonal as ${p}$ varies, and this claim will then follow from a Cauchy-Schwarz argument.

Let ${G = (G,+)}$ be a finite additive group. A tiling pair is a pair of non-empty subsets ${A, B}$ such that every element of ${G}$ can be written in exactly one way as a sum of an element ${a}$ of ${A}$ and an element ${b}$ of ${B}$, in which case we write ${G = A \oplus B}$. The sets ${A, B}$ are then called tiles, with ${B}$ being a complementary tile to ${A}$ and vice versa. For instance, every subgroup ${H}$ of ${G}$ is a tile, as one can pick one representative from each coset of ${H}$ to form the complementary tile. Conversely, any set formed by taking one representative from each coset of ${H}$ is also a tile.

Tiles can be quite complicated, particularly when the group ${G}$ is “high-dimensional”. We will therefore restrict to the simple case of a cyclic group ${G = {\bf Z}/N{\bf Z}}$, and restrict even further to the special case when the modulus ${N}$ is square-free. Here, the situation should be much simpler. In particular, we have the following conjecture of Coven and Meyerowitz, which asserts that the previous construction of a tile is, in fact, the only such construction:

Conjecture 1 (Coven-Meyerowitz conjecture, square-free case) Let ${N}$ be square-free, and let ${A}$ be a tile of ${{\bf Z}/N{\bf Z}}$. Then there exists a subgroup ${H}$ of ${{\bf Z}/N{\bf Z}}$ such that ${A}$ consists of a single representative from each coset of ${H}$.

Note that in the square-free case, every subgroup ${H}$ of ${{\bf Z}/N{\bf Z}}$ has a complementary subgroup ${H^\perp}$ (thus ${{\bf Z}/N{\bf Z} = H \oplus H^\perp}$). In particular, ${H}$ consists of a single representative from each coset of ${H^\perp}$, and so the examples of subgroups of ${{\bf Z}/N{\bf Z}}$ are covered by the above conjecture in the square-free case.

In the non-square free case, the above assertion is not true; for instance, if ${p}$ is a prime, then the multiples of ${p}$ in ${{\bf Z}/p^2{\bf Z}}$ are a tile, but cannot be formed from taking a single representative from all the cosets of a given subgroup. There is a more general conjecture of Coven and Meyerowitz to handle this more general case, although it is more difficult to state:

Conjecture 2 (Coven-Meyerowitz conjecture, general case) Let ${N}$ be a natural number, and let ${A}$ be a tile of ${{\bf Z}/N{\bf Z}}$. Then there exists a set ${S_A}$ of prime powers with ${|A| = \prod_{p^j \in S_A} p}$ such that the Fourier transform

$\displaystyle \hat 1_A(k) := \sum_{n \in A} e^{2\pi i kn / N}$

vanishes whenever ${k}$ is a non-zero element of ${{\bf Z}/N{\bf Z}}$ whose order is the product of elements of ${S_A}$ that are powers of distinct primes. Equivalently, the generating polynomial ${\sum_{n \in A} x^n}$ is divisible by the cyclotomic polynomials ${\phi_m}$ whenever ${m}$ is the product of elements of ${S_A}$ that are powers of distinct primes.

It can be shown (with a modest amount of effort) that Conjecture 2 implies Conjecture 1, but we will not do so here, focusing instead exclusively on the square-free case for simplicity.

It was observed by Laba that Conjecture 2 is connected to the following well-known conjecture of Fuglede:

Conjecture 3 (One-dimensional Fuglede conjecture, tiling to spectral direction) Let ${E}$ be a compact subset of ${{\bf R}}$ of positive measure which is a tile (thus ${{\bf R} = E \oplus \Lambda}$ for some set ${\Lambda \subset {\bf R}}$). Then ${L^2(E)}$ (with Lebesgue measure) has a spectrum, that is to say an orthogonal set of plane waves ${x \mapsto e^{2\pi i \xi x}}$.

Indeed, it was shown by Laba that Conjecture 2 implies Conjecture 3 in the case when ${E}$ is the finite union of unit intervals. Actually, thanks to the more recent work of Farkas, Matolcsi, and Mora we know that Conjecture 2 in fact implies the universal spectrum conjecture of Lagarias and Wang, which in turn was known to imply Conjecture 3 in full generality. (On the other hand, the conjecture fails in four and higher dimensions; see the papers of Kolountzakis-Matolcsi and of Farkas-Revesz.)

Given the simple statement of Conjecture 1, it is perhaps somewhat surprising that it remains open, even in simple cases such as when ${N}$ is the product of just four primes. One reason for this is that some plausible strengthenings of this conjecture (such as the Tijdeman-Sands conjecture) are known to be false, as we will review below. On the other hand, as we shall see, tiling sets have a lot of combinatorial structure, and in principle one should be able to work out a lot of special cases of the conjecture. Given the combinatorial nature of this problem, it may well be quite suitable for a polymath project in fact, if there is sufficient interest.

In the last set of notes, we obtained the following structural theorem concerning approximate groups:

Theorem 1 Let ${A}$ be a finite ${K}$-approximate group. Then there exists a coset nilprogression ${P}$ of rank and step ${O_K(1)}$ contained in ${A^4}$, such that ${A}$ is covered by ${O_K(1)}$ left-translates of ${P}$ (and hence also by ${O_K(1)}$ right-translates of ${P}$).

Remark 1 Under some mild additional hypotheses (e.g. if the dimensions of ${P}$ are sufficiently large, or if ${P}$ is placed in a certain “normal form”, details of which may be found in this paper), a coset nilprogression ${P}$ of rank and step ${O_K(1)}$ will be an ${O_K(1)}$-approximate group, thus giving a partial converse to Theorem 1. (It is not quite a full converse though, even if one works qualitatively and forgets how the constants depend on ${K}$: if ${A}$ is covered by a bounded number of left- and right-translates ${gP, Pg}$ of ${P}$, one needs the group elements ${g}$ to “approximately normalise” ${P}$ in some sense if one wants to then conclude that ${A}$ is an approximate group.) The mild hypotheses alluded to above can be enforced in the statement of the theorem, but we will not discuss this technicality here, and refer the reader to the above-mentioned paper for details.

By placing the coset nilprogression in a virtually nilpotent group, we have the following corollary in the global case:

Corollary 2 Let ${A}$ be a finite ${K}$-approximate group in an ambient group ${G}$. Then ${A}$ is covered by ${O_K(1)}$ left cosets of a virtually nilpotent subgroup ${G'}$ of ${G}$.

In this final set of notes, we give some applications of the above results. The first application is to replace “${K}$-approximate group” by “sets of bounded doubling”:

Proposition 3 Let ${A}$ be a finite non-empty subset of a (global) group ${G}$ such that ${|A^2| \leq K |A|}$. Then there exists a coset nilprogression ${P}$ of rank and step ${O_K(1)}$ and cardinality ${|P| \gg_K |A|}$ such that ${A}$ can be covered by ${O_K(1)}$ left-translates of ${P}$, and also by ${O_K(1)}$ right-translates of ${P}$.

We will also establish (a strengthening of) a well-known theorem of Gromov on groups of polynomial growth, as promised back in Notes 0, as well as a variant result (of a type known as a “generalised Margulis lemma”) controlling the almost stabilisers of discrete actions of isometries.

The material here is largely drawn from my recent paper with Emmanuel Breuillard and Ben Green.

[Once again, some advertising on behalf of my department, following on a similar announcement in the previous two years.]

Two years ago, the UCLA mathematics department launched a scholarship opportunity for entering freshman students with exceptional background and promise in mathematics. We have offered one scholarship every year, but this year due to an additional source of funding, we will also be able to offer an additional scholarship for California residents.The UCLA Math Undergraduate Merit Scholarship provides for full tuition, and a room and board allowance for 4 years. In addition, scholarship recipients follow an individualized accelerated program of study, as determined after consultation with UCLA faculty.   The program of study leads to a Masters degree in Mathematics in four years.
More information and an application form for the scholarship can be found on the web at:
and
To be considered for Fall 2012, candidates must apply for the scholarship and also for admission to UCLA on or before November 30, 2011.

A common theme in mathematical analysis (particularly in analysis of a “geometric” or “statistical” flavour) is the interplay between “macroscopic” and “microscopic” scales. These terms are somewhat vague and imprecise, and their interpretation depends on the context and also on one’s choice of normalisations, but if one uses a “macroscopic” normalisation, “macroscopic” scales correspond to scales that are comparable to unit size (i.e. bounded above and below by absolute constants), while “microscopic” scales are much smaller, being the minimal scale at which nontrivial behaviour occurs. (Other normalisations are possible, e.g. making the microscopic scale a unit scale, and letting the macroscopic scale go off to infinity; for instance, such a normalisation is often used, at least initially, in the study of groups of polynomial growth. However, for the theory of approximate groups, a macroscopic scale normalisation is more convenient.)

One can also consider “mesoscopic” scales which are intermediate between microscopic and macroscopic scales, or large-scale behaviour at scales that go off to infinity (and in particular are larger than the macroscopic range of scales), although the behaviour of these scales will not be the main focus of this post. Finally, one can divide the macroscopic scales into “local” macroscopic scales (less than ${\epsilon}$ for some small but fixed ${\epsilon>0}$) and “global” macroscopic scales (scales that are allowed to be larger than a given large absolute constant ${C}$). For instance, given a finite approximate group ${A}$:

• Sets such as ${A^m}$ for some fixed ${m}$ (e.g. ${A^{10}}$) can be considered to be sets at a global macroscopic scale. Sending ${m}$ to infinity, one enters the large-scale regime.
• Sets such as the sets ${S}$ that appear in the Sanders lemma from the previous set of notes (thus ${S^m \subset A^4}$ for some fixed ${m}$, e.g. ${m=100}$) can be considered to be sets at a local macroscopic scale. Sending ${m}$ to infinity, one enters the mesoscopic regime.
• The non-identity element ${u}$ of ${A}$ that is “closest” to the identity in some suitable metric (cf. the proof of Jordan’s theorem from Notes 0) would be an element associated to the microscopic scale. The orbit ${u, u^2, u^3, \ldots}$ starts out at microscopic scales, and (assuming some suitable “escape” axioms) will pass through mesoscopic scales and finally entering the macroscopic regime. (Beyond this point, the orbit may exhibit a variety of behaviours, such as periodically returning back to the smaller scales, diverging off to ever larger scales, or filling out a dense subset of some macroscopic set; the escape axioms we will use do not exclude any of these possibilities.)

For comparison, in the theory of locally compact groups, properties about small neighbourhoods of the identity (e.g. local compactness, or the NSS property) would be properties at the local macroscopic scale, whereas the space ${L(G)}$ of one-parameter subgroups can be interpreted as an object at the microscopic scale. The exponential map then provides a bridge connecting the microscopic and macroscopic scales.

We return now to approximate groups. The macroscopic structure of these objects is well described by the Hrushovski Lie model theorem from the previous set of notes, which informally asserts that the macroscopic structure of an (ultra) approximate group can be modeled by a Lie group. This is already an important piece of information about general approximate groups, but it does not directly reveal the full structure of such approximate groups, because these Lie models are unable to see the microscopic behaviour of these approximate groups.

To illustrate this, let us review one of the examples of a Lie model of an ultra approximate group, namely Exercise 28 from Notes 7. In this example one studied a “nilbox” from a Heisenberg group, which we rewrite here in slightly different notation. Specifically, let ${G}$ be the Heisenberg group

$\displaystyle G := \{ (a,b,c): a,b,c \in {\bf Z} \}$

with group law

$\displaystyle (a,b,c) \ast (a',b',c') := (a+a', b+b', c+c'+ab') \ \ \ \ \ (1)$

and let ${A = \prod_{n \rightarrow \alpha} A_n}$, where ${A_n \subset G}$ is the box

$\displaystyle A_n := \{ (a,b,c) \in G: |a|, |b| \leq n; |c| \leq n^{10} \};$

thus ${A}$ is the nonstandard box

$\displaystyle A := \{ (a,b,c) \in {}^* G: |a|, |b| \leq N; |c| \leq N^{10} \}$

where ${N := \lim_{n \rightarrow \alpha} n}$. As the above exercise establishes, ${A \cup A^{-1}}$ is an ultra approximate group with a Lie model ${\pi: \langle A \rangle \rightarrow {\bf R}^3}$ given by the formula

$\displaystyle \pi( a, b, c ) := ( \hbox{st} \frac{a}{N}, \hbox{st} \frac{b}{N}, \hbox{st} \frac{c}{N^{10}} )$

for ${a,b = O(N)}$ and ${c = O(N^{10})}$. Note how the nonabelian nature of ${G}$ (arising from the ${ab'}$ term in the group law (1)) has been lost in the model ${{\bf R}^3}$, because the effect of that nonabelian term on ${\frac{c}{N^{10}}}$ is only ${O(\frac{N^2}{N^8})}$ which is infinitesimal and thus does not contribute to the standard part. In particular, if we replace ${G}$ with the abelian group ${G' := \{(a,b,c): a,b,c \in {\bf Z} \}}$ with the additive group law

$\displaystyle (a,b,c) \ast' (a',b',c') := (a+a',b+b',c+c')$

and let ${A'}$ and ${\pi'}$ be defined exactly as with ${A}$ and ${\pi}$, but placed inside the group structure of ${G'}$ rather than ${G}$, then ${A \cup A^{-1}}$ and ${A' \cup (A')^{-1}}$ are essentially “indistinguishable” as far as their models by ${{\bf R}^3}$ are concerned, even though the latter approximate group is abelian and the former is not. The problem is that the nonabelian-ness in the former example is so microscopic that it falls entirely inside the kernel of ${\pi}$ and is thus not detected at all by the model.

The problem of not being able to “see” the microscopic structure of a group (or approximate group) also was a key difficulty in the theory surrounding Hilbert’s fifth problem that was discussed in previous notes. A key tool in being able to resolve such structure was to build left-invariant metrics ${d}$ (or equivalently, norms ${\| \|}$) on one’s group, which obeyed useful “Gleason axioms” such as the commutator axiom

$\displaystyle \| [g,h] \| \ll \|g\| \|h\| \ \ \ \ \ (2)$

for sufficiently small ${g,h}$, or the escape axiom

$\displaystyle \| g^n \| \gg |n| \|g\| \ \ \ \ \ (3)$

when ${|n| \|g\|}$ was sufficiently small. Such axioms have important and non-trivial content even in the microscopic regime where ${g}$ or ${h}$ are extremely close to the identity. For instance, in the proof of Jordan’s theorem from Notes 0, which showed that any finite unitary group ${G}$ was boundedly virtually abelian, a key step was to apply the commutator axiom (2) (for the distance to the identity in operator norm) to the most “microscopic” element of ${G}$, or more precisely a non-identity element of ${G}$ of minimal norm. The key point was that this microscopic element was virtually central in ${G}$, and as such it restricted much of ${G}$ to a lower-dimensional subgroup of the unitary group, at which point one could argue using an induction-on-dimension argument. As we shall see, a similar argument can be used to place “virtually nilpotent” structure on finite approximate groups. For instance, in the Heisenberg-type approximate groups ${A \cup A^{-1}}$ and ${A' \cup (A')^{-1}}$ discussed earlier, the element ${(0,0,1)}$ will be “closest to the origin” in a suitable sense to be defined later, and is centralised by both approximate groups; quotienting out (the orbit of) that central element and iterating the process two more times, we shall see that one can express both ${A \cup A^{-1}}$ and ${A'\cup (A')^{-1}}$ as a tower of central cyclic extensions, which in particular establishes the nilpotency of both groups.

The escape axiom (3) is a particularly important axiom in connecting the microscopic structure of a group ${G}$ to its macroscopic structure; for instance, as shown in Notes 2, this axiom (in conjunction with the closely related commutator axiom) tends to imply dilation estimates such as ${d( g^n, h^n ) \sim n d(g,h)}$ that allow one to understand the microscopic geometry of points ${g,h}$ close to the identity in terms of the (local) macroscopic geometry of points ${g^n, h^n}$ that are significantly further away from the identity.

It is thus of interest to build some notion of a norm (or left-invariant metrics) on an approximate group ${A}$ that obeys the escape and commutator axioms (while being non-degenerate enough to adequately capture the geometry of ${A}$ in some sense), in a fashion analogous to the Gleason metrics that played such a key role in the theory of Hilbert’s fifth problem. It is tempting to use the Lie model theorem to do this, since Lie groups certainly come with Gleason metrics. However, if one does this, one ends up, roughly speaking, with a norm on ${A}$ that only obeys the escape and commutator estimates macroscopically; roughly speaking, this means that one has a macroscopic commutator inequality

$\displaystyle \| [g,h] \| \ll \|g\| \|h\| + o(1)$

and a macroscopic escape property

$\displaystyle \| g^n \| \gg |n| \|g\| - o(|n|)$

but such axioms are too weak for analysis at the microscopic scale, and in particular in establishing centrality of the element closest to the identity.

Another way to proceed is to build a norm that is specifically designed to obey the crucial escape property. Given an approximate group ${A}$ in a group ${G}$, and an element ${g}$ of ${G}$, we can define the escape norm ${\|g\|_{e,A}}$ of ${g}$ by the formula

$\displaystyle \| g \|_{e,A} := \inf \{ \frac{1}{n+1}: n \in {\bf N}: g, g^2, \ldots, g^n \in A \}.$

Thus, ${\|g\|_{e,A}}$ equals ${1}$ if ${g}$ lies outside of ${A}$, equals ${1/2}$ if ${g}$ lies in ${A}$ but ${g^2}$ lies outside of ${A}$, and so forth. Such norms had already appeared in Notes 4, in the context of analysing NSS groups.

As it turns out, this expression will obey an escape axiom, as long as we place some additional hypotheses on ${A}$ which we will present shortly. However, it need not actually be a norm; in particular, the triangle inequality

$\displaystyle \|gh\|_{e,A} \leq \|g\|_{e,A} + \|h\|_{e,A}$

is not necessarily true. Fortunately, it turns out that by a (slightly more complicated) version of the Gleason machinery from Notes 4 we can establish a usable substitute for this inequality, namely the quasi-triangle inequality

$\displaystyle \|g_1 \ldots g_k \|_{e,A} \leq C (\|g_1\|_{e,A} + \ldots + \|g_k\|_{e,A}),$

where ${C}$ is a constant independent of ${k}$. As we shall see, these estimates can then be used to obtain a commutator estimate (2).

However, to do all this, it is not enough for ${A}$ to be an approximate group; it must obey two additional “trapping” axioms that improve the properties of the escape norm. We formalise these axioms (somewhat arbitrarily) as follows:

Definition 1 (Strong approximate group) Let ${K \geq 1}$. A strong ${K}$-approximate group is a finite ${K}$-approximate group ${A}$ in a group ${G}$ with a symmetric subset ${S}$ obeying the following axioms:

An ultra strong ${K}$-approximate group is an ultraproduct ${A = \prod_{n \rightarrow \alpha} A_n}$ of strong ${K}$-approximate groups.

The first trapping condition can be rewritten as

$\displaystyle \|g\|_{e,A} \leq 1000 \|g\|_{e,A^{100}}$

and the second trapping condition can similarly be rewritten as

$\displaystyle \|g\|_{e,S} \leq 10^6 K^3 \|g\|_{e,A}.$

This makes the escape norms of ${A, A^{100}}$, and ${S}$ comparable to each other, which will be needed for a number of reasons (and in particular to close a certain bootstrap argument properly). Compare this with equation (12) from Notes 4, which used the NSS hypothesis to obtain similar conclusions. Thus, one can view the strong approximate group axioms as being a sort of proxy for the NSS property.

Example 1 Let ${N}$ be a large natural number. Then the interval ${A = [-N,N]}$ in the integers is a ${2}$-approximate group, which is also a strong ${2}$-approximate group (setting ${S = [10^{-6} N, 10^{-6} N]}$, for instance). On the other hand, if one places ${A}$ in ${{\bf Z}/5N{\bf Z}}$ rather than in the integers, then the first trapping condition is lost and one is no longer a strong ${2}$-approximate group. Also, if one remains in the integers, but deletes a few elements from ${A}$, e.g. deleting ${\pm \lfloor 10^{-10} N\rfloor}$ from ${A}$), then one is still a ${O(1)}$-approximate group, but is no longer a strong ${O(1)}$-approximate group, again because the first trapping condition is lost.

A key consequence of the Hrushovski Lie model theorem is that it allows one to replace approximate groups by strong approximate groups:

Exercise 1 (Finding strong approximate groups)

• (i) Let ${A}$ be an ultra approximate group with a good Lie model ${\pi: \langle A \rangle \rightarrow L}$, and let ${B}$ be a symmetric convex body (i.e. a convex open bounded subset) in the Lie algebra ${{\mathfrak l}}$. Show that if ${r>0}$ is a sufficiently small standard number, then there exists a strong ultra approximate group ${A'}$ with

$\displaystyle \pi^{-1}(\exp(rB)) \subset A' \subset \pi^{-1}(\exp(1.1 rB)) \subset A,$

and with ${A}$ can be covered by finitely many left translates of ${A'}$. Furthermore, ${\pi}$ is also a good model for ${A'}$.

• (ii) If ${A}$ is a finite ${K}$-approximate group, show that there is a strong ${O_K(1)}$-approximate group ${A'}$ inside ${A^4}$ with the property that ${A}$ can be covered by ${O_K(1)}$ left translates of ${A'}$. (Hint: use (i), Hrushovski’s Lie model theorem, and a compactness and contradiction argument.)

The need to compare the strong approximate group to an exponentiated small ball ${\exp(rB)}$ will be convenient later, as it allows one to easily use the geometry of ${L}$ to track various aspects of the strong approximate group.

As mentioned previously, strong approximate groups exhibit some of the features of NSS locally compact groups. In Notes 4, we saw that the escape norm for NSS locally compact groups was comparable to a Gleason metric. The following theorem is an analogue of that result:

Theorem 2 (Gleason lemma) Let ${A}$ be a strong ${K}$-approximate group in a group ${G}$.

• (Symmetry) For any ${g \in G}$, one has ${\|g^{-1}\|_{e,A} = \|g\|_{e,A}}$.
• (Conjugacy bound) For any ${g, h \in A^{10}}$, one has ${\|g^h\|_{e,A} \ll \|g\|_{e,A}}$.
• (Triangle inequality) For any ${g_1,\ldots,g_k \in G}$, one has ${\|g_1 \ldots g_k \|_{e,A} \ll_K (\|g_1\|_{e,A} + \ldots + \|g_k\|_{e,A})}$.
• (Escape property) One has ${\|g^n\|_{e,A} \gg |n| \|g\|_{e,A}}$ whenever ${|n| \|g\|_{e,A} < 1}$.
• (Commutator inequality) For any ${g,h \in A^{10}}$, one has ${\| [g,h] \|_{e,A} \ll_K \|g\|_{e,A} \|h\|_{e,A}}$.

The proof of this theorem will occupy a large part of the current set of notes. We then aim to use this theorem to classify strong approximate groups. The basic strategy (temporarily ignoring a key technical issue) follows the Bieberbach-Frobenius proof of Jordan’s theorem, as given in Notes 0, is as follows.

1. Start with an (ultra) strong approximate group ${A}$.
2. From the Gleason lemma, the elements with zero escape norm form a normal subgroup of ${A}$. Quotient these elements out. Show that all non-identity elements will have positive escape norm.
3. Find the non-identity element ${g_1}$ in (the quotient of) ${A}$ of minimal escape norm. Use the commutator estimate (assuming it is inherited by the quotient) to show that ${g_1}$ will centralise (most of) this quotient. In particular, the orbit ${\langle g_1 \rangle}$ is (essentially) a central subgroup of ${\langle A \rangle}$.
4. Quotient this orbit out; then find the next non-identity element ${g_2}$ in this new quotient of ${A}$. Again, show that ${\langle g_2 \rangle}$ is essentially a central subgroup of this quotient.
5. Repeat this process until ${A}$ becomes entirely trivial. Undoing all the quotients, this should demonstrate that ${\langle A \rangle}$ is virtually nilpotent, and that ${A}$ is essentially a coset nilprogression.

There are two main technical issues to resolve to make this strategy work. The first is to show that the iterative step in the argument terminates in finite time. This we do by returning to the Lie model theorem. It turns out that each time one quotients out by an orbit of an element that escapes, the dimension of the Lie model drops by at least one. This will ensure termination of the argument in finite time.

The other technical issue is that while the quotienting out all the elements of zero escape norm eliminates all “torsion” from ${A}$ (in the sense that the quotient of ${A}$ has no non-trivial elements of zero escape norm), further quotienting operations can inadvertently re-introduce such torsion. This torsion can be re-eradicated by further quotienting, but the price one pays for this is that the final structural description of ${\langle A \rangle}$ is no longer as strong as “virtually nilpotent”, but is instead a more complicated tower alternating between (ultra) finite extensions and central extensions.

Example 2 Consider the strong ${O(1)}$-approximate group

$\displaystyle A := \{ a N^{10} + 5 b: |a| \leq N; |b| \leq N^2 \}$

in the integers, where ${N}$ is a large natural number not divisible by ${5}$. As ${{\bf Z}}$ is torsion-free, all non-zero elements of ${A}$ have positive escape norm, and the nonzero element of minimal escape norm here is ${g=5}$ (or ${g=-5}$). But if one quotients by ${\langle g \rangle}$, ${A}$ projects down to ${{\bf Z}/5{\bf Z}}$, which now has torsion (and all elements in this quotient have zero escape norm). Thus torsion has been re-introduced by the quotienting operation. (A related observation is that the intersection of ${A}$ with ${\langle g \rangle = 5{\bf Z}}$ is not a simple progression, but is a more complicated object, namely a generalised arithmetic progression of rank two.)

To deal with this issue, we will not quotient out by the entire cyclic group ${\langle g \rangle = \{g^n: n \in {\bf Z} \}}$ generated by the element ${g}$ of minimal escape norm, but rather by an arithmetic progression ${P = \{g^n: |n| \leq N\}}$, where ${N}$ is a natural number comparable to the reciprocal ${1/\|g\|_{e,A}}$ of the escape norm, as this will be enough to cut the dimension of the Lie model down by one without introducing any further torsion. Of course, this cannot be done in the category of global groups, since the arithmetic progression ${P}$ will not, in general, be a group. However, it is still a local group, and it turns out that there is an analogue of the quotient space construction in local groups. This fixes the problem, but at a cost: in order to make the inductive portion of the argument work smoothly, it is now more natural to place the entire argument inside the category of local groups rather than global groups, even though the primary interest in approximate groups ${A}$ is in the global case when ${A}$ lies inside a global group. This necessitates some technical modification to some of the preceding discussion (for instance, the Gleason-Yamabe theorem must be replaced by the local version of this theorem, due to Goldbring); details can be found in this recent paper of Emmanuel Breuillard, Ben Green, and myself, but will only be sketched here.

Let ${{\mathfrak g}}$ be a finite-dimensional Lie algebra (over the reals). Given two sufficiently small elements ${x, y}$ of ${{\mathfrak g}}$, define the right Baker-Campbell-Hausdorff-Dynkin law

$\displaystyle R_y(x) := x + \int_0^1 F_R( \hbox{Ad}_x \hbox{Ad}_{ty} ) y \ dt \ \ \ \ \ (1)$

where ${\hbox{Ad}_x := \exp(\hbox{ad}_x)}$, ${\hbox{ad}_x: {\mathfrak g} \rightarrow {\mathfrak g}}$ is the adjoint map ${\hbox{ad}_x(y) := [x,y]}$, and ${F_R}$ is the function ${F_R(z) := \frac{z \log z}{z-1}}$, which is analytic for ${z}$ near ${1}$. Similarly, define the left Baker-Campbell-Hausdorff-Dynkin law

$\displaystyle L_x(y) := y + \int_0^1 F_L( \hbox{Ad}_{tx} \hbox{Ad}_y ) x\ dt \ \ \ \ \ (2)$

where ${F_L(z) := \frac{\log z}{z-1}}$. One easily verifies that these expressions are well-defined (and depend smoothly on ${x}$ and ${y}$) when ${x}$ and ${y}$ are sufficiently small.

We have the famous Baker-Campbell-Hausdoff-Dynkin formula:

Theorem 1 (BCH formula) Let ${G}$ be a finite-dimensional Lie group over the reals with Lie algebra ${{\mathfrak g}}$. Let ${\log}$ be a local inverse of the exponential map ${\exp: {\mathfrak g} \rightarrow G}$, defined in a neighbourhood of the identity. Then for sufficiently small ${x, y \in {\mathfrak g}}$, one has

$\displaystyle \log( \exp(x) \exp(y) ) = R_y(x) = L_x(y).$

See for instance these notes of mine for a proof of this formula (it is for ${R_y}$, but one easily obtains a similar proof for ${L_x}$).

In particular, one can give a neighbourhood of the identity in ${{\mathfrak g}}$ the structure of a local Lie group by defining the group operation ${\ast}$ as

$\displaystyle x \ast y := R_y(x) = L_x(y) \ \ \ \ \ (3)$

for sufficiently small ${x, y}$, and the inverse operation by ${x^{-1} := -x}$ (one easily verifies that ${R_x(-x) = L_x(-x) = 0}$ for all small ${x}$).

It is tempting to reverse the BCH formula and conclude (the local form of) Lie’s third theorem, that every finite-dimensional Lie algebra is isomorphic to the Lie algebra of some local Lie group, by using (3) to define a smooth local group structure on a neighbourhood of the identity. (See this previous post for a definition of a local Lie group.) The main difficulty in doing so is in verifying that the definition (3) is well-defined (i.e. that ${R_y(x)}$ is always equal to ${L_x(y)}$) and locally associative. The well-definedness issue can be trivially disposed of by using just one of the expressions ${R_y(x)}$ or ${L_x(y)}$ as the definition of ${\ast}$ (though, as we shall see, it will be very convenient to use both of them simultaneously). However, the associativity is not obvious at all.

With the assistance of Ado’s theorem, which places ${{\mathfrak g}}$ inside the general linear Lie algebra ${\mathfrak{gl}_n({\bf R})}$ for some ${n}$, one can deduce both the well-definedness and associativity of (3) from the Baker-Campbell-Hausdorff formula for ${\mathfrak{gl}_n({\bf R})}$. However, Ado’s theorem is rather difficult to prove (see for instance this previous blog post for a proof), and it is natural to ask whether there is a way to establish these facts without Ado’s theorem.

After playing around with this for some time, I managed to extract a direct proof of well-definedness and local associativity of (3), giving a proof of Lie’s third theorem independent of Ado’s theorem. This is not a new result by any means, (indeed, the original proofs of Lie and Cartan of Lie’s third theorem did not use Ado’s theorem), but I found it an instructive exercise to work out the details, and so I am putting it up on this blog in case anyone else is interested (and also because I want to be able to find the argument again if I ever need it in the future).

In the previous set of notes, we introduced the notion of an ultra approximate group – an ultraproduct ${A = \prod_{n \rightarrow\alpha} A_n}$ of finite ${K}$-approximate groups ${A_n}$ for some ${K}$ independent of ${n}$, where each ${K}$-approximate group ${A_n}$ may lie in a distinct ambient group ${G_n}$. Although these objects arise initially from the “finitary” objects ${A_n}$, it turns out that ultra approximate groups ${A}$ can be profitably analysed by means of infinitary groups ${L}$ (and in particular, locally compact groups or Lie groups ${L}$), by means of certain models ${\rho: \langle A \rangle \rightarrow L}$ of ${A}$ (or of the group ${\langle A \rangle}$ generated by ${A}$). We will define precisely what we mean by a model later, but as a first approximation one can view a model as a representation of the ultra approximate group ${A}$ (or of ${\langle A \rangle}$) that is “macroscopically faithful” in that it accurately describes the “large scale” behaviour of ${A}$ (or equivalently, that the kernel of the representation is “microscopic” in some sense). In the next section we will see how one can use “Gleason lemma” technology to convert this macroscopic control of an ultra approximate group into microscopic control, which will be the key to classifying approximate groups.

Models of ultra approximate groups can be viewed as the multiplicative combinatorics analogue of the more well known concept of an ultralimit of metric spaces, which we briefly review below the fold as motivation.

The crucial observation is that ultra approximate groups enjoy a local compactness property which allows them to be usefully modeled by locally compact groups (and hence, through the Gleason-Yamabe theorem from previous notes, by Lie groups also). As per the Heine-Borel theorem, the local compactness will come from a combination of a completeness property and a local total boundedness property. The completeness property turns out to be a direct consequence of the countable saturation property of ultraproducts, thus illustrating one of the key advantages of the ultraproduct setting. The local total boundedness property is more interesting. Roughly speaking, it asserts that “large bounded sets” (such as ${A}$ or ${A^{100}}$) can be covered by finitely many translates of “small bounded sets” ${S}$, where “small” is a topological group sense, implying in particular that large powers ${S^m}$ of ${S}$ lie inside a set such as ${A}$ or ${A^4}$. The easiest way to obtain such a property comes from the following lemma of Sanders:

Lemma 1 (Sanders lemma) Let ${A}$ be a finite ${K}$-approximate group in a (global) group ${G}$, and let ${m \geq 1}$. Then there exists a symmetric subset ${S}$ of ${A^4}$ with ${|S| \gg_{K,m} |A|}$ containing the identity such that ${S^m \subset A^4}$.

This lemma has an elementary combinatorial proof, and is the key to endowing an ultra approximate group with locally compact structure. There is also a closely related lemma of Croot and Sisask which can achieve similar results, and which will also be discussed below. (The locally compact structure can also be established more abstractly using the much more general methods of definability theory, as was first done by Hrushovski, but we will not discuss this approach here.)

By combining the locally compact structure of ultra approximate groups ${A}$ with the Gleason-Yamabe theorem, one ends up being able to model a large “ultra approximate subgroup” ${A'}$ of ${A}$ by a Lie group ${L}$. Such Lie models serve a number of important purposes in the structure theory of approximate groups. Firstly, as all Lie groups have a dimension which is a natural number, they allow one to assign a natural number “dimension” to ultra approximate groups, which opens up the ability to perform “induction on dimension” arguments. Secondly, Lie groups have an escape property (which is in fact equivalent to no small subgroups property): if a group element ${g}$ lies outside of a very small ball ${B_\epsilon}$, then some power ${g^n}$ of it will escape a somewhat larger ball ${B_1}$. Or equivalently: if a long orbit ${g, g^2, \ldots, g^n}$ lies inside the larger ball ${B_1}$, one can deduce that the original element ${g}$ lies inside the small ball ${B_\epsilon}$. Because all Lie groups have this property, we will be able to show that all ultra approximate groups ${A}$ “essentially” have a similar property, in that they are “controlled” by a nearby ultra approximate group which obeys a number of escape-type properties analogous to those enjoyed by small balls in a Lie group, and which we will call a strong ultra approximate group. This will be discussed in the next set of notes, where we will also see how these escape-type properties can be exploited to create a metric structure on strong approximate groups analogous to the Gleason metrics studied in previous notes, which can in turn be exploited (together with an induction on dimension argument) to fully classify such approximate groups (in the finite case, at least).

There are some cases where the analysis is particularly simple. For instance, in the bounded torsion case, one can show that the associated Lie model ${L}$ is necessarily zero-dimensional, which allows for a easy classification of approximate groups of bounded torsion.

Some of the material here is drawn from my recent paper with Ben Green and Emmanuel Breuillard, which is in turn inspired by a previous paper of Hrushovski.

Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “The structure of approximate groups“, submitted to Pub. IHES. We had announced the main results of this paper in various forums (including this blog) for a few months now, but it had taken some time to fully write up the paper and put in various refinements and applications.

As announced previously, the main result of this paper is what is a (virtually, qualitatively) complete description of finite approximate groups in an arbitrary (local or global) group ${G}$. For simplicity let us work in the much more familiar setting of global groups, although our results also apply (but are a bit more technical to state) in the local group setting.

Recall that in a global group ${G = (G,\cdot)}$, a ${K}$-approximate group is a symmetric subset ${A}$ of ${G}$ containing the origin, with the property that the product set ${A \cdot A}$ is covered by ${K}$ left-translates of ${A}$. Examples of ${O(1)}$-approximate groups include genuine groups, convex bodies in a bounded dimensional vector space, small balls in a bounded dimensional Lie group, large balls in a discrete nilpotent group of bounded rank or step, or generalised arithmetic progressions (or more generally, coset progressions) of bounded rank in an abelian group. Specialising now to finite approximate groups, a key example of such a group is what we call a coset nilprogression: a set of the form ${\pi^{-1}(P)}$, where ${\pi: G' \rightarrow N}$ is a homomorphism with finite kernel from a subgroup ${G'}$ of ${G}$ to a nilpotent group ${N}$ of bounded step, and ${P = P(u_1,\ldots,u_r;N_1,\ldots,N_r)}$ is a nilprogression with a bounded number of generators ${u_1,\ldots,u_r}$ in ${N}$ and some lengths ${N_1,\ldots,N_r \gg 1}$, where ${P(u_1,\ldots,u_r;N_1,\ldots,N_r)}$ consists of all the words involving at most ${N_1}$ copies of ${u_1^{\pm 1}}$, ${N_2}$ copies of ${u_2^{\pm 1}}$, and so forth up to ${N_r}$ copies of ${u_r^{\pm 1}}$. One can show (by some nilpotent algebra) that all such coset nilprogressions are ${O(1)}$-approximate groups so long as the step and the rank ${r}$ are bounded (and if ${N_1,\ldots,N_r}$ are sufficiently large).

Our main theorem (which was essentially conjectured independently by Helfgott and by Lindenstrauss) asserts, roughly speaking, that coset nilprogressions are essentially the only examples of approximate groups.

Theorem 1 Let ${A}$ be a ${K}$-approximate group. Then ${A^4}$ contains a coset nilprogression ${P}$ of rank and step ${O_K(1)}$, such that ${A}$ can be covered by ${O_K(1)}$ left-translates of ${P}$.

In the torsion-free abelian case, this result is essentially Freiman’s theorem (with an alternate proof by Ruzsa); for general abelian case, it is due to Green and Ruzsa. Various partial results in this direction for some other groups (e.g. free groups, nilpotent groups, solvable groups, or simple groups of Lie type) are also known; see these previous blog posts for a summary of several of these results.

This result has a number of applications to geometric growth theory, and in particular to variants of Gromov’s theorem of groups of polynomial growth, which asserts that a finitely generated group is of polynomial growth if and only if it is virtually nilpotent. The connection lies in the fact that if the balls ${B_S(R)}$ associated to a finite set of generators ${S}$ has polynomial growth, then some simple volume-packing arguments combined with the pigeonhole principle will show that ${B_S(R)}$ will end up being a ${O(1)}$-approximate group for many radii ${R}$. In fact, since our theorem only needs a single approximate group to obtain virtually nilpotent structure, we are able to obtain some new strengthenings of Gromov’s theorem. For instance, if ${A}$ is any ${K}$-approximate group in a finitely generated group ${G}$ that contains ${B_S(R_0)}$ for some set of generators ${S}$ and some ${R_0}$ that is sufficiently large depending on ${K}$, our theorem implies that ${G}$ is virtually nilpotent, answering a question of Petrunin. Among other things, this gives an alternate proof of a recent result of Kapovitch and Wilking (see also this previous paper of Cheeger and Colding) that a compact manifold of bounded diameter and Ricci curvature at least ${-\epsilon}$ necessarily has a virtually nilpotent fundamental group if ${\epsilon}$ is sufficiently small (depending only on dimension). The main point here is that no lower bound on the injectivity radius is required. Another application is a “Margulis-type lemma”, which asserts that if a metric space ${X}$ has “bounded packing” (in the sense that any ball of radius (say) ${4}$ is covered by a bounded number of balls of radius ${1}$), and ${\Gamma}$ is a group of isometries on ${X}$ that acts discretely (i.e. every orbit has only finitely many elements (counting multiplicity) in each bounded set), then the near-stabiliser ${\{ \gamma \in \Gamma: d(\gamma x, x) \leq \epsilon \}}$ of a point ${x}$ is virtually nilpotent if ${\epsilon}$ is small enough depending on the packing constant.

There are also some variants and refinements to the main theorem proved in the paper, such as an extension to local groups, and also an improvement on the bound on the rank and step from ${O_K(1)}$ to ${O(\log K)}$ (but at the cost of replacing ${A^4}$ in the theorem with ${A^{O(1)}}$).

I’ll be discussing the proof of the main theorem in detail in the next few lecture notes of my current graduate course. The full proof is somewhat lengthy (occupying about 50 pages of the 90-page paper), but can be summarised in the following steps:

1. (Hrushovski) Take an arbitrary sequence ${A_n}$ of finite ${K}$-approximate groups, and show that an appropriate limit ${A}$ of such groups can be “modeled” in some sense by an open bounded subset of a locally compact group. (The precise definition of “model” is technical, but “macroscopically faithful representation” is a good first approximation.) As discussed in the previous lecture notes, we use an ultralimit for this purpose; the paper of Hrushovski where this strategy was first employed also considered more sophisticated model-theoretic limits. To build a locally compact topology, Hrushovski used some tools from definability theory; in our paper, we instead use a combinatorial lemma of Sanders (closely related to a similar result of Croot and Sisask.)
2. (Gleason-Yamabe) The locally compact group can in turn be “modeled” by a Lie group (possibly after shrinking the group, and thus the ultralimit ${A}$, slightly). (This result arose from the solution to Hilbert’s fifth problem, as discussed here. For our extension to local groups, we use a recent local version of the Gleason-Yamabe theorem, due to Goldbring.)
3. (Gleason) Using the escape properties of the Lie model, construct a norm ${\| \|}$ (and thus a left-invariant metric ${d}$) on the ultralimit approximate group ${A}$ (and also on the finitary groups ${A_n}$) that obeys a number of good properties, such as a commutator estimate ${\| [g,h]\| \ll \|g\| \|h\|}$. (This is modeled on an analogous construction used in the theory of Hilbert’s fifth problem, as discussed in this previous set of lecture notes.) This norm is essentially an escape norm associated to (a slight modification) of ${A}$ or ${A_n}$.
4. (Jordan-Bieberbach-Frobenius) We now take advantage of the finite nature of the ${A_n}$ by locating the non-trivial element ${e}$ of ${A_n}$ with minimal escape norm (but one has to first quotient out the elements of zero escape norm first). The commutator estimate mentioned previously ensures that this element is essentially “central” in ${A_n}$. One can then quotient out a progression ${P(e;N)}$ generated by this central element (reducing the dimension of the Lie model by one in the process) and iterates the process until the dimension of the model drops to zero. Reversing the process, this constructs a coset nilprogression inside ${A_n^4}$. This argument is based on the classic proof of Jordan’s theorem due to Bieberbach and Frobenius, as discussed in this blog post.

One quirk of the argument is that it requires one to work in the category of local groups rather than global groups. (This is somewhat analogous to how, in the standard proofs of Freiman’s theorem, one needs to work with the category of Freiman homomorphisms, rather than group homomorphisms.) The reason for this arises when performing the quotienting step in the Jordan-Bieberbach-Frobenius leg of the argument. The obvious way to perform this step (and the thing that we tried first) would be to quotient out by the entire cyclic group ${\langle e \rangle}$ generated by the element ${e}$ of minimal escape norm. However, it turns out that this doesn’t work too well, because the group quotiented out is so “large” that it can create a lot of torsion in the quotient. In particular, elements which used to have positive escape norm, can now become trapped in the quotient of ${A_n}$, thus sending their escape norm to zero. This leads to an inferior conclusion (in which a coset nilprogression is replaced by a more complicated tower of alternating extensions between central progressions and finite groups, similar to the towers encountered in my previous paper on this topic). To prevent this unwanted creation of torsion, one has to truncate the cyclic group ${\langle e \rangle}$ before it escapes ${A_n}$, so that one quotients out by a geometric progression ${P(e;N)}$ rather than the cyclic group. But the operation of quotienting out by a ${P(e;N)}$, which is a local group rather than a global one, cannot be formalised in the category of global groups, but only in the category of local groups. Because of this, we were forced to carry out the entire argument using the language of local groups. As it turns out, the arguments are ultimately more natural in this setting, although there is an initial investment of notation required, given that global group theory is much more familiar and well-developed than local group theory.

One interesting feature of the argument is that it does not use much of the existing theory of Freiman-type theorems, instead building the coset nilprogression directly from the geometric properties of the approximate group. In particular, our argument gives a new proof of Freiman’s theorem in the abelian case, which largely avoids Fourier analysis (except through the use of the theory of Hilbert’s fifth problem, which uses the Peter-Weyl theorem (or, in the abelian case, Pontryagin duality), which is basically a version of Fourier analysis).

Roughly speaking, mathematical analysis can be divided into two major styles, namely hard analysis and soft analysis. The precise distinction between the two types of analysis is imprecise (and in some cases one may use a blend the two styles), but some key differences can be listed as follows.

• Hard analysis tends to be concerned with quantitative or effective properties such as estimates, upper and lower bounds, convergence rates, and growth rates or decay rates. In contrast, soft analysis tends to be concerned with qualitative or ineffective properties such as existence and uniqueness, finiteness, measurability, continuity, differentiability, connectedness, or compactness.
• Hard analysis tends to be focused on finitary, finite-dimensional or discrete objects, such as finite sets, finitely generated groups, finite Boolean combination of boxes or balls, or “finite-complexity” functions, such as polynomials or functions on a finite set. In contrast, soft analysis tends to be focused on infinitary, infinite-dimensional, or continuous objects, such as arbitrary measurable sets or measurable functions, or abstract locally compact groups.
• Hard analysis tends to involve explicit use of many parameters such as ${\epsilon}$, ${\delta}$, ${N}$, etc. In contrast, soft analysis tends to rely instead on properties such as continuity, differentiability, compactness, etc., which implicitly are defined using a similar set of parameters, but whose parameters often do not make an explicit appearance in arguments.
• In hard analysis, it is often the case that a key lemma in the literature is not quite optimised for the application at hand, and one has to reprove a slight variant of that lemma (using a variant of the proof of the original lemma) in order for it to be suitable for applications. In contrast, in soft analysis, key results can often be used as “black boxes”, without need of further modification or inspection of the proof.
• The properties in soft analysis tend to enjoy precise closure properties; for instance, the composition or linear combination of continuous functions is again continuous, and similarly for measurability, differentiability, etc. In contrast, the closure properties in hard analysis tend to be fuzzier, in that the parameters in the conclusion are often different from the parameters in the hypotheses. For instance, the composition of two Lipschitz functions with Lipschitz constant ${K}$ is still Lipschitz, but now with Lipschitz constant ${K^2}$ instead of ${K}$. These changes in parameters mean that hard analysis arguments often require more “bookkeeping” than their soft analysis counterparts, and are less able to utilise algebraic constructions (e.g. quotient space constructions) that rely heavily on precise closure properties.

In the lectures so far, focusing on the theory surrounding Hilbert’s fifth problem, the results and techniques have fallen well inside the category of soft analysis. However, we will now turn to the theory of approximate groups, which is a topic which is traditionally studied using the methods of hard analysis. (Later we will also study groups of polynomial growth, which lies on an intermediate position in the spectrum between hard and soft analysis, and which can be profitably analysed using both styles of analysis.)

Despite the superficial differences between hard and soft analysis, though, there are a number of important correspondences between results in hard analysis and results in soft analysis. For instance, if one has some sort of uniform quantitative bound on some expression relating to finitary objects, one can often use limiting arguments to then conclude a qualitative bound on analogous expressions on infinitary objects, by viewing the latter objects as some sort of “limit” of the former objects. Conversely, if one has a qualitative bound on infinitary objects, one can often use compactness and contradiction arguments to recover uniform quantitative bounds on finitary objects as a corollary.

Remark 1 Another type of correspondence between hard analysis and soft analysis, which is “syntactical” rather than “semantical” in nature, arises by taking the proofs of a soft analysis result, and translating such a qualitative proof somehow (e.g. by carefully manipulating quantifiers) into a quantitative proof of an analogous hard analysis result. This type of technique is sometimes referred to as proof mining in the proof theory literature, and is discussed in this previous blog post (and its comments). We will however not employ systematic proof mining techniques here, although in later posts we will informally borrow arguments from infinitary settings (such as the methods used to construct Gleason metrics) and adapt them to finitary ones.

Let us illustrate the correspondence between hard and soft analysis results with a simple example.

Proposition 1 Let ${X}$ be a sequentially compact topological space, let ${S}$ be a dense subset of ${X}$, and let ${f: X \rightarrow [0,+\infty]}$ be a continuous function (giving the extended half-line ${[0,+\infty]}$ the usual order topology). Then the following statements are equivalent:

• (i) (Qualitative bound on infinitary objects) For all ${x \in X}$, one has ${f(x) < +\infty}$.
• (ii) (Quantitative bound on finitary objects) There exists ${M < +\infty}$ such that ${f(x) \leq M}$ for all ${x \in S}$.

In applications, ${S}$ is typically a (non-compact) set of “finitary” (or “finite complexity”) objects of a certain class, and ${X}$ is some sort of “completion” or “compactification” of ${S}$ which admits additional “infinitary” objects that may be viewed as limits of finitary objects.

Proof: To see that (ii) implies (i), observe from density that every point ${x}$ in ${X}$ is adherent to ${S}$, and so given any neighbourhood ${U}$ of ${x}$, there exists ${y \in S \cap U}$. Since ${f(y) \leq M}$, we conclude from the continuity of ${f}$ that ${f(x) \leq M}$ also, and the claim follows.

Conversely, to show that (i) implies (ii), we use the “compactness and contradiction” argument. Suppose for sake of contradiction that (ii) failed. Then for any natural number ${n}$, there exists ${x_n \in S}$ such that ${f(x_n) \geq n}$. (Here we have used the axiom of choice, which we will assume throughout this course.) Using sequential compactness, and passing to a subsequence if necessary, we may assume that the ${x_n}$ converge to a limit ${x \in X}$. By continuity of ${f}$, this implies that ${f(x) = +\infty}$, contradicting (i). $\Box$

Remark 2 Note that the above deduction of (ii) from (i) is ineffective in that it gives no explicit bound on the uniform bound ${M}$ in (ii). Without any further information on how the qualitative bound (i) is proven, this is the best one can do in general (and this is one of the most significant weaknesses of infinitary methods when used to solve finitary problems); but if one has access to the proof of (i), one can often finitise or proof mine that argument to extract an effective bound for ${M}$, although often the bound one obtains in the process is quite poor (particularly if the proof of (i) relied extensively on infinitary tools, such as limits). See this blog post for some related discussion.

The above simple example illustrates that in order to get from an “infinitary” statement such as (i) to a “finitary” statement such as (ii), a key step is to be able to take a sequence ${(x_n)_{n \in {\bf N}}}$ (or in some cases, a more general net ${(x_\alpha)_{\alpha \in A}}$) of finitary objects and extract a suitable infinitary limit object ${x}$. In the literature, there are three main ways in which one can extract such a limit:

• (Topological limit) If the ${x_n}$ are all elements of some topological space ${S}$ (e.g. an incomplete function space) which has a suitable “compactification” or “completion” ${X}$ (e.g. a Banach space), then (after passing to a subsequence if necessary) one can often ensure the ${x_n}$ converge in a topological sense (or in a metrical sense) to a limit ${x}$. The use of this type of limit to pass between quantitative/finitary and qualitative/infinitary results is particularly common in the more analytical areas of mathematics (such as ergodic theory, asymptotic combinatorics, or PDE), due to the abundance of useful compactness results in analysis such as the (sequential) Banach-Alaoglu theorem, Prokhorov’s theorem, the Helly selection theorem, the Arzelá-Ascoli theorem, or even the humble Bolzano-Weierstrass theorem. However, one often has to take care with the nature of convergence, as many compactness theorems only guarantee convergence in a weak sense rather than in a strong one.
• (Categorical limit) If the ${x_n}$ are all objects in some category (e.g. metric spaces, groups, fields, etc.) with a number of morphisms between the ${x_n}$ (e.g. morphisms from ${x_{n+1}}$ to ${x_n}$, or vice versa), then one can often form a direct limit ${\lim_{\rightarrow} x_n}$ or inverse limit ${\lim_{\leftarrow} x_n}$ of these objects to form a limiting object ${x}$. The use of these types of limits to connect quantitative and qualitative results is common in subjects such as algebraic geometry that are particularly amenable to categorical ways of thinking. (We have seen inverse limits appear in the discussion of Hilbert’s fifth problem, although in that context they were not really used to connect quantitative and qualitative results together.)
• (Logical limit) If the ${x_n}$ are all distinct spaces (or elements or subsets of distinct spaces), with few morphisms connecting them together, then topological and categorical limits are often unavailable or unhelpful. In such cases, however, one can still tie together such objects using an ultraproduct construction (or similar device) to create a limiting object ${\lim_{n \rightarrow \alpha} x_n}$ or limiting space ${\prod_{n \rightarrow \alpha} x_n}$ that is a logical limit of the ${x_n}$, in the sense that various properties of the ${x_n}$ (particularly those that can be phrased using the language of first-order logic) are preserved in the limit. As such, logical limits are often very well suited for the task of connecting finitary and infinitary mathematics together. Ultralimit type constructions are of course used extensively in logic (particularly in model theory), but are also popular in metric geometry. They can also be used in many of the previously mentioned areas of mathematics, such as algebraic geometry (as discussed in this previous post).

The three types of limits are analogous in many ways, with a number of connections between them. For instance, in the study of groups of polynomial growth, both topological limits (using the metric notion of Gromov-Hausdorff convergence) and logical limits (using the ultralimit construction) are commonly used, and to some extent the two constructions are at least partially interchangeable in this setting. (See also these previous posts for the use of ultralimits as a substitute for topological limits.) In the theory of approximate groups, though, it was observed by Hrushovski that logical limits (and in particular, ultraproducts) are the most useful type of limit to connect finitary approximate groups to their infinitary counterparts. One reason for this is that one is often interested in obtaining results on approximate groups ${A}$ that are uniform in the choice of ambient group ${G}$. As such, one often seeks to take a limit of approximate groups ${A_n}$ that lie in completely unrelated ambient groups ${G_n}$, with no obvious morphisms or metrics tying the ${G_n}$ to each other. As such, the topological and categorical limits are not easily usable, whereas the logical limits can still be employed without much difficulty.

Logical limits are closely tied with non-standard analysis. Indeed, by applying an ultraproduct construction to standard number systems such as the natural numbers ${{\bf N}}$ or the reals ${{\bf R}}$, one can obtain nonstandard number systems such as the nonstandard natural numbers ${{}^* {\bf N}}$ or the nonstandard real numbers (or hyperreals) ${{}^* {\bf R}}$. These nonstandard number systems behave very similarly to their standard counterparts, but also enjoy the advantage of containing the standard number systems as proper subsystems (e.g. ${{\bf R}}$ is a subring of ${{}^* {\bf R}}$), which allows for some convenient algebraic manipulations (such as the quotient space construction to create spaces such as ${{}^* {\bf R} / {\bf R}}$) which are not easily accessible in the purely standard universe. Nonstandard spaces also enjoy a useful completeness property, known as countable saturation, which is analogous to metric completeness (as discussed in this previous blog post) and which will be particularly useful for us in tying together the theory of approximate groups with the theory of Hilbert’s fifth problem. See this previous post for more discussion on ultrafilters and nonstandard analysis.

In these notes, we lay out the basic theory of ultraproducts and ultralimits (in particular, proving Los’s theorem, which roughly speaking asserts that ultralimits are limits in a logical sense, as well as the countable saturation property alluded to earlier). We also lay out some of the basic foundations of nonstandard analysis, although we will not rely too heavily on nonstandard tools in this course. Finally, we apply this general theory to approximate groups, to connect finite approximate groups to an infinitary type of approximate group which we will call an ultra approximate group. We will then study these ultra approximate groups (and models of such groups) in more detail in the next set of notes.

Remark 3 Throughout these notes (and in the rest of the course), we will assume the axiom of choice, in order to easily use ultrafilter-based tools. If one really wanted to expend the effort, though, one could eliminate the axiom of choice from the proofs of the final “finitary” results that one is ultimately interested in proving, at the cost of making the proofs significantly lengthier. Indeed, there is a general result of Gödel that any result which can be stated in the language of Peano arithmetic (which, roughly speaking, means that the result is “finitary” in nature), and can be proven in set theory using the axiom of choice (or more precisely, in the ZFC axiom system), can also be proven in set theory without the axiom of choice (i.e. in the ZF system). As this course is not focused on foundations, we shall simply assume the axiom of choice henceforth to avoid further distraction by such issues.