You are currently browsing the monthly archive for November 2011.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant ${\det M_n}$ of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant ${\det A_n}$ of a random iid matrix ${A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$, thus the real and imaginary parts are independent with law ${N(0,1/2)_{\bf R}}$), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution ${\zeta_{ij} \equiv \pm 1}$ (with a ${1/2}$ chance of either sign). More generally, one can consider a matrix ${A_n}$ in which all the entries ${\zeta_{ij}}$ are independently and identically distributed with mean zero and variance ${1}$.

We can expand ${\det A_n}$ using the Leibniz expansion

$\displaystyle \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)$

where ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ ranges over the permutations of ${\{1,\ldots,n\}}$, and ${I_\sigma}$ is the product

$\displaystyle I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.$

From the iid nature of the ${\zeta_{ij}}$, we easily see that each ${I_\sigma}$ has mean zero and variance one, and are pairwise uncorrelated as ${\sigma}$ varies. We conclude that ${\det A_n}$ has mean zero and variance ${n!}$ (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that ${\det A_n}$ is typically of size ${O(\sqrt{n!})}$.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, ${\det A_n}$ is clearly symmetrical, so we can focus attention on the magnitude ${|\det A_n|}$. We can interpret this quantity geometrically as the volume of an ${n}$-dimensional parallelopiped whose generating vectors ${X_1,\ldots,X_n}$ are independent real gaussian vectors in ${{\bf R}^n}$ (i.e. their coefficients are iid with law ${N(0,1)_{\bf R}}$). Using the classical base-times-height formula, we thus have

$\displaystyle |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)$

where ${V_i}$ is the ${i-1}$-dimensional linear subspace of ${{\bf R}^n}$ spanned by ${X_1,\ldots,X_{i-1}}$ (note that ${X_1,\ldots,X_n}$, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

$\displaystyle \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).$

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group ${O(n)}$. Because of this, we see that if we fix ${X_1,\ldots,X_{i-1}}$ (and thus ${V_i}$, the random variable ${\hbox{dist}(X_i,V_i)}$ has the same distribution as ${\hbox{dist}(X_i,{\bf R}^{i-1})}$, or equivalently the ${\chi}$ distribution

$\displaystyle \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}$

where ${\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}}$ are iid copies of ${N(0,1)_{\bf R}}$. As this distribution does not depend on the ${X_1,\ldots,X_{i-1}}$, we conclude that the law of ${\log |\det A_n|}$ is given by the sum of ${n}$ independent ${\chi}$-variables:

$\displaystyle \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.$

A standard computation shows that each ${\chi_j^2}$ has mean ${j}$ and variance ${2j}$, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that ${\log \chi_j}$ has mean ${\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})}$ and variance ${\frac{1}{2j}+O(1/j^{3/2})}$. As such, ${\log |\det A_n|}$ has mean ${\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)}$ and variance ${\frac{1}{2} \log n + O(1)}$. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)$

where ${\rightarrow}$ denotes convergence in distribution. A bit more informally, we have

$\displaystyle |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)$

when ${A_n}$ is a real gaussian matrix; thus, for instance, the median value of ${|\det A_n|}$ is ${n^{-1/2+o(1)} \sqrt{n!}}$. At first glance, this appears to conflict with the second moment bound ${\mathop{\bf E} |\det A_n|^2 = n!}$ of Turán mentioned earlier, but once one recalls that ${\exp(N(0,t)_{\bf R})}$ has a second moment of ${\exp(2t)}$, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real ${\chi}$ distribution with the complex ${\chi}$ distribution, in which the ${\xi_{i,j}}$ are distributed according to the complex gaussian ${N(0,1)_{\bf C}}$ instead of the real one). At the end of the day, one ends up with the law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)$

$\displaystyle |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)$

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices ${A_n}$ by Wigner matrices ${M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus ${\zeta_{ij} = \overline{\zeta_{ji}}}$ for all ${i,j}$. Model examples here include the Gaussian Unitary Ensemble (GUE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i=j \leq n}$, the Gaussian Orthogonal Ensemble (GOE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,2)_{\bf R}}$ for ${1 \leq i=j \leq n}$, and the symmetric Bernoulli ensemble, in which ${\zeta_{ij} \equiv \pm 1}$ for ${1 \leq i \leq j \leq n}$ (with probability ${1/2}$ of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants ${\det M_n}$ of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the ${I_\sigma}$ are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of ${I_\sigma}$ is still usually zero, but equals ${(-1)^{n/2}}$ in the exceptional case when ${\sigma}$ is a perfect matching (i.e. the union of exactly ${n/2}$ ${2}$-cycles, a possibility that can of course only happen when ${n}$ is even). As such, the mean ${\mathop{\bf E} \det M_n}$ still vanishes when ${n}$ is odd, but for even ${n}$ it is equal to

$\displaystyle (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}$

(the fraction here simply being the number of perfect matchings on ${n}$ vertices). Using Stirling’s formula, one then computes that ${|\mathop{\bf E} \det A_n|}$ is comparable to ${n^{-1/4} \sqrt{n!}}$ when ${n}$ is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that ${\mathop{\bf E} |\det A_n|^2}$ is comparable to ${n^{1/2} n!}$ for GUE and ${n^{3/2} n!}$ for GOE. (The discrepancy here comes from the fact that in the GOE case, ${I_\sigma}$ and ${I_\rho}$ can correlate when ${\rho}$ contains reversals of ${k}$-cycles of ${\sigma}$ for ${k \geq 3}$, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let ${M_n}$ be a Wigner matrix.

• If ${M_n}$ is drawn from GUE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.$

• If ${M_n}$ is drawn from GOE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.$

• The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )$

when ${M_n}$ is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )$

when ${M_n}$ is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

$\displaystyle \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)$

of the Stieltjes transform

$\displaystyle s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}$

of ${M_n}$. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) Let ${M'_n}$ be the random tridiagonal real symmetric matrix

$\displaystyle M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}$

where the ${a_1,\ldots,a_n, b_1,\ldots,b_{n-1}}$ are jointly independent real random variables, with ${a_1,\ldots,a_n \equiv N(0,1)_{\bf R}}$ being standard real Gaussians, and each ${b_i}$ having a ${\chi}$-distribution:

$\displaystyle b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}$

where ${z_{i,j} \equiv N(0,1)_{\bf C}}$ are iid complex gaussians. Let ${M_n}$ be drawn from GUE. Then the joint eigenvalue distribution of ${M_n}$ is identical to the joint eigenvalue distribution of ${M'_n}$.

Proof: Let ${M_n}$ be drawn from GUE. We can write

$\displaystyle M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}$

where ${M_{n-1}}$ is drawn from the ${n-1\times n-1}$ GUE, ${a_n \equiv N(0,1)_{\bf R}}$, and ${X_n \in {\bf C}^{n-1}}$ is a random gaussian vector with all entries iid with distribution ${N(0,1)_{\bf C}}$. Furthermore, ${M_{n-1}, X_n, a_n}$ are jointly independent.

We now apply the tridiagonal matrix algorithm. Let ${b_{n-1} := |X_n|}$, then ${b_n}$ has the ${\chi}$-distribution indicated in the proposition. We then conjugate ${M_n}$ by a unitary matrix ${U}$ that preserves the final basis vector ${e_n}$, and maps ${X}$ to ${b_{n-1} e_{n-1}}$. Then we have

$\displaystyle U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}$

where ${\tilde M_{n-1}}$ is conjugate to ${M_{n-1}}$. Now we make the crucial observation: because ${M_{n-1}}$ is distributed according to GUE (which is a unitarily invariant ensemble), and ${U}$ is a unitary matrix independent of ${M_{n-1}}$, ${\tilde M_{n-1}}$ is also distributed according to GUE, and remains independent of both ${b_{n-1}}$ and ${a_n}$.

We continue this process, expanding ${U M_n U^*}$ as

$\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}$

Applying a further unitary conjugation that fixes ${e_{n-1}, e_n}$ but maps ${X_{n-1}}$ to ${b_{n-2} e_{n-2}}$, we may replace ${X_{n-1}}$ by ${b_{n-2} e_{n-2}}$ while transforming ${M_{n-2}}$ to another GUE matrix ${\tilde M_{n-2}}$ independent of ${a_n, b_{n-1}, a_{n-1}, b_{n-2}}$. Iterating this process, we eventually obtain a coupling of ${M_n}$ to ${M'_n}$ by unitary conjugations, and the claim follows. $\Box$

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant ${D_n}$ of the above matrix is given by solving the recursion

$\displaystyle D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}$

with ${D_0=1}$ and ${D_{-1} = 0}$. Thus, instead of the product of a sequence of independent scalar ${\chi}$ distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent ${2\times 2}$ matrices whose entries are given by gaussians and ${\chi}$ distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these ${2 \times 2}$ matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

In the Winter quarter (starting on January 9), I will be teaching a graduate course on expansion in groups of Lie type.  This course will focus on constructions of expanding Cayley graphs on finite groups of Lie type (such as the special linear groups $SL_d({\bf F}_q)$, or their simple quotients $PSL_d({\bf F}_q)$, but also including more exotic “twisted” groups of Lie type, such as the Steinberg or Suzuki-Ree groups), including the “classical” constructions of Margulis and of Selberg, but also the more recent constructions of Bourgain-Gamburd and later authors (including some very recent work of Ben Green, Emmanuel Breuillard, Rob Guralnick, and myself which is nearing completion and which I plan to post about shortly).  As usual, I plan to start posting lecture notes on this blog before the course begins.

Let ${n}$ be a natural number, and let ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ be a permutation of ${\{1,\ldots,n\}}$, drawn uniformly at random. Using the cycle decomposition, one can view ${\sigma}$ as the disjoint union of cycles of varying lengths (from ${1}$ to ${n}$). For each ${1 \leq k \leq n}$, let ${C_k}$ denote the number of cycles of ${\sigma}$ of length ${k}$; thus the ${C_k}$ are natural number-valued random variables with the constraint

$\displaystyle \sum_{k=1}^n k C_k = n. \ \ \ \ \ (1)$

We let ${C := \sum_{k=1}^n C_k}$ be the number of cycles (of arbitrary length); this is another natural number-valued random variable, of size at most ${n}$.

I recently had need to understand the distribution of the random variables ${C_k}$ and ${C}$. As it turns out this is an extremely classical subject, but as an exercise I worked out what I needed using a quite tedious computation involving generating functions that I will not reproduce here. But the resulting identities I got were so nice, that they strongly suggested the existence of elementary bijective (or “double counting”) proofs, in which the identities are proven with a minimum of computation, by interpreting each side of the identity as the cardinality (or probability) of the same quantity (or event), viewed in two different ways. I then found these bijective proofs, which I found to be rather cute; again, these are all extremely classical (closely related, for instance, to Stirling numbers of the first kind), but I thought some readers might be interested in trying to find these proofs themselves as an exercise (and I also wanted a place to write the identities down so I could retrieve them later), so I have listed the identities I found below.

1. For any ${1 \leq k \leq n}$, one has ${{\bf E} C_k = \frac{1}{k}}$. In particular, ${{\bf E} C = 1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + O(1)}$.
2. More generally, for any ${1 \leq k \leq n}$ and ${j \geq 1}$ with ${jk \leq n}$, one has ${{\bf E} \binom{C_k}{j} = \frac{1}{k^j j!}}$.
3. More generally still, for any ${1 \leq k_1 < \ldots < k_r \leq n}$ and ${j_1,\ldots,j_r \geq 1}$ with ${\sum_{i=1}^r j_i k_i \leq n}$, one has

$\displaystyle {\bf E} \prod_{i=1}^r \binom{C_{k_i}}{j_i} = \prod_{i=1}^r \frac{1}{k_i^{j_i} j_i!}.$

4. In particular, we have Cauchy’s formula: if ${\sum_{k=1}^n j_k k = n}$, then the probability that ${C_k = j_k}$ for all ${k=1,\ldots,n}$ is precisely ${\prod_{k=1}^n \frac{1}{k^{j_k} j_k!}}$. (This in particular leads to a reasonably tractable formula for the joint generating function of the ${C_k}$, which is what I initially used to compute everything that I needed, before finding the slicker bijective proofs.)
5. For fixed ${k}$, ${C_k}$ converges in distribution as ${n \rightarrow \infty}$ to the Poisson distribution of intensity ${\frac{1}{k}}$.
6. More generally, for fixed ${1 \leq k_1 < \ldots < k_r}$, ${C_{k_1},\ldots,C_{k_r}}$ converge in joint distribution to ${r}$ independent Poisson distributions of intensity ${\frac{1}{k_1},\ldots,\frac{1}{k_r}}$ respectively. (A more precise version of this claim can be found in this paper of Arratia and Tavaré.)
7. One has ${{\bf E} 2^C = n+1}$.
8. More generally, one has ${{\bf E} m^C = \binom{n+m-1}{n}}$ for all natural numbers ${m}$.

One of the basic problems in analytic number theory is to estimate sums of the form

$\displaystyle \sum_{p

as ${x \rightarrow \infty}$, where ${p}$ ranges over primes and ${f}$ is some explicit function of interest (e.g. a linear phase function ${f(p) = e^{2\pi i \alpha p}}$ for some real number ${\alpha}$). This is essentially the same task as obtaining estimates on the sum

$\displaystyle \sum_{n

where ${\Lambda}$ is the von Mangoldt function. If ${f}$ is bounded, ${f(n)=O(1)}$, then from the prime number theorem one has the trivial bound

$\displaystyle \sum_{n

but often (when ${f}$ is somehow “oscillatory” in nature) one is seeking the refinement

$\displaystyle \sum_{n

or equivalently

$\displaystyle \sum_{p

Thanks to identities such as

$\displaystyle \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)$

where ${\mu}$ is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

$\displaystyle \sum_{n

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of ${\log x}$ before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When ${f}$ is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of ${f}$. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

$\displaystyle \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),$

the Type I sum

$\displaystyle -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),$

the Type II sum

$\displaystyle -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),$

and the error term ${\sum_{d \leq V} \Lambda(n) f(n)}$, whenever ${1 \leq U, V \leq x}$ are parameters, and ${a, b}$ are the sequences

$\displaystyle a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)$

and

$\displaystyle b(m) := \sum_{d|m: d \leq U} \mu(d).$

Similarly one can express (4) as the Type I sum

$\displaystyle -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),$

the Type II sum

$\displaystyle - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)$

and the error term ${\sum_{d \leq UV} \mu(n) f(N)}$, whenever ${1 \leq U,V \leq x}$ with ${UV \leq x}$, and ${c}$ is the sequence

$\displaystyle c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).$

After eliminating troublesome sequences such as ${a(), b(), c()}$ via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

$\displaystyle \sum_{r \leq y} f(rd)$

or Type II sums such as

$\displaystyle \sum_{r \leq y} f(rd) \overline{f(rd')}$

for various ${y, d, d' \geq 1}$. Here, the trivial bound is ${O(y)}$, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like ${O( \frac{y}{\log^C y})}$ for some constant ${C}$ (e.g. ${C=5}$) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of ${f}$. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let ${f: {\bf N} \rightarrow {\bf C}}$ be a bounded function such that

$\displaystyle \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)$

for any distinct primes ${p, q}$ (where the decay rate of the error term ${o(x)}$ may depend on ${p}$ and ${q}$). Then

$\displaystyle \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)$

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which ${\mu}$ can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that ${\sum_{n \leq x} f(n) = o(x)}$ if one has ${\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)}$ for each fixed non-zero ${h}$.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

$\displaystyle \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)$

for any irrational ${\alpha}$. (For rational ${\alpha}$, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then ${\mu(n)}$ exhibits strong correlation with ${f(n)}$; by change of variables, we then expect ${\mu(pn)}$ to correlate with ${f(pn)}$ and ${\mu(pm)}$ to correlate with ${f(qn)}$, for “typical” ${p,q}$ at least. On the other hand, since ${\mu}$ is multiplicative, ${\mu(pn)}$ exhibits strong correlation with ${\mu(qn)}$. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if ${P}$ is a “large” but finite set of primes (in the sense that the sum ${A := \sum_{p \in P} \frac{1}{p}}$ is large), then for a typical large number ${n}$ (much larger than the elements of ${P}$), the number of primes in ${P}$ that divide ${n}$ is pretty close to ${A = \sum_{p \in P} \frac{1}{p}}$:

$\displaystyle \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)$

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against ${\mu(n) f(n)}$ and obtain an approximation

$\displaystyle \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)$

that approximates a sum of ${\mu(n) f(n)}$ by a bunch of sparser sums of ${\mu(n) f(n)}$. Since

$\displaystyle x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},$

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

$\displaystyle \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})$

for all ${p \in P}$ (or at least for “most” ${p \in P}$).

Now we make the change of variables ${n = pm}$. As the Möbius function is multiplicative, we usually have ${\mu(n) = \mu(p) \mu(m) = - \mu(m)}$. (There is an exception when ${n}$ is divisible by ${p^2}$, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

$\displaystyle \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )$

for most ${p \in P}$. However, by the hypothesis (5), the sequences ${m \mapsto f(pm)}$ are asymptotically orthogonal as ${p}$ varies, and this claim will then follow from a Cauchy-Schwarz argument.

Let ${G = (G,+)}$ be a finite additive group. A tiling pair is a pair of non-empty subsets ${A, B}$ such that every element of ${G}$ can be written in exactly one way as a sum of an element ${a}$ of ${A}$ and an element ${b}$ of ${B}$, in which case we write ${G = A \oplus B}$. The sets ${A, B}$ are then called tiles, with ${B}$ being a complementary tile to ${A}$ and vice versa. For instance, every subgroup ${H}$ of ${G}$ is a tile, as one can pick one representative from each coset of ${H}$ to form the complementary tile. Conversely, any set formed by taking one representative from each coset of ${H}$ is also a tile.

Tiles can be quite complicated, particularly when the group ${G}$ is “high-dimensional”. We will therefore restrict to the simple case of a cyclic group ${G = {\bf Z}/N{\bf Z}}$, and restrict even further to the special case when the modulus ${N}$ is square-free. Here, the situation should be much simpler. In particular, we have the following conjecture of Coven and Meyerowitz, which asserts that the previous construction of a tile is, in fact, the only such construction:

Conjecture 1 (Coven-Meyerowitz conjecture, square-free case) Let ${N}$ be square-free, and let ${A}$ be a tile of ${{\bf Z}/N{\bf Z}}$. Then there exists a subgroup ${H}$ of ${{\bf Z}/N{\bf Z}}$ such that ${A}$ consists of a single representative from each coset of ${H}$.

Note that in the square-free case, every subgroup ${H}$ of ${{\bf Z}/N{\bf Z}}$ has a complementary subgroup ${H^\perp}$ (thus ${{\bf Z}/N{\bf Z} = H \oplus H^\perp}$). In particular, ${H}$ consists of a single representative from each coset of ${H^\perp}$, and so the examples of subgroups of ${{\bf Z}/N{\bf Z}}$ are covered by the above conjecture in the square-free case.

In the non-square free case, the above assertion is not true; for instance, if ${p}$ is a prime, then the multiples of ${p}$ in ${{\bf Z}/p^2{\bf Z}}$ are a tile, but cannot be formed from taking a single representative from all the cosets of a given subgroup. There is a more general conjecture of Coven and Meyerowitz to handle this more general case, although it is more difficult to state:

Conjecture 2 (Coven-Meyerowitz conjecture, general case) Let ${N}$ be a natural number, and let ${A}$ be a tile of ${{\bf Z}/N{\bf Z}}$. Then there exists a set ${S_A}$ of prime powers with ${|A| = \prod_{p^j \in S_A} p}$ such that the Fourier transform

$\displaystyle \hat 1_A(k) := \sum_{n \in A} e^{2\pi i kn / N}$

vanishes whenever ${k}$ is a non-zero element of ${{\bf Z}/N{\bf Z}}$ whose order is the product of elements of ${S_A}$ that are powers of distinct primes. Equivalently, the generating polynomial ${\sum_{n \in A} x^n}$ is divisible by the cyclotomic polynomials ${\phi_m}$ whenever ${m}$ is the product of elements of ${S_A}$ that are powers of distinct primes.

It can be shown (with a modest amount of effort) that Conjecture 2 implies Conjecture 1, but we will not do so here, focusing instead exclusively on the square-free case for simplicity.

It was observed by Laba that Conjecture 2 is connected to the following well-known conjecture of Fuglede:

Conjecture 3 (One-dimensional Fuglede conjecture, tiling to spectral direction) Let ${E}$ be a compact subset of ${{\bf R}}$ of positive measure which is a tile (thus ${{\bf R} = E \oplus \Lambda}$ for some set ${\Lambda \subset {\bf R}}$). Then ${L^2(E)}$ (with Lebesgue measure) has a spectrum, that is to say an orthogonal set of plane waves ${x \mapsto e^{2\pi i \xi x}}$.

Indeed, it was shown by Laba that Conjecture 2 implies Conjecture 3 in the case when ${E}$ is the finite union of unit intervals. Actually, thanks to the more recent work of Farkas, Matolcsi, and Mora we know that Conjecture 2 in fact implies the universal spectrum conjecture of Lagarias and Wang, which in turn was known to imply Conjecture 3 in full generality. (On the other hand, the conjecture fails in four and higher dimensions; see the papers of Kolountzakis-Matolcsi and of Farkas-Revesz.)

Given the simple statement of Conjecture 1, it is perhaps somewhat surprising that it remains open, even in simple cases such as when ${N}$ is the product of just four primes. One reason for this is that some plausible strengthenings of this conjecture (such as the Tijdeman-Sands conjecture) are known to be false, as we will review below. On the other hand, as we shall see, tiling sets have a lot of combinatorial structure, and in principle one should be able to work out a lot of special cases of the conjecture. Given the combinatorial nature of this problem, it may well be quite suitable for a polymath project in fact, if there is sufficient interest.

In the last set of notes, we obtained the following structural theorem concerning approximate groups:

Theorem 1 Let ${A}$ be a finite ${K}$-approximate group. Then there exists a coset nilprogression ${P}$ of rank and step ${O_K(1)}$ contained in ${A^4}$, such that ${A}$ is covered by ${O_K(1)}$ left-translates of ${P}$ (and hence also by ${O_K(1)}$ right-translates of ${P}$).

Remark 1 Under some mild additional hypotheses (e.g. if the dimensions of ${P}$ are sufficiently large, or if ${P}$ is placed in a certain “normal form”, details of which may be found in this paper), a coset nilprogression ${P}$ of rank and step ${O_K(1)}$ will be an ${O_K(1)}$-approximate group, thus giving a partial converse to Theorem 1. (It is not quite a full converse though, even if one works qualitatively and forgets how the constants depend on ${K}$: if ${A}$ is covered by a bounded number of left- and right-translates ${gP, Pg}$ of ${P}$, one needs the group elements ${g}$ to “approximately normalise” ${P}$ in some sense if one wants to then conclude that ${A}$ is an approximate group.) The mild hypotheses alluded to above can be enforced in the statement of the theorem, but we will not discuss this technicality here, and refer the reader to the above-mentioned paper for details.

By placing the coset nilprogression in a virtually nilpotent group, we have the following corollary in the global case:

Corollary 2 Let ${A}$ be a finite ${K}$-approximate group in an ambient group ${G}$. Then ${A}$ is covered by ${O_K(1)}$ left cosets of a virtually nilpotent subgroup ${G'}$ of ${G}$.

In this final set of notes, we give some applications of the above results. The first application is to replace “${K}$-approximate group” by “sets of bounded doubling”:

Proposition 3 Let ${A}$ be a finite non-empty subset of a (global) group ${G}$ such that ${|A^2| \leq K |A|}$. Then there exists a coset nilprogression ${P}$ of rank and step ${O_K(1)}$ and cardinality ${|P| \gg_K |A|}$ such that ${A}$ can be covered by ${O_K(1)}$ left-translates of ${P}$, and also by ${O_K(1)}$ right-translates of ${P}$.

We will also establish (a strengthening of) a well-known theorem of Gromov on groups of polynomial growth, as promised back in Notes 0, as well as a variant result (of a type known as a “generalised Margulis lemma”) controlling the almost stabilisers of discrete actions of isometries.

The material here is largely drawn from my recent paper with Emmanuel Breuillard and Ben Green.

[Once again, some advertising on behalf of my department, following on a similar announcement in the previous two years.]

Two years ago, the UCLA mathematics department launched a scholarship opportunity for entering freshman students with exceptional background and promise in mathematics. We have offered one scholarship every year, but this year due to an additional source of funding, we will also be able to offer an additional scholarship for California residents.The UCLA Math Undergraduate Merit Scholarship provides for full tuition, and a room and board allowance for 4 years. In addition, scholarship recipients follow an individualized accelerated program of study, as determined after consultation with UCLA faculty.   The program of study leads to a Masters degree in Mathematics in four years.
More information and an application form for the scholarship can be found on the web at:
and
To be considered for Fall 2012, candidates must apply for the scholarship and also for admission to UCLA on or before November 30, 2011.

A common theme in mathematical analysis (particularly in analysis of a “geometric” or “statistical” flavour) is the interplay between “macroscopic” and “microscopic” scales. These terms are somewhat vague and imprecise, and their interpretation depends on the context and also on one’s choice of normalisations, but if one uses a “macroscopic” normalisation, “macroscopic” scales correspond to scales that are comparable to unit size (i.e. bounded above and below by absolute constants), while “microscopic” scales are much smaller, being the minimal scale at which nontrivial behaviour occurs. (Other normalisations are possible, e.g. making the microscopic scale a unit scale, and letting the macroscopic scale go off to infinity; for instance, such a normalisation is often used, at least initially, in the study of groups of polynomial growth. However, for the theory of approximate groups, a macroscopic scale normalisation is more convenient.)

One can also consider “mesoscopic” scales which are intermediate between microscopic and macroscopic scales, or large-scale behaviour at scales that go off to infinity (and in particular are larger than the macroscopic range of scales), although the behaviour of these scales will not be the main focus of this post. Finally, one can divide the macroscopic scales into “local” macroscopic scales (less than ${\epsilon}$ for some small but fixed ${\epsilon>0}$) and “global” macroscopic scales (scales that are allowed to be larger than a given large absolute constant ${C}$). For instance, given a finite approximate group ${A}$:

• Sets such as ${A^m}$ for some fixed ${m}$ (e.g. ${A^{10}}$) can be considered to be sets at a global macroscopic scale. Sending ${m}$ to infinity, one enters the large-scale regime.
• Sets such as the sets ${S}$ that appear in the Sanders lemma from the previous set of notes (thus ${S^m \subset A^4}$ for some fixed ${m}$, e.g. ${m=100}$) can be considered to be sets at a local macroscopic scale. Sending ${m}$ to infinity, one enters the mesoscopic regime.
• The non-identity element ${u}$ of ${A}$ that is “closest” to the identity in some suitable metric (cf. the proof of Jordan’s theorem from Notes 0) would be an element associated to the microscopic scale. The orbit ${u, u^2, u^3, \ldots}$ starts out at microscopic scales, and (assuming some suitable “escape” axioms) will pass through mesoscopic scales and finally entering the macroscopic regime. (Beyond this point, the orbit may exhibit a variety of behaviours, such as periodically returning back to the smaller scales, diverging off to ever larger scales, or filling out a dense subset of some macroscopic set; the escape axioms we will use do not exclude any of these possibilities.)

For comparison, in the theory of locally compact groups, properties about small neighbourhoods of the identity (e.g. local compactness, or the NSS property) would be properties at the local macroscopic scale, whereas the space ${L(G)}$ of one-parameter subgroups can be interpreted as an object at the microscopic scale. The exponential map then provides a bridge connecting the microscopic and macroscopic scales.

We return now to approximate groups. The macroscopic structure of these objects is well described by the Hrushovski Lie model theorem from the previous set of notes, which informally asserts that the macroscopic structure of an (ultra) approximate group can be modeled by a Lie group. This is already an important piece of information about general approximate groups, but it does not directly reveal the full structure of such approximate groups, because these Lie models are unable to see the microscopic behaviour of these approximate groups.

To illustrate this, let us review one of the examples of a Lie model of an ultra approximate group, namely Exercise 28 from Notes 7. In this example one studied a “nilbox” from a Heisenberg group, which we rewrite here in slightly different notation. Specifically, let ${G}$ be the Heisenberg group

$\displaystyle G := \{ (a,b,c): a,b,c \in {\bf Z} \}$

with group law

$\displaystyle (a,b,c) \ast (a',b',c') := (a+a', b+b', c+c'+ab') \ \ \ \ \ (1)$

and let ${A = \prod_{n \rightarrow \alpha} A_n}$, where ${A_n \subset G}$ is the box

$\displaystyle A_n := \{ (a,b,c) \in G: |a|, |b| \leq n; |c| \leq n^{10} \};$

thus ${A}$ is the nonstandard box

$\displaystyle A := \{ (a,b,c) \in {}^* G: |a|, |b| \leq N; |c| \leq N^{10} \}$

where ${N := \lim_{n \rightarrow \alpha} n}$. As the above exercise establishes, ${A \cup A^{-1}}$ is an ultra approximate group with a Lie model ${\pi: \langle A \rangle \rightarrow {\bf R}^3}$ given by the formula

$\displaystyle \pi( a, b, c ) := ( \hbox{st} \frac{a}{N}, \hbox{st} \frac{b}{N}, \hbox{st} \frac{c}{N^{10}} )$

for ${a,b = O(N)}$ and ${c = O(N^{10})}$. Note how the nonabelian nature of ${G}$ (arising from the ${ab'}$ term in the group law (1)) has been lost in the model ${{\bf R}^3}$, because the effect of that nonabelian term on ${\frac{c}{N^{10}}}$ is only ${O(\frac{N^2}{N^8})}$ which is infinitesimal and thus does not contribute to the standard part. In particular, if we replace ${G}$ with the abelian group ${G' := \{(a,b,c): a,b,c \in {\bf Z} \}}$ with the additive group law

$\displaystyle (a,b,c) \ast' (a',b',c') := (a+a',b+b',c+c')$

and let ${A'}$ and ${\pi'}$ be defined exactly as with ${A}$ and ${\pi}$, but placed inside the group structure of ${G'}$ rather than ${G}$, then ${A \cup A^{-1}}$ and ${A' \cup (A')^{-1}}$ are essentially “indistinguishable” as far as their models by ${{\bf R}^3}$ are concerned, even though the latter approximate group is abelian and the former is not. The problem is that the nonabelian-ness in the former example is so microscopic that it falls entirely inside the kernel of ${\pi}$ and is thus not detected at all by the model.

The problem of not being able to “see” the microscopic structure of a group (or approximate group) also was a key difficulty in the theory surrounding Hilbert’s fifth problem that was discussed in previous notes. A key tool in being able to resolve such structure was to build left-invariant metrics ${d}$ (or equivalently, norms ${\| \|}$) on one’s group, which obeyed useful “Gleason axioms” such as the commutator axiom

$\displaystyle \| [g,h] \| \ll \|g\| \|h\| \ \ \ \ \ (2)$

for sufficiently small ${g,h}$, or the escape axiom

$\displaystyle \| g^n \| \gg |n| \|g\| \ \ \ \ \ (3)$

when ${|n| \|g\|}$ was sufficiently small. Such axioms have important and non-trivial content even in the microscopic regime where ${g}$ or ${h}$ are extremely close to the identity. For instance, in the proof of Jordan’s theorem from Notes 0, which showed that any finite unitary group ${G}$ was boundedly virtually abelian, a key step was to apply the commutator axiom (2) (for the distance to the identity in operator norm) to the most “microscopic” element of ${G}$, or more precisely a non-identity element of ${G}$ of minimal norm. The key point was that this microscopic element was virtually central in ${G}$, and as such it restricted much of ${G}$ to a lower-dimensional subgroup of the unitary group, at which point one could argue using an induction-on-dimension argument. As we shall see, a similar argument can be used to place “virtually nilpotent” structure on finite approximate groups. For instance, in the Heisenberg-type approximate groups ${A \cup A^{-1}}$ and ${A' \cup (A')^{-1}}$ discussed earlier, the element ${(0,0,1)}$ will be “closest to the origin” in a suitable sense to be defined later, and is centralised by both approximate groups; quotienting out (the orbit of) that central element and iterating the process two more times, we shall see that one can express both ${A \cup A^{-1}}$ and ${A'\cup (A')^{-1}}$ as a tower of central cyclic extensions, which in particular establishes the nilpotency of both groups.

The escape axiom (3) is a particularly important axiom in connecting the microscopic structure of a group ${G}$ to its macroscopic structure; for instance, as shown in Notes 2, this axiom (in conjunction with the closely related commutator axiom) tends to imply dilation estimates such as ${d( g^n, h^n ) \sim n d(g,h)}$ that allow one to understand the microscopic geometry of points ${g,h}$ close to the identity in terms of the (local) macroscopic geometry of points ${g^n, h^n}$ that are significantly further away from the identity.

It is thus of interest to build some notion of a norm (or left-invariant metrics) on an approximate group ${A}$ that obeys the escape and commutator axioms (while being non-degenerate enough to adequately capture the geometry of ${A}$ in some sense), in a fashion analogous to the Gleason metrics that played such a key role in the theory of Hilbert’s fifth problem. It is tempting to use the Lie model theorem to do this, since Lie groups certainly come with Gleason metrics. However, if one does this, one ends up, roughly speaking, with a norm on ${A}$ that only obeys the escape and commutator estimates macroscopically; roughly speaking, this means that one has a macroscopic commutator inequality

$\displaystyle \| [g,h] \| \ll \|g\| \|h\| + o(1)$

and a macroscopic escape property

$\displaystyle \| g^n \| \gg |n| \|g\| - o(|n|)$

but such axioms are too weak for analysis at the microscopic scale, and in particular in establishing centrality of the element closest to the identity.

Another way to proceed is to build a norm that is specifically designed to obey the crucial escape property. Given an approximate group ${A}$ in a group ${G}$, and an element ${g}$ of ${G}$, we can define the escape norm ${\|g\|_{e,A}}$ of ${g}$ by the formula

$\displaystyle \| g \|_{e,A} := \inf \{ \frac{1}{n+1}: n \in {\bf N}: g, g^2, \ldots, g^n \in A \}.$

Thus, ${\|g\|_{e,A}}$ equals ${1}$ if ${g}$ lies outside of ${A}$, equals ${1/2}$ if ${g}$ lies in ${A}$ but ${g^2}$ lies outside of ${A}$, and so forth. Such norms had already appeared in Notes 4, in the context of analysing NSS groups.

As it turns out, this expression will obey an escape axiom, as long as we place some additional hypotheses on ${A}$ which we will present shortly. However, it need not actually be a norm; in particular, the triangle inequality

$\displaystyle \|gh\|_{e,A} \leq \|g\|_{e,A} + \|h\|_{e,A}$

is not necessarily true. Fortunately, it turns out that by a (slightly more complicated) version of the Gleason machinery from Notes 4 we can establish a usable substitute for this inequality, namely the quasi-triangle inequality

$\displaystyle \|g_1 \ldots g_k \|_{e,A} \leq C (\|g_1\|_{e,A} + \ldots + \|g_k\|_{e,A}),$

where ${C}$ is a constant independent of ${k}$. As we shall see, these estimates can then be used to obtain a commutator estimate (2).

However, to do all this, it is not enough for ${A}$ to be an approximate group; it must obey two additional “trapping” axioms that improve the properties of the escape norm. We formalise these axioms (somewhat arbitrarily) as follows:

Definition 1 (Strong approximate group) Let ${K \geq 1}$. A strong ${K}$-approximate group is a finite ${K}$-approximate group ${A}$ in a group ${G}$ with a symmetric subset ${S}$ obeying the following axioms:

An ultra strong ${K}$-approximate group is an ultraproduct ${A = \prod_{n \rightarrow \alpha} A_n}$ of strong ${K}$-approximate groups.

The first trapping condition can be rewritten as

$\displaystyle \|g\|_{e,A} \leq 1000 \|g\|_{e,A^{100}}$

and the second trapping condition can similarly be rewritten as

$\displaystyle \|g\|_{e,S} \leq 10^6 K^3 \|g\|_{e,A}.$

This makes the escape norms of ${A, A^{100}}$, and ${S}$ comparable to each other, which will be needed for a number of reasons (and in particular to close a certain bootstrap argument properly). Compare this with equation (12) from Notes 4, which used the NSS hypothesis to obtain similar conclusions. Thus, one can view the strong approximate group axioms as being a sort of proxy for the NSS property.

Example 1 Let ${N}$ be a large natural number. Then the interval ${A = [-N,N]}$ in the integers is a ${2}$-approximate group, which is also a strong ${2}$-approximate group (setting ${S = [10^{-6} N, 10^{-6} N]}$, for instance). On the other hand, if one places ${A}$ in ${{\bf Z}/5N{\bf Z}}$ rather than in the integers, then the first trapping condition is lost and one is no longer a strong ${2}$-approximate group. Also, if one remains in the integers, but deletes a few elements from ${A}$, e.g. deleting ${\pm \lfloor 10^{-10} N\rfloor}$ from ${A}$), then one is still a ${O(1)}$-approximate group, but is no longer a strong ${O(1)}$-approximate group, again because the first trapping condition is lost.

A key consequence of the Hrushovski Lie model theorem is that it allows one to replace approximate groups by strong approximate groups:

Exercise 1 (Finding strong approximate groups)

• (i) Let ${A}$ be an ultra approximate group with a good Lie model ${\pi: \langle A \rangle \rightarrow L}$, and let ${B}$ be a symmetric convex body (i.e. a convex open bounded subset) in the Lie algebra ${{\mathfrak l}}$. Show that if ${r>0}$ is a sufficiently small standard number, then there exists a strong ultra approximate group ${A'}$ with

$\displaystyle \pi^{-1}(\exp(rB)) \subset A' \subset \pi^{-1}(\exp(1.1 rB)) \subset A,$

and with ${A}$ can be covered by finitely many left translates of ${A'}$. Furthermore, ${\pi}$ is also a good model for ${A'}$.

• (ii) If ${A}$ is a finite ${K}$-approximate group, show that there is a strong ${O_K(1)}$-approximate group ${A'}$ inside ${A^4}$ with the property that ${A}$ can be covered by ${O_K(1)}$ left translates of ${A'}$. (Hint: use (i), Hrushovski’s Lie model theorem, and a compactness and contradiction argument.)

The need to compare the strong approximate group to an exponentiated small ball ${\exp(rB)}$ will be convenient later, as it allows one to easily use the geometry of ${L}$ to track various aspects of the strong approximate group.

As mentioned previously, strong approximate groups exhibit some of the features of NSS locally compact groups. In Notes 4, we saw that the escape norm for NSS locally compact groups was comparable to a Gleason metric. The following theorem is an analogue of that result:

Theorem 2 (Gleason lemma) Let ${A}$ be a strong ${K}$-approximate group in a group ${G}$.

• (Symmetry) For any ${g \in G}$, one has ${\|g^{-1}\|_{e,A} = \|g\|_{e,A}}$.
• (Conjugacy bound) For any ${g, h \in A^{10}}$, one has ${\|g^h\|_{e,A} \ll \|g\|_{e,A}}$.
• (Triangle inequality) For any ${g_1,\ldots,g_k \in G}$, one has ${\|g_1 \ldots g_k \|_{e,A} \ll_K (\|g_1\|_{e,A} + \ldots + \|g_k\|_{e,A})}$.
• (Escape property) One has ${\|g^n\|_{e,A} \gg |n| \|g\|_{e,A}}$ whenever ${|n| \|g\|_{e,A} < 1}$.
• (Commutator inequality) For any ${g,h \in A^{10}}$, one has ${\| [g,h] \|_{e,A} \ll_K \|g\|_{e,A} \|h\|_{e,A}}$.

The proof of this theorem will occupy a large part of the current set of notes. We then aim to use this theorem to classify strong approximate groups. The basic strategy (temporarily ignoring a key technical issue) follows the Bieberbach-Frobenius proof of Jordan’s theorem, as given in Notes 0, is as follows.

1. Start with an (ultra) strong approximate group ${A}$.
2. From the Gleason lemma, the elements with zero escape norm form a normal subgroup of ${A}$. Quotient these elements out. Show that all non-identity elements will have positive escape norm.
3. Find the non-identity element ${g_1}$ in (the quotient of) ${A}$ of minimal escape norm. Use the commutator estimate (assuming it is inherited by the quotient) to show that ${g_1}$ will centralise (most of) this quotient. In particular, the orbit ${\langle g_1 \rangle}$ is (essentially) a central subgroup of ${\langle A \rangle}$.
4. Quotient this orbit out; then find the next non-identity element ${g_2}$ in this new quotient of ${A}$. Again, show that ${\langle g_2 \rangle}$ is essentially a central subgroup of this quotient.
5. Repeat this process until ${A}$ becomes entirely trivial. Undoing all the quotients, this should demonstrate that ${\langle A \rangle}$ is virtually nilpotent, and that ${A}$ is essentially a coset nilprogression.

There are two main technical issues to resolve to make this strategy work. The first is to show that the iterative step in the argument terminates in finite time. This we do by returning to the Lie model theorem. It turns out that each time one quotients out by an orbit of an element that escapes, the dimension of the Lie model drops by at least one. This will ensure termination of the argument in finite time.

The other technical issue is that while the quotienting out all the elements of zero escape norm eliminates all “torsion” from ${A}$ (in the sense that the quotient of ${A}$ has no non-trivial elements of zero escape norm), further quotienting operations can inadvertently re-introduce such torsion. This torsion can be re-eradicated by further quotienting, but the price one pays for this is that the final structural description of ${\langle A \rangle}$ is no longer as strong as “virtually nilpotent”, but is instead a more complicated tower alternating between (ultra) finite extensions and central extensions.

Example 2 Consider the strong ${O(1)}$-approximate group

$\displaystyle A := \{ a N^{10} + 5 b: |a| \leq N; |b| \leq N^2 \}$

in the integers, where ${N}$ is a large natural number not divisible by ${5}$. As ${{\bf Z}}$ is torsion-free, all non-zero elements of ${A}$ have positive escape norm, and the nonzero element of minimal escape norm here is ${g=5}$ (or ${g=-5}$). But if one quotients by ${\langle g \rangle}$, ${A}$ projects down to ${{\bf Z}/5{\bf Z}}$, which now has torsion (and all elements in this quotient have zero escape norm). Thus torsion has been re-introduced by the quotienting operation. (A related observation is that the intersection of ${A}$ with ${\langle g \rangle = 5{\bf Z}}$ is not a simple progression, but is a more complicated object, namely a generalised arithmetic progression of rank two.)

To deal with this issue, we will not quotient out by the entire cyclic group ${\langle g \rangle = \{g^n: n \in {\bf Z} \}}$ generated by the element ${g}$ of minimal escape norm, but rather by an arithmetic progression ${P = \{g^n: |n| \leq N\}}$, where ${N}$ is a natural number comparable to the reciprocal ${1/\|g\|_{e,A}}$ of the escape norm, as this will be enough to cut the dimension of the Lie model down by one without introducing any further torsion. Of course, this cannot be done in the category of global groups, since the arithmetic progression ${P}$ will not, in general, be a group. However, it is still a local group, and it turns out that there is an analogue of the quotient space construction in local groups. This fixes the problem, but at a cost: in order to make the inductive portion of the argument work smoothly, it is now more natural to place the entire argument inside the category of local groups rather than global groups, even though the primary interest in approximate groups ${A}$ is in the global case when ${A}$ lies inside a global group. This necessitates some technical modification to some of the preceding discussion (for instance, the Gleason-Yamabe theorem must be replaced by the local version of this theorem, due to Goldbring); details can be found in this recent paper of Emmanuel Breuillard, Ben Green, and myself, but will only be sketched here.