You are currently browsing the category archive for the ‘math.MG’ category.

This fall (starting Monday, September 26), I will be teaching a graduate topics course which I have entitled “Hilbert’s fifth problem and related topics.” The course is going to focus on three related topics:

• Hilbert’s fifth problem on the topological description of Lie groups, as well as the closely related (local) classification of locally compact groups (the Gleason-Yamabe theorem).
• Approximate groups in nonabelian groups, and their classification via the Gleason-Yamabe theorem (this is very recent work of Emmanuel Breuillard, Ben Green, Tom Sanders, and myself, building upon earlier work of Hrushovski);
• Gromov’s theorem on groups of polynomial growth, as proven via the classification of approximate groups (as well as some consequences to fundamental groups of Riemannian manifolds).

I have already blogged about these topics repeatedly in the past (particularly with regard to Hilbert’s fifth problem), and I intend to recycle some of that material in the lecture notes for this course.

The above three families of results exemplify two broad principles (part of what I like to call “the dichotomy between structure and randomness“):

• (Rigidity) If a group-like object exhibits a weak amount of regularity, then it (or a large portion thereof) often automatically exhibits a strong amount of regularity as well;
• (Structure) This strong regularity manifests itself either as Lie type structure (in continuous settings) or nilpotent type structure (in discrete settings). (In some cases, “nilpotent” should be replaced by sister properties such as “abelian“, “solvable“, or “polycyclic“.)

Let me illustrate what I mean by these two principles with two simple examples, one in the continuous setting and one in the discrete setting. We begin with a continuous example. Given an ${n \times n}$ complex matrix ${A \in M_n({\bf C})}$, define the matrix exponential ${\exp(A)}$ of ${A}$ by the formula

$\displaystyle \exp(A) := \sum_{k=0}^\infty \frac{A^k}{k!} = 1 + A + \frac{1}{2!} A^2 + \frac{1}{3!} A^3 + \ldots$

which can easily be verified to be an absolutely convergent series.

Exercise 1 Show that the map ${A \mapsto \exp(A)}$ is a real analytic (and even complex analytic) map from ${M_n({\bf C})}$ to ${M_n({\bf C})}$, and obeys the restricted homomorphism property

$\displaystyle \exp(sA) \exp(tA) = \exp((s+t)A) \ \ \ \ \ (1)$

for all ${A \in M_n({\bf C})}$ and ${s,t \in {\bf C}}$.

Proposition 1 (Rigidity and structure of matrix homomorphisms) Let ${n}$ be a natural number. Let ${GL_n({\bf C})}$ be the group of invertible ${n \times n}$ complex matrices. Let ${\Phi: {\bf R} \rightarrow GL_n({\bf C})}$ be a map obeying two properties:

• (Group-like object) ${\Phi}$ is a homomorphism, thus ${\Phi(s) \Phi(t) = \Phi(s+t)}$ for all ${s,t \in {\bf R}}$.
• (Weak regularity) The map ${t \mapsto \Phi(t)}$ is continuous.

Then:

• (Strong regularity) The map ${t \mapsto \Phi(t)}$ is smooth (i.e. infinitely differentiable). In fact it is even real analytic.
• (Lie-type structure) There exists a (unique) complex ${n \times n}$ matrix ${A}$ such that ${\Phi(t) = \exp(tA)}$ for all ${t \in {\bf R}}$.

Proof: Let ${\Phi}$ be as above. Let ${\epsilon > 0}$ be a small number (depending only on ${n}$). By the homomorphism property, ${\Phi(0) = 1}$ (where we use ${1}$ here to denote the identity element of ${GL_n({\bf C})}$), and so by continuity we may find a small ${t_0>0}$ such that ${\Phi(t) = 1 + O(\epsilon)}$ for all ${t \in [-t_0,t_0]}$ (we use some arbitrary norm here on the space of ${n \times n}$ matrices, and allow implied constants in the ${O()}$ notation to depend on ${n}$).

The map ${A \mapsto \exp(A)}$ is real analytic and (by the inverse function theorem) is a diffeomorphism near ${0}$. Thus, by the inverse function theorem, we can (if ${\epsilon}$ is small enough) find a matrix ${B}$ of size ${B = O(\epsilon)}$ such that ${\Phi(t_0) = \exp(B)}$. By the homomorphism property and (1), we thus have

$\displaystyle \Phi(t_0/2)^2 = \Phi(t_0) = \exp(B) = \exp(B/2)^2.$

On the other hand, by another application of the inverse function theorem we see that the squaring map ${A \mapsto A^2}$ is a diffeomorphism near ${1}$ in ${GL_n({\bf C})}$, and thus (if ${\epsilon}$ is small enough)

$\displaystyle \Phi(t_0/2) = \exp(B/2).$

We may iterate this argument (for a fixed, but small, value of ${\epsilon}$) and conclude that

$\displaystyle \Phi(t_0/2^k) = \exp(B/2^k)$

for all ${k = 0,1,2,\ldots}$. By the homomorphism property and (1) we thus have

$\displaystyle \Phi(qt_0) = \exp(qB)$

whenever ${q}$ is a dyadic rational, i.e. a rational of the form ${a/2^k}$ for some integer ${a}$ and natural number ${k}$. By continuity we thus have

$\displaystyle \Phi(st_0) = \exp(sB)$

for all real ${s}$. Setting ${A := B/t_0}$ we conclude that

$\displaystyle \Phi(t) = \exp(tA)$

for all real ${t}$, which gives existence of the representation and also real analyticity and smoothness. Finally, uniqueness of the representation ${\Phi(t) = \exp(tA)}$ follows from the identity

$\displaystyle A = \frac{d}{dt} \exp(tA)|_{t=0}.$

$\Box$

Exercise 2 Generalise Proposition 1 by replacing the hypothesis that ${\Phi}$ is continuous with the hypothesis that ${\Phi}$ is Lebesgue measurable (Hint: use the Steinhaus theorem.). Show that the proposition fails (assuming the axiom of choice) if this hypothesis is omitted entirely.

Note how one needs both the group-like structure and the weak regularity in combination in order to ensure the strong regularity; neither is sufficient on its own. We will see variants of the above basic argument throughout the course. Here, the task of obtaining smooth (or real analytic structure) was relatively easy, because we could borrow the smooth (or real analytic) structure of the domain ${{\bf R}}$ and range ${M_n({\bf C})}$; but, somewhat remarkably, we shall see that one can still build such smooth or analytic structures even when none of the original objects have any such structure to begin with.

Now we turn to a second illustration of the above principles, namely Jordan’s theorem, which uses a discreteness hypothesis to upgrade Lie type structure to nilpotent (and in this case, abelian) structure. We shall formulate Jordan’s theorem in a slightly stilted fashion in order to emphasise the adherence to the above-mentioned principles.

Theorem 2 (Jordan’s theorem) Let ${G}$ be an object with the following properties:

• (Group-like object) ${G}$ is a group.
• (Discreteness) ${G}$ is finite.
• (Lie-type structure) ${G}$ is contained in ${U_n({\bf C})}$ (the group of unitary ${n \times n}$ matrices) for some ${n}$.

Then there is a subgroup ${G'}$ of ${G}$ such that

• (${G'}$ is close to ${G}$) The index ${|G/G'|}$ of ${G'}$ in ${G}$ is ${O_n(1)}$ (i.e. bounded by ${C_n}$ for some quantity ${C_n}$ depending only on ${n}$).
• (Nilpotent-type structure) ${G'}$ is abelian.

A key observation in the proof of Jordan’s theorem is that if two unitary elements ${g, h \in U_n({\bf C})}$ are close to the identity, then their commutator ${[g,h] = g^{-1}h^{-1}gh}$ is even closer to the identity (in, say, the operator norm ${\| \|_{op}}$). Indeed, since multiplication on the left or right by unitary elements does not affect the operator norm, we have

$\displaystyle \| [g,h] - 1 \|_{op} = \| gh - hg \|_{op}$

$\displaystyle = \| (g-1)(h-1) - (h-1)(g-1) \|_{op}$

and so by the triangle inequality

$\displaystyle \| [g,h] - 1 \|_{op} \leq 2 \|g-1\|_{op} \|h-1\|_{op}. \ \ \ \ \ (2)$

Now we can prove Jordan’s theorem.

Proof: We induct on ${n}$, the case ${n=1}$ being trivial. Suppose first that ${G}$ contains a central element ${g}$ which is not a multiple of the identity. Then, by definition, ${G}$ is contained in the centraliser ${Z(g)}$ of ${g}$, which by the spectral theorem is isomorphic to a product ${U_{n_1}({\bf C}) \times \ldots \times U_{n_k}({\bf C})}$ of smaller unitary groups. Projecting ${G}$ to each of these factor groups and applying the induction hypothesis, we obtain the claim.

Thus we may assume that ${G}$ contains no central elements other than multiples of the identity. Now pick a small ${\epsilon > 0}$ (one could take ${\epsilon=\frac{1}{10n}}$ in fact) and consider the subgroup ${G'}$ of ${G}$ generated by those elements of ${G}$ that are within ${\epsilon}$ of the identity (in the operator norm). By considering a maximal ${\epsilon}$-net of ${G}$ we see that ${G'}$ has index at most ${O_{n,\epsilon}(1)}$ in ${G}$. By arguing as before, we may assume that ${G'}$ has no central elements other than multiples of the identity.

If ${G'}$ consists only of multiples of the identity, then we are done. If not, take an element ${g}$ of ${G'}$ that is not a multiple of the identity, and which is as close as possible to the identity (here is where we crucially use that ${G}$ is finite). By (2), we see that if ${\epsilon}$ is sufficiently small depending on ${n}$, and if ${h}$ is one of the generators of ${G'}$, then ${[g,h]}$ lies in ${G'}$ and is closer to the identity than ${g}$, and is thus a multiple of the identity. On the other hand, ${[g,h]}$ has determinant ${1}$. Given that it is so close to the identity, it must therefore be the identity (if ${\epsilon}$ is small enough). In other words, ${g}$ is central in ${G'}$, and is thus a multiple of the identity. But this contradicts the hypothesis that there are no central elements other than multiples of the identity, and we are done. $\Box$

Commutator estimates such as (2) will play a fundamental role in many of the arguments we will see in this course; as we saw above, such estimates combine very well with a discreteness hypothesis, but will also be very useful in the continuous setting.

Exercise 3 Generalise Jordan’s theorem to the case when ${G}$ is a finite subgroup of ${GL_n({\bf C})}$ rather than of ${U_n({\bf C})}$. (Hint: The elements of ${G}$ are not necessarily unitary, and thus do not necessarily preserve the standard Hilbert inner product of ${{\bf C}^n}$. However, if one averages that inner product by the finite group ${G}$, one obtains a new inner product on ${{\bf C}^n}$ that is preserved by ${G}$, which allows one to conjugate ${G}$ to a subgroup of ${U_n({\bf C})}$. This averaging trick is (a small) part of Weyl’s unitary trick in representation theory.)

Exercise 4 (Inability to discretise nonabelian Lie groups) Show that if ${n \geq 3}$, then the orthogonal group ${O_n({\bf R})}$ cannot contain arbitrarily dense finite subgroups, in the sense that there exists an ${\epsilon = \epsilon_n > 0}$ depending only on ${n}$ such that for every finite subgroup ${G}$ of ${O_n({\bf R})}$, there exists a ball of radius ${\epsilon}$ in ${O_n({\bf R})}$ (with, say, the operator norm metric) that is disjoint from ${G}$. What happens in the ${n=2}$ case?

Remark 1 More precise classifications of the finite subgroups of ${U_n({\bf C})}$ are known, particularly in low dimensions. For instance, one can show that the only finite subgroups of ${SO_3({\bf R})}$ (which ${SU_2({\bf C})}$ is a double cover of) are isomorphic to either a cyclic group, a dihedral group, or the symmetry group of one of the Platonic solids.

One of the most well known problems from ancient Greek mathematics was that of trisecting an angle by straightedge and compass, which was eventually proven impossible in 1837 by Pierre Wantzel, using methods from Galois theory.

Formally, one can set up the problem as follows. Define a configuration to be a finite collection ${{\mathcal C}}$ of points, lines, and circles in the Euclidean plane. Define a construction step to be one of the following operations to enlarge the collection ${{\mathcal C}}$:

• (Straightedge) Given two distinct points ${A, B}$ in ${{\mathcal C}}$, form the line ${\overline{AB}}$ that connects ${A}$ and ${B}$, and add it to ${{\mathcal C}}$.
• (Compass) Given two distinct points ${A, B}$ in ${{\mathcal C}}$, and given a third point ${O}$ in ${{\mathcal C}}$ (which may or may not equal ${A}$ or ${B}$), form the circle with centre ${O}$ and radius equal to the length ${|AB|}$ of the line segment joining ${A}$ and ${B}$, and add it to ${{\mathcal C}}$.
• (Intersection) Given two distinct curves ${\gamma, \gamma'}$ in ${{\mathcal C}}$ (thus ${\gamma}$ is either a line or a circle in ${{\mathcal C}}$, and similarly for ${\gamma'}$), select a point ${P}$ that is common to both ${\gamma}$ and ${\gamma'}$ (there are at most two such points), and add it to ${{\mathcal C}}$.

We say that a point, line, or circle is constructible by straightedge and compass from a configuration ${{\mathcal C}}$ if it can be obtained from ${{\mathcal C}}$ after applying a finite number of construction steps.

Problem 1 (Angle trisection) Let ${A, B, C}$ be distinct points in the plane. Is it always possible to construct by straightedge and compass from ${A,B,C}$ a line ${\ell}$ through ${A}$ that trisects the angle ${\angle BAC}$, in the sense that the angle between ${\ell}$ and ${BA}$ is one third of the angle of ${\angle BAC}$?

Thanks to Wantzel’s result, the answer to this problem is known to be “no” in general; a generic angle ${\angle BAC}$ cannot be trisected by straightedge and compass. (On the other hand, some special angles can certainly be trisected by straightedge and compass, such as a right angle. Also, one can certainly trisect generic angles using other methods than straightedge and compass; see the Wikipedia page on angle trisection for some examples of this.)

The impossibility of angle trisection stands in sharp contrast to the easy construction of angle bisection via straightedge and compass, which we briefly review as follows:

1. Start with three points ${A, B, C}$.
2. Form the circle ${c_0}$ with centre ${A}$ and radius ${AB}$, and intersect it with the line ${\overline{AC}}$. Let ${D}$ be the point in this intersection that lies on the same side of ${A}$ as ${C}$. (${D}$ may well be equal to ${C}$).
3. Form the circle ${c_1}$ with centre ${B}$ and radius ${AB}$, and the circle ${c_2}$ with centre ${D}$ and radius ${AB}$. Let ${E}$ be the point of intersection of ${c_1}$ and ${c_2}$ that is not ${A}$.
4. The line ${\ell := \overline{AE}}$ will then bisect the angle ${\angle BAC}$.

The key difference between angle trisection and angle bisection ultimately boils down to the following trivial number-theoretic fact:

Lemma 2 There is no power of ${2}$ that is evenly divisible by ${3}$.

Proof: Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic. $\Box$

In contrast, there are of course plenty of powers of ${2}$ that are evenly divisible by ${2}$, and this is ultimately why angle bisection is easy while angle trisection is hard.

The standard way in which Lemma 2 is used to demonstrate the impossibility of angle trisection is via Galois theory. The implication is quite short if one knows this theory, but quite opaque otherwise. We briefly sketch the proof of this implication here, though we will not need it in the rest of the discussion. Firstly, Lemma 2 implies the following fact about field extensions.

Corollary 3 Let ${F}$ be a field, and let ${E}$ be an extension of ${F}$ that can be constructed out of ${F}$ by a finite sequence of quadratic extensions. Then ${E}$ does not contain any cubic extensions ${K}$ of ${F}$.

Proof: If $E$ contained a cubic extension $K$ of $F$, then the dimension of $E$ over $F$ would be a multiple of three. On the other hand, if $E$ is obtained from $F$ by a tower of quadratic extensions, then the dimension of $E$ over $F$ is a power of two. The claim then follows from Lemma 2. $\Box$

To conclude the proof, one then notes that any point, line, or circle that can be constructed from a configuration ${{\mathcal C}}$ is definable in a field obtained from the coefficients of all the objects in ${{\mathcal C}}$ after taking a finite number of quadratic extensions, whereas a trisection of an angle ${\angle ABC}$ will generically only be definable in a cubic extension of the field generated by the coordinates of ${A,B,C}$.

The Galois theory method also allows one to obtain many other impossibility results of this type, most famously the Abel-Ruffini theorem on the insolvability of the quintic equation by radicals. For this reason (and also because of the many applications of Galois theory to number theory and other branches of mathematics), the Galois theory argument is the “right” way to prove the impossibility of angle trisection within the broader framework of modern mathematics. However, this argument has the drawback that it requires one to first understand Galois theory (or at least field theory), which is usually not presented until an advanced undergraduate algebra or number theory course, whilst the angle trisection problem requires only high-school level mathematics to formulate. Even if one is allowed to “cheat” and sweep several technicalities under the rug, one still needs to possess a fair amount of solid intuition about advanced algebra in order to appreciate the proof. (This was undoubtedly one reason why, even after Wantzel’s impossibility result was published, a large amount of effort was still expended by amateur mathematicians to try to trisect a general angle.)

In this post I would therefore like to present a different proof (or perhaps more accurately, a disguised version of the standard proof) of the impossibility of angle trisection by straightedge and compass, that avoids explicit mention of Galois theory (though it is never far beneath the surface). With “cheats”, the proof is actually quite simple and geometric (except for Lemma 2, which is still used at a crucial juncture), based on the basic geometric concept of monodromy; unfortunately, some technical work is needed however to remove these cheats.

To describe the intuitive idea of the proof, let us return to the angle bisection construction, that takes a triple ${A, B, C}$ of points as input and returns a bisecting line ${\ell}$ as output. We iterate the construction to create a quadrisecting line ${m}$, via the following sequence of steps that extend the original bisection construction:

1. Start with three points ${A, B, C}$.
2. Form the circle ${c_0}$ with centre ${A}$ and radius ${AB}$, and intersect it with the line ${\overline{AC}}$. Let ${D}$ be the point in this intersection that lies on the same side of ${A}$ as ${C}$. (${D}$ may well be equal to ${C}$).
3. Form the circle ${c_1}$ with centre ${B}$ and radius ${AB}$, and the circle ${c_2}$ with centre ${D}$ and radius ${AB}$. Let ${E}$ be the point of intersection of ${c_1}$ and ${c_2}$ that is not ${A}$.
4. Let ${F}$ be the point on the line ${\ell := \overline{AE}}$ which lies on ${c_0}$, and is on the same side of ${A}$ as ${E}$.
5. Form the circle ${c_3}$ with centre ${F}$ and radius ${AB}$. Let ${G}$ be the point of intersection of ${c_1}$ and ${c_3}$ that is not ${A}$.
6. The line ${m := \overline{AG}}$ will then quadrisect the angle ${\angle BAC}$.

Let us fix the points ${A}$ and ${B}$, but not ${C}$, and view ${m}$ (as well as intermediate objects such as ${D}$, ${c_2}$, ${E}$, ${\ell}$, ${F}$, ${c_3}$, ${G}$) as a function of ${C}$.

Let us now do the following: we begin rotating ${C}$ counterclockwise around ${A}$, which drags around the other objects ${D}$, ${c_2}$, ${E}$, ${\ell}$, ${F}$, ${c_3}$, ${G}$ that were constructed by ${C}$ accordingly. For instance, here is an early stage of this rotation process, when the angle ${\angle BAC}$ has become obtuse:

Now for the slightly tricky bit. We are going to keep rotating ${C}$ beyond a half-rotation of ${180^\circ}$, so that ${\angle BAC}$ now becomes a reflex angle. At this point, a singularity occurs; the point ${E}$ collides into ${A}$, and so there is an instant in which the line ${\ell = \overline{AE}}$ is not well-defined. However, this turns out to be a removable singularity (and the easiest way to demonstrate this will be to tap the power of complex analysis, as complex numbers can easily route around such a singularity), and we can blast through it to the other side, giving a picture like this:

Note that we have now deviated from the original construction in that ${F}$ and ${E}$ are no longer on the same side of ${A}$; we are thus now working in a continuation of that construction rather than with the construction itself. Nevertheless, we can still work with this continuation (much as, say, one works with analytic continuations of infinite series such as ${\sum_{n=1}^\infty \frac{1}{n^s}}$ beyond their original domain of definition).

We now keep rotating ${C}$ around ${A}$. Here, ${\angle BAC}$ is approaching a full rotation of ${360^\circ}$:

When ${\angle BAC}$ reaches a full rotation, a different singularity occurs: ${c_1}$ and ${c_2}$ coincide. Nevertheless, this is also a removable singularity, and we blast through to beyond a full rotation:

And now ${C}$ is back where it started, as are ${D}$, ${c_2}$, ${E}$, and ${\ell}$… but the point ${F}$ has moved, from one intersection point of ${\ell \cap c_3}$ to the other. As a consequence, ${c_3}$, ${G}$, and ${m}$ have also changed, with ${m}$ being at right angles to where it was before. (In the jargon of modern mathematics, the quadrisection construction has a non-trivial monodromy.)

But nothing stops us from rotating ${C}$ some more. If we continue this procedure, we see that after two full rotations of ${C}$ around ${A}$, all points, lines, and circles constructed from ${A, B, C}$ have returned to their original positions. Because of this, we shall say that the quadrisection construction described above is periodic with period ${2}$.

Similarly, if one performs an octisection of the angle ${\angle BAC}$ by bisecting the quadrisection, one can verify that this octisection is periodic with period ${4}$; it takes four full rotations of ${C}$ around ${A}$ before the configuration returns to where it started. More generally, one can show

Proposition 4 Any construction of straightedge and compass from the points ${A,B,C}$ is periodic with period equal to a power of ${2}$.

The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedge-and-compass construction there is an ambiguity of at most ${2! = 2}$. Each rotation of ${C}$ around ${A}$ can potentially flip one of these points to the other, but then if one rotates again, the point returns to its original position, and then one can analyse the next point in the construction in the same fashion until one obtains the proposition.

But now consider a putative trisection operation, that starts with an arbitrary angle ${\angle BAC}$ and somehow uses some sequence of straightedge and compass constructions to end up with a trisecting line ${\ell}$:

What is the period of this construction? If we continuously rotate ${C}$ around ${A}$, we observe that a full rotations of ${C}$ only causes the trisecting line ${\ell}$ to rotate by a third of a full rotation (i.e. by ${120^\circ}$):

Because of this, we see that the period of any construction that contains ${\ell}$ must be a multiple of ${3}$. But this contradicts Proposition 4 and Lemma 2.

Below the fold, I will make the above proof rigorous. Unfortunately, in doing so, I had to again leave the world of high-school mathematics, as one needs a little bit of algebraic geometry and complex analysis to resolve the issues with singularities that we saw in the above sketch. Still, I feel that at an intuitive level at least, this argument is more geometric and accessible than the Galois-theoretic argument (though anyone familiar with Galois theory will note that there is really not that much difference between the proofs, ultimately, as one has simply replaced the Galois group with a closely related monodromy group instead).

This is another installment of my my series of posts on Hilbert’s fifth problem. One formulation of this problem is answered by the following theorem of Gleason and Montgomery-Zippin:

Theorem 1 (Hilbert’s fifth problem) Let ${G}$ be a topological group which is locally Euclidean. Then ${G}$ is isomorphic to a Lie group.

Theorem 1 is deep and difficult result, but the discussion in the previous posts has reduced the proof of this Theorem to that of establishing two simpler results, involving the concepts of a no small subgroups (NSS) subgroup, and that of a Gleason metric. We briefly recall the relevant definitions:

Definition 2 (NSS) A topological group ${G}$ is said to have no small subgroups, or is NSS for short, if there is an open neighbourhood ${U}$ of the identity in ${G}$ that contains no subgroups of ${G}$ other than the trivial subgroup ${\{ \hbox{id}\}}$.

Definition 3 (Gleason metric) Let ${G}$ be a topological group. A Gleason metric on ${G}$ is a left-invariant metric ${d: G \times G \rightarrow {\bf R}^+}$ which generates the topology on ${G}$ and obeys the following properties for some constant ${C>0}$, writing ${\|g\|}$ for ${d(g,\hbox{id})}$:

The remaining steps in the resolution of Hilbert’s fifth problem are then as follows:

Theorem 4 (Reduction to the NSS case) Let ${G}$ be a locally compact group, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is NSS and locally compact.

Theorem 5 (Gleason’s lemma) Let ${G}$ be a locally compact NSS group. Then ${G}$ has a Gleason metric.

The purpose of this post is to establish these two results, using arguments that are originally due to Gleason. We will split this task into several subtasks, each of which improves the structure on the group ${G}$ by some amount:

Proposition 6 (From locally compact to metrisable) Let ${G}$ be a locally compact group, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is locally compact and metrisable.

For any open neighbourhood ${U}$ of the identity in ${G}$, let ${Q(U)}$ be the union of all the subgroups of ${G}$ that are contained in ${U}$. (Thus, for instance, ${G}$ is NSS if and only if ${Q(U)}$ is trivial for all sufficiently small ${U}$.)

Proposition 7 (From metrisable to subgroup trapping) Let ${G}$ be a locally compact metrisable group. Then ${G}$ has the subgroup trapping property: for every open neighbourhood ${U}$ of the identity, there exists another open neighbourhood ${V}$ of the identity such that ${Q(V)}$ generates a subgroup ${\langle Q(V) \rangle}$ contained in ${U}$.

Proposition 8 (From subgroup trapping to NSS) Let ${G}$ be a locally compact group with the subgroup trapping property, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is locally compact and NSS.

Proposition 9 (From NSS to the escape property) Let ${G}$ be a locally compact NSS group. Then there exists a left-invariant metric ${d}$ on ${G}$ generating the topology on ${G}$ which obeys the escape property (1) for some constant ${C}$.

Proposition 10 (From escape to the commutator estimate) Let ${G}$ be a locally compact group with a left-invariant metric ${d}$ that obeys the escape property (1). Then ${d}$ also obeys the commutator property (2).

It is clear that Propositions 6, 7, and 8 combine to give Theorem 4, and Propositions 9, 10 combine to give Theorem 5.

Propositions 610 are all proven separately, but their proofs share some common strategies and ideas. The first main idea is to construct metrics on a locally compact group ${G}$ by starting with a suitable “bump function” ${\phi \in C_c(G)}$ (i.e. a continuous, compactly supported function from ${G}$ to ${{\bf R}}$) and pulling back the metric structure on ${C_c(G)}$ by using the translation action ${\tau_g \phi(x) := \phi(g^{-1} x)}$, thus creating a (semi-)metric

$\displaystyle d_\phi( g, h ) := \| \tau_g \phi - \tau_h \phi \|_{C_c(G)} := \sup_{x \in G} |\phi(g^{-1} x) - \phi(h^{-1} x)|. \ \ \ \ \ (3)$

One easily verifies that this is indeed a (semi-)metric (in that it is non-negative, symmetric, and obeys the triangle inequality); it is also left-invariant, and so we have ${d_\phi(g,h) = \|g^{-1} h \|_\phi = \| h^{-1} g \|_\phi}$, where

$\displaystyle \| g \|_\phi = d_\phi(g,\hbox{id}) = \| \partial_g \phi \|_{C_c(G)}$

where ${\partial_g}$ is the difference operator ${\partial_g = 1 - \tau_g}$,

$\displaystyle \partial_g \phi(x) = \phi(x) - \phi(g^{-1} x).$

This construction was already seen in the proof of the Birkhoff-Kakutani theorem, which is the main tool used to establish Proposition 6. For the other propositions, the idea is to choose a bump function ${\phi}$ that is “smooth” enough that it creates a metric with good properties such as the commutator estimate (2). Roughly speaking, to get a bound of the form (2), one needs ${\phi}$ to have “${C^{1,1}}$ regularity” with respect to the “right” smooth structure on ${G}$ By ${C^{1,1}}$ regularity, we mean here something like a bound of the form

$\displaystyle \| \partial_g \partial_h \phi \|_{C_c(G)} \ll \|g\|_\phi \|h\|_\phi \ \ \ \ \ (4)$

for all ${g,h \in G}$. Here we use the usual asymptotic notation, writing ${X \ll Y}$ or ${X=O(Y)}$ if ${X \leq CY}$ for some constant ${C}$ (which can vary from line to line).

The following lemma illustrates how ${C^{1,1}}$ regularity can be used to build Gleason metrics.

Lemma 11 Suppose that ${\phi \in C_c(G)}$ obeys (4). Then the (semi-)metric ${d_\phi}$ (and associated (semi-)norm ${\|\|_\phi}$) obey the escape property (1) and the commutator property (2).

Proof: We begin with the commutator property (2). Observe the identity

$\displaystyle \tau_{[g,h]} = \tau_{hg}^{-1} \tau_{gh}$

whence

$\displaystyle \partial_{[g,h]} = \tau_{hg}^{-1} ( \tau_{hg} - \tau_{gh} )$

$\displaystyle = \tau_{hg}^{-1} ( \partial_h \partial_g - \partial_g \partial_h ).$

From the triangle inequality (and translation-invariance of the ${C_c(G)}$ norm) we thus see that (2) follows from (4). Similarly, to obtain the escape property (1), observe the telescoping identity

$\displaystyle \partial_{g^n} = n \partial_g + \sum_{i=0}^{n-1} \partial_g \partial_{g^i}$

for any ${g \in G}$ and natural number ${n}$, and thus by the triangle inequality

$\displaystyle \| g^n \|_\phi = n \| g \|_\phi + O( \sum_{i=0}^{n-1} \| \partial_g \partial_{g^i} \phi \|_{C_c(G)} ). \ \ \ \ \ (5)$

But from (4) (and the triangle inequality) we have

$\displaystyle \| \partial_g \partial_{g^i} \phi \|_{C_c(G)} \ll \|g\|_\phi \|g^i \|_\phi \ll i \|g\|_\phi^2$

and thus we have the “Taylor expansion”

$\displaystyle \|g^n\|_\phi = n \|g\|_\phi + O( n^2 \|g\|_\phi^2 )$

which gives (1). $\Box$

It remains to obtain ${\phi}$ that have the desired ${C^{1,1}}$ regularity property. In order to get such regular bump functions, we will use the trick of convolving together two lower regularity bump functions (such as two functions with “${C^{0,1}}$ regularity” in some sense to be determined later). In order to perform this convolution, we will use the fundamental tool of (left-invariant) Haar measure ${\mu}$ on the locally compact group ${G}$. Here we exploit the basic fact that the convolution

$\displaystyle f_1 * f_2(x) := \int_G f_1(y) f_2(y^{-1} x)\ d\mu(y) \ \ \ \ \ (6)$

of two functions ${f_1,f_2 \in C_c(G)}$ tends to be smoother than either of the two factors ${f_1,f_2}$. This is easiest to see in the abelian case, since in this case we can distribute derivatives according to the law

$\displaystyle \partial_g (f_1 * f_2) = (\partial_g f_1) * f_2 = f_1 * (\partial_g f_2),$

which suggests that the order of “differentiability” of ${f_1*f_2}$ should be the sum of the orders of ${f_1}$ and ${f_2}$ separately.

These ideas are already sufficient to establish Proposition 10 directly, and also Proposition 9 when comined with an additional bootstrap argument. The proofs of Proposition 7 and Proposition 8 use similar techniques, but is more difficult due to the potential presence of small subgroups, which require an application of the Peter-Weyl theorem to properly control. Both of these theorems will be proven below the fold, thus (when combined with the preceding posts) completing the proof of Theorem 1.

The presentation here is based on some unpublished notes of van den Dries and Goldbring on Hilbert’s fifth problem. I am indebted to Emmanuel Breuillard, Ben Green, and Tom Sanders for many discussions related to these arguments.

Hilbert’s fifth problem asks to clarify the extent that the assumption on a differentiable or smooth structure is actually needed in the theory of Lie groups and their actions. While this question is not precisely formulated and is thus open to some interpretation, the following result of Gleason and Montgomery-Zippin answers at least one aspect of this question:

Theorem 1 (Hilbert’s fifth problem) Let ${G}$ be a topological group which is locally Euclidean (i.e. it is a topological manifold). Then ${G}$ is isomorphic to a Lie group.

Theorem 1 can be viewed as an application of the more general structural theory of locally compact groups. In particular, Theorem 1 can be deduced from the following structural theorem of Gleason and Yamabe:

Theorem 2 (Gleason-Yamabe theorem) Let ${G}$ be a locally compact group, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is isomorphic to a Lie group.

The deduction of Theorem 1 from Theorem 2 proceeds using the Brouwer invariance of domain theorem and is discussed in this previous post. In this post, I would like to discuss the proof of Theorem 2. We can split this proof into three parts, by introducing two additional concepts. The first is the property of having no small subgroups:

Definition 3 (NSS) A topological group ${G}$ is said to have no small subgroups, or is NSS for short, if there is an open neighbourhood ${U}$ of the identity in ${G}$ that contains no subgroups of ${G}$ other than the trivial subgroup ${\{ \hbox{id}\}}$.

An equivalent definition of an NSS group is one which has an open neighbourhood ${U}$ of the identity that every non-identity element ${g \in G \backslash \{\hbox{id}\}}$ escapes in finite time, in the sense that ${g^n \not \in U}$ for some positive integer ${n}$. It is easy to see that all Lie groups are NSS; we shall shortly see that the converse statement (in the locally compact case) is also true, though significantly harder to prove.

Another useful property is that of having what I will call a Gleason metric:

Definition 4 Let ${G}$ be a topological group. A Gleason metric on ${G}$ is a left-invariant metric ${d: G \times G \rightarrow {\bf R}^+}$ which generates the topology on ${G}$ and obeys the following properties for some constant ${C>0}$, writing ${\|g\|}$ for ${d(g,\hbox{id})}$:

• (Escape property) If ${g \in G}$ and ${n \geq 1}$ is such that ${n \|g\| \leq \frac{1}{C}}$, then ${\|g^n\| \geq \frac{1}{C} n \|g\|}$.
• (Commutator estimate) If ${g, h \in G}$ are such that ${\|g\|, \|h\| \leq \frac{1}{C}}$, then

$\displaystyle \|[g,h]\| \leq C \|g\| \|h\|, \ \ \ \ \ (1)$

where ${[g,h] := g^{-1}h^{-1}gh}$ is the commutator of ${g}$ and ${h}$.

For instance, the unitary group ${U(n)}$ with the operator norm metric ${d(g,h) := \|g-h\|_{op}}$ can easily verified to be a Gleason metric, with the commutator estimate (1) coming from the inequality

$\displaystyle \| [g,h] - 1 \|_{op} = \| gh - hg \|_{op}$

$\displaystyle = \| (g-1) (h-1) - (h-1) (g-1) \|_{op}$

$\displaystyle \leq 2 \|g-1\|_{op} \|g-1\|_{op}.$

Similarly, any left-invariant Riemannian metric on a (connected) Lie group can be verified to be a Gleason metric. From the escape property one easily sees that all groups with Gleason metrics are NSS; again, we shall see that there is a partial converse.

Remark 1 The escape and commutator properties are meant to capture “Euclidean-like” structure of the group. Other metrics, such as Carnot-Carathéodory metrics on Carnot Lie groups such as the Heisenberg group, usually fail one or both of these properties.

The proof of Theorem 2 can then be split into three subtheorems:

Theorem 5 (Reduction to the NSS case) Let ${G}$ be a locally compact group, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is NSS, locally compact, and metrisable.

Theorem 6 (Gleason’s lemma) Let ${G}$ be a locally compact metrisable NSS group. Then ${G}$ has a Gleason metric.

Theorem 7 (Building a Lie structure) Let ${G}$ be a locally compact group with a Gleason metric. Then ${G}$ is isomorphic to a Lie group.

Clearly, by combining Theorem 5, Theorem 6, and Theorem 7 one obtains Theorem 2 (and hence Theorem 1).

Theorem 5 and Theorem 6 proceed by some elementary combinatorial analysis, together with the use of Haar measure (to build convolutions, and thence to build “smooth” bump functions with which to create a metric, in a variant of the analysis used to prove the Birkhoff-Kakutani theorem); Theorem 5 also requires Peter-Weyl theorem (to dispose of certain compact subgroups that arise en route to the reduction to the NSS case), which was discussed previously on this blog.

In this post I would like to detail the final component to the proof of Theorem 2, namely Theorem 7. (I plan to discuss the other two steps, Theorem 5 and Theorem 6, in a separate post.) The strategy is similar to that used to prove von Neumann’s theorem, as discussed in this previous post (and von Neumann’s theorem is also used in the proof), but with the Gleason metric serving as a substitute for the faithful linear representation. Namely, one first gives the space ${L(G)}$ of one-parameter subgroups of ${G}$ enough of a structure that it can serve as a proxy for the “Lie algebra” of ${G}$; specifically, it needs to be a vector space, and the “exponential map” needs to cover an open neighbourhood of the identity. This is enough to set up an “adjoint” representation of ${G}$, whose image is a Lie group by von Neumann’s theorem; the kernel is essentially the centre of ${G}$, which is abelian and can also be shown to be a Lie group by a similar analysis. To finish the job one needs to use arguments of Kuranishi and of Gleason, as discussed in this previous post.

The arguments here can be phrased either in the standard analysis setting (using sequences, and passing to subsequences often) or in the nonstandard analysis setting (selecting an ultrafilter, and then working with infinitesimals). In my view, the two approaches have roughly the same level of complexity in this case, and I have elected for the standard analysis approach.

Remark 2 From Theorem 7 we see that a Gleason metric structure is a good enough substitute for smooth structure that it can actually be used to reconstruct the entire smooth structure; roughly speaking, the commutator estimate (1) allows for enough “Taylor expansion” of expressions such as ${g^n h^n}$ that one can simulate the fundamentals of Lie theory (in particular, construction of the Lie algebra and the exponential map, and its basic properties. The advantage of working with a Gleason metric rather than a smoother structure, though, is that it is relatively undemanding with regards to regularity; in particular, the commutator estimate (1) is roughly comparable to the imposition ${C^{1,1}}$ structure on the group ${G}$, as this is the minimal regularity to get the type of Taylor approximation (with quadratic errors) that would be needed to obtain a bound of the form (1). We will return to this point in a later post.

In a previous blog post, I discussed the recent result of Guth and Katz obtaining a near-optimal bound on the Erdos distance problem. One of the tools used in the proof (building upon the earlier work of Elekes and Sharir) was the observation that the incidence geometry of the Euclidean group ${SE(2)}$ of rigid motions of the plane was almost identical to that of lines in the Euclidean space ${{\bf R}^3}$:

Proposition 1 One can identify a (Zariski-)dense portion of ${SE(2)}$ with ${{\bf R}^3}$, in such a way that for any two points ${A, B}$ in the plane ${{\bf R}^2}$, the set ${l_{AB} := \{ R \in SE(2): RA = B \}}$ of rigid motions mapping ${A}$ to ${B}$ forms a line in ${{\bf R}^3}$.

Proof: A rigid motion is either a translation or a rotation, with the latter forming a Zariski-dense subset of ${SE(2)}$. Identify a rotation ${R}$ in ${SE(2)}$ by an angle ${\theta}$ with ${|\theta| < \pi}$ around a point ${P}$ with the element ${(P, \cot \frac{\theta}{2})}$ in ${{\bf R}^3}$. (Note that such rotations also form a Zariski-dense subset of ${SE(2)}$.) Elementary trigonometry then reveals that if ${R}$ maps ${A}$ to ${B}$, then ${P}$ lies on the perpendicular bisector of ${AB}$, and depends in a linear fashion on ${\cot \frac{\theta}{2}}$ (for fixed ${A,B}$). The claim follows. $\Box$

As seen from the proof, this proposition is an easy (though ad hoc) application of elementary trigonometry, but it was still puzzling to me why such a simple parameterisation of the incidence structure of ${SE(2)}$ was possible. Certainly it was clear from general algebraic geometry considerations that some bounded-degree algebraic description was available, but why would the ${l_{AB}}$ be expressible as lines and not as, say, quadratic or cubic curves?

In this post I would like to record some observations arising from discussions with Jordan Ellenberg, Jozsef Solymosi, and Josh Zahl which give a more conceptual (but less elementary) derivation of the above proposition that avoids the use of ad hoc coordinate transformations such as ${R \mapsto (P, \cot\frac{\theta}{2})}$. The starting point is to view the Euclidean plane ${{\bf R}^2}$ as the scaling limit of the sphere ${S^2}$ (a fact which is familiar to all of us through the geometry of the Earth), which makes the Euclidean group ${SE(2)}$ a scaling limit of the rotation group ${SO(3)}$. The latter can then be lifted to a double cover, namely the spin group ${Spin(3)}$. This group has a natural interpretation as the unit quaternions, which is isometric to the unit sphere ${S^3}$. The analogue of the lines ${l_{AB}}$ in this setting become great circles on this sphere; applying a projective transformation, one can map ${S^3}$ to ${{\bf R}^3}$ (or more precisely to the projective space ${{\bf P}^3}$), at whichi point the great circles become lines. This gives a proof of Proposition 1.

Details of the correspondence are provided below the fold. One by-product of this analysis, incidentally, is the observation that the Guth-Katz bound ${g(N) \gg N / \log N}$ for the Erdos distance problem in the plane ${{\bf R}^2}$, immediately extends with almost no modification to the sphere ${S^2}$ as well (i.e. any ${N}$ points in ${S^2}$ determine ${\gg N/\log N}$ distances), as well as to the hyperbolic plane ${H^2}$.

Assaf Naor and I have just uploaded to the arXiv our joint paper “Scale-oblivious metric fragmentation and the nonlinear Dvoretzky theorem“.

Consider a finite metric space ${(X,d)}$, with ${X}$ being a set of ${n}$ points. It is not always the case that this space can be isometrically embedded into a Hilbert space such as ${\ell^2}$; for instance, in a Hilbert space every pair of points has a unique midpoint, but uniqueness can certainly fail for finite metric spaces (consider for instance the graph metric on a four-point diamond). The situation improves, however, if one allows (a) the embedding to have some distortion, and (b) one is willing to embed just a subset of ${X}$, rather than all of ${X}$. More precisely, for any integer ${n}$ and any distortion ${D \geq 1}$, let ${k(n,D)}$ be the largest integer with the property that given every ${n}$-point metric space ${(X,d)}$, there exists a ${k}$-point subset ${Y}$ of ${X}$, and a map ${f: Y \rightarrow \ell^2}$, which has distortion at most ${D}$ in the sense that

$\displaystyle d(y,y') \leq \| f(y) - f(y') \|_{\ell^2} \leq D d(y,y')$

for all ${y,y' \in Y}$.

Bourgain, Figiel, and Milman established that for any fixed ${D>1}$, one has the lower bound ${k(n,D) \gg_D \log n}$, and also showed for ${D}$ sufficiently close to ${1}$ there was a matching upper bound ${k(n,D) \ll_D \log n}$. This type of result was called a nonlinear Dvoretsky theorem, in analogy with the linear Dvoretsky theorem that asserts that given any ${n}$-dimensional normed vector space and ${D > 1}$, there existed a ${k}$-dimensional subspace which embedded with distortion ${D}$ into ${\ell^2}$, with ${k \sim_D \log n}$. In other words, one can ensure that about ${\log n}$ of the ${n}$ points in an arbitrary metric space behave in an essentially Euclidean manner, and that this is best possible if one wants the distortion to be small.

Bartel, Linial, Mendel, and Naor observed that there was a threshold phenomenon at ${D=2}$. Namely, for ${D < 2}$, the Bourgain-Figiel-Milman bounds were sharp with ${k(n,D) \sim_D \log n}$; but for ${D>2}$ one instead had a power law

$\displaystyle n^{1-B(D)} \ll_D k(n,D) \ll n^{1-b(D)}$

for some ${0 < b(D) \leq B(D) < 1}$. In other words, once one allows distortion by factors greater than ${2}$, one can now embed a polynomial portion of the points, rather than just a logarithmic portion, into a Hilbert space. This has some applications in theoretical computer science to constructing “approximate distance oracles” for high-dimensional sets of data. (The situation at the critical value ${D=2}$ is still unknown.)

In the special case that the metric ${d}$ is an ultrametric, so that the triangle inequality ${d(x,z) \leq d(x,y) + d(y,z)}$ is upgraded to the ultra-triangle inequality ${d(x,z) \leq \max(d(x,y), d(y,z))}$, then it is an easy exercise to show that ${X}$ can be embedded isometrically into a Hilbert space, and in fact into a sphere of radius ${\hbox{diam}(X)/\sqrt{2}}$. Indeed, this can be established by an induction on the cardinality of ${X}$, using the ultrametric inequality to partition any finite ultrametric space ${X}$ of two or more points into sets of strictly smaller diameter that are all separated by ${\hbox{diam}(X)}$, and using the inductive hypothesis and Pythagoras’ theorem.

One can then replace the concept of embedding into a Hilbert space, with the apparently stronger concept of embedding into an ultrametric space; this is useful for the computer science applications as ultrametrics have a tree structure which allows for some efficient algorithms for computing distances in such spaces. As it turns out, all the preceding constructions carry over without difficulty to this setting; thus, for ${D<2}$, one can embed a logarithmic number of points with distortion ${D}$ into an ultrametric space, and for ${D>2}$ one can embed a polynomial number of points.

One can view the task of locating a subset of a metric space that is equivalent (up to bounded distortion) to an ultrametric as that of fragmenting a metric space into a tree-like structure. For instance, the standard metric on the arithmetic progression ${\{1,\ldots,3^m\}}$ of length ${n=3^m}$ is not an ultrametric (and in fact needs a huge distortion factor of ${3^m}$ in order to embed into an ultrametric space), but if one restricts to the Cantor-like subset of integers in ${\{1,\ldots,3^m\}}$ whose base ${3}$ expansion consists solely of ${0}$s and ${1}$s, then one easily verifies that the resulting set is fragmented enough that the standard metric is equivalent (up to a distortion factor of ${3}$) to an ultrametric, namely the metric ${d(x,y) := 3^{j}/2}$ where ${j}$ is the largest integer for which ${3^j}$ divides ${y-x}$ and ${x \neq y}$.

The above fragmentation constructions were somewhat complicated and deterministic in nature. Mendel and Naor introduced a simpler probabilistic method, based on random partition trees of ${X}$, which reproved the above results in the high-distortion case ${D \gg 1}$ (with ${b(D), B(D) \sim 1/D}$ in the limit ${D \rightarrow \infty}$). However, this method had inherent limitations in the low-distortion case, in particular failing for ${D < 3}$ (basically because the method would establish a stronger embedding result which fails in that regime).

In this paper, we introduce a variant of the random partition tree method, which involves a random fragmentation at a randomly chosen set of scales, that works all the way down to ${D>2}$ and gives a clean value for the exponent ${b(D)}$, namely ${2e/D}$; in fact we have the more precise result that we can take ${b(D)}$ to be ${1-\theta}$, where ${0 < \theta < 1}$ is the unique solution to the equation

$\displaystyle \frac{2}{D} = (1-\theta) \theta^{\theta/(1-\theta)},$

and that this is the limit of the method if one chooses a “scale-oblivious” approach that applies uniformly to all metric spaces, ignoring the internal structure of each individual metric space.

The construction ends up to be relatively simple (taking about three pages). The basic idea is as follows. Pick two scales ${R > r > 0}$, and let ${x_1, x_2, x_3, \ldots}$ be a random sequence of points in ${X}$ (of course, being finite, every point in ${X}$ will almost surely be visited infinitely often by this sequence). We can then partition ${X}$ into pieces ${Y_1, Y_2, \ldots}$, by defining ${Y_m}$ to be the set of points ${x}$ for which ${x_m}$ is the first point in the sequence to lie in the ball ${B(x,R)}$. We then let ${S_m}$ be the subset of ${Y_m}$ in which ${x_m}$ actually falls in the smaller ball ${B(x,r)}$. As a consequence, we see that each ${S_m}$ is contained in a ball of radius ${r}$ (namely ${B(x_m,r)}$), but that any two ${S_m, S_{m'}}$ are separated by a distance of at least ${R-r}$ (from the triangle inequality). This gives one layer of a quasi-ultrametric tree structure on ${\bigcup_m S_m}$; if one iterates this over many different pairs of scales ${R, r}$, one gets a full quasi-ultrametric tree structure, which one can then adjust with bounded distortion to a genuine ultrametric structure. The game is then to optimise the choice of ${R, r}$ so as to maximise the portion of ${X}$ that remains in the tree; it turns out that a suitably random choice of such scales is the optimal one.

Assaf Naor and I have just uploaded to the arXiv our paper “Random Martingales and localization of maximal inequalities“, to be submitted shortly. This paper investigates the best constant in generalisations of the classical Hardy-Littlewood maximal inequality

for any absolutely integrable ${f: {\mathbb R}^n \rightarrow {\mathbb R}}$, where ${B(x,r)}$ is the Euclidean ball of radius ${r}$ centred at ${x}$, and ${|E|}$ denotes the Lebesgue measure of a subset ${E}$ of ${{\mathbb R}^n}$. This inequality is fundamental to a large part of real-variable harmonic analysis, and in particular to Calderón-Zygmund theory. A similar inequality in fact holds with the Euclidean norm replaced by any other convex norm on ${{\mathbb R}^n}$.

The exact value of the constant ${C_n}$ is only known in ${n=1}$, with a remarkable result of Melas establishing that ${C_1 = \frac{11+\sqrt{61}}{12}}$. Classical covering lemma arguments give the exponential upper bound ${C_n \leq 2^n}$ when properly optimised (a direct application of the Vitali covering lemma gives ${C_n \leq 5^n}$, but one can reduce ${5}$ to ${2}$ by being careful). In an important paper of Stein and Strömberg, the improved bound ${C_n = O( n \log n )}$ was obtained for any convex norm by a more intricate covering norm argument, and the slight improvement ${C_n = O(n)}$ obtained in the Euclidean case by another argument more adapted to the Euclidean setting that relied on heat kernels. In the other direction, a recent result of Aldaz shows that ${C_n \rightarrow \infty}$ in the case of the ${\ell^\infty}$ norm, and in fact in an even more recent preprint of Aubrun, the lower bound ${C_n \gg_\epsilon \log^{1-\epsilon} n}$ for any ${\epsilon > 0}$ has been obtained in this case. However, these lower bounds do not apply in the Euclidean case, and one may still conjecture that ${C_n}$ is in fact uniformly bounded in this case.

Unfortunately, we do not make direct progress on these problems here. However, we do show that the Stein-Strömberg bound ${C_n = O(n \log n)}$ is extremely general, applying to a wide class of metric measure spaces obeying a certain “microdoubling condition at dimension ${n}$“; and conversely, in such level of generality, it is essentially the best estimate possible, even with additional metric measure hypotheses on the space. Thus, if one wants to improve this bound for a specific maximal inequality, one has to use specific properties of the geometry (such as the connections between Euclidean balls and heat kernels). Furthermore, in the general setting of metric measure spaces, one has a general localisation principle, which roughly speaking asserts that in order to prove a maximal inequality over all scales ${r \in (0,+\infty)}$, it suffices to prove such an inequality in a smaller range ${r \in [R, nR]}$ uniformly in ${R>0}$. It is this localisation which ultimately explains the significance of the ${n \log n}$ growth in the Stein-Strömberg result (there are ${O(n \log n)}$ essentially distinct scales in any range ${[R,nR]}$). It also shows that if one restricts the radii ${r}$ to a lacunary range (such as powers of ${2}$), the best constant improvees to ${O(\log n)}$; if one restricts the radii to an even sparser range such as powers of ${n}$, the best constant becomes ${O(1)}$.

Yehuda Shalom and I have just uploaded to the arXiv our paper “A finitary version of Gromov’s polynomial growth theorem“, to be submitted to Geom. Func. Anal..  The purpose of this paper is to establish a quantitative version of Gromov’s polynomial growth theorem which, among other things, is meaningful for finite groups.   Here is a statement of Gromov’s theorem:

Gromov’s theorem. Let $G$ be a group generated by a finite (symmetric) set $S$, and suppose that one has the polynomial growth condition

$|B_S(R)| \leq R^d$ (1)

for all sufficiently large $R$ and some fixed $d$, where $B_S(R)$ is the ball of radius $R$ generated by $S$ (i.e. the set of all words in $S$ of length at most $d$, evaluated in $G$).  Then $G$ is virtually nilpotent, i.e. it has a finite index subgroup $H$ which is nilpotent of some finite step $s$.

As currently stated, Gromov’s theorem is qualitative rather than quantitative; it does not specify any relationship between the input data (the growth exponent $d$ and the range of scales $R$ for which one has (1)), and the output parameters (in particular, the index $|G/H|$ of the nilpotent subgroup $H$ of $G$, and the step $s$ of that subgroup).  However, a compactness argument (sketched in this previous blog post) shows that some such relationship must exist; indeed, if one has (1) for all $R_0 \leq R \leq C( R_0, d )$ for some sufficiently large $C(R_0,d)$, then one can ensure $H$ has index at most $C'(R_0,d)$ and step at most $C''(R_0,d)$ for some quantities $C'(R_0,d)$, $C''(R_0,d)$; thus Gromov’s theorem is inherently a “local” result which only requires one to multiply the generator set $S$ a finite number $C(R_0,d)$ of times before one sees the virtual nilpotency of the group.  However, the compactness argument does not give an explicit value to the quantities $C(R_0,d), C'(R_0,d), C''(R_0,d)$, and the nature of Gromov’s proof (using, in particular, the deep Montgomery-Zippin-Yamabe theory on Hilbert’s fifth problem) does not easily allow such an explicit value to be extracted.

Another point is that the original formulation of Gromov’s theorem required the polynomial bound (1) at all sufficiently large scales $R$.  A later proof of this theorem by van den Dries and Wilkie relaxed this hypothesis to requiring (1) just for infinitely many scales $R$; the later proof by Kleiner (which I blogged about here) also has this relaxed hypothesis.

Our main result reduces the hypothesis (1) to a single large scale, and makes most of the qualitative dependencies in the theorem quantitative:

Theorem 1. If (1) holds for some $R > \exp(\exp(C d^C))$ for some sufficiently large absolute constant $C$, then $G$ contains a finite index subgroup $H$ which is nilpotent of step at most $C^d$.

The argument does in principle provide a bound on the index of $H$ in $G$, but it is very poor (of Ackermann type).  If instead one is willing to relax “nilpotent” to “polycyclic“, the bounds on the index are somewhat better (of tower exponential type), though still far from ideal.

There is a related finitary analogue of Gromov’s theorem by Makarychev and Lee, which asserts that any finite group of uniformly polynomial growth has a subgroup with a large abelianisation.  The quantitative bounds in that result are quite strong, but on the other hand the hypothesis is also strong (it requires upper and lower bounds of the form (1) at all scales) and the conclusion is a bit weaker than virtual nilpotency.  The argument is based on a modification of Kleiner’s proof.

Our argument also proceeds by modifying Kleiner’s proof of Gromov’s theorem (a significant fraction of which was already quantitative), and carefully removing all of the steps which require one to take an asymptotic limit.  To ease this task, we look for the most elementary arguments available for each step of the proof (thus consciously avoiding powerful tools such as the Tits alternative).  A key technical issue is that because there is only a single scale $R$ for which one has polynomial growth, one has to work at scales significantly less than $R$ in order to have any chance of keeping control of the various groups and other objects being generated.

Below the fold, I discuss a stripped down version of Kleiner’s argument, and then how we convert it to a fully finitary argument.

I am posting the last two talks in my Clay-Mahler lecture series here:

[Update, Sep 14: Poincaré conjecture slides revised.]

[Update, Sep 18: Prime slides revised also.]

The celebrated Szemerédi-Trotter theorem gives a bound for the set of incidences ${I(P,L) := \{ (p,\ell) \in P \times L: p \in \ell \}}$ between a finite set of points ${P}$ and a finite set of lines ${L}$ in the Euclidean plane ${{\bf R}^2}$. Specifically, the bound is

$\displaystyle |I(P,L)| \ll |P|^{2/3} |L|^{2/3} + |P| + |L| \ \ \ \ \ (1)$

where we use the asymptotic notation ${X \ll Y}$ or ${X=O(Y)}$ to denote the statement that ${X \leq CY}$ for some absolute constant ${C}$. In particular, the number of incidences between ${n}$ points and ${n}$ lines is ${O(n^{4/3})}$. This bound is sharp; consider for instance the discrete box ${P := \{ (a,b) \in {\bf Z}^2: 1 \leq a \leq N; 1 \leq b \leq 2N^2 \}}$ with ${L}$ being the collection of lines ${\{ (x,mx+b): m, b \in {\bf Z}, 1 \leq m \leq N, 1 \leq b \leq N^2 \}}$. One easily verifies that ${|P|=2N^3}$, ${|L| = N^3}$, and ${|I(P,L)| = N^4}$, showing that (1) is essentially sharp in the case ${|P| \sim |L|}$; one can concoct similar examples for other regimes of ${|P|}$ and ${|L|}$.

On the other hand, if one replaces the Euclidean plane ${{\bf R}^2}$ by a finite field geometry ${F^2}$, where ${F}$ is a finite field, then the estimate (1) is false. For instance, if ${P}$ is the entire plane ${F^2}$, and ${L}$ is the set of all lines in ${F^2}$, then ${|P|, |L|}$ are both comparable to ${|F|^2}$, but ${|I(P,L)|}$ is comparable to ${|F|^3}$, thus violating (1) when ${|F|}$ is large. Thus any proof of the Szemerédi-Trotter theorem must use a special property of the Euclidean plane which is not enjoyed by finite field geometries. In particular, this strongly suggests that one cannot rely purely on algebra and combinatorics to prove (1); one must also use some Euclidean geometry or topology as well.

Nowadays, the slickest proof of the Szemerédi-Trotter theorem is via the crossing number inequality (as discussed in this previous post), which ultimately relies on Euler’s famous formula ${|V|-|E|+|F|=2}$; thus in this argument it is topology which is the feature of Euclidean space which one is exploiting, and which is not present in the finite field setting. Today, though, I would like to mention a different proof (closer in spirit to the original proof of Szemerédi-Trotter, and also a later argument of Clarkson et al.), based on the method of cell decomposition, which has proven to be a very flexible method in combinatorial incidence geometry. Here, the distinctive feature of Euclidean geometry one is exploiting is convexity, which again has no finite field analogue.

Roughly speaking, the idea is this. Using nothing more than the axiom that two points determine at most one line, one can obtain the bound

$\displaystyle |I(P,L)| \ll |P| |L|^{1/2} + |L|, \ \ \ \ \ (2)$

which is inferior to (1). (On the other hand, this estimate works in both Euclidean and finite field geometries, and is sharp in the latter case, as shown by the example given earlier.) Dually, the axiom that two lines determine at most one point gives the bound

$\displaystyle |I(P,L)| \ll |L| |P|^{1/2} + |P| \ \ \ \ \ (3)$

(or alternatively, one can use projective duality to interchange points and lines and deduce (3) from (2)).

An inspection of the proof of (2) shows that it is only expected to be sharp when the bushes ${L_p := \{ \ell \in L: \ell \ni p \}}$ associated to each point ${p \in P}$ behave like “independent” subsets of ${L}$, so that there is no significant correlation between the bush ${L_p}$ of one point and the bush of another point ${L_q}$.

However, in Euclidean space, we have the phenomenon that the bush of a point ${L_p}$ is influenced by the region of space that ${p}$ lies in. Clearly, if ${p}$ lies in a set ${\Omega}$ (e.g. a convex polygon), then the only lines ${\ell \in L}$ that can contribute to ${L_p}$ are those lines which pass through ${\Omega}$. If ${\Omega}$ is a small convex region of space, one expects only a fraction of the lines in ${L}$ to actually pass through ${\Omega}$. As such, if ${p}$ and ${q}$ both lie in ${\Omega}$, then ${L_p}$ and ${L_q}$ are compressed inside a smaller subset of ${L}$, namely the set of lines passing through ${\Omega}$, and so should be more likely to intersect than if they were independent. This should lead to an improvement to (2) (and indeed, as we shall see below, ultimately leads to (1)).

More formally, the argument proceeds by applying the following lemma:

Lemma 1 (Cell decomposition) Let ${L}$ be a finite collection of lines in ${{\bf R}^2}$, let ${P}$ be a finite set of points, and let ${r \geq 1}$. Then it is possible to find a set ${R}$ of ${O(r)}$ lines in ${L}$, plus some additional open line segments not containing any point in ${P}$, which subdivide ${{\bf R}^2}$ into ${O(r^2)}$ convex regions (or cells), such that the interior of each such cell is incident to at most ${O(|L|/r)}$ lines.

The deduction of (1) from (2), (3) and Lemma 1 is very quick. Firstly we may assume we are in the range

$\displaystyle |L|^{1/2} \ll |P| \ll |L|^2 \ \ \ \ \ (4)$

otherwise the bound (1) follows already from either (2) or (3) and some high-school algebra.

Let ${r \geq 1}$ be a parameter to be optimised later. We apply the cell decomposition to subdivide ${{\bf R}^2}$ into ${O(r^2)}$ open convex regions, plus a family ${R}$ of ${O(r)}$ lines. Each of the ${O(r^2)}$ convex regions ${\Omega}$ has only ${O(|L|/r)}$ lines through it, and so by (2) contributes ${O( |P \cap \Omega| |L|^{1/2}/r^{1/2} + |L| / r )}$ incidences. Meanwhile, on each of the lines ${\ell}$ in ${R}$ used to perform this decomposition, there are at most ${|L|}$ transverse incidences (because each line in ${L}$ distinct from ${\ell}$ can intersect ${\ell}$ at most once), plus all the incidences along ${\ell}$ itself. Putting all this together, one obtains

$\displaystyle |I(P,L)| \leq |I(P,L \cap R)| + O( |P| |L|^{1/2}/r^{1/2} + |L| r).$

We optimise this by selecting ${r \sim |P|^{2/3} / |L|^{1/3}}$; from (4) we can ensure that ${r \leq |L|/2}$, so that ${|L \cap R| \leq |L|/2}$. One then obtains

$\displaystyle |I(P,L)| \leq |I(P,L \cap R)| + O( |P|^{2/3} |L|^{2/3} ).$

We can iterate away the ${L \cap R}$ error (halving the number of lines each time) and sum the resulting geometric series to obtain (1).

It remains to prove (1). If one subdivides ${{\bf R}^2}$ using ${r}$ arbitrary lines, one creates at most ${O(r^2)}$ cells (because each new line intersects the existing lines at most once, and so can create at most ${O(r)}$ distinct cells), and for a similar reason, every line in ${L}$ visits at most ${r}$ of these regions, and so by double counting one expects ${O(|L|/r)}$ lines per cell “on the average”. The key difficulty is then to get ${O(|L|/r)}$ lines through every cell, not just on the average. It turns out that a probabilistic argument will almost work, but with a logarithmic loss (thus having ${O( |L| \log |L| / r )}$ lines per cell rather than ${O(|L|/r)}$); but with a little more work one can then iterate away this loss also. The arguments here are loosely based on those of Clarkson et al.; a related (deterministic) decomposition also appears in the original paper of Szemerédi and Trotter. But I wish to focus here on the probabilistic approach.)

It is also worth noting that the original (somewhat complicated) argument of Szemerédi-Trotter has been adapted to establish the analogue of (1) in the complex plane ${{\bf C}^2}$ by Toth, while the other known proofs of Szemerédi-Trotter, so far, have not been able to be extended to this setting (the Euler characteristic argument clearly breaks down, as does any proof based on using lines to divide planes into half-spaces). So all three proofs have their advantages and disadvantages.