You are currently browsing the category archive for the ‘math.CA’ category.

Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

Theorem 1 (Mean ergodic theorem) Let ${(X,\mu,T)}$ be a measure-preserving system (a probability space ${(X,\mu)}$ with an invertible measure-preserving transformation ${T}$). Then for any ${f \in L^2(X,\mu)}$, the averages ${\frac{1}{N} \sum_{n=1}^N T^n f}$ converge in ${L^2(X,\mu)}$ norm as ${N \rightarrow \infty}$, where ${T^n f(x) := f(T^{-n} x)}$.

In this post, all functions in ${L^2(X,\mu)}$ and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of ${f}$ to the ${T}$-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

Theorem 2 (von Neumann mean ergodic theorem) Let ${H}$ be a Hilbert space, and let ${U: H \rightarrow H}$ be a unitary operator on ${H}$. Then for any ${f \in H}$, the averages ${\frac{1}{N} \sum_{n=1}^N U^n f}$ converge strongly in ${H}$ as ${N \rightarrow \infty}$.

Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

Now we turn to Walsh’s theorem.

Theorem 3 (Walsh’s convergence theorem) Let ${(X,\mu)}$ be a measure space with a measure-preserving action of a nilpotent group ${G}$. Let ${g_1,\ldots,g_k: {\bf Z} \rightarrow G}$ be polynomial sequences in ${G}$ (i.e. each ${g_i}$ takes the form ${g_i(n) = a_{i,1}^{p_{i,1}(n)} \ldots a_{i,j}^{p_{i,j}(n)}}$ for some ${a_{i,1},\ldots,a_{i,j} \in G}$ and polynomials ${p_{i,1},\ldots,p_{i,j}: {\bf Z} \rightarrow {\bf Z}}$). Then for any ${f_1,\ldots,f_k \in L^\infty(X,\mu)}$, the averages ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$ converge in ${L^2(X,\mu)}$ norm as ${N \rightarrow \infty}$, where ${g(n) f(x) := f(g(n)^{-1} x)}$.

It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand ${(g_1(n) f_1) \ldots (g_k(n) f_k)}$, one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra ${L^\infty(X,\mu)}$. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space ${({\mathcal A},\tau)}$, which for us will be a commutative unital algebra ${{\mathcal A}}$ over the reals together with a linear functional ${\tau: {\mathcal A} \rightarrow {\bf R}}$ which maps ${1}$ to ${1}$ and obeys the non-negativity axiom ${\tau(f^2) \ge 0}$ for all ${f}$. The key example to keep in mind here is ${{\mathcal A} = L^\infty(X,\mu)}$ of essentially bounded real-valued measurable functions with the supremum norm, and with the trace ${\tau(f) := \int_X f\ d\mu}$. We will also assume in our definition of commutative probability spaces that all elements ${f}$ of ${{\mathcal A}}$ are bounded in the sense that the spectral radius ${\rho(f) := \lim_{k \rightarrow \infty} \tau(f^{2k})^{1/2k}}$ is finite. (In the concrete case of ${L^\infty(X,\mu)}$, the spectral radius is just the ${L^\infty}$ norm.)

Given a commutative probability space, we can form an inner product ${\langle, \rangle_{L^2(\tau)}}$ on it by the formula

$\displaystyle \langle f, g \rangle_{L^2(\tau)} := \tau(fg).$

This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on ${{\mathcal A}}$. We could complete this structure into a Hilbert space ${L^2(\tau)}$ (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing ${L^2(\tau)}$ as providing a semi-metric on ${{\mathcal A}}$. For future reference we record the inequalities

$\displaystyle \rho(fg) \leq \rho(f) \rho(g)$

$\displaystyle \rho(f+g) \leq \rho(f) + \rho(g)$

$\displaystyle \| fg\|_{L^2(\tau)} \leq \|f\|_{L^2(\tau)} \rho(g)$

for any ${f,g}$, which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the ${L^\infty(X,\mu)}$ case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

The abstract version of Theorem 3 is then

Theorem 4 (Walsh’s theorem, abstract version) Let ${({\mathcal A},\tau)}$ be a commutative probability space, and let ${G}$ be a nilpotent group acting on ${{\mathcal A}}$ by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and ${L^2(\tau)}$ norm). Let ${g_1,\ldots,g_k: {\bf Z} \rightarrow G}$ be polynomial sequences. Then for any ${f_1,\ldots,f_k \in {\mathcal A}}$, the averages ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$ form a Cauchy sequence in ${L^2(\tau)}$ (semi-)norm as ${N \rightarrow \infty}$.

It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression ${\|\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)\|_{L^2(\tau)}^2}$ can be viewed as the inner product of (say) ${f_k}$ with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split ${f_k}$ into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$, except with the polynomials ${g_1,\ldots,g_k}$ replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.

One of the basic problems in the field of operator algebras is to develop a functional calculus for either a single operator ${A}$, or a collection ${A_1, A_2, \ldots, A_k}$ of operators. These operators could in principle act on any function space, but typically one either considers complex matrices (which act on a complex finite dimensional space), or operators (either bounded or unbounded) on a complex Hilbert space. (One can of course also obtain analogous results for real operators, but we will work throughout with complex operators in this post.)

Roughly speaking, a functional calculus is a way to assign an operator ${F(A)}$ or ${F(A_1,\ldots,A_k)}$ to any function ${F}$ in a suitable function space, which is linear over the complex numbers, preserve the scalars (i.e. ${c(A) = c}$ when ${c \in {\bf C}}$), and should be either an exact or approximate homomorphism in the sense that

$\displaystyle FG(A_1,\ldots,A_k) = F(A_1,\ldots,A_k) G(A_1,\ldots,A_k), \ \ \ \ \ (1)$

should hold either exactly or approximately. In the case when the ${A_i}$ are self-adjoint operators acting on a Hilbert space (or Hermitian matrices), one often also desires the identity

$\displaystyle \overline{F}(A_1,\ldots,A_k) = F(A_1,\ldots,A_k)^* \ \ \ \ \ (2)$

to also hold either exactly or approximately. (Note that one cannot reasonably expect (1) and (2) to hold exactly for all ${F,G}$ if the ${A_1,\ldots,A_k}$ and their adjoints ${A_1^*,\ldots,A_k^*}$ do not commute with each other, so in those cases one has to be willing to allow some error terms in the above wish list of properties of the calculus.) Ideally, one should also be able to relate the operator norm of ${f(A)}$ or ${f(A_1,\ldots,A_k)}$ with something like the uniform norm on ${f}$. In principle, the existence of a good functional calculus allows one to manipulate operators as if they were scalars (or at least approximately as if they were scalars), which is very helpful for a number of applications, such as partial differential equations, spectral theory, noncommutative probability, and semiclassical mechanics. A functional calculus for multiple operators ${A_1,\ldots,A_k}$ can be particularly valuable as it allows one to treat ${A_1,\ldots,A_k}$ as being exact or approximate scalars simultaneously. For instance, if one is trying to solve a linear differential equation that can (formally at least) be expressed in the form

$\displaystyle F(X,D) u = f$

for some data ${f}$, unknown function ${u}$, some differential operators ${X,D}$, and some nice function ${F}$, then if one’s functional calculus is good enough (and ${F}$ is suitably “elliptic” in the sense that it does not vanish or otherwise degenerate too often), one should be able to solve this equation either exactly or approximately by the formula

$\displaystyle u = F^{-1}(X,D) f,$

which is of course how one would solve this equation if one pretended that the operators ${X,D}$ were in fact scalars. Formalising this calculus rigorously leads to the theory of pseudodifferential operators, which allows one to (approximately) solve or at least simplify a much wider range of differential equations than one what can achieve with more elementary algebraic transformations (e.g. integrating factors, change of variables, variation of parameters, etc.). In quantum mechanics, a functional calculus that allows one to treat operators as if they were approximately scalar can be used to rigorously justify the correspondence principle in physics, namely that the predictions of quantum mechanics approximate that of classical mechanics in the semiclassical limit ${\hbar \rightarrow 0}$.

There is no universal functional calculus that works in all situations; the strongest functional calculi, which are close to being an exact *-homomorphisms on very large class of functions, tend to only work for under very restrictive hypotheses on ${A}$ or ${A_1,\ldots,A_k}$ (in particular, when ${k > 1}$, one needs the ${A_1,\ldots,A_k}$ to commute either exactly, or very close to exactly), while there are weaker functional calculi which have fewer nice properties and only work for a very small class of functions, but can be applied to quite general operators ${A}$ or ${A_1,\ldots,A_k}$. In some cases the functional calculus is only formal, in the sense that ${f(A)}$ or ${f(A_1,\ldots,A_k)}$ has to be interpreted as an infinite formal series that does not converge in a traditional sense. Also, when one wishes to select a functional calculus on non-commuting operators ${A_1,\ldots,A_k}$, there is a certain amount of non-uniqueness: one generally has a number of slightly different functional calculi to choose from, which generally have the same properties but differ in some minor technical details (particularly with regards to the behaviour of “lower order” components of the calculus). This is a similar to how one has a variety of slightly different coordinate systems available to parameterise a Riemannian manifold or Lie group. This is on contrast to the ${k=1}$ case when the underlying operator ${A = A_1}$ is (essentially) normal (so that ${A}$ commutes with ${A^*}$); in this special case (which includes the important subcases when ${A}$ is unitary or (essentially) self-adjoint), spectral theory gives us a canonical and very powerful functional calculus which can be used without further modification in applications.

Despite this lack of uniqueness, there is one standard choice for a functional calculus available for general operators ${A_1,\ldots,A_k}$, namely the Weyl functional calculus; it is analogous in some ways to normal coordinates for Riemannian manifolds, or exponential coordinates of the first kind for Lie groups, in that it treats lower order terms in a reasonably nice fashion. (But it is important to keep in mind that, like its analogues in Riemannian geometry or Lie theory, there will be some instances in which the Weyl calculus is not the optimal calculus to use for the application at hand.)

I decided to write some notes on the Weyl functional calculus (also known as Weyl quantisation), and to sketch the applications of this calculus both to the theory of pseudodifferential operators. They are mostly for my own benefit (so that I won’t have to redo these particular calculations again), but perhaps they will also be of interest to some readers here. (Of course, this material is also covered in many other places. e.g. Folland’s “harmonic analysis in phase space“.)

Nonstandard analysis is a mathematical framework in which one extends the standard mathematical universe ${{\mathfrak U}}$ of standard numbers, standard sets, standard functions, etc. into a larger nonstandard universe ${{}^* {\mathfrak U}}$ of nonstandard numbers, nonstandard sets, nonstandard functions, etc., somewhat analogously to how one places the real numbers inside the complex numbers, or the rationals inside the reals. This nonstandard universe enjoys many of the same properties as the standard one; in particular, we have the transfer principle that asserts that any statement in the language of first order logic is true in the standard universe if and only if it is true in the nonstandard one. (For instance, because Fermat’s last theorem is known to be true for standard natural numbers, it is automatically true for nonstandard natural numbers as well.) However, the nonstandard universe also enjoys some additional useful properties that the standard one does not, most notably the countable saturation property, which is a property somewhat analogous to the completeness property of a metric space; much as metric completeness allows one to assert that the intersection of a countable family of nested closed balls is non-empty, countable saturation allows one to assert that the intersection of a countable family of nested satisfiable formulae is simultaneously satisfiable. (See this previous blog post for more on the analogy between the use of nonstandard analysis and the use of metric completions.) Furthermore, by viewing both the standard and nonstandard universes externally (placing them both inside a larger metatheory, such as a model of Zermelo-Frankel-Choice (ZFC) set theory; in some more advanced set-theoretic applications one may also wish to add some large cardinal axioms), one can place some useful additional definitions and constructions on these universes, such as defining the concept of an infinitesimal nonstandard number (a number which is smaller in magnitude than any positive standard number). The ability to rigorously manipulate infinitesimals is of course one of the most well-known advantages of working with nonstandard analysis.

To build a nonstandard universe ${{}^* {\mathfrak U}}$ from a standard one ${{\mathfrak U}}$, the most common approach is to take an ultrapower of ${{\mathfrak U}}$ with respect to some non-principal ultrafilter over the natural numbers; see e.g. this blog post for details. Once one is comfortable with ultrafilters and ultrapowers, this becomes quite a simple and elegant construction, and greatly demystifies the nature of nonstandard analysis.

On the other hand, nonprincipal ultrafilters do have some unappealing features. The most notable one is that their very existence requires the axiom of choice (or more precisely, a weaker form of this axiom known as the boolean prime ideal theorem). Closely related to this is the fact that one cannot actually write down any explicit example of a nonprincipal ultrafilter, but must instead rely on nonconstructive tools such as Zorn’s lemma, the Hahn-Banach theorem, Tychonoff’s theorem, the Stone-Cech compactification, or the boolean prime ideal theorem to locate one. As such, ultrafilters definitely belong to the “infinitary” side of mathematics, and one may feel that it is inappropriate to use such tools for “finitary” mathematical applications, such as those which arise in hard analysis. From a more practical viewpoint, because of the presence of the infinitary ultrafilter, it can be quite difficult (though usually not impossible, with sufficient patience and effort) to take a finitary result proven via nonstandard analysis and coax an effective quantitative bound from it.

There is however a “cheap” version of nonstandard analysis which is less powerful than the full version, but is not as infinitary in that it is constructive (in the sense of not requiring any sort of choice-type axiom), and which can be translated into standard analysis somewhat more easily than a fully nonstandard argument; indeed, a cheap nonstandard argument can often be presented (by judicious use of asymptotic notation) in a way which is nearly indistinguishable from a standard one. It is obtained by replacing the nonprincipal ultrafilter in fully nonstandard analysis with the more classical Fréchet filter of cofinite subsets of the natural numbers, which is the filter that implicitly underlies the concept of the classical limit ${\lim_{{\bf n} \rightarrow \infty} a_{\bf n}}$ of a sequence when the underlying asymptotic parameter ${{\bf n}}$ goes off to infinity. As such, “cheap nonstandard analysis” aligns very well with traditional mathematics, in which one often allows one’s objects to be parameterised by some external parameter such as ${{\bf n}}$, which is then allowed to approach some limit such as ${\infty}$. The catch is that the Fréchet filter is merely a filter and not an ultrafilter, and as such some of the key features of fully nonstandard analysis are lost. Most notably, the law of the excluded middle does not transfer over perfectly from standard analysis to cheap nonstandard analysis; much as there exist bounded sequences of real numbers (such as ${0,1,0,1,\ldots}$) which do not converge to a (classical) limit, there exist statements in cheap nonstandard analysis which are neither true nor false (at least without passing to a subsequence, see below). The loss of such a fundamental law of mathematical reasoning may seem like a major disadvantage for cheap nonstandard analysis, and it does indeed make cheap nonstandard analysis somewhat weaker than fully nonstandard analysis. But in some situations (particularly when one is reasoning in a “constructivist” or “intuitionistic” fashion, and in particular if one is avoiding too much reliance on set theory) it turns out that one can survive the loss of this law; and furthermore, the law of the excluded middle is still available for standard analysis, and so one can often proceed by working from time to time in the standard universe to temporarily take advantage of this law, and then transferring the results obtained there back to the cheap nonstandard universe once one no longer needs to invoke the law of the excluded middle. Furthermore, the law of the excluded middle can be recovered by adopting the freedom to pass to subsequences with regards to the asymptotic parameter ${{\bf n}}$; this technique is already in widespread use in the analysis of partial differential equations, although it is generally referred to by names such as “the compactness method” rather than as “cheap nonstandard analysis”.

Below the fold, I would like to describe this cheap version of nonstandard analysis, which I think can serve as a pedagogical stepping stone towards fully nonstandard analysis, as it is formally similar to (though weaker than) fully nonstandard analysis, but on the other hand is closer in practice to standard analysis. As we shall see below, the relation between cheap nonstandard analysis and standard analysis is analogous in many ways to the relation between probabilistic reasoning and deterministic reasoning; it also resembles somewhat the preference in much of modern mathematics for viewing mathematical objects as belonging to families (or to categories) to be manipulated en masse, rather than treating each object individually. (For instance, nonstandard analysis can be used as a partial substitute for scheme theory in order to obtain uniformly quantitative results in algebraic geometry, as discussed for instance in this previous blog post.)

One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function ${f}$ is restricted to a narrow region of physical space, then its Fourier transform ${\hat f}$ must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.

In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function ${f: {\bf Z} \rightarrow {\bf C}}$ on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support ${\hbox{supp}(f) := \{ n \in {\bf Z}: f(n) \neq 0 \}}$ of this function is finite (in practice, we will only work with functions that are supported in an interval ${[M+1,M+N]}$ for some natural numbers ${M,N}$). Then we can define the Fourier transform ${\hat f: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ by the formula

$\displaystyle \hat f(\xi) := \sum_{n \in {\bf Z}} f(n) e(-n\xi)$

where ${e(x) := e^{2\pi i x}}$. (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)

The classical uncertainty principle, in this context, asserts that if ${f}$ is localised in an interval of length ${N}$, then ${\hat f}$ must be “smeared out” at a scale of at least ${1/N}$ (and essentially constant at scales less than ${1/N}$). For instance, if ${f}$ is supported in ${[M+1,M+N]}$, then we have the Plancherel identity

$\displaystyle \int_{{\bf R}/{\bf Z}} |\hat f(\xi)|^2\ d\xi = \| f \|_{\ell^2({\bf Z})}^2 \ \ \ \ \ (1)$

while from the Cauchy-Schwarz inequality we have

$\displaystyle |\hat f(\xi)| \leq N^{1/2} \| f \|_{\ell^2({\bf Z})}$

for each frequency ${\xi}$, and in particular that

$\displaystyle \int_I |\hat f(\xi)|^2\ d\xi \leq N |I| \| f \|_{\ell^2({\bf Z})}^2$

for any arc ${I}$ in the unit circle (with ${|I|}$ denoting the length of ${I}$). In particular, an interval of length significantly less than ${1/N}$ can only capture a fraction of the ${L^2}$ energy of the Fourier transform of ${\hat f}$, which is consistent with the above informal statement of the uncertainty principle.

Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if ${f}$ is supported in ${[M+1,M+N]}$, and ${\xi_1,\ldots,\xi_J}$ are frequencies in ${{\bf R}/{\bf Z}}$ that are ${\delta}$-separated for some ${\delta>0}$, thus ${\| \xi_i-\xi_j\|_{{\bf R}/{\bf Z}} \geq \delta}$ for all ${1 \leq i (where ${\|\xi\|_{{\bf R}/{\bf Z}}}$ denotes the distance of ${\xi}$ to the origin in ${{\bf R}/{\bf Z}}$), then

$\displaystyle \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2. \ \ \ \ \ (2)$

The reader is encouraged to see how this inequality is consistent with the Plancherel identity (1) and the intuition that ${\hat f}$ is essentially constant at scales less than ${1/N}$. The factor ${N + \frac{1}{\delta}}$ can in fact be amplified a little bit to ${N + \frac{1}{\delta} - 1}$, which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates ${[M+1,M+N]}$ to ${[MK+K, MK+NK]}$ (and replaces each frequency ${\xi_j}$ by their ${K^{th}}$ roots), and then sending ${K \rightarrow \infty}$ (cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.

In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric ${d_\infty(n,m) := |n-m|}$ on the integers ${{\bf Z}}$ (in particular, the parameter ${N}$ is essentially the Archimedean diameter of the support of ${f}$). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the ${p}$-adic metrics play an equally important role; indeed, it is common to unify the Archimedean and ${p}$-adic perspectives together into a unified adelic perspective. In the ${p}$-adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of ${p}$. Intersecting these balls from different ${p}$-adic metrics together, we obtain residue classes with respect to various moduli ${q}$ (which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo ${q}$“. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).

In this context, the uncertainty principle is this: the more residue classes modulo ${q}$ that ${f}$ avoids, the more the Fourier transform ${\hat f}$ must spread out along multiples of ${1/q}$. To illustrate a very simple example of this principle, let us take ${q=2}$, and suppose that ${f}$ is supported only on odd numbers (thus it completely avoids the residue class ${0 \mod 2}$). We write out the formulae for ${\hat f(\xi)}$ and ${\hat f(\xi+1/2)}$:

$\displaystyle \hat f(\xi) = \sum_n f(n) e(-n\xi)$

$\displaystyle \hat f(\xi+1/2) = \sum_n f(n) e(-n\xi) e(-n/2).$

If ${f}$ is supported on the odd numbers, then ${e(-n/2)}$ is always equal to ${-1}$ on the support of ${f}$, and so we have ${\hat f(\xi+1/2)=-\hat f(\xi)}$. Thus, whenever ${\hat f}$ has a significant presence at a frequency ${\xi}$, it also must have an equally significant presence at the frequency ${\xi+1/2}$; there is a spreading out across multiples of ${1/2}$. Note that one has a similar effect if ${f}$ was supported instead on the even integers instead of the odd integers.

A little more generally, suppose now that ${f}$ avoids a single residue class modulo a prime ${p}$; for sake of argument let us say that it avoids the zero residue class ${0 \mod p}$, although the situation for the other residue classes is similar. For any frequency ${\xi}$ and any ${j=0,\ldots,p-1}$, one has

$\displaystyle \hat f(\xi+j/p) = \sum_n f(n) e(-n\xi) e(-jn/p).$

From basic Fourier analysis, we know that the phases ${e(-jn/p)}$ sum to zero as ${j}$ ranges from ${0}$ to ${p-1}$ whenever ${n}$ is not a multiple of ${p}$. We thus have

$\displaystyle \sum_{j=0}^{p-1} \hat f(\xi+j/p) = 0.$

In particular, if ${\hat f(\xi)}$ is large, then one of the other ${\hat f(\xi+j/p)}$ has to be somewhat large as well; using the Cauchy-Schwarz inequality, we can quantify this assertion in an ${\ell^2}$ sense via the inequality

$\displaystyle \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{1}{p-1} |\hat f(\xi)|^2. \ \ \ \ \ (3)$

Let us continue this analysis a bit further. Now suppose that ${f}$ avoids ${\omega(p)}$ residue classes modulo a prime ${p}$, for some ${0 \leq \omega(p) < p}$. (We exclude the case ${\omega(p)=p}$ as it is clearly degenerates by forcing ${f}$ to be identically zero.) Let ${\nu(n)}$ be the function that equals ${1}$ on these residue classes and zero away from these residue classes, then

$\displaystyle \sum_n f(n) e(-n\xi) \nu(n) = 0.$

Using the periodic Fourier transform, we can write

$\displaystyle \nu(n) = \sum_{j=0}^{p-1} c(j) e(-jn/p)$

for some coefficients ${c(j)}$, thus

$\displaystyle \sum_{j=0}^{p-1} \hat f(\xi+j/p) c(j) = 0.$

Some Fourier-analytic computations reveal that

$\displaystyle c(0) = \frac{\omega(p)}{p}$

and

$\displaystyle \sum_{j=0}^{p-1} |c(j)|^2 = \frac{\omega(p)}{p}$

and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):

$\displaystyle \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{\omega(p)}{p-\omega(p)} |\hat f(\xi)|^2.$

Thus we see that the more residue classes mod ${p}$ we exclude, the more Fourier energy has to disperse along multiples of ${1/p}$. It is also instructive to consider the extreme case ${\omega(p)=p-1}$, in which ${f}$ is supported on just a single residue class ${a \mod p}$; in this case, one clearly has ${\hat f(\xi+j/p) = e(-aj/p) \hat f(\xi)}$, and so ${\hat f}$ spreads its energy completely evenly along multiples of ${1/p}$.

In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:

Proposition 1 (Montgomery’s uncertainty principle) Let ${f: {\bf Z} \rightarrow{\bf C}}$ be a finitely supported function which, for each prime ${p}$, avoids ${\omega(p)}$ residue classes modulo ${p}$ for some ${0 \leq \omega(p) < p}$. Then for each natural number ${q}$, one has

$\displaystyle \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi+\frac{a}{q})|^2 \geq h(q) |\hat f(\xi)|^2$

where ${h(q)}$ is the quantity

$\displaystyle h(q) := \mu(q)^2 \prod_{p|q} \frac{\omega(p)}{p-\omega(p)} \ \ \ \ \ (4)$

where ${\mu}$ is the Möbius function.

We give a proof of this proposition below the fold.

Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the ${p}$-adic senses. This leads to the following corollary:

Corollary 2 (Arithmetic large sieve inequality) Let ${f: {\bf Z} \rightarrow {\bf C}}$ be a function supported on an interval ${[M+1,M+N]}$ which, for each prime ${p}$, avoids ${\omega(p)}$ residue classes modulo ${p}$ for some ${0 \leq \omega(p) < p}$. Let ${\xi_1,\ldots,\xi_J \in {\bf R}/{\bf Z}}$, and let ${{\mathcal Q}}$ be a finite set of natural numbers. Suppose that the frequencies ${\xi_j + \frac{a}{q}}$ with ${1 \leq j \leq J}$, ${q \in {\mathcal Q}}$, and ${(a,q)=1}$ are ${\delta}$-separated. Then one has

$\displaystyle \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq \frac{N + \frac{1}{\delta}}{\sum_{q \in {\mathcal Q}} h(q)} \| f \|_{\ell^2({\bf Z})}^2$

where ${h(q)}$ was defined in (4).

Indeed, from the large sieve inequality one has

$\displaystyle \sum_{j=1}^J \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2$

while from Proposition 1 one has

$\displaystyle \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \geq (\sum_{q \in {\mathcal Q}} h(q)) |\hat f(\xi_j)|^2$

whence the claim.

There is a great deal of flexibility in the above inequality, due to the ability to select the set ${{\mathcal Q}}$, the frequencies ${\xi_1,\ldots,\xi_J}$, the omitted classes ${\omega(p)}$, and the separation parameter ${\delta}$. Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:

Corollary 3 (Large sieve) Let ${A}$ be a set of integers contained in ${[M+1,M+N]}$ which avoids ${\omega(p)}$ residue classes modulo ${p}$ for each prime ${p}$, and let ${R>0}$. Then

$\displaystyle |A| \leq \frac{N+R^2}{G(R)}$

where

$\displaystyle G(R) := \sum_{q \leq R} h(q).$

Proof: We apply Corollary 2 with ${f = 1_A}$, ${J=1}$, ${\delta=1/R^2}$, ${\xi_1=0}$, and ${{\mathcal Q} := \{ q: q \leq R\}}$. The key point is that the Farey sequence of fractions ${a/q}$ with ${q \leq R}$ and ${(a,q)=1}$ is ${1/R^2}$-separated, since

$\displaystyle \| \frac{a}{q}-\frac{a'}{q'} \|_{{\bf R}/{\bf Z}} \geq \frac{1}{qq'} \geq \frac{1}{R^2}$

whenever ${a/q, a'/q'}$ are distinct fractions in this sequence. $\Box$

If, for instance, ${A}$ is the set of all primes in ${[M+1,M+N]}$ larger than ${R}$, then one can set ${\omega(p)=1}$ for all ${p \leq R}$, which makes ${h(q) = \frac{\mu^2(q)}{\phi(q)}}$, where ${\phi}$ is the Euler totient function. It is a classical estimate that

$\displaystyle G(R) = \log R + O(1).$

Using this fact and optimising in ${R}$, we obtain (a special case of) the Brun-Titchmarsh inequality

$\displaystyle \pi(M+N)-\pi(M) \leq (2+o_{N \rightarrow \infty}(1)) \frac{N}{\log N}$

where ${\pi(x)}$ is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality

$\displaystyle \pi(M+N;a,q)-\pi(M;a,q) \leq (2+o_{N \rightarrow \infty;q}(1)) \frac{q}{\phi(q)} \frac{N}{\log N}$

for any primitive residue class ${a \mod q}$, where ${\pi(M;a,q)}$ is the number of primes less than or equal to ${M}$ that are congruent to ${a \mod q}$. By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality

$\displaystyle \pi(M+N;a,q)-\pi(M;a,q) \leq 2 \frac{q}{\phi(q)} \frac{N}{\log N}$

for any natural numbers ${M,N,a,q}$ with ${N>1}$. This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.

I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.

In the previous set of notes we introduced the notion of expansion in arbitrary ${k}$-regular graphs. For the rest of the course, we will now focus attention primarily to a special type of ${k}$-regular graph, namely a Cayley graph.

Definition 1 (Cayley graph) Let ${G = (G,\cdot)}$ be a group, and let ${S}$ be a finite subset of ${G}$. We assume that ${S}$ is symmetric (thus ${s^{-1} \in S}$ whenever ${s \in S}$) and does not contain the identity ${1}$ (this is to avoid loops). Then the (right-invariant) Cayley graph ${Cay(G,S)}$ is defined to be the graph with vertex set ${G}$ and edge set ${\{ \{sx,x\}: x \in G, s \in S \}}$, thus each vertex ${x \in G}$ is connected to the ${|S|}$ elements ${sx}$ for ${s \in S}$, and so ${Cay(G,S)}$ is a ${|S|}$-regular graph.

Example 1 The graph in Exercise 3 of Notes 1 is the Cayley graph on ${{\bf Z}/N{\bf Z}}$ with generators ${S = \{-1,+1\}}$.

Remark 1 We call the above Cayley graphs right-invariant because every right translation ${x\mapsto xg}$ on ${G}$ is a graph automorphism of ${Cay(G,S)}$. This group of automorphisms acts transitively on the vertex set of the Cayley graph. One can thus view a Cayley graph as a homogeneous space of ${G}$, as it “looks the same” from every vertex. One could of course also consider left-invariant Cayley graphs, in which ${x}$ is connected to ${xs}$ rather than ${sx}$. However, the two such graphs are isomorphic using the inverse map ${x \mapsto x^{-1}}$, so we may without loss of generality restrict our attention throughout to left Cayley graphs.

Remark 2 For minor technical reasons, it will be convenient later on to allow ${S}$ to contain the identity and to come with multiplicity (i.e. it will be a multiset rather than a set). If one does so, of course, the resulting Cayley graph will now contain some loops and multiple edges.

For the purposes of building expander families, we would of course want the underlying group ${G}$ to be finite. However, it will be convenient at various times to “lift” a finite Cayley graph up to an infinite one, and so we permit ${G}$ to be infinite in our definition of a Cayley graph.

We will also sometimes consider a generalisation of a Cayley graph, known as a Schreier graph:

Definition 2 (Schreier graph) Let ${G}$ be a finite group that acts (on the left) on a space ${X}$, thus there is a map ${(g,x) \mapsto gx}$ from ${G \times X}$ to ${X}$ such that ${1x = x}$ and ${(gh)x = g(hx)}$ for all ${g,h \in G}$ and ${x \in X}$. Let ${S}$ be a symmetric subset of ${G}$ which acts freely on ${X}$ in the sense that ${sx \neq x}$ for all ${s \in S}$ and ${x \in X}$, and ${sx \neq s'x}$ for all distinct ${s,s' \in S}$ and ${x \in X}$. Then the Schreier graph ${Sch(X,S)}$ is defined to be the graph with vertex set ${X}$ and edge set ${\{ \{sx,x\}: x \in X, s \in S \}}$.

Example 2 Every Cayley graph ${Cay(G,S)}$ is also a Schreier graph ${Sch(G,S)}$, using the obvious left-action of ${G}$ on itself. The ${k}$-regular graphs formed from ${l}$ permutations ${\pi_1,\ldots,\pi_l \in S_n}$ that were studied in the previous set of notes is also a Schreier graph provided that ${\pi_i(v) \neq v, \pi_i^{-1}(v), \pi_j(v)}$ for all distinct ${1 \leq i,j \leq l}$, with the underlying group being the permutation group ${S_n}$ (which acts on the vertex set ${X = \{1,\ldots,n\}}$ in the obvious manner), and ${S := \{\pi_1,\ldots,\pi_l,\pi_1^{-1},\ldots,\pi_l^{-1}\}}$.

Exercise 1 If ${k}$ is an even integer, show that every ${k}$-regular graph is a Schreier graph involving a set ${S}$ of generators of cardinality ${k/2}$. (Hint: first show that every ${k}$-regular graph can be decomposed into ${k/2}$ unions of cycles, each of which partition the vertex set, then use the previous example.

We return now to Cayley graphs. It is easy to characterise qualitative expansion properties of Cayley graphs:

Exercise 2 (Qualitative expansion) Let ${Cay(G,S)}$ be a finite Cayley graph.

• (i) Show that ${Cay(G,S)}$ is a one-sided ${\epsilon}$-expander for ${G}$ for some ${\epsilon>0}$ if and only if ${S}$ generates ${G}$.
• (ii) Show that ${Cay(G,S)}$ is a two-sided ${\epsilon}$-expander for ${G}$ for some ${\epsilon>0}$ if and only if ${S}$ generates ${G}$, and furthermore ${S}$ intersects each index ${2}$ subgroup of ${G}$.

We will however be interested in more quantitative expansion properties, in which the expansion constant ${\epsilon}$ is independent of the size of the Cayley graph, so that one can construct non-trivial expander families ${Cay(G_n,S_n)}$ of Cayley graphs.

One can analyse the expansion of Cayley graphs in a number of ways. For instance, by taking the edge expansion viewpoint, one can study Cayley graphs combinatorially, using the product set operation

$\displaystyle A \cdot B := \{ab: a \in A, b \in B \}$

of subsets of ${G}$.

Exercise 3 (Combinatorial description of expansion) Let ${Cay(G_n,S_n)}$ be a family of finite ${k}$-regular Cayley graphs. Show that ${Cay(G_n,S_n)}$ is a one-sided expander family if and only if there is a constant ${c>0}$ independent of ${n}$ such that ${|E_n \cup E_n S_n| \geq (1+c) |E_n|}$ for all sufficiently large ${n}$ and all subsets ${E_n}$ of ${G_n}$ with ${|E_n| \leq |G_n|/2}$.

One can also give a combinatorial description of two-sided expansion, but it is more complicated and we will not use it here.

Exercise 4 (Abelian groups do not expand) Let ${Cay(G_n,S_n)}$ be a family of finite ${k}$-regular Cayley graphs, with the ${G_n}$ all abelian, and the ${S_n}$ generating ${G_n}$. Show that ${Cay(G_n,S_n)}$ are a one-sided expander family if and only if the Cayley graphs have bounded cardinality (i.e. ${\sup_n |G_n| < \infty}$). (Hint: assume for contradiction that ${Cay(G_n,S_n)}$ is a one-sided expander family with ${|G_n| \rightarrow \infty}$, and show by two different arguments that ${\sup_n |S_n^m|}$ grows at least exponentially in ${m}$ and also at most polynomially in ${m}$, giving the desired contradiction.)

The left-invariant nature of Cayley graphs also suggests that such graphs can be profitably analysed using some sort of Fourier analysis; as the underlying symmetry group is not necessarily abelian, one should use the Fourier analysis of non-abelian groups, which is better known as (unitary) representation theory. The Fourier-analytic nature of Cayley graphs can be highlighted by recalling the operation of convolution of two functions ${f, g \in \ell^2(G)}$, defined by the formula

$\displaystyle f * g(x) := \sum_{y \in G} f(y) g(y^{-1} x) = \sum_{y \in G} f(x y^{-1}) g(y).$

This convolution operation is bilinear and associative (at least when one imposes a suitable decay condition on the functions, such as compact support), but is not commutative unless ${G}$ is abelian. (If one is more algebraically minded, one can also identify ${\ell^2(G)}$ (when ${G}$ is finite, at least) with the group algebra ${{\bf C} G}$, in which case convolution is simply the multiplication operation in this algebra.) The adjacency operator ${A}$ on a Cayley graph ${Cay(G,S)}$ can then be viewed as a convolution

$\displaystyle Af = |S| \mu * f,$

where ${\mu}$ is the probability density

$\displaystyle \mu := \frac{1}{|S|} \sum_{s \in S} \delta_s \ \ \ \ \ (1)$

where ${\delta_s}$ is the Kronecker delta function on ${s}$. Using the spectral definition of expansion, we thus see that ${Cay(G,S)}$ is a one-sided expander if and only if

$\displaystyle \langle f, \mu*f \rangle \leq (1-\epsilon) \|f\|_{\ell^2(G)} \ \ \ \ \ (2)$

whenever ${f \in \ell^2(G)}$ is orthogonal to the constant function ${1}$, and is a two-sided expander if

$\displaystyle \| \mu*f \|_{\ell^2(G)} \leq (1-\epsilon) \|f\|_{\ell^2(G)} \ \ \ \ \ (3)$

whenever ${f \in \ell^2(G)}$ is orthogonal to the constant function ${1}$.

We remark that the above spectral definition of expansion can be easily extended to symmetric sets ${S}$ which contain the identity or have multiplicity (i.e. are multisets). (We retain symmetry, though, in order to keep the operation of convolution by ${\mu}$ self-adjoint.) In particular, one can say (with some slight abuse of notation) that a set of elements ${s_1,\ldots,s_l}$ of ${G}$ (possibly with repetition, and possibly with some elements equalling the identity) generates a one-sided or two-sided ${\epsilon}$-expander if the associated symmetric probability density

$\displaystyle \mu := \frac{1}{2l} \sum_{i=1}^l \delta_{s_i} + \delta_{s_i^{-1}}$

obeys either (2) or (3).

We saw in the last set of notes that expansion can be characterised in terms of random walks. One can of course specialise this characterisation to the Cayley graph case:

Exercise 5 (Random walk description of expansion) Let ${Cay(G_n,S_n)}$ be a family of finite ${k}$-regular Cayley graphs, and let ${\mu_n}$ be the associated probability density functions. Let ${A > 1/2}$ be a constant.

• Show that the ${Cay(G_n,S_n)}$ are a two-sided expander family if and only if there exists a ${C>0}$ such that for all sufficiently large ${n}$, one has ${\| \mu_n^{*m} - \frac{1}{|G_n|} \|_{\ell^2(G_n)} \leq \frac{1}{|G_n|^A}}$ for some ${m \leq C \log |G_n|}$, where ${\mu_n^{*m} := \mu_n * \ldots * \mu_n}$ denotes the convolution of ${m}$ copies of ${\mu_n}$.
• Show that the ${Cay(G_n,S_n)}$ are a one-sided expander family if and only if there exists a ${C>0}$ such that for all sufficiently large ${n}$, one has ${\| (\frac{1}{2} \delta_1 + \frac{1}{2} \mu_n)^{*m} - \frac{1}{|G_n|} \|_{\ell^2(G_n)} \leq \frac{1}{|G_n|^A}}$ for some ${m \leq C \log |G_n|}$.

In this set of notes, we will connect expansion of Cayley graphs to an important property of certain infinite groups, known as Kazhdan’s property (T) (or property (T) for short). In 1973, Margulis exploited this property to create the first known explicit and deterministic examples of expanding Cayley graphs. As it turns out, property (T) is somewhat overpowered for this purpose; in particular, we now know that there are many families of Cayley graphs for which the associated infinite group does not obey property (T) (or weaker variants of this property, such as property ${\tau}$). In later notes we will therefore turn to other methods of creating Cayley graphs that do not rely on property (T). Nevertheless, property (T) is of substantial intrinsic interest, and also has many connections to other parts of mathematics than the theory of expander graphs, so it is worth spending some time to discuss it here.

The material here is based in part on this recent text on property (T) by Bekka, de la Harpe, and Valette (available online here).

In this set of notes we will be able to finally prove the Gleason-Yamabe theorem from Notes 0, which we restate here:

Theorem 1 (Gleason-Yamabe theorem) Let ${G}$ be a locally compact group. Then, for any open neighbourhood ${U}$ of the identity, there exists an open subgroup ${G'}$ of ${G}$ and a compact normal subgroup ${K}$ of ${G'}$ in ${U}$ such that ${G'/K}$ is isomorphic to a Lie group.

In the next set of notes, we will combine the Gleason-Yamabe theorem with some topological analysis (and in particular, using the invariance of domain theorem) to establish some further control on locally compact groups, and in particular obtaining a solution to Hilbert’s fifth problem.

To prove the Gleason-Yamabe theorem, we will use three major tools developed in previous notes. The first (from Notes 2) is a criterion for Lie structure in terms of a special type of metric, which we will call a Gleason metric:

Definition 2 Let ${G}$ be a topological group. A Gleason metric on ${G}$ is a left-invariant metric ${d: G \times G \rightarrow {\bf R}^+}$ which generates the topology on ${G}$ and obeys the following properties for some constant ${C>0}$, writing ${\|g\|}$ for ${d(g,\hbox{id})}$:

• (Escape property) If ${g \in G}$ and ${n \geq 1}$ is such that ${n \|g\| \leq \frac{1}{C}}$, then ${\|g^n\| \geq \frac{1}{C} n \|g\|}$.
• (Commutator estimate) If ${g, h \in G}$ are such that ${\|g\|, \|h\| \leq \frac{1}{C}}$, then

$\displaystyle \|[g,h]\| \leq C \|g\| \|h\|, \ \ \ \ \ (1)$

where ${[g,h] := g^{-1}h^{-1}gh}$ is the commutator of ${g}$ and ${h}$.

Theorem 3 (Building Lie structure from Gleason metrics) Let ${G}$ be a locally compact group that has a Gleason metric. Then ${G}$ is isomorphic to a Lie group.

The second tool is the existence of a left-invariant Haar measure on any locally compact group; see Theorem 3 from Notes 3. Finally, we will also need the compact case of the Gleason-Yamabe theorem (Theorem 8 from Notes 3), which was proven via the Peter-Weyl theorem:

Theorem 4 (Gleason-Yamabe theorem for compact groups) Let ${G}$ be a compact Hausdorff group, and let ${U}$ be a neighbourhood of the identity. Then there exists a compact normal subgroup ${H}$ of ${G}$ contained in ${U}$ such that ${G/H}$ is isomorphic to a linear group (i.e. a closed subgroup of a general linear group ${GL_n({\bf C})}$).

To finish the proof of the Gleason-Yamabe theorem, we have to somehow use the available structures on locally compact groups (such as Haar measure) to build good metrics on those groups (or on suitable subgroups or quotient groups). The basic construction is as follows:

Definition 5 (Building metrics out of test functions) Let ${G}$ be a topological group, and let ${\psi: G \rightarrow {\bf R}^+}$ be a bounded non-negative function. Then we define the pseudometric ${d_\psi: G \times G \rightarrow {\bf R}^+}$ by the formula

$\displaystyle d_\psi(g,h) := \sup_{x \in G} |\tau(g) \psi(x) - \tau(h) \psi(x)|$

$\displaystyle = \sup_{x \in G} |\psi(g^{-1} x ) - \psi(h^{-1} x)|$

and the semi-norm ${\| \|_\psi: G \rightarrow {\bf R}^+}$ by the formula

$\displaystyle \|g\|_\psi := d_\psi(g, \hbox{id}).$

Note that one can also write

$\displaystyle \|g\|_\psi = \sup_{x \in G} |\partial_g \psi(x)|$

where ${\partial_g \psi(x) := \psi(x) - \psi(g^{-1} x)}$ is the “derivative” of ${\psi}$ in the direction ${g}$.

Exercise 1 Let the notation and assumptions be as in the above definition. For any ${g,h,k \in G}$, establish the metric-like properties

1. (Identity) ${d_\psi(g,h) \geq 0}$, with equality when ${g=h}$.
2. (Symmetry) ${d_\psi(g,h) = d_\psi(h,g)}$.
3. (Triangle inequality) ${d_\psi(g,k) \leq d_\psi(g,h) + d_\psi(h,k)}$.
4. (Continuity) If ${\psi \in C_c(G)}$, then the map ${d_\psi: G \times G \rightarrow {\bf R}^+}$ is continuous.
5. (Boundedness) One has ${d_\psi(g,h) \leq \sup_{x \in G} |\psi(x)|}$. If ${\psi \in C_c(G)}$ is supported in a set ${K}$, then equality occurs unless ${g^{-1} h \in K K^{-1}}$.
6. (Left-invariance) ${d_\psi(g,h) = d_\psi(kg,kh)}$. In particular, ${d_\psi(g,h) = \| h^{-1} g \|_\psi = \| g^{-1} h \|_\psi}$.

In particular, we have the norm-like properties

1. (Identity) ${\|g\|_\psi \geq 0}$, with equality when ${g=\hbox{id}}$.
2. (Symmetry) ${\|g\|_\psi = \|g^{-1}\|_\psi}$.
3. (Triangle inequality) ${\|gh\|_\psi \leq \|g\|_\psi + \|h\|_\psi}$.
4. (Continuity) If ${\psi \in C_c(G)}$, then the map ${\|\|_\psi: G \rightarrow {\bf R}^+}$ is continuous.
5. (Boundedness) One has ${\|g\|_\psi \leq \sup_{x \in G} |\psi(x)|}$. If ${\psi \in C_c(G)}$ is supported in a set ${K}$, then equality occurs unless ${g \in K K^{-1}}$.

We remark that the first three properties of ${d_\psi}$ in the above exercise ensure that ${d_\psi}$ is indeed a pseudometric.

To get good metrics (such as Gleason metrics) on groups ${G}$, it thus suffices to obtain test functions ${\psi}$ that obey suitably good “regularity” properties. We will achieve this primarily by means of two tricks. The first trick is to obtain high-regularity test functions by convolving together two low-regularity test functions, taking advantage of the existence of a left-invariant Haar measure ${\mu}$ on ${G}$. The second trick is to obtain low-regularity test functions by means of a metric-like object on ${G}$. This latter trick may seem circular, as our whole objective is to get a metric on ${G}$ in the first place, but the key point is that the metric one starts with does not need to have as many “good properties” as the metric one ends up with, thanks to the regularity-improving properties of convolution. As such, one can use a “bootstrap argument” (or induction argument) to create a good metric out of almost nothing. It is this bootstrap miracle which is at the heart of the proof of the Gleason-Yamabe theorem (and hence to the solution of Hilbert’s fifth problem).

The arguments here are based on the nonstandard analysis arguments used to establish Hilbert’s fifth problem by Hirschfeld and by Goldbring (and also some unpublished lecture notes of Goldbring and van den Dries). However, we will not explicitly use any nonstandard analysis in this post.

In the last few notes, we have been steadily reducing the amount of regularity needed on a topological group in order to be able to show that it is in fact a Lie group, in the spirit of Hilbert’s fifth problem. Now, we will work on Hilbert’s fifth problem from the other end, starting with the minimal assumption of local compactness on a topological group ${G}$, and seeing what kind of structures one can build using this assumption. (For simplicity we shall mostly confine our discussion to global groups rather than local groups for now.) In view of the preceding notes, we would like to see two types of structures emerge in particular:

• representations of ${G}$ into some more structured group, such as a matrix group ${GL_n({\bf C})}$; and
• metrics on ${G}$ that capture the escape and commutator structure of ${G}$ (i.e. Gleason metrics).

To build either of these structures, a fundamentally useful tool is that of (left-) Haar measure – a left-invariant Radon measure ${\mu}$ on ${G}$. (One can of course also consider right-Haar measures; in many cases (such as for compact or abelian groups), the two concepts are the same, but this is not always the case.) This concept generalises the concept of Lebesgue measure on Euclidean spaces ${{\bf R}^d}$, which is of course fundamental in analysis on those spaces.

Haar measures will help us build useful representations and useful metrics on locally compact groups ${G}$. For instance, a Haar measure ${\mu}$ gives rise to the regular representation ${\tau: G \rightarrow U(L^2(G,d\mu))}$ that maps each element ${g \in G}$ of ${G}$ to the unitary translation operator ${\rho(g): L^2(G,d\mu) \rightarrow L^2(G,d\mu)}$ on the Hilbert space ${L^2(G,d\mu)}$ of square-integrable measurable functions on ${G}$ with respect to this Haar measure by the formula

$\displaystyle \tau(g) f(x) := f(g^{-1} x).$

(The presence of the inverse ${g^{-1}}$ is convenient in order to obtain the homomorphism property ${\tau(gh) = \tau(g)\tau(h)}$ without a reversal in the group multiplication.) In general, this is an infinite-dimensional representation; but in many cases (and in particular, in the case when ${G}$ is compact) we can decompose this representation into a useful collection of finite-dimensional representations, leading to the Peter-Weyl theorem, which is a fundamental tool for understanding the structure of compact groups. This theorem is particularly simple in the compact abelian case, where it turns out that the representations can be decomposed into one-dimensional representations ${\chi: G \rightarrow U({\bf C}) \equiv S^1}$, better known as characters, leading to the theory of Fourier analysis on general compact abelian groups. With this and some additional (largely combinatorial) arguments, we will also be able to obtain satisfactory structural control on locally compact abelian groups as well.

The link between Haar measure and useful metrics on ${G}$ is a little more complicated. Firstly, once one has the regular representation ${\tau: G\rightarrow U(L^2(G,d\mu))}$, and given a suitable “test” function ${\psi: G \rightarrow {\bf C}}$, one can then embed ${G}$ into ${L^2(G,d\mu)}$ (or into other function spaces on ${G}$, such as ${C_c(G)}$ or ${L^\infty(G)}$) by mapping a group element ${g \in G}$ to the translate ${\tau(g) \psi}$ of ${\psi}$ in that function space. (This map might not actually be an embedding if ${\psi}$ enjoys a non-trivial translation symmetry ${\tau(g)\psi=\psi}$, but let us ignore this possibility for now.) One can then pull the metric structure on the function space back to a metric on ${G}$, for instance defining an ${L^2(G,d\mu)}$-based metric

$\displaystyle d(g,h) := \| \tau(g) \psi - \tau(h) \psi \|_{L^2(G,d\mu)}$

if ${\psi}$ is square-integrable, or perhaps a ${C_c(G)}$-based metric

$\displaystyle d(g,h) := \| \tau(g) \psi - \tau(h) \psi \|_{C_c(G)} \ \ \ \ \ (1)$

if ${\psi}$ is continuous and compactly supported (with ${\|f \|_{C_c(G)} := \sup_{x \in G} |f(x)|}$ denoting the supremum norm). These metrics tend to have several nice properties (for instance, they are automatically left-invariant), particularly if the test function is chosen to be sufficiently “smooth”. For instance, if we introduce the differentiation (or more precisely, finite difference) operators

$\displaystyle \partial_g := 1-\tau(g)$

(so that ${\partial_g f(x) = f(x) - f(g^{-1} x)}$) and use the metric (1), then a short computation (relying on the translation-invariance of the ${C_c(G)}$ norm) shows that

$\displaystyle d([g,h], \hbox{id}) = \| \partial_g \partial_h \psi - \partial_h \partial_g \psi \|_{C_c(G)}$

for all ${g,h \in G}$. This suggests that commutator estimates, such as those appearing in the definition of a Gleason metric in Notes 2, might be available if one can control “second derivatives” of ${\psi}$; informally, we would like our test functions ${\psi}$ to have a “${C^{1,1}}$” type regularity.

If ${G}$ was already a Lie group (or something similar, such as a ${C^{1,1}}$ local group) then it would not be too difficult to concoct such a function ${\psi}$ by using local coordinates. But of course the whole point of Hilbert’s fifth problem is to do without such regularity hypotheses, and so we need to build ${C^{1,1}}$ test functions ${\psi}$ by other means. And here is where the Haar measure comes in: it provides the fundamental tool of convolution

$\displaystyle \phi * \psi(x) := \int_G \phi(x y^{-1}) \psi(y) d\mu(y)$

between two suitable functions ${\phi, \psi: G \rightarrow {\bf C}}$, which can be used to build smoother functions out of rougher ones. For instance:

Exercise 1 Let ${\phi, \psi: {\bf R}^d \rightarrow {\bf C}}$ be continuous, compactly supported functions which are Lipschitz continuous. Show that the convolution ${\phi * \psi}$ using Lebesgue measure on ${{\bf R}^d}$ obeys the ${C^{1,1}}$-type commutator estimate

$\displaystyle \| \partial_g \partial_h (\phi * \psi) \|_{C_c({\bf R}^d)} \leq C \|g\| \|h\|$

for all ${g,h \in {\bf R}^d}$ and some finite quantity ${C}$ depending only on ${\phi, \psi}$.

This exercise suggests a strategy to build Gleason metrics by convolving together some “Lipschitz” test functions and then using the resulting convolution as a test function to define a metric. This strategy may seem somewhat circular because one needs a notion of metric in order to define Lipschitz continuity in the first place, but it turns out that the properties required on that metric are weaker than those that the Gleason metric will satisfy, and so one will be able to break the circularity by using a “bootstrap” or “induction” argument.

We will discuss this strategy – which is due to Gleason, and is fundamental to all currently known solutions to Hilbert’s fifth problem – in later posts. In this post, we will construct Haar measure on general locally compact groups, and then establish the Peter-Weyl theorem, which in turn can be used to obtain a reasonably satisfactory structural classification of both compact groups and locally compact abelian groups.

My graduate text on measure theory (based on these lecture notes) is now published by the AMS as part of the Graduate Studies in Mathematics series.  (See also my own blog page for this book, which among other things contains a draft copy of the book in PDF format.)

The classical inverse function theorem reads as follows:

Theorem 1 (${C^1}$ inverse function theorem) Let ${\Omega \subset {\bf R}^n}$ be an open set, and let ${f: \Omega \rightarrow {\bf R}^n}$ be an continuously differentiable function, such that for every ${x_0 \in \Omega}$, the derivative map ${Df(x_0): {\bf R}^n \rightarrow {\bf R}^n}$ is invertible. Then ${f}$ is a local homeomorphism; thus, for every ${x_0 \in \Omega}$, there exists an open neighbourhood ${U}$ of ${x_0}$ and an open neighbourhood ${V}$ of ${f(x_0)}$ such that ${f}$ is a homeomorphism from ${U}$ to ${V}$.

It is also not difficult to show by inverting the Taylor expansion

$\displaystyle f(x) = f(x_0) + Df(x_0)(x-x_0) + o(\|x-x_0\|)$

that at each ${x_0}$, the local inverses ${f^{-1}: V \rightarrow U}$ are also differentiable at ${f(x_0)}$ with derivative

$\displaystyle Df^{-1}(f(x_0)) = Df(x_0)^{-1}. \ \ \ \ \ (1)$

The textbook proof of the inverse function theorem proceeds by an application of the contraction mapping theorem. Indeed, one may normalise ${x_0=f(x_0)=0}$ and ${Df(0)}$ to be the identity map; continuity of ${Df}$ then shows that ${Df(x)}$ is close to the identity for small ${x}$, which may be used (in conjunction with the fundamental theorem of calculus) to make ${x \mapsto x-f(x)+y}$ a contraction on a small ball around the origin for small ${y}$, at which point the contraction mapping theorem readily finishes off the problem.

I recently learned (after I asked this question on Math Overflow) that the hypothesis of continuous differentiability may be relaxed to just everywhere differentiability:

Theorem 2 (Everywhere differentiable inverse function theorem) Let ${\Omega \subset {\bf R}^n}$ be an open set, and let ${f: \Omega \rightarrow {\bf R}^n}$ be an everywhere differentiable function, such that for every ${x_0 \in \Omega}$, the derivative map ${Df(x_0): {\bf R}^n \rightarrow {\bf R}^n}$ is invertible. Then ${f}$ is a local homeomorphism; thus, for every ${x_0 \in \Omega}$, there exists an open neighbourhood ${U}$ of ${x_0}$ and an open neighbourhood ${V}$ of ${f(x_0)}$ such that ${f}$ is a homeomorphism from ${U}$ to ${V}$.

As before, one can recover the differentiability of the local inverses, with the derivative of the inverse given by the usual formula (1).

This result implicitly follows from the more general results of Cernavskii about the structure of finite-to-one open and closed maps, however the arguments there are somewhat complicated (and subsequent proofs of those results, such as the one by Vaisala, use some powerful tools from algebraic geometry, such as dimension theory). There is however a more elementary proof of Saint Raymond that was pointed out to me by Julien Melleray. It only uses basic point-set topology (for instance, the concept of a connected component) and the basic topological and geometric structure of Euclidean space (in particular relying primarily on local compactness, local connectedness, and local convexity). I decided to present (an arrangement of) Saint Raymond’s proof here.

To obtain a local homeomorphism near ${x_0}$, there are basically two things to show: local surjectivity near ${x_0}$ (thus, for ${y}$ near ${f(x_0)}$, one can solve ${f(x)=y}$ for some ${x}$ near ${x_0}$) and local injectivity near ${x_0}$ (thus, for distinct ${x_1, x_2}$ near ${f(x_0)}$, ${f(x_1)}$ is not equal to ${f(x_2)}$). Local surjectivity is relatively easy; basically, the standard proof of the inverse function theorem works here, after replacing the contraction mapping theorem (which is no longer available due to the possibly discontinuous nature of ${Df}$) with the Brouwer fixed point theorem instead (or one could also use degree theory, which is more or less an equivalent approach). The difficulty is local injectivity – one needs to preclude the existence of nearby points ${x_1, x_2}$ with ${f(x_1) = f(x_2) = y}$; note that in contrast to the contraction mapping theorem that provides both existence and uniqueness of fixed points, the Brouwer fixed point theorem only gives existence and not uniqueness.

In one dimension ${n=1}$ one can proceed by using Rolle’s theorem. Indeed, as one traverses the interval from ${x_1}$ to ${x_2}$, one must encounter some intermediate point ${x_*}$ which maximises the quantity ${|f(x_*)-y|}$, and which is thus instantaneously non-increasing both to the left and to the right of ${x_*}$. But, by hypothesis, ${f'(x_*)}$ is non-zero, and this easily leads to a contradiction.

Saint Raymond’s argument for the higher dimensional case proceeds in a broadly similar way. Starting with two nearby points ${x_1, x_2}$ with ${f(x_1)=f(x_2)=y}$, one finds a point ${x_*}$ which “locally extremises” ${\|f(x_*)-y\|}$ in the following sense: ${\|f(x_*)-y\|}$ is equal to some ${r_*>0}$, but ${x_*}$ is adherent to at least two distinct connected components ${U_1, U_2}$ of the set ${U = \{ x: \|f(x)-y\| < r_* \}}$. (This is an oversimplification, as one has to restrict the available points ${x}$ in ${U}$ to a suitably small compact set, but let us ignore this technicality for now.) Note from the non-degenerate nature of ${Df(x_*)}$ that ${x_*}$ was already adherent to ${U}$; the point is that ${x_*}$ “disconnects” ${U}$ in some sense. Very roughly speaking, the way such a critical point ${x_*}$ is found is to look at the sets ${\{ x: \|f(x)-y\| \leq r \}}$ as ${r}$ shrinks from a large initial value down to zero, and one finds the first value of ${r_*}$ below which this set disconnects ${x_1}$ from ${x_2}$. (Morally, one is performing some sort of Morse theory here on the function ${x \mapsto \|f(x)-y\|}$, though this function does not have anywhere near enough regularity for classical Morse theory to apply.)

The point ${x_*}$ is mapped to a point ${f(x_*)}$ on the boundary ${\partial B(y,r_*)}$ of the ball ${B(y,r_*)}$, while the components ${U_1, U_2}$ are mapped to the interior of this ball. By using a continuity argument, one can show (again very roughly speaking) that ${f(U_1)}$ must contain a “hemispherical” neighbourhood ${\{ z \in B(y,r_*): \|z-f(x_*)\| < \kappa \}}$ of ${f(x_*)}$ inside ${B(y,r_*)}$, and similarly for ${f(U_2)}$. But then from differentiability of ${f}$ at ${x_*}$, one can then show that ${U_1}$ and ${U_2}$ overlap near ${x_*}$, giving a contradiction.

The rigorous details of the proof are provided below the fold.

This is another installment of my my series of posts on Hilbert’s fifth problem. One formulation of this problem is answered by the following theorem of Gleason and Montgomery-Zippin:

Theorem 1 (Hilbert’s fifth problem) Let ${G}$ be a topological group which is locally Euclidean. Then ${G}$ is isomorphic to a Lie group.

Theorem 1 is deep and difficult result, but the discussion in the previous posts has reduced the proof of this Theorem to that of establishing two simpler results, involving the concepts of a no small subgroups (NSS) subgroup, and that of a Gleason metric. We briefly recall the relevant definitions:

Definition 2 (NSS) A topological group ${G}$ is said to have no small subgroups, or is NSS for short, if there is an open neighbourhood ${U}$ of the identity in ${G}$ that contains no subgroups of ${G}$ other than the trivial subgroup ${\{ \hbox{id}\}}$.

Definition 3 (Gleason metric) Let ${G}$ be a topological group. A Gleason metric on ${G}$ is a left-invariant metric ${d: G \times G \rightarrow {\bf R}^+}$ which generates the topology on ${G}$ and obeys the following properties for some constant ${C>0}$, writing ${\|g\|}$ for ${d(g,\hbox{id})}$:

The remaining steps in the resolution of Hilbert’s fifth problem are then as follows:

Theorem 4 (Reduction to the NSS case) Let ${G}$ be a locally compact group, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is NSS and locally compact.

Theorem 5 (Gleason’s lemma) Let ${G}$ be a locally compact NSS group. Then ${G}$ has a Gleason metric.

The purpose of this post is to establish these two results, using arguments that are originally due to Gleason. We will split this task into several subtasks, each of which improves the structure on the group ${G}$ by some amount:

Proposition 6 (From locally compact to metrisable) Let ${G}$ be a locally compact group, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is locally compact and metrisable.

For any open neighbourhood ${U}$ of the identity in ${G}$, let ${Q(U)}$ be the union of all the subgroups of ${G}$ that are contained in ${U}$. (Thus, for instance, ${G}$ is NSS if and only if ${Q(U)}$ is trivial for all sufficiently small ${U}$.)

Proposition 7 (From metrisable to subgroup trapping) Let ${G}$ be a locally compact metrisable group. Then ${G}$ has the subgroup trapping property: for every open neighbourhood ${U}$ of the identity, there exists another open neighbourhood ${V}$ of the identity such that ${Q(V)}$ generates a subgroup ${\langle Q(V) \rangle}$ contained in ${U}$.

Proposition 8 (From subgroup trapping to NSS) Let ${G}$ be a locally compact group with the subgroup trapping property, and let ${U}$ be an open neighbourhood of the identity in ${G}$. Then there exists an open subgroup ${G'}$ of ${G}$, and a compact subgroup ${N}$ of ${G'}$ contained in ${U}$, such that ${G'/N}$ is locally compact and NSS.

Proposition 9 (From NSS to the escape property) Let ${G}$ be a locally compact NSS group. Then there exists a left-invariant metric ${d}$ on ${G}$ generating the topology on ${G}$ which obeys the escape property (1) for some constant ${C}$.

Proposition 10 (From escape to the commutator estimate) Let ${G}$ be a locally compact group with a left-invariant metric ${d}$ that obeys the escape property (1). Then ${d}$ also obeys the commutator property (2).

It is clear that Propositions 6, 7, and 8 combine to give Theorem 4, and Propositions 9, 10 combine to give Theorem 5.

Propositions 6-10 are all proven separately, but their proofs share some common strategies and ideas. The first main idea is to construct metrics on a locally compact group ${G}$ by starting with a suitable “bump function” ${\phi \in C_c(G)}$ (i.e. a continuous, compactly supported function from ${G}$ to ${{\bf R}}$) and pulling back the metric structure on ${C_c(G)}$ by using the translation action ${\tau_g \phi(x) := \phi(g^{-1} x)}$, thus creating a (semi-)metric

$\displaystyle d_\phi( g, h ) := \| \tau_g \phi - \tau_h \phi \|_{C_c(G)} := \sup_{x \in G} |\phi(g^{-1} x) - \phi(h^{-1} x)|. \ \ \ \ \ (3)$

One easily verifies that this is indeed a (semi-)metric (in that it is non-negative, symmetric, and obeys the triangle inequality); it is also left-invariant, and so we have ${d_\phi(g,h) = \|g^{-1} h \|_\phi = \| h^{-1} g \|_\phi}$, where

$\displaystyle \| g \|_\phi = d_\phi(g,\hbox{id}) = \| \partial_g \phi \|_{C_c(G)}$

where ${\partial_g}$ is the difference operator ${\partial_g = 1 - \tau_g}$,

$\displaystyle \partial_g \phi(x) = \phi(x) - \phi(g^{-1} x).$

This construction was already seen in the proof of the Birkhoff-Kakutani theorem, which is the main tool used to establish Proposition 6. For the other propositions, the idea is to choose a bump function ${\phi}$ that is “smooth” enough that it creates a metric with good properties such as the commutator estimate (2). Roughly speaking, to get a bound of the form (2), one needs ${\phi}$ to have “${C^{1,1}}$ regularity” with respect to the “right” smooth structure on ${G}$ By ${C^{1,1}}$ regularity, we mean here something like a bound of the form

$\displaystyle \| \partial_g \partial_h \phi \|_{C_c(G)} \ll \|g\|_\phi \|h\|_\phi \ \ \ \ \ (4)$

for all ${g,h \in G}$. Here we use the usual asymptotic notation, writing ${X \ll Y}$ or ${X=O(Y)}$ if ${X \leq CY}$ for some constant ${C}$ (which can vary from line to line).

The following lemma illustrates how ${C^{1,1}}$ regularity can be used to build Gleason metrics.

Lemma 11 Suppose that ${\phi \in C_c(G)}$ obeys (4). Then the (semi-)metric ${d_\phi}$ (and associated (semi-)norm ${\|\|_\phi}$) obey the escape property (1) and the commutator property (2).

Proof: We begin with the commutator property (2). Observe the identity

$\displaystyle \tau_{[g,h]} = \tau_{hg}^{-1} \tau_{gh}$

whence

$\displaystyle \partial_{[g,h]} = \tau_{hg}^{-1} ( \tau_{hg} - \tau_{gh} )$

$\displaystyle = \tau_{hg}^{-1} ( \partial_h \partial_g - \partial_g \partial_h ).$

From the triangle inequality (and translation-invariance of the ${C_c(G)}$ norm) we thus see that (2) follows from (4). Similarly, to obtain the escape property (1), observe the telescoping identity

$\displaystyle \partial_{g^n} = n \partial_g + \sum_{i=0}^{n-1} \partial_g \partial_{g^i}$

for any ${g \in G}$ and natural number ${n}$, and thus by the triangle inequality

$\displaystyle \| g^n \|_\phi = n \| g \|_\phi + O( \sum_{i=0}^{n-1} \| \partial_g \partial_{g^i} \phi \|_{C_c(G)} ). \ \ \ \ \ (5)$

But from (4) (and the triangle inequality) we have

$\displaystyle \| \partial_g \partial_{g^i} \phi \|_{C_c(G)} \ll \|g\|_\phi \|g^i \|_\phi \ll i \|g\|_\phi^2$

and thus we have the “Taylor expansion”

$\displaystyle \|g^n\|_\phi = n \|g\|_\phi + O( n^2 \|g\|_\phi^2 )$

which gives (1). $\Box$

It remains to obtain ${\phi}$ that have the desired ${C^{1,1}}$ regularity property. In order to get such regular bump functions, we will use the trick of convolving together two lower regularity bump functions (such as two functions with “${C^{0,1}}$ regularity” in some sense to be determined later). In order to perform this convolution, we will use the fundamental tool of (left-invariant) Haar measure ${\mu}$ on the locally compact group ${G}$. Here we exploit the basic fact that the convolution

$\displaystyle f_1 * f_2(x) := \int_G f_1(y) f_2(y^{-1} x)\ d\mu(y) \ \ \ \ \ (6)$

of two functions ${f_1,f_2 \in C_c(G)}$ tends to be smoother than either of the two factors ${f_1,f_2}$. This is easiest to see in the abelian case, since in this case we can distribute derivatives according to the law

$\displaystyle \partial_g (f_1 * f_2) = (\partial_g f_1) * f_2 = f_1 * (\partial_g f_2),$

which suggests that the order of “differentiability” of ${f_1*f_2}$ should be the sum of the orders of ${f_1}$ and ${f_2}$ separately.

These ideas are already sufficient to establish Proposition 10 directly, and also Proposition 9 when comined with an additional bootstrap argument. The proofs of Proposition 7 and Proposition 8 use similar techniques, but is more difficult due to the potential presence of small subgroups, which require an application of the Peter-Weyl theorem to properly control. Both of these theorems will be proven below the fold, thus (when combined with the preceding posts) completing the proof of Theorem 1.

The presentation here is based on some unpublished notes of van den Dries and Goldbring on Hilbert’s fifth problem. I am indebted to Emmanuel Breuillard, Ben Green, and Tom Sanders for many discussions related to these arguments.