You are currently browsing the monthly archive for May 2011.

This is yet another post in a series on basic ingredients in the structural theory of locally compact groups, which is closely related to Hilbert’s fifth problem.

In order to understand the structure of a topological group ${G}$, a basic strategy is to try to split ${G}$ into two smaller factor groups ${H, K}$ by exhibiting a short exact sequence

$\displaystyle 0 \rightarrow K \rightarrow G \rightarrow H \rightarrow 0.$

If one has such a sequence, then ${G}$ is an extension of ${H}$ by ${K}$ (which includes direct products ${H \times K}$ and semidirect products ${H \ltimes K}$ as examples, but can be more general than these situations, as discussed in this previous blog post). In principle, the problem of understanding the structure of ${G}$ then splits into three simpler problems:

1. (Horizontal structure) Understanding the structure of the “horizontal” group ${H}$.
2. (Vertical structure) Understanding the structure of the “vertical” group ${K}$.
3. (Cohomology) Understanding the ways in which one can extend ${H}$ by ${K}$.

The “cohomological” aspect to this program can be nontrivial. However, in principle at least, this strategy reduces the study of the large group ${G}$ to the study of the smaller groups ${H, K}$. (This type of splitting strategy is not restricted to topological groups, but can also be adapted to many other categories, particularly those of groups or group-like objects.) Typically, splitting alone does not fully kill off a structural classification problem, but it can reduce matters to studying those objects which are somehow “simple” or “irreducible”. For instance, this strategy can often be used to reduce questions about arbitrary finite groups to finite simple groups.

A simple example of splitting is as follows. Given any topological group ${G}$, one can form the connected component ${G^\circ}$ of the identity – the maximal connected set containing the identity. It is not difficult to show that ${G^\circ}$ is a closed (and thus also locally compact) normal subgroup of ${G}$, whose quotient ${G/G^\circ}$ is another locally compact group. Furthermore, due to the maximal connected nature of ${G^\circ}$, ${G/G^\circ}$ is totally disconnected – the only connected sets are the singletons. In particular, ${G/G^\circ}$ is Hausdorff (the identity element is closed). Thus we have obtained a splitting

$\displaystyle 0 \rightarrow G^\circ \rightarrow G \rightarrow G/G^\circ \rightarrow 0$

of an arbitrary locally compact group into a connected locally compact group ${G^\circ}$, and a totally disconnected locally compact group ${G/G^\circ}$. In principle at least, the study of locally compact groups thus splits into the study of connected locally compact groups, and the study of totally disconnected locally compact groups (though the cohomological issues are not always trivial).

In the structural theory of totally disconnected locally compact groups, the first basic theorem in the subject is van Dantzig’s theorem (which we prove below the fold):

Theorem 1 (Van Danztig’s theorem) Every totally disconnected locally compact group ${G}$ contains a compact open subgroup ${H}$ (which will of course still be totally disconnected).

Example 1 Let ${p}$ be a prime. Then the ${p}$-adic field ${{\bf Q}_p}$ (with the usual ${p}$-adic valuation) is totally disconnected locally compact, and the ${p}$-adic integers ${{\bf Z}_p}$ are a compact open subgroup.

Of course, this situation is the polar opposite of what occurs in the connected case, in which the only open subgroup is the whole group.

In view of van Dantzig’s theorem, we see that the “local” behaviour of totally disconnected locally compact groups can be modeled by the compact totally disconnected groups, which are better understood (for instance, one can start analysing them using the Peter-Weyl theorem, as discussed in this previous post). The global behaviour however remains more complicated, in part because the compact open subgroup given by van Dantzig’s theorem need not be normal, and so does not necessarily induce a splitting of ${G}$ into compact and discrete factors.

Example 2 Let ${p}$ be a prime, and let ${G}$ be the semi-direct product ${{\bf Z} \ltimes {\bf Q}_p}$, where the integers ${{\bf Z}}$ act on ${{\bf Q}_p}$ by the map ${m: x \mapsto p^m x}$, and we give ${G}$ the product of the discrete topology of ${{\bf Z}}$ and the ${p}$-adic topology on ${{\bf Q}_p}$. One easily verifies that ${G}$ is a totally disconnected locally compact group. It certainly has compact open subgroups, such as ${\{0\} \times {\bf Z}_p}$. However, it is easy to show that ${G}$ has no non-trivial compact normal subgroups (the problem is that the conjugation action of ${{\bf Z}}$ on ${{\bf Q}_p}$ has all non-trivial orbits unbounded).

Returning to more general locally compact groups, we obtain an immediate corollary:

Corollary 2 Every locally compact group ${G}$ contains an open subgroup ${H}$ which is “compact-by-connected” in the sense that ${H/H^\circ}$ is compact.

Indeed, one applies van Dantzig’s theorem to the totally disconnected group ${G/G^\circ}$, and then pulls back the resulting compact open subgroup.

Now we mention another application of van Dantzig’s theorem, of more direct relevance to Hilbert’s fifth problem. Define a generalised Lie group to be a topological group ${G}$ with the property that given any open neighbourhood ${U}$ of the identity, there exists an open subgroup ${G'}$ of ${G}$ and a compact normal subgroup ${N}$ of ${G'}$ in ${U}$ such that ${G'/N}$ is isomorphic to a Lie group. It is easy to see that such groups are locally compact. The deep Gleason-Yamabe theorem, which among other things establishes a satisfactory solution to Hilbert’s fifth problem (and which we will not prove here), asserts the converse:

Theorem 3 (Gleason-Yamabe theorem) Every locally compact group is a generalised Lie group.

Example 3 We consider the locally compact group ${G = {\bf Z} \ltimes {\bf Q}_p}$ from Example 2. This is of course not a Lie group. However, any open neighbourhood ${U}$ of the identity in ${G}$ will contain the compact subgroup ${N := \{0\} \times p^j {\bf Z}_p}$ for some integer ${j}$. The open subgroup ${G' := \{0\} \times {\bf Z}_p}$ then has ${G'/N}$ isomorphic to the discrete finite group ${{\bf Z}/p^j{\bf Z}}$, which is certainly a Lie group. Thus ${G}$ is a generalised Lie group.

One important example of generalised Lie groups are those locally compact groups which are an inverse limit (or projective limit) of Lie groups. Indeed, suppose we have a family ${(G_i)_{i\in I}}$ of Lie groups ${G_i}$ indexed by partially ordered set ${I}$ which is directed in the sense that every finite subset of ${I}$ has an upper bound, together with continuous homomorphisms ${\pi_{i \rightarrow j}: G_i \rightarrow G_j}$ for all ${i > j}$ which form a category in the sense that ${\pi_{j \rightarrow k} \circ \pi_{i \rightarrow j} = \pi_{i \rightarrow k}}$ for all ${i>j>k}$. Then we can form the inverse limit

$\displaystyle G := \lim_{\stackrel{\leftarrow}{i \in I}} G_i,$

which is the subgroup of ${\prod_{i \in I} G_i}$ consisting of all tuples ${(g_i)_{i \in I} \in \prod_{i \in I} G_i}$ which are compatible with the ${\pi_{i \rightarrow j}}$ in the sense that ${\pi_{i \rightarrow j}(g_i) = g_j}$ for all ${i>j}$. If we endow ${\prod_{i \in I} G_i}$ with the product topology, then ${G}$ is a closed subgroup of ${\prod_{i \in I} G_i}$, and thus has the structure of a topological group, with continuous homomorphisms ${\pi_i: G \rightarrow G_i}$ which are compatible with the ${\pi_{i \rightarrow j}}$ in the sense that ${\pi_{i \rightarrow j} \circ \pi_i = \pi_j}$ for all ${i>j}$. Such an inverse limit need not be locally compact; for instance, the inverse limit

$\displaystyle \lim_{\stackrel{\leftarrow}{n \in {\bf N}}} {\bf R}^n$

of Euclidean spaces with the usual coordinate projection maps is isomorphic to the infinite product space ${{\bf R}^{\bf N}}$ with the product topology, which is not locally compact. However, if an inverse limit

$\displaystyle G = \lim_{\stackrel{\leftarrow}{i \in I}} G_i$

of Lie groups is locally compact, it can be easily seen to be a generalised Lie group. Indeed, by local compactness, any open neighbourhood ${G}$ of the identity will contain an open precompact neighbourhood of the identity; by construction of the product topology (and the directed nature of ${I}$), this smaller neighbourhood will in turn will contain the kernel of one of the ${\pi_i}$, which will be compact since the preceding neighbourhood was precompact. Quotienting out by this ${\pi_i}$ we obtain a locally compact subgroup of the Lie group ${G_i}$, which is necessarily again a Lie group by Cartan’s theorem, and the claim follows.

In the converse direction, it is possible to use Corollary 2 to obtain the following observation of Gleason:

Theorem 4 Every Hausdorff generalised Lie group contains an open subgroup that is an inverse limit of Lie groups.

We show Theorem 4 below the fold. Combining this with the (substantially more difficult) Gleason-Yamabe theorem, we obtain quite a satisfactory description of the local structure of locally compact groups. (The situation is particularly simple for connected groups, which have no non-trivial open subgroups; we then conclude that every connected locally compact Hausdorff group is the inverse limit of Lie groups.)

Example 4 The locally compact group ${G := {\bf Z} \ltimes {\bf Q}_p}$ is not an inverse limit of Lie groups because (as noted earlier) it has no non-trivial compact normal subgroups, which would contradict the preceding analysis that showed that all locally compact inverse limits of Lie groups were generalised Lie groups. On the other hand, ${G}$ contains the open subgroup ${\{0\} \times {\bf Q}_p}$, which is the inverse limit of the discrete (and thus Lie) groups ${\{0\} \times {\bf Q}_p/p^j {\bf Z}_p}$ for ${j \in {\bf Z}}$ (where we give ${{\bf Z}}$ the usual ordering, and use the obvious projection maps).

This is another post in a series on various components to the solution of Hilbert’s fifth problem. One interpretation of this problem is to ask for a purely topological classification of the topological groups which are isomorphic to Lie groups. (Here we require Lie groups to be finite-dimensional, but allow them to be disconnected.)

There are some obvious necessary conditions on a topological group in order for it to be isomorphic to a Lie group; for instance, it must be Hausdorff and locally compact. These two conditions, by themselves, are not quite enough to force a Lie group structure; consider for instance a ${p}$-adic field ${{\mathbf Q}_p}$ for some prime ${p}$, which is a locally compact Hausdorff topological group which is not a Lie group (the topology is locally that of a Cantor set). Nevertheless, it turns out that by adding some key additional assumptions on the topological group, one can recover Lie structure. One such result, which is a key component of the full solution to Hilbert’s fifth problem, is the following result of von Neumann:

Theorem 1 Let ${G}$ be a locally compact Hausdorff topological group that has a faithful finite-dimensional linear representation, i.e. an injective continuous homomorphism ${\rho: G \rightarrow GL_d({\bf C})}$ into some linear group. Then ${G}$ can be given the structure of a Lie group. Furthermore, after giving ${G}$ this Lie structure, ${\rho}$ becomes smooth (and even analytic) and non-degenerate (the Jacobian always has full rank).

This result is closely related to a theorem of Cartan:

Theorem 2 (Cartan’s theorem) Any closed subgroup ${H}$ of a Lie group ${G}$, is again a Lie group (in particular, ${H}$ is an analytic submanifold of ${G}$, with the induced analytic structure).

Indeed, Theorem 1 immediately implies Theorem 2 in the important special case when the ambient Lie group is a linear group, and in any event it is not difficult to modify the proof of Theorem 1 to give a proof of Theorem 2. However, Theorem 1 is more general than Theorem 2 in some ways. For instance, let ${G}$ be the real line ${{\bf R}}$, which we faithfully represent in the ${2}$-torus ${({\bf R}/{\bf Z})^2}$ using an irrational embedding ${t \mapsto (t,\alpha t) \hbox{ mod } {\bf Z}^2}$ for some fixed irrational ${\alpha}$. The ${2}$-torus can in turn be embedded in a linear group (e.g. by identifying it with ${U(1) \times U(1)}$, or ${SO(2) \times SO(2)}$), thus giving a faithful linear representation ${\rho}$ of ${{\bf R}}$. However, the image is not closed (it is a dense subgroup of a ${2}$-torus), and so Cartan’s theorem does not directly apply (${\rho({\bf R})}$ fails to be a Lie group). Nevertheless, Theorem 1 still applies and guarantees that the original group ${{\bf R}}$ is a Lie group.

(On the other hand, the image of any compact subset of ${G}$ under a faithful representation ${\rho}$ must be closed, and so Theorem 1 is very close to the version of Theorem 2 for local groups.)

The key to building the Lie group structure on a topological group is to first build the associated Lie algebra structure, by means of one-parameter subgroups.

Definition 3 A one-parameter subgroup of a topological group ${G}$ is a continuous homomorphism ${\phi: {\bf R} \rightarrow G}$ from the real line (with the additive group structure) to ${G}$.

Remark 1 Technically, ${\phi}$ is a parameterisation of a subgroup ${\phi({\bf R})}$, rather than a subgroup itself, but we will abuse notation and refer to ${\phi}$ as the subgroup.

In a Lie group ${G}$, the one-parameter subgroups are in one-to-one correspondence with the Lie algebra ${{\mathfrak g}}$, with each element ${X \in {\mathfrak g}}$ giving rise to a one-parameter subgroup ${\phi(t) := \exp(tX)}$, and conversely each one-parameter subgroup ${\phi}$ giving rise to an element ${\phi'(0)}$ of the Lie algebra; we will establish these basic facts in the special case of linear groups below the fold. On the other hand, the notion of a one-parameter subgroup can be defined in an arbitrary topological group. So this suggests the following strategy if one is to try to represent a topological group ${G}$ as a Lie group:

1. First, form the space ${L(G)}$ of one-parameter subgroups of ${G}$.
2. Show that ${L(G)}$ has the structure of a (finite-dimensional) Lie algebra.
3. Show that ${L(G)}$ “behaves like” the tangent space of ${G}$ at the identity (in particular, the one-parameter subgroups in ${L(G)}$ should cover a neighbourhood of the identity in ${G}$).
4. Conclude that ${G}$ has the structure of a Lie group.

It turns out that this strategy indeed works to give Theorem 1 (and variants of this strategy are ubiquitious in the rest of the theory surrounding Hilbert’s fifth problem).

Below the fold, I record the proof of Theorem 1 (based on the exposition of Montgomery and Zippin). I plan to organise these disparate posts surrounding Hilbert’s fifth problem (and its application to related topics, such as Gromov’s theorem or to the classification of approximate groups) at a later date.

A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm ${\|T\|_{op}}$ of a linear map ${T: H \rightarrow H'}$ between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case ${T: {\bf C}^m \rightarrow {\bf C}^n}$ is of interest, as this operator norm is the same as the largest singular value ${\sigma_1(A)}$ of the ${n \times m}$ matrix ${A}$ associated to ${T}$.

In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an ${n \times n}$ diagonal matrix ${D = \hbox{diag}(\lambda_1,\ldots,\lambda_n)}$. In this case, the operator norm is simply the supremum norm of the diagonal coefficients:

$\displaystyle \|D\|_{op} = \sup_{1 \leq i \leq n} |\lambda_i|. \ \ \ \ \ (1)$

A variant of (1) is Schur’s test, which for simplicity we will phrase in the setting of finite-dimensional operators ${T: {\bf C}^m \rightarrow {\bf C}^n}$ given by a matrix ${A = (a_{ij})_{1 \leq i \leq n; 1 \leq j \leq m}}$ via the usual formula

$\displaystyle T (x_j)_{j=1}^m := ( \sum_{j=1}^m a_{ij} x_j )_{i=1}^n.$

A simple version of this test is as follows: if all the absolute row sums and columns sums of ${A}$ are bounded by some constant ${M}$, thus

$\displaystyle \sum_{j=1}^m |a_{ij}| \leq M \ \ \ \ \ (2)$

for all ${1 \leq i \leq n}$ and

$\displaystyle \sum_{i=1}^n |a_{ij}| \leq M \ \ \ \ \ (3)$

for all ${1 \leq j \leq m}$, then

$\displaystyle \|T\|_{op} = \|A\|_{op} \leq M \ \ \ \ \ (4)$

(note that this generalises (the upper bound in) (1).) Indeed, to see (4), it suffices by duality and homogeneity to show that

$\displaystyle |\sum_{i=1}^n (\sum_{j=1}^m a_{ij} x_j) y_i| \leq M$

whenever ${(x_j)_{j=1}^m}$ and ${(y_i)_{i=1}^n}$ are sequences with ${\sum_{j=1}^m |x_j|^2 = \sum_{i=1}^n |y_i|^2 = 1}$; but this easily follows from the arithmetic mean-geometric mean inequality

$\displaystyle |a_{ij} x_j) y_i| \leq \frac{1}{2} |a_{ij}| |x_i|^2 + \frac{1}{2} |a_{ij}| |y_j|^2$

and (2), (3).

Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients ${a_{ij}}$, as opposed to just their magnitudes ${|a_{ij}|}$) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients ${\lambda_i}$ or ${a_{ij}}$ are replaced by operators instead.

To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix

$\displaystyle A = \begin{pmatrix} \Lambda_1 & 0 & \ldots & 0 \\ 0 & \Lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \Lambda_n \end{pmatrix} \ \ \ \ \ (5)$

where each ${\Lambda_i}$ is now a ${m_i \times m_i}$ matrix, and so ${A}$ is an ${m \times m}$ matrix with ${m := m_1 + \ldots +m_n}$. Then we have

$\displaystyle \|A\|_{op} = \sup_{1 \leq i \leq n} \|\Lambda_i\|_{op}. \ \ \ \ \ (6)$

Indeed, the lower bound is trivial (as can be seen by testing ${A}$ on vectors which are supported on the ${i^{th}}$ block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition

$\displaystyle {\bf C}^m \equiv \bigoplus_{i=1}^m {\bf C}^{m_i} \ \ \ \ \ (7)$

to decompose an arbitrary vector ${x \in {\bf C}^m}$ as

$\displaystyle x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}$

with ${x_i \in {\bf C}^{m_i}}$, in which case we have

$\displaystyle Ax = \begin{pmatrix} \Lambda_1 x_1 \\ \Lambda_2 x_2 \\ \vdots \\ \Lambda_n x_n \end{pmatrix}$

and the upper bound in (6) then follows from a simple computation.

The operator ${T}$ associated to the matrix ${A}$ in (5) can be viewed as a sum ${T = \sum_{i=1}^n T_i}$, where each ${T_i}$ corresponds to the ${\Lambda_i}$ block of ${A}$, in which case (6) can also be written as

$\displaystyle \|T\|_{op} = \sup_{1 \leq i \leq n} \|T_i\|_{op}. \ \ \ \ \ (8)$

When ${n}$ is large, this is a significant improvement over the triangle inequality, which merely gives

$\displaystyle \|T\|_{op} \leq \sum_{1 \leq i \leq n} \|T_i\|_{op}.$

The reason for this gain can ultimately be traced back to the “orthogonality” of the ${T_i}$; that they “occupy different columns” and “different rows” of the range and domain of ${T}$. This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities

$\displaystyle T_i^* T_j = 0 \ \ \ \ \ (9)$

and

$\displaystyle T_i T^* j = 0 \ \ \ \ \ (10)$

whenever ${i \neq j}$. (The first identity asserts that the ranges of the ${T_i}$ are orthogonal to each other, and the second asserts that the coranges of the ${T_i}$ (the ranges of the adjoints ${T_i^*}$) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).

The Cotlar-Stein lemma is an extension of this observation to the case where the ${T_i}$ are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have

Lemma 1 (Cotlar-Stein lemma) Let ${T_1,\ldots,T_n: H \rightarrow H'}$ be a finite sequence of bounded linear operators from one Hilbert space ${H}$ to another ${H'}$, obeying the bounds

$\displaystyle \sum_{j=1}^n \| T_i T_j^* \|_{op}^{1/2} \leq M \ \ \ \ \ (11)$

and

$\displaystyle \sum_{j=1}^n \| T_i^* T_j \|_{op}^{1/2} \leq M \ \ \ \ \ (12)$

for all ${i=1,\ldots,n}$ and some ${M > 0}$ (compare with (2), (3)). Then one has

$\displaystyle \| \sum_{i=1}^n T_i \|_{op} \leq M. \ \ \ \ \ (13)$

Note from the basic ${TT^*}$ identity

$\displaystyle \|T\|_{op} = \|TT^* \|_{op}^{1/2} = \|T^* T\|_{op}^{1/2} \ \ \ \ \ (14)$

that the hypothesis (11) (or (12)) already gives the bound

$\displaystyle \|T_i\|_{op} \leq M \ \ \ \ \ (15)$

on each component ${T_i}$ of ${T}$, which by the triangle inequality gives the inferior bound

$\displaystyle \| \sum_{i=1}^n T_i \|_{op} \leq nM;$

the point of the Cotlar-Stein lemma is that the dependence on ${n}$ in this bound is eliminated in (13), which in particular makes the bound suitable for extension to the limit ${n \rightarrow \infty}$ (see Remark 1 below).

The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.

The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators ${T}$ which “do not mix scales together too much”, in that operators ${T}$ map functions “that oscillate at a given scale ${2^{-i}}$” to functions that still mostly oscillate at the same scale ${2^{-i}}$. In that case, one can often split ${T}$ into components ${T_i}$ which essentically capture the scale ${2^{-i}}$ behaviour, and understanding ${L^2}$ boundedness properties of ${T}$ then reduces to establishing the boundedness of the simpler operators ${T_i}$ (and of establishing a sufficient decay in products such as ${T_i^* T_j}$ or ${T_i T_j^*}$ when ${i}$ and ${j}$ are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the ${T_i}$, but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition ${T = \sum_i T_i}$ in a fashion that is tailored to the particular operator ${T}$, and is not necessarily dictated by Fourier-analytic considerations.

Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of ${T}$ is understood through the operator norm of a large power of ${T}$ (or more precisely, of its self-adjoint square ${TT^*}$ or ${T^* T}$). Indeed, from an iteration of (14) we see that for any natural number ${N}$, one has

$\displaystyle \|T\|_{op}^{2N} = \| (TT^*)^N \|_{op}. \ \ \ \ \ (16)$

To estimate the right-hand side, we expand out the right-hand side and apply the triangle inequality to bound it by

$\displaystyle \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^* \|_{op}. \ \ \ \ \ (17)$

Recall that when we applied the triangle inequality directly to ${T}$, we lost a factor of ${n}$ in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.

To bound (17), we use the basic inequality ${\|ST\|_{op} \leq \|S\|_{op} \|T\|_{op}}$ in two different ways. If we group the product ${T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^*}$ in pairs, we can bound the summand of (17) by

$\displaystyle \| T_{i_1} T_{j_1}^* \|_{op} \ldots \| T_{i_N} T_{j_N}^* \|_{op}.$

On the other hand, we can group the product by pairs in another way, to obtain the bound of

$\displaystyle \| T_{i_1} \|_{op} \| T_{j_1}^* T_{i_2} \|_{op} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op} \| T_{j_N}^* \|_{op}.$

We bound ${\| T_{i_1} \|_{op}}$ and ${\| T_{j_N}^* \|_{op}}$ crudely by ${M}$ using (15). Taking the geometric mean of the above bounds, we can thus bound (17) by

$\displaystyle M \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* \|_{op}^{1/2} \| T_{j_1}^* T_{i_2} \|_{op}^{1/2} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op}^{1/2} \| T_{i_N} T_{j_N}^* \|_{op}^{1/2}.$

If we then sum this series first in ${j_N}$, then in ${i_N}$, then moving back all the way to ${i_1}$, using (11) and (12) alternately, we obtain a final bound of

$\displaystyle n M^{2N}$

for (16). Taking ${N^{th}}$ roots, we obtain

$\displaystyle \|T\|_{op} \leq n^{1/2N} M.$

Sending ${N \rightarrow \infty}$, we obtain the claim.

Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums ${\sum_{i=1}^\infty T_i}$ (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any ${f \in H}$, the sum ${\sum_{i=1}^\infty T_i f}$ is unconditionally convergent in ${H'}$ (and furthermore has bounded ${2}$-variation), and the resulting operator ${\sum_{i=1}^\infty T_i}$ is a bounded linear operator with an operator norm bound on ${M}$.

Remark 2 If we specialise to the case where all the ${T_i}$ are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.

Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality

$\displaystyle \|A\|_{op}^{2N} \leq \hbox{tr}( (AA^*)^N )$

(which follows easily from the singular value decomposition), we can bound ${\|A\|_{op}^{2N}}$ by

$\displaystyle \sum_{i_1,\ldots,j_N \in \{1,\ldots,n\}} a_{i_1,j_1} \overline{a_{j_1,i_2}} \ldots a_{i_N,j_N} \overline{a_{j_N,i_1}}.$

Estimating the other two terms in the summand by ${M}$, and then repeatedly summing the indices one at a time as before, we obtain

$\displaystyle \|A\|_{op}^{2N} \leq n M^{2N}$

and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.

Recall that a (real) topological vector space is a real vector space ${V = (V, 0, +, \cdot)}$ equipped with a topology ${{\mathcal F}}$ that makes the vector space operations ${+: V \times V \rightarrow V}$ and ${\cdot: {\bf R} \times V \rightarrow V}$ continuous. One often restricts attention to Hausdorff topological vector spaces; in practice, this is not a severe restriction because it turns out that any topological vector space can be made Hausdorff by quotienting out the closure ${\overline{\{0\}}}$ of the origin ${\{0\}}$. One can also discuss complex topological vector spaces, and the theory is not significantly different; but for sake of exposition we shall restrict attention here to the real case.

An obvious example of a topological vector space is a finite-dimensional vector space such as ${{\bf R}^n}$ with the usual topology. Of course, there are plenty of infinite-dimensional topological vector spaces also, such as infinite-dimensional normed vector spaces (with the strong, weak, or weak-* topologies) or Frechet spaces.

One way to distinguish the finite and infinite dimensional topological vector spaces is via local compactness. Recall that a topological space is locally compact if every point in that space has a compact neighbourhood. From the Heine-Borel theorem, all finite-dimensional vector spaces (with the usual topology) are locally compact. In infinite dimensions, one can trivially make a vector space locally compact by giving it a trivial topology, but once one restricts to the Hausdorff case, it seems impossible to make a space locally compact. For instance, in an infinite-dimensional normed vector space ${V}$ with the strong topology, an iteration of the Riesz lemma shows that the closed unit ball ${B}$ in that space contains an infinite sequence with no convergent subsequence, which (by the Heine-Borel theorem) implies that ${V}$ cannot be locally compact. If one gives ${V}$ the weak-* topology instead, then ${B}$ is now compact by the Banach-Alaoglu theorem, but is no longer a neighbourhood of the identity in this topology. In fact, we have the following result:

Theorem 1 Every locally compact Hausdorff topological vector space is finite-dimensional.

The first proof of this theorem that I am aware of is by André Weil. There is also a related result:

Theorem 2 Every finite-dimensional Hausdorff topological vector space has the usual topology.

As a corollary, every locally compact Hausdorff topological vector space is in fact isomorphic to ${{\bf R}^n}$ with the usual topology for some ${n}$. This can be viewed as a very special case of the theorem of Gleason, which is a key component of the solution to Hilbert’s fifth problem, that a locally compact group ${G}$ with no small subgroups (in the sense that there is a neighbourhood of the identity that contains no non-trivial subgroups) is necessarily isomorphic to a Lie group. Indeed, Theorem 1 is in fact used in the proof of Gleason’s theorem (the rough idea being to first locate a “tangent space” to ${G}$ at the origin, with the tangent vectors described by “one-parameter subgroups” of ${G}$, and show that this space is a locally compact Hausdorff topological space, and hence finite dimensional by Theorem 1).

Theorem 2 may seem devoid of content, but it does contain some subtleties, as it hinges crucially on the joint continuity of the vector space operations ${+: V \times V \rightarrow V}$ and ${\cdot: {\bf R} \times V \rightarrow V}$, and not just on the separate continuity in each coordinate. Consider for instance the one-dimensional vector space ${{\bf R}}$ with the co-compact topology (a non-empty set is open iff its complement is compact in the usual topology). In this topology, the space is ${T_1}$ (though not Hausdorff), the scalar multiplication map ${\cdot: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is jointly continuous as long as the scalar is not zero, and the addition map ${+: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is continuous in each coordinate (i.e. translations are continuous), but not jointly continuous; for instance, the set ${\{ (x,y) \in {\bf R}: x+y \not \in [0,1]\}}$ does not contain a non-trivial Cartesian product of two sets that are open in the co-compact topology. So this is not a counterexample to Theorem 2. Similarly for the cocountable or cofinite topologies on ${{\bf R}}$ (the latter topology, incidentally, is the same as the Zariski topology on ${{\bf R}}$).

Another near-counterexample comes from the topology of ${{\bf R}}$ inherited by pulling back the usual topology on the unit circle ${{\bf R}/{\bf Z}}$. Admittedly, this pullback topology is not quite Hausdorff, but the addition map ${+: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is jointly continuous. On the other hand, the scalar multiplication map ${\cdot: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is not continuous at all. A slight variant of this topology comes from pulling back the usual topology on the torus ${({\bf R}/{\bf Z})^2}$ under the map ${x \mapsto (x,\alpha x)}$ for some irrational ${\alpha}$; this restores the Hausdorff property, and addition is still jointly continuous, but multiplication remains discontinuous.

As some final examples, consider ${{\bf R}}$ with the discrete topology; here, the topology is Hausdorff, addition is jointly continuous, and every dilation is continuous, but multiplication is not jointly continuous. If one instead gives ${{\bf R}}$ the half-open topology, then again the topology is Hausdorff and addition is jointly continuous, but scalar multiplication is only jointly continuous once one restricts the scalar to be non-negative.

Below the fold, I record the textbook proof of Theorem 2 and Theorem 1. There is nothing particularly original in this presentation, but I wanted to record it here for my own future reference, and perhaps these results will also be of interest to some other readers.

If ${f: {\bf R}^d \rightarrow {\bf C}}$ is a locally integrable function, we define the Hardy-Littlewood maximal function ${Mf: {\bf R}^d \rightarrow {\bf C}}$ by the formula

$\displaystyle Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,$

where ${B(x,r)}$ is the ball of radius ${r}$ centred at ${x}$, and ${|E|}$ denotes the measure of a set ${E}$. The Hardy-Littlewood maximal inequality asserts that

$\displaystyle |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)$

for all ${f\in L^1({\bf R}^d)}$, all ${\lambda > 0}$, and some constant ${C_d > 0}$ depending only on ${d}$. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem

$\displaystyle \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)$

for all ${f \in L^1({\bf R}^d)}$ and almost every ${x \in {\bf R}^d}$. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality ${\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}$) we see that

$\displaystyle \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)$

for all ${p > 1}$ and ${f \in L^p({\bf R}^d)}$, and some constant ${C_{d,p}}$ depending on ${d}$ and ${p}$.

The exact dependence of ${C_{d,p}}$ on ${d}$ and ${p}$ is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form ${C_d = C^d}$ for some absolute constant ${C>1}$. Inserting this into the Marcinkiewicz theorem, one obtains a constant ${C_{d,p}}$ of the form ${C_{d,p} = \frac{C^d}{p-1}}$ for some ${C>1}$ (and taking ${p}$ bounded away from infinity, for simplicity). The dependence on ${p}$ is about right, but the dependence on ${d}$ should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of ${d}$:

Theorem 1 One can take ${C_{d,p} = C_p}$ for each ${p>1}$, where ${C_p}$ depends only on ${p}$.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

$\displaystyle M_S f(x) := \sup_{r>0} A_r |f|(x)$

where ${A_r}$ are the spherical averaging operators

$\displaystyle A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)$

and ${d\sigma^{d-1}}$ is normalised surface measure on the sphere ${S^{d-1}}$. Because this is an uncountable supremum, and the averaging operators ${A_r}$ do not have good continuity properties in ${r}$, it is not a priori obvious that ${M_S f}$ is even a measurable function for, say, locally integrable ${f}$; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions ${f}$. The Stein maximal theorem for the spherical maximal function then asserts that if ${d \geq 3}$ and ${p > \frac{d}{d-1}}$, then we have

$\displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)$

for all (continuous) ${f \in L^p({\bf R}^d)}$. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence ${\lim_{r \rightarrow 0} A_r f(x) = f(x)}$ of the spherical averages for any ${f \in L^p({\bf R}^d)}$ when ${d \geq 3}$ and ${p > \frac{d}{d-1}}$, although we will not focus on this application here.)

The condition ${p > \frac{d}{d-1}}$ can be seen to be necessary as follows. Take ${f}$ to be any fixed bump function. A brief calculation then shows that ${M_S f(x)}$ decays like ${|x|^{1-d}}$ as ${|x| \rightarrow \infty}$, and hence ${M_S f}$ does not lie in ${L^p({\bf R}^d)}$ unless ${p > \frac{d}{d-1}}$. By taking ${f}$ to be a rescaled bump function supported on a small ball, one can show that the condition ${p > \frac{d}{d-1}}$ is necessary even if we replace ${{\bf R}^d}$ with a compact region (and similarly restrict the radius parameter ${r}$ to be bounded). The condition ${d \geq 3}$ however is not quite necessary; the result is also true when ${d=2}$, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator ${Mf}$, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

$\displaystyle Mf(x) \leq M_S f(x)$

for any (continuous) ${f}$, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant ${C_{p,d}}$. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of ${L^p({\bf R}^d)}$ by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when ${d \geq 3}$ and ${p > \frac{d}{d-1}}$; and secondly, the constant ${C_{d,p}}$ in that theorem still depends on dimension ${d}$. The first objection can be easily disposed of, for if ${p>1}$, then the hypotheses ${d \geq 3}$ and ${p > \frac{d}{d-1}}$ will automatically be satisfied for ${d}$ sufficiently large (depending on ${p}$); note that the case when ${d}$ is bounded (with a bound depending on ${p}$) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant ${C_{d,p}}$ in (3) depends on ${d}$. However, here we can use the method of rotations to show that the constants ${C_{p,d}}$ can be taken to be non-increasing (and hence bounded) in ${d}$. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that ${C_{d+1,p} \leq C_{d,p}}$, in the sense that any bound of the form

$\displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)$

for the ${d}$-dimensional spherical maximal function, implies the same bound

$\displaystyle \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)$

for the ${d+1}$-dimensional spherical maximal function, with exactly the same constant ${A}$. For any direction ${\omega_0 \in S^d \subset {\bf R}^{d+1}}$, consider the averaging operators

$\displaystyle M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)$

for any continuous ${f: {\bf R}^{d+1} \rightarrow {\bf C}}$, where

$\displaystyle A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)$

where ${U_{\omega_0}}$ is some orthogonal transformation mapping the sphere ${S^{d-1}}$ to the sphere ${S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}$; the exact choice of orthogonal transformation ${U_{\omega_0}}$ is irrelevant due to the rotation-invariance of surface measure ${d\sigma^{d-1}}$ on the sphere ${S^{d-1}}$. A simple application of Fubini’s theorem (after first rotating ${\omega_0}$ to be, say, the standard unit vector ${e_d}$) using (4) then shows that

$\displaystyle \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)$

uniformly in ${\omega_0}$. On the other hand, by viewing the ${d}$-dimensional sphere ${S^d}$ as an average of the spheres ${S^{d-1,\omega_0}}$, we have the identity

$\displaystyle A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);$

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of ${f}$ on the sphere ${\{ y \in {\bf R}^{d+1}: |y-x|=r\}}$. This implies that

$\displaystyle M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)$

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant ${C_d}$ for the weak ${(1,1)}$ inequality (1) is independent of dimension, as the weak ${L^1}$ quasinorm ${\| \|_{L^{1,\infty}}}$ is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether ${C_d}$ in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take ${C_d = Cd}$ for some absolute constant ${C}$, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function

$\displaystyle \sup_{t > 0} e^{t\Delta} |f|(x).$

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type ${(1,1)}$ with a constant of ${1}$, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls ${B(x,r)}$ with cubes, then the weak ${(1,1)}$ constant ${C_d}$ must go to infinity as ${d \rightarrow \infty}$.

I recently reposted my favourite logic puzzle, namely the blue-eyed islander puzzle. I am fond of this puzzle because in order to properly understand the correct solution (and to properly understand why the alternative solution is incorrect), one has to think very clearly (but unintuitively) about the nature of knowledge.

There is however an additional subtlety to the puzzle that was pointed out in comments, in that the correct solution to the puzzle has two components, a (necessary) upper bound and a (possible) lower bound (I’ll explain this further below the fold, in order to avoid blatantly spoiling the puzzle here). Only the upper bound is correctly explained in the puzzle (and even then, there are some slight inaccuracies, as will be discussed below). The lower bound, however, is substantially more difficult to establish, in part because the bound is merely possible and not necessary. Ultimately, this is because to demonstrate the upper bound, one merely has to show that a certain statement is logically deducible from an islander’s state of knowledge, which can be done by presenting an appropriate chain of logical deductions. But to demonstrate the lower bound, one needs to show that certain statements are not logically deducible from an islander’s state of knowledge, which is much harder, as one has to rule out all possible chains of deductive reasoning from arriving at this particular conclusion. In fact, to rigorously establish such impossiblity statements, one ends up having to leave the “syntactic” side of logic (deductive reasoning), and move instead to the dual “semantic” side of logic (creation of models). As we shall see, semantics requires substantially more mathematical setup than syntax, and the demonstration of the lower bound will therefore be much lengthier than that of the upper bound.

To complicate things further, the particular logic that is used in the blue-eyed islander puzzle is not the same as the logics that are commonly used in mathematics, namely propositional logic and first-order logic. Because the logical reasoning here depends so crucially on the concept of knowledge, one must work instead with an epistemic logic (or more precisely, an epistemic modal logic) which can properly work with, and model, the knowledge of various agents. To add even more complication, the role of time is also important (an islander may not know a certain fact on one day, but learn it on the next day), so one also needs to incorporate the language of temporal logic in order to fully model the situation. This makes both the syntax and semantics of the logic quite intricate; to see this, one only needs to contemplate the task of programming a computer with enough epistemic and temporal deductive reasoning powers that it would be able to solve the islander puzzle (or even a smaller version thereof, say with just three or four islanders) without being deliberately “fed” the solution. (The fact, therefore, that humans can grasp the correct solution without any formal logical training is therefore quite remarkable.)

As difficult as the syntax of temporal epistemic modal logic is, though, the semantics is more intricate still. For instance, it turns out that in order to completely model the epistemic state of a finite number of agents (such as 1000 islanders), one requires an infinite model, due to the existence of arbitrarily long nested chains of knowledge (e.g. “${A}$ knows that ${B}$ knows that ${C}$ knows that ${D}$ has blue eyes”), which cannot be automatically reduced to shorter chains of knowledge. Furthermore, because each agent has only an incomplete knowledge of the world, one must take into account multiple hypothetical worlds, which differ from the real world but which are considered to be possible worlds by one or more agents, thus introducing modality into the logic. More subtly, one must also consider worlds which each agent knows to be impossible, but are not commonly known to be impossible, so that (for instance) one agent is willing to admit the possibility that another agent considers that world to be possible; it is the consideration of such worlds which is crucial to the resolution of the blue-eyed islander puzzle. And this is even before one adds the temporal aspect (e.g. “On Tuesday, ${A}$ knows that on Monday, ${B}$ knew that by Wednesday, ${C}$ will know that ${D}$ has blue eyes”).

Despite all this fearsome complexity, it is still possible to set up both the syntax and semantics of temporal epistemic modal logic in such a way that one can formulate the blue-eyed islander problem rigorously, and in such a way that one has both an upper and a lower bound in the solution. The purpose of this post is to construct such a setup and to explain the lower bound in particular. The same logic is also useful for analysing another well-known paradox, the unexpected hanging paradox, and I will do so at the end of the post. Note though that there is more than one way to set up epistemic logics, and they are not all equivalent to each other.

(On the other hand, for puzzles such as the islander puzzle in which there are only a finite number of atomic propositions and no free variables, one at least can avoid the need to admit predicate logic, in which one has to discuss quantifiers such as ${\forall}$ and ${\exists}$. A fully formed predicate temporal epistemic modal logic would indeed be of terrifying complexity.)

Our approach here will be a little different from the approach commonly found in the epistemic logic literature, in which one jumps straight to “arbitrary-order epistemic logic” in which arbitrarily long nested chains of knowledge (“${A}$ knows that ${B}$ knows that ${C}$ knows that \ldots”) are allowed. Instead, we will adopt a hierarchical approach, recursively defining for ${k=0,1,2,\ldots}$ a “${k^{th}}$-order epistemic logic” in which knowledge chains of depth up to ${k}$, but no greater, are permitted. The arbitrarily order epistemic logic is then obtained as a limit (a direct limit on the syntactic side, and an inverse limit on the semantic side, which is dual to the syntactic side) of the finite order epistemic logics.

I should warn that this is going to be a rather formal and mathematical post. Readers who simply want to know the answer to the islander puzzle would probably be better off reading the discussion at the puzzle’s own blog post instead.

A topological space ${X}$ is said to be metrisable if one can find a metric ${d: X \times X \rightarrow [0,+\infty)}$ on it whose open balls ${B(x,r) := \{ y \in X: d(x,y) < r \}}$ generate the topology.

There are some obvious necessary conditions on the space ${X}$ in order for it to be metrisable. For instance, it must be Hausdorff, since all metric spaces are Hausdorff. It must also be first countable, because every point ${x}$ in a metric space has a countable neighbourhood base of balls ${B(x,1/n)}$, ${n=1,2,\ldots}$.

In the converse direction, being Hausdorff and first countable is not always enough to guarantee metrisability, for a variety of reasons. For instance the long line is not metrisable despite being both Hausdorff and first countable, due to a failure of paracompactness, which prevents one from gluing together the local metric structures on this line into a global one. Even after adding in paracompactness, this is still not enough; the real line with the lower limit topology (also known as the Sorgenfrey line) is Hausdorff, first countable, and paracompact, but still not metrisable (because of a failure of second countability despite being separable).

However, there is one important setting in which the Hausdorff and first countability axioms do suffice to give metrisability, and that is the setting of topological groups:

Theorem 1 (Birkhoff-Kakutani theorem) Let ${G}$ be a topological group (i.e. a topological space that is also a group, such that the group operations ${\cdot: G \times G \rightarrow G}$ and ${()^{-1}: G \rightarrow G}$ are continuous). Then ${G}$ is metrisable if and only if it is both Hausdorff and first countable.

Remark 1 It is not hard to show that a topological group is Hausdorff if and only if the singleton set ${\{\hbox{id}\}}$ is closed. More generally, in an arbitrary topological group, it is a good exercise to show that the closure of ${\{\hbox{id}\}}$ is always a closed normal subgroup ${H}$ of ${G}$, whose quotient ${G/H}$ is then a Hausdorff topological group. Because of this, the study of topological groups can usually be reduced immediately to the study of Hausdorff topological groups. (Indeed, in many texts, topological groups are automatically understood to be an abbreviation for “Hausdorff topological group”.)

The standard proof of the Birkhoff-Kakutani theorem (which we have taken from this book of Montgomery and Zippin) relies on the following Urysohn-type lemma:

Lemma 2 (Urysohn-type lemma) Let ${G}$ be a Hausdorff first countable group. Then there exists a bounded continuous function ${f: G \rightarrow [0,1]}$ with the following properties:

• (Unique maximum) ${f(\hbox{id}) = 1}$, and ${f(x) < 1}$ for all ${x \neq \hbox{id}}$.
• (Neighbourhood base) The sets ${\{ x \in G: f(x) > 1-1/n \}}$ for ${n=1,2,\ldots}$ form a neighbourhood base at the identity.
• (Uniform continuity) For every ${\varepsilon > 0}$, there exists an open neighbourhood ${U}$ of the identity such that ${|f(gx)-f(x)| \leq \epsilon}$ for all ${g \in U}$ and ${x \in G}$.

Note that if ${G}$ had a left-invariant metric, then the function ${f(x) := \max( 1 - \hbox{dist}(x,\hbox{id}), 0)}$ would suffice for this lemma, which already gives some indication as to why this lemma is relevant to the Birkhoff-Kakutani theorem.

Let us assume Lemma 2 for now and finish the proof of the Birkhoff-Kakutani theorem. We only prove the difficult direction, namely that a Hausdorff first countable topological group ${G}$ is metrisable. We let ${f}$ be the function from Lemma 2, and define the function ${d_f := G \times G \rightarrow [0,+\infty)}$ by the formula

$\displaystyle d_f( g, h ) := \| \tau_g f - \tau_h f \|_{BC(G)} = \sup_{x \in G} |f(g^{-1} x) - f(h^{-1} x)| \ \ \ \ \ (1)$

where ${BC(G)}$ is the space of bounded continuous functions on ${G}$ (with the supremum norm) and ${\tau_g}$ is the left-translation operator ${\tau_g f(x) := f(g^{-1} x)}$.

Clearly ${d_f}$ obeys the the identity ${d_f(g,g) = 0}$ and symmetry ${d_f(g,h) = d_f(h,g)}$ axioms, and the triangle inequality ${d_f(g,k) \leq d_f(g,h) + d_f(h,k)}$ is also immediate. This already makes ${d_f}$ a pseudometric. In order for ${d_f}$ to be a genuine metric, what is needed is that ${f}$ have no non-trivial translation invariances, i.e. one has ${\tau_g f \neq f}$ for all ${g \neq \hbox{id}}$. But this follows since ${f}$ attains its maximum at exactly one point, namely the group identity ${\hbox{id}}$.

To put it another way: because ${f}$ has no non-trivial translation invariances, the left translation action ${\tau}$ gives an embedding ${g \mapsto \tau_g f}$, and ${G}$ then inherits a metric ${d_f}$ from the metric structure on ${BC(G)}$.

Now we have to check whether the metric ${d_f}$ actually generates the topology. This amounts to verifying two things. Firstly, that every ball ${B(x,r)}$ in this metric is open; and secondly, that every open neighbourhood of a point ${x \in G}$ contains a ball ${B(x,r)}$.

To verify the former claim, it suffices to show that the map ${g \mapsto \tau_g f}$ from ${G}$ to ${BC(G)}$ is continuous, follows from the uniform continuity hypothesis. The second claim follows easily from the neighbourhood base hypothesis, since if ${d_f(g,h) < 1/n}$ then ${f(g^{-1} h) > 1-1/n}$.

Remark 2 The above argument in fact shows that if a group ${G}$ is metrisable, then it admits a left-invariant metric. The idea of using a suitable continuous function ${f}$ to generate a useful metric structure on a topological group is a powerful one, for instance underlying the Gleason lemmas which are fundamental to the solution of Hilbert’s fifth problem. I hope to return to this topic in a future post.

Now we prove Lemma 2. By first countability, we can find a countable neighbourhood base

$\displaystyle V_1 \supset V_2 \supset \ldots \supset \{\hbox{id}\}$

of the identity. As ${G}$ is Hausdorff, we must have

$\displaystyle \bigcap_{n=1}^\infty V_n = \{\hbox{id}\}.$

Using the continuity of the group axioms, we can recursively find a sequence of nested open neighbourhoods of the identity

$\displaystyle U_1 \supset U_{1/2} \supset U_{1/4} \supset \ldots \supset \{\hbox{id}\} \ \ \ \ \ (2)$

such that each ${U_{1/2^n}}$ is symmetric (i.e. ${g \in U_{1/2^n}}$ if and only if ${g^{-1} \in U_{1/2^n}}$), is contained in ${V_n}$, and is such that ${U_{1/2^{n+1}} \cdot U_{1/2^{n+1}} \subset U_{1/2^n}}$ for each ${n \geq 0}$. In particular the ${U_{1/2^n}}$ are also a neighbourhood base of the identity with

$\displaystyle \bigcap_{n=1}^\infty U_{1/2^n} = \{\hbox{id}\}. \ \ \ \ \ (3)$

For every dyadic rational ${a/2^n}$ in ${(0,1)}$, we can now define the open sets ${U_{a/2^n}}$ by setting

$\displaystyle U_{a/2^n} := U_{1/2^{n_k}} \cdot \ldots \cdot U_{1/2^{n_1}}$

where ${a/2^n = 2^{-n_1} + \ldots + 2^{-n_k}}$ is the binary expansion of ${a/2^n}$ with ${1 \leq n_1 < \ldots < n_k}$. By repeated use of the hypothesis ${U_{1/2^{n+1}} \cdot U_{1/2^{n+1}} \subset U_{1/2^n}}$ we see that the ${U_{a/2^n}}$ are increasing in ${a/2^n}$; indeed, we have the inclusion

$\displaystyle U_{1/2^n} \cdot U_{a/2^n} \subset U_{(a+1)/2^n} \ \ \ \ \ (4)$

for all ${n \geq 1}$ and ${1 \leq a < 2^n}$.

We now set

$\displaystyle f(x) := \sup \{ 1 - \frac{a}{2^n}: n \geq 1; 1 \leq a < 2^n; x \in U_{a/2^n} \}$

with the understanding that ${f(x)=0}$ if the supremum is over the empty set. One easily verifies using (4) that ${f}$ is continuous, and furthermore obeys the uniform continuity property. The neighbourhood base property follows since the ${U_{1/2^n}}$ are a neighbourhood base of the identity, and the unique maximum property follows from (3). This proves Lemma 2.

Remark 3 A very similar argument to the one above also establishes that every topological group ${G}$ is completely regular.

Notice that the function ${f}$ constructed in the above argument was localised to the set ${V_1}$. As such, it is not difficult to localise the Birkhoff-Kakutani theorem to local groups. A local group is a topological space ${G}$ equipped with an identity ${\hbox{id}}$, a partially defined inversion operation ${()^{-1}: \Lambda \rightarrow G}$, and a partially defined product operation ${\cdot: \Omega \rightarrow G}$, where ${\Lambda}$, ${\Omega}$ are open subsets of ${G}$ and ${G \times G}$, obeying the following restricted versions of the group axioms:

1. (Continuity) ${\cdot}$ and ${()^{-1}}$ are continuous on their domains of definition.
2. (Identity) For any ${g \in G}$, ${\hbox{id} \cdot g}$ and ${g \cdot \hbox{id}}$ are well-defined and equal to ${g}$.
3. (Inverse) For any ${g \in \Lambda}$, ${g \cdot g^{-1}}$ and ${g^{-1} \cdot g}$ are well-defined and equal to ${\hbox{id}}$. ${\hbox{id}^{-1}}$ is well-defined and equal to ${\hbox{id}}$.
4. (Local associativity) If ${g, h, k \in G}$ are such that ${g \cdot h}$, ${(g \cdot h) \cdot k}$, ${h \cdot k}$, and ${g \cdot (h \cdot k)}$ are all well-defined, then ${(g \cdot h) \cdot k = g \cdot (h \cdot k)}$.

Informally, one can view a local group as a topological group in which the closure axiom has been almost completely dropped, but with all the other axioms retained. A basic way to generate a local group is to start with an ordinary topological group ${G}$ and restrict it to an open neighbourhood ${U}$ of the identity, with ${\Lambda := \{ g \in U: g^{-1} \in U \}}$ and ${\Omega := \{ (g,h) \in U \times U: gh \in U \}}$. However, this is not quite the only way to generate local groups (ultimately because the local associativity axiom does not necessarily imply a (stronger) global associativity axiom in which one considers two different ways to multiply more than three group elements together).

Remark 4 Another important example of a local group is that of a group chunk, in which the sets ${\Lambda}$ and ${\Omega}$ are somehow “generic”; for instance, ${G}$ could be an algebraic variety, ${\Lambda, \Omega}$ Zariski-open, and the group operations birational on their domains of definition. This is somewhat analogous to the notion of a “${99\%}$ group” in additive combinatorics. There are a number of group chunk theorems, starting with a theorem of Weil in the algebraic setting, which roughly speaking assert that a generic portion of a group chunk can be identified with the generic portion of a genuine group.

We then have

Theorem 3 (Birkhoff-Kakutani theorem for local groups) Let ${G}$ be a local group which is Hausdorff and first countable. Then there exists an open neighbourhood ${V_0}$ of the identity which is metrisable.

Proof: (Sketch) It is not difficult to see that in a local group ${G}$, one can find a symmetric neighbourhood ${V_0}$ of the identity such that the product of any ${100}$ (say) elements of ${V_0}$ (multiplied together in any order) are well-defined, which effectively allows us to treat elements of ${V_0}$ as if they belonged to a group for the purposes of simple algebraic manipulation, such as applying the cancellation laws ${gh=gk \implies h=k}$ for ${g,h,k \in V_0}$. Inside this ${V_0}$, one can then repeat the previous arguments and eventually end up with a continuous function ${f \in BC(G)}$ supported in ${V_0}$ obeying the conclusions of Lemma 2 (but in the uniform continuity conclusion, one has to restrict ${x}$ to, say, ${V_0^{10}}$, to avoid issues of ill-definedness). The definition (1) then gives a metric on ${V_0}$ with the required properties, where we make the convention that ${\tau_g f(x)}$ vanishes for ${x \not \in V_0^{10}}$ (say) and ${g \in V_0}$. $\Box$

My motivation for studying local groups is that it turns out that there is a correspondence (first observed by Hrushovski) between the concept of an approximate group in additive combinatorics, and a locally compact local group in topological group theory; I hope to discuss this correspondence further in a subsequent post.

Suppose one has a measure space ${X = (X, {\mathcal B}, \mu)}$ and a sequence of operators ${T_n: L^p(X) \rightarrow L^p(X)}$ that are bounded on some ${L^p(X)}$ space, with ${1 \leq p < \infty}$. Suppose that on some dense subclass of functions ${f}$ in ${L^p(X)}$ (e.g. continuous compactly supported functions, if the space ${X}$ is reasonable), one already knows that ${T_n f}$ converges pointwise almost everywhere to some limit ${Tf}$, for another bounded operator ${T: L^p(X) \rightarrow L^p(X)}$ (e.g. ${T}$ could be the identity operator). What additional ingredient does one need to pass to the limit and conclude that ${T_n f}$ converges almost everywhere to ${Tf}$ for all ${f}$ in ${L^p(X)}$ (and not just for ${f}$ in a dense subclass)?

One standard way to proceed here is to study the maximal operator

$\displaystyle T_* f(x) := \sup_n |T_n f(x)|$

and aim to establish a weak-type maximal inequality

$\displaystyle \| T_* f \|_{L^{p,\infty}(X)} \leq C \| f \|_{L^p(X)} \ \ \ \ \ (1)$

for all ${f \in L^p(X)}$ (or all ${f}$ in the dense subclass), and some constant ${C}$, where ${L^{p,\infty}}$ is the weak ${L^p}$ norm

$\displaystyle \|f\|_{L^{p,\infty}(X)} := \sup_{t > 0} t \mu( \{ x \in X: |f(x)| \geq t \})^{1/p}.$

A standard approximation argument using (1) then shows that ${T_n f}$ will now indeed converge to ${Tf}$ pointwise almost everywhere for all ${f}$ in ${L^p(X)}$, and not just in the dense subclass. See for instance these lecture notes of mine, in which this method is used to deduce the Lebesgue differentiation theorem from the Hardy-Littlewood maximal inequality. This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it is strictly necessary. In particular, is it possible to have a pointwise convergence result ${T_n f \mapsto T f}$ without being able to obtain a weak-type maximal inequality of the form (1)?

In the case of norm convergence (in which one asks for ${T_n f}$ to converge to ${Tf}$ in the ${L^p}$ norm, rather than in the pointwise almost everywhere sense), the answer is no, thanks to the uniform boundedness principle, which among other things shows that norm convergence is only possible if one has the uniform bound

$\displaystyle \sup_n \| T_n f \|_{L^p(X)} \leq C \| f \|_{L^p(X)} \ \ \ \ \ (2)$

for some ${C>0}$ and all ${f \in L^p(X)}$; and conversely, if one has the uniform bound, and one has already established norm convergence of ${T_n f}$ to ${Tf}$ on a dense subclass of ${L^p(X)}$, (2) will extend that norm convergence to all of ${L^p(X)}$.

Returning to pointwise almost everywhere convergence, the answer in general is “yes”. Consider for instance the rank one operators

$\displaystyle T_n f(x) := 1_{[n,n+1]} \int_0^1 f(y)\ dy$

from ${L^1({\bf R})}$ to ${L^1({\bf R})}$. It is clear that ${T_n f}$ converges pointwise almost everywhere to zero as ${n \rightarrow \infty}$ for any ${f \in L^1({\bf R})}$, and the operators ${T_n}$ are uniformly bounded on ${L^1({\bf R})}$, but the maximal function ${T_*}$ does not obey (1). One can modify this example in a number of ways to defeat almost any reasonable conjecture that something like (1) should be necessary for pointwise almost everywhere convergence.

In spite of this, a remarkable observation of Stein, now known as Stein’s maximal principle, asserts that the maximal inequality is necessary to prove pointwise almost everywhere convergence, if one is working on a compact group and the operators ${T_n}$ are translation invariant, and if the exponent ${p}$ is at most ${2}$:

Theorem 1 (Stein maximal principle) Let ${G}$ be a compact group, let ${X}$ be a homogeneous space of ${G}$ with a finite Haar measure ${\mu}$, let ${1\leq p \leq 2}$, and let ${T_n: L^p(X) \rightarrow L^p(X)}$ be a sequence of bounded linear operators commuting with translations, such that ${T_n f}$ converges pointwise almost everywhere for each ${f \in L^p(X)}$. Then (1) holds.

This is not quite the most general vesion of the principle; some additional variants and generalisations are given in the original paper of Stein. For instance, one can replace the discrete sequence ${T_n}$ of operators with a continuous sequence ${T_t}$ without much difficulty. As a typical application of this principle, we see that Carleson’s celebrated theorem that the partial Fourier series ${\sum_{n=-N}^N \hat f(n) e^{2\pi i nx}}$ of an ${L^2({\bf R}/{\bf Z})}$ function ${f: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ converge almost everywhere is in fact equivalent to the estimate

$\displaystyle \| \sup_{N>0} |\sum_{n=-N}^N \hat f(n) e^{2\pi i n\cdot}|\|_{L^{2,\infty}({\bf R}/{\bf Z})} \leq C \|f\|_{L^2({\bf R}/{\bf Z})}. \ \ \ \ \ (3)$

And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (3), and Stein’s maximal principle strongly suggests that this is the optimal way to try to prove this theorem.

On the other hand, the theorem does fail for ${p>2}$, and almost everywhere convergence results in ${L^p}$ for ${p>2}$ can be proven by other methods than weak ${(p,p)}$ estimates. For instance, the convergence of Bochner-Riesz multipliers in ${L^p({\bf R}^n)}$ for any ${n}$ (and for ${p}$ in the range predicted by the Bochner-Riesz conjecture) was verified for ${p > 2}$ by Carbery, Rubio de Francia, and Vega, despite the fact that the weak ${(p,p)}$ of even a single Bochner-Riesz multiplier, let alone the maximal function, has still not been completely verified in this range. (Carbery, Rubio de Francia and Vega use weighted ${L^2}$ estimates for the maximal Bochner-Riesz operator, rather than ${L^p}$ type estimates.) For ${p \leq 2}$, though, Stein’s principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz means is equivalent to the weak ${(p,p)}$ estimate (1).

Stein’s principle is restricted to compact groups (such as the torus ${({\bf R}/{\bf Z})^n}$ or the rotation group ${SO(n)}$) and their homogeneous spaces (such as the torus ${({\bf R}/{\bf Z})^n}$ again, or the sphere ${S^{n-1}}$). As stated, the principle fails in the noncompact setting; for instance, in ${{\bf R}}$, the convolution operators ${T_n f := f * 1_{[n,n+1]}}$ are such that ${T_n f}$ converges pointwise almost everywhere to zero for every ${f \in L^1({\bf R}^n)}$, but the maximal function is not of weak-type ${(1,1)}$. However, in many applications on non-compact domains, the ${T_n}$ are “localised” enough that one can transfer from a non-compact setting to a compact setting and then apply Stein’s principle. For instance, Carleson’s theorem on the real line ${{\bf R}}$ is equivalent to Carleson’s theorem on the circle ${{\bf R}/{\bf Z}}$ (due to the localisation of the Dirichlet kernels), which as discussed before is equivalent to the estimate (3) on the circle, which by a scaling argument is equivalent to the analogous estimate on the real line ${{\bf R}}$.

Stein’s argument from his 1961 paper can be viewed nowadays as an application of the probabilistic method; starting with a sequence of increasingly bad counterexamples to the maximal inequality (1), one randomly combines them together to create a single “infinitely bad” counterexample. To make this idea work, Stein employs two basic ideas:

1. The random rotations (or random translations) trick. Given a subset ${E}$ of ${X}$ of small but positive measure, one can randomly select about ${|X|/|E|}$ translates ${g_i E}$ of ${E}$ that cover most of ${X}$.
2. The random sums trick Given a collection ${f_1,\ldots,f_n: X \rightarrow {\bf C}}$ of signed functions that may possibly cancel each other in a deterministic sum ${\sum_{i=1}^n f_i}$, one can perform a random sum ${\sum_{i=1}^n \pm f_i}$ instead to obtain a random function whose magnitude will usually be comparable to the square function ${(\sum_{i=1}^n |f_i|^2)^{1/2}}$; this can be made rigorous by concentration of measure results, such as Khintchine’s inequality.

These ideas have since been used repeatedly in harmonic analysis. For instance, I used the random rotations trick in a recent paper with Jordan Ellenberg and Richard Oberlin on Kakeya-type estimates in finite fields. The random sums trick is by now a standard tool to build various counterexamples to estimates (or to convergence results) in harmonic analysis, for instance being used by Fefferman in his famous paper disproving the boundedness of the ball multiplier on ${L^p({\bf R}^n)}$ for ${p \neq 2}$, ${n \geq 2}$. Another use of the random sum trick is to show that Theorem 1 fails once ${p>2}$; see Stein’s original paper for details.

Another use of the random rotations trick, closely related to Theorem 1, is the Nikishin-Stein factorisation theorem. Here is Stein’s formulation of this theorem:

Theorem 2 (Stein factorisation theorem) Let ${G}$ be a compact group, let ${X}$ be a homogeneous space of ${G}$ with a finite Haar measure ${\mu}$, let ${1\leq p \leq 2}$ and ${q>0}$, and let ${T: L^p(X) \rightarrow L^q(X)}$ be a bounded linear operator commuting with translations and obeying the estimate

$\displaystyle \|T f \|_{L^q(X)} \leq A \|f\|_{L^p(X)}$

for all ${f \in L^p(X)}$ and some ${A>0}$. Then ${T}$ also maps ${L^p(X)}$ to ${L^{p,\infty}(X)}$, with

$\displaystyle \|T f \|_{L^{p,\infty}(X)} \leq C_{p,q} A \|f\|_{L^p(X)}$

for all ${f \in L^p(X)}$, with ${C_{p,q}}$ depending only on ${p, q}$.

This result is trivial with ${q \geq p}$, but becomes useful when ${q. In this regime, the translation invariance allows one to freely “upgrade” a strong-type ${(p,q)}$ result to a weak-type ${(p,p)}$ result. In other words, bounded linear operators from ${L^p(X)}$ to ${L^q(X)}$ automatically factor through the inclusion ${L^{p,\infty}(X) \subset L^q(X)}$, which helps explain the name “factorisation theorem”. Factorisation theory has been developed further by many authors, including Maurey and Pisier.

Stein’s factorisation theorem (or more precisely, a variant of it) is useful in the theory of Kakeya and restriction theorems in Euclidean space, as first observed by Bourgain.

In 1970, Nikishin obtained the following generalisation of Stein’s factorisation theorem in which the translation-invariance hypothesis can be dropped, at the cost of excluding a set of small measure:

Theorem 3 (Nikishin-Stein factorisation theorem) Let ${X}$ be a finite measure space, let ${1\leq p \leq 2}$ and ${q>0}$, and let ${T: L^p(X) \rightarrow L^q(X)}$ be a bounded linear operator obeying the estimate

$\displaystyle \|T f \|_{L^q(X)} \leq A \|f\|_{L^p(X)}$

for all ${f \in L^p(X)}$ and some ${A>0}$. Then for any ${\epsilon > 0}$, there exists a subset ${E}$ of ${X}$ of measure at most ${\epsilon}$ such that

$\displaystyle \|T f \|_{L^{p,\infty}(X \backslash E)} \leq C_{p,q,\epsilon} A \|f\|_{L^p(X)} \ \ \ \ \ (4)$

for all ${f \in L^p(X)}$, with ${C_{p,q,\epsilon}}$ depending only on ${p, q, \epsilon}$.

One can recover Theorem 2 from Theorem 3 by an averaging argument to eliminate the exceptional set; we omit the details.

Recall that a (complex) abstract Lie algebra is a complex vector space ${{\mathfrak g}}$ (either finite or infinite dimensional) equipped with a bilinear antisymmetric form ${[]: {\mathfrak g} \times {\mathfrak g} \rightarrow {\mathfrak g}}$ that obeys the Jacobi identity

$\displaystyle [[X,Y],Z] + [[Y,Z],X] + [[Z,X],Y] = 0. \ \ \ \ \ (1)$

(One can of course define Lie algebras over other fields than the complex numbers ${{\bf C}}$, but in order to avoid some technical issues we shall work solely with the complex case in this post.)

An important special case of the abstract Lie algebras are the concrete Lie algebras, in which ${{\mathfrak g} \subset \hbox{End}(V)}$ is a vector space of linear transformations ${X: V \rightarrow V}$ on a vector space ${V}$ (which again can be either finite or infinite dimensional), and the bilinear form is given by the usual Lie bracket

$\displaystyle [X,Y] := XY-YX.$

It is easy to verify that every concrete Lie algebra is an abstract Lie algebra. In the converse direction, we have

Theorem 1 Every abstract Lie algebra is isomorphic to a concrete Lie algebra.

To prove this theorem, we introduce the useful algebraic tool of the universal enveloping algebra ${U({\mathfrak g})}$ of the abstract Lie algebra ${{\mathfrak g}}$. This is the free (associative, complex) algebra generated by ${{\mathfrak g}}$ (viewed as a complex vector space), subject to the constraints

$\displaystyle [X,Y] = XY - YX. \ \ \ \ \ (2)$

This algebra is described by the Poincaré-Birkhoff-Witt theorem, which asserts that given an ordered basis ${(X_i)_{i \in I}}$ of ${{\mathfrak g}}$ as a vector space, that a basis of ${U({\mathfrak g})}$ is given by “monomials” of the form

$\displaystyle X_{i_1}^{a_1} \ldots X_{i_m}^{a_m} \ \ \ \ \ (3)$

where ${m}$ is a natural number, the ${i_1 < \ldots < i_m}$ are an increasing sequence of indices in ${I}$, and the ${a_1,\ldots,a_m}$ are positive integers. Indeed, given two such monomials, one can express their product as a finite linear combination of further monomials of the form (3) after repeatedly applying (2) (which we rewrite as ${XY = YX + [X,Y]}$) to reorder the terms in this product modulo lower order terms until one all monomials have their indices in the required increasing order. It is then a routine exercise in basic abstract algebra (using all the axioms of an abstract Lie algebra) to verify that this is multiplication rule on monomials does indeed define a complex associative algebra which has the universal properties required of the universal enveloping algebra.

The abstract Lie algebra ${{\mathfrak g}}$ acts on its universal enveloping algebra ${U({\mathfrak g})}$ by left-multiplication: ${X: M \mapsto XM}$, thus giving a map from ${{\mathfrak g}}$ to ${\hbox{End}(U({\mathfrak g}))}$. It is easy to verify that this map is a Lie algebra homomorphism (so this is indeed an action (or representation) of the Lie algebra), and this action is clearly faithful (i.e. the map from ${{\mathfrak g}}$ to ${\hbox{End}(U{\mathfrak g})}$ is injective), since each element ${X}$ of ${{\mathfrak g}}$ maps the identity element ${1}$ of ${U({\mathfrak g})}$ to a different element of ${U({\mathfrak g})}$, namely ${X}$. Thus ${{\mathfrak g}}$ is isomorphic to its image in ${\hbox{End}(U({\mathfrak g}))}$, proving Theorem 1.

In the converse direction, every representation ${\rho: {\mathfrak g} \rightarrow \hbox{End}(V)}$ of a Lie algebra “factors through” the universal enveloping algebra, in that it extends to an algebra homomorphism from ${U({\mathfrak g})}$ to ${\hbox{End}(V)}$, which by abuse of notation we shall also call ${\rho}$.

One drawback of Theorem 1 is that the space ${U({\mathfrak g})}$ that the concrete Lie algebra acts on will almost always be infinite-dimensional, even when the original Lie algebra ${{\mathfrak g}}$ is finite-dimensional. However, there is a useful theorem of Ado that rectifies this:

Theorem 2 (Ado’s theorem) Every finite-dimensional abstract Lie algebra is isomorphic to a concrete Lie algebra over a finite-dimensional vector space ${V}$.

Among other things, this theorem can be used (in conjunction with the Baker-Campbell-Hausdorff formula) to show that every abstract (finite-dimensional) Lie group (or abstract local Lie group) is locally isomorphic to a linear group. (It is well-known, though, that abstract Lie groups are not necessarily globally isomorphic to a linear group, but we will not discuss these global obstructions here.)

Ado’s theorem is surprisingly tricky to prove in general, but some special cases are easy. For instance, one can try using the adjoint representation ${\hbox{ad}: {\mathfrak g} \rightarrow \hbox{End}({\mathfrak g})}$ of ${{\mathfrak g}}$ on itself, defined by the action ${X: Y \mapsto [X,Y]}$; the Jacobi identity (1) ensures that this indeed a representation of ${{\mathfrak g}}$. The kernel of this representation is the centre ${Z({\mathfrak g}) := \{ X \in {\mathfrak g}: [X,Y]=0 \hbox{ for all } Y \in {\mathfrak g}\}}$. This already gives Ado’s theorem in the case when ${{\mathfrak g}}$ is semisimple, in which case the center is trivial.

The adjoint representation does not suffice, by itself, to prove Ado’s theorem in the non-semisimple case. However, it does provide an important reduction in the proof, namely it reduces matters to showing that every finite-dimensional Lie algebra ${{\mathfrak g}}$ has a finite-dimensional representation ${\rho: {\mathfrak g} \rightarrow \hbox{End}(V)}$ which is faithful on the centre ${Z({\mathfrak g})}$. Indeed, if one has such a representation, one can then take the direct sum of that representation with the adjoint representation to obtain a new finite-dimensional representation which is now faithful on all of ${{\mathfrak g}}$, which then gives Ado’s theorem for ${{\mathfrak g}}$.

It remins to find a finite-dimensional representation of ${{\mathfrak g}}$ which is faithful on the centre ${Z({\mathfrak g})}$. In the case when ${{\mathfrak g}}$ is abelian, so that the centre ${Z({\mathfrak g})}$ is all of ${{\mathfrak g}}$, this is again easy, because ${{\mathfrak g}}$ then acts faithfully on ${{\mathfrak g} \times {\bf C}}$ by the infinitesimal shear maps ${X: (Y,t) \mapsto (tX, 0)}$. In matrix form, this representation identifies each ${X}$ in this abelian Lie algebra with an “upper-triangular” matrix:

$\displaystyle X \equiv \begin{pmatrix} 0 & X \\ 0 & 0 \end{pmatrix}.$

This construction gives a faithful finite-dimensional representation of the centre ${Z({\mathfrak g})}$ of any finite-dimensional Lie algebra. The standard proof of Ado’s theorem (which I believe dates back to work of Harish-Chandra) then proceeds by gradually “extending” this representation of the centre ${Z({\mathfrak g})}$ to larger and larger sub-algebras of ${{\mathfrak g}}$, while preserving the finite-dimensionality of the representation and the faithfulness on ${Z({\mathfrak g})}$, until one obtains a representation on the entire Lie algebra ${{\mathfrak g}}$ with the required properties. (For technical inductive reasons, one also needs to carry along an additional property of the representation, namely that it maps the nilradical to nilpotent elements, but we will discuss this technicality later.)

This procedure is a little tricky to execute in general, but becomes simpler in the nilpotent case, in which the lower central series ${{\mathfrak g}_1 := {\mathfrak g}; {\mathfrak g}_{n+1} := [{\mathfrak g}, {\mathfrak g}_n]}$ becomes trivial for sufficiently large ${n}$:

Theorem 3 (Ado’s theorem for nilpotent Lie algebras) Let ${{\mathfrak n}}$ be a finite-dimensional nilpotent Lie algebra. Then there exists a finite-dimensional faithful representation ${\rho: {\mathfrak n} \rightarrow \hbox{End}(V)}$ of ${{\mathfrak n}}$. Furthermore, there exists a natural number ${k}$ such that ${\rho({\mathfrak n})^k = \{0\}}$, i.e. one has ${\rho(X_1) \ldots \rho(X_k)=0}$ for all ${X_1,\ldots,X_k \in {\mathfrak n}}$.

The second conclusion of Ado’s theorem here is useful for induction purposes. (By Engel’s theorem, this conclusion is also equivalent to the assertion that every element of ${\rho({\mathfrak n})}$ is nilpotent, but we can prove Theorem 3 without explicitly invoking Engel’s theorem.)

Below the fold, I give a proof of Theorem 3, and then extend the argument to cover the full strength of Ado’s theorem. This is not a new argument – indeed, I am basing this particular presentation from the one in Fulton and Harris – but it was an instructive exercise for me to try to extract the proof of Ado’s theorem from the more general structural theory of Lie algebras (e.g. Engel’s theorem, Lie’s theorem, Levi decomposition, etc.) in which the result is usually placed. (However, the proof I know of still needs Engel’s theorem to establish the solvable case, and the Levi decomposition to then establish the general case.)

Igor Rodnianski and I have just uploaded to the arXiv our paper “Effective limiting absorption principles, and applications“, submitted to Communications in Mathematical Physics. In this paper we derive limiting absorption principles (of type discussed in this recent post) for a general class of Schrödinger operators ${H = -\Delta + V}$ on a wide class of manifolds, namely the asymptotically conic manifolds. The precise definition of such manifolds is somewhat technical, but they include as a special case the asymptotically flat manifolds, which in turn include as a further special case the smooth compact perturbations of Euclidean space ${{\bf R}^n}$ (i.e. the smooth Riemannian manifolds that are identical to ${{\bf R}^n}$ outside of a compact set). The potential ${V}$ is assumed to be a short range potential, which roughly speaking means that it decays faster than ${1/|x|}$ as ${x \rightarrow \infty}$; for several of the applications (particularly at very low energies) we need to in fact assume that ${V}$ is a strongly short range potential, which roughly speaking means that it decays faster than ${1/|x|^2}$.

To begin with, we make no hypotheses about the topology or geodesic geometry of the manifold ${M}$; in particular, we allow ${M}$ to be trapping in the sense that it contains geodesic flows that do not escape to infinity, but instead remain trapped in a bounded subset of ${M}$. We also allow the potential ${V}$ to be signed, which in particular allows bound states (eigenfunctions of negative energy) to be created. For standard technical reasons we restrict attention to dimensions three and higher: ${d \geq 3}$.

It is well known that such Schrödinger operators ${H}$ are essentially self-adjoint, and their spectrum consists of purely absolutely continuous spectrum on ${(0,+\infty)}$, together with possibly some eigenvalues at zero and negative energy (and at zero energy and in dimensions three and four, there are also the possibility of resonances which, while not strictly eigenvalues, have a somewhat analogous effect on the dynamics of the Laplacian and related objects, such as resolvents). In particular, the resolvents ${R(\lambda \pm i\epsilon) := (H - \lambda \mp i\epsilon)^{-1}}$ make sense as bounded operators on ${L^2(M)}$ for any ${\lambda \in {\bf R}}$ and ${\epsilon > 0}$. As discussed in the previous blog post, it is of interest to obtain bounds for the behaviour of these resolvents, as this can then be used via some functional calculus manipulations to obtain control on many other operators and PDE relating to the Schrödinger operator ${H}$, such as the Helmholtz equation, the time-dependent Schrödinger equation, and the wave equation. In particular, it is of interest to obtain limiting absorption estimates such as

$\displaystyle \| R(\lambda \pm i\epsilon) f \|_{H^{0,-1/2-\sigma}(M)} \leq C(M,V,\lambda,\sigma) \| f \|_{H^{0,1/2+\sigma}(M)} \ \ \ \ \ (1)$

for ${\lambda \in {\bf R}}$ (and particularly in the positive energy regime ${\lambda>0}$), where ${\sigma,\epsilon > 0}$ and ${f}$ is an arbitrary test function. The constant ${C(M,V,\lambda,\sigma)}$ needs to be independent of ${\epsilon}$ for such estimates to be truly useful, but it is also of interest to determine the extent to which these constants depend on ${M}$, ${V}$, and ${\lambda}$. The dependence on ${\sigma}$ is relatively uninteresting and henceforth we will suppress it. In particular, our paper focused to a large extent on quantitative methods that could give effective bounds on ${C(M,V,\lambda)}$ in terms of quantities such as the magnitude ${A}$ of the potential ${V}$ in a suitable norm.

It turns out to be convenient to distinguish between three regimes:

• The high-energy regime ${\lambda \gg 1}$;
• The medium-energy regime ${\lambda \sim 1}$; and
• The low-energy regime ${0 < \lambda \ll 1}$.

Our methods actually apply more or less uniformly to all three regimes, but the nature of the conclusions is quite different in each of the three regimes.

The high-energy regime ${\lambda \gg 1}$ was essentially worked out by Burq, although we give an independent treatment of Burq’s results here. In this regime it turns out that we have an unconditional estimate of the form (1) with a constant of the shape

$\displaystyle C(M,V,\lambda) = C(M,A) e^{C(M,A) \sqrt{\lambda}}$

where ${C(M,A)}$ is a constant that depends only on ${M}$ and on a parameter ${A}$ that controls the size of the potential ${V}$. This constant, while exponentially growing, is still finite, which among other things is enough to rule out the possibility that ${H}$ contains eigenfunctions (i.e. point spectrum) embedded in the high-energy portion of the spectrum. As is well known, if ${M}$ contains a certain type of trapped geodesic (in particular those arising from positively curved portions of the manifold, such as the equator of a sphere), then it is possible to construct pseudomodes ${f}$ that show that this sort of exponential growth is necessary. On the other hand, if we make the non-trapping hypothesis that all geodesics in ${M}$ escape to infinity, then we can obtain a much stronger high-energy limiting absorption estimate, namely

$\displaystyle C(M,V,\lambda,\sigma) = C(M,A) \lambda^{-1/2}.$

The exponent ${1/2}$ here is closely related to the standard fact that on non-trapping manifolds, there is a local smoothing effect for the time-dependent Schrödinger equation that gains half a derivative of regularity (cf. previous blog post). In the high-energy regime, the dynamics are well-approximated by semi-classical methods, and in particular one can use tools such as the positive commutator method and pseudo-differential calculus to obtain the desired estimates. In case of trapping one also needs the standard technique of Carleman inequalities to control the compact (and possibly trapping) core of the manifold, and in particular needing the delicate two-weight Carleman inequalities of Burq.

In the medium and low energy regimes one needs to work harder. In the medium energy regime ${\lambda \sim 1}$, we were able to obtain a uniform bound

$\displaystyle C(M,V,\lambda) \leq C(M,A)$

for all asymptotically conic manifolds (trapping or not) and all short-range potentials. To establish this bound, we have to supplement the existing tools of the positive commutator method and Carleman inequalities with an additional ODE-type analysis of various energies of the solution ${u = R(\lambda \pm i\epsilon) f}$ to a Helmholtz equation on large spheres, as will be discussed in more detail below the fold.

The methods also extend to the low-energy regime ${0 < \lambda \ll 1}$. Here, the bounds become somewhat interesting, with a subtle distinction between effective estimates that are uniform over all potentials ${V}$ which are bounded in a suitable sense by a parameter ${A}$ (e.g. obeying ${|V(x)| \leq A \langle x \rangle^{-2-2\sigma}}$ for all ${x}$), and ineffective estimates that exploit qualitative properties of ${V}$ (such as the absence of eigenfunctions or resonances at zero) and are thus not uniform over ${V}$. On the effective side, and for potentials that are strongly short range (at least at local scales ${|x| = O(\lambda^{-1/2})}$; one can tolerate merely short-range behaviour at more global scales, but this is a technicality that we will not discuss further here) we were able to obtain a polynomial bound of the form

$\displaystyle C(M,V,\lambda) \leq C(M,A) \lambda^{-C(M,A)}$

that blew up at a large polynomial rate at the origin. Furthermore, by carefully designing a sequence of potentials ${V}$ that induce near-eigenfunctions that resemble two different Bessel functions of the radial variable glued together, we are able to show that this type of polynomial bound is sharp in the following sense: given any constant ${C > 0}$, there exists a sequence ${V_n}$ of potentials on Euclidean space ${{\bf R}^d}$ uniformly bounded by ${A}$, and a sequence ${\lambda_n}$ of energies going to zero, such that

$\displaystyle C({\bf R}^d,V_n,\lambda_n) \geq \lambda_n^{-C}.$

This shows that if one wants bounds that are uniform in the potential ${V}$, then arbitrary polynomial blowup is necessary.

Interestingly, though, if we fix the potential ${V}$, and then ask for bounds that are not necessarily uniform in ${V}$, then one can do better, as was already observed in a classic paper of Jensen and Kato concerning power series expansions of the resolvent near the origin. In particular, if we make the spectral assumption that ${V}$ has no eigenfunctions or resonances at zero, then an argument (based on (a variant of) the Fredholm alternative, which as discussed in this recent blog post gives ineffective bounds) gives a bound of the form

$\displaystyle C(M,V,\lambda) \leq C(M,V) \lambda^{-1/2}$

in the low-energy regime (but note carefully here that the constant ${C(M,V)}$ on the right-hand side depends on the potential ${V}$ itself, and not merely on the parameter ${A}$ that upper bounds it). Even if there are eigenvalues or resonances, it turns out that one can still obtain a similar bound but with an exponent of ${\lambda^{-3/2}}$ instead of ${\lambda^{-1/2}}$. This limited blowup at infinity is in sharp contrast to the arbitrarily large polynomial blowup rate that can occur if one demands uniform bounds. (This particular subtlety between uniform and non-uniform estimates confused us, by the way, for several weeks; for a long time we thought that we had somehow found a contradiction between our results and the results of Jensen and Kato.)

As applications of our limiting absorption estimates, we give local smoothing and dispersive estimates for solutions (as well as the closely related RAGE type theorems) to the time-dependent Schrödinger and wave equations, and also reprove standard facts about the spectrum of Schrödinger operators in this setting.