You are currently browsing the category archive for the ‘math.FA’ category.

A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm ${\|T\|_{op}}$ of a linear map ${T: H \rightarrow H'}$ between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case ${T: {\bf C}^m \rightarrow {\bf C}^n}$ is of interest, as this operator norm is the same as the largest singular value ${\sigma_1(A)}$ of the ${n \times m}$ matrix ${A}$ associated to ${T}$.

In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an ${n \times n}$ diagonal matrix ${D = \hbox{diag}(\lambda_1,\ldots,\lambda_n)}$. In this case, the operator norm is simply the supremum norm of the diagonal coefficients:

$\displaystyle \|D\|_{op} = \sup_{1 \leq i \leq n} |\lambda_i|. \ \ \ \ \ (1)$

A variant of (1) is Schur’s test, which for simplicity we will phrase in the setting of finite-dimensional operators ${T: {\bf C}^m \rightarrow {\bf C}^n}$ given by a matrix ${A = (a_{ij})_{1 \leq i \leq n; 1 \leq j \leq m}}$ via the usual formula

$\displaystyle T (x_j)_{j=1}^m := ( \sum_{j=1}^m a_{ij} x_j )_{i=1}^n.$

A simple version of this test is as follows: if all the absolute row sums and columns sums of ${A}$ are bounded by some constant ${M}$, thus

$\displaystyle \sum_{j=1}^m |a_{ij}| \leq M \ \ \ \ \ (2)$

for all ${1 \leq i \leq n}$ and

$\displaystyle \sum_{i=1}^n |a_{ij}| \leq M \ \ \ \ \ (3)$

for all ${1 \leq j \leq m}$, then

$\displaystyle \|T\|_{op} = \|A\|_{op} \leq M \ \ \ \ \ (4)$

(note that this generalises (the upper bound in) (1).) Indeed, to see (4), it suffices by duality and homogeneity to show that

$\displaystyle |\sum_{i=1}^n (\sum_{j=1}^m a_{ij} x_j) y_i| \leq M$

whenever ${(x_j)_{j=1}^m}$ and ${(y_i)_{i=1}^n}$ are sequences with ${\sum_{j=1}^m |x_j|^2 = \sum_{i=1}^n |y_i|^2 = 1}$; but this easily follows from the arithmetic mean-geometric mean inequality

$\displaystyle |a_{ij} x_j) y_i| \leq \frac{1}{2} |a_{ij}| |x_i|^2 + \frac{1}{2} |a_{ij}| |y_j|^2$

and (2), (3).

Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients ${a_{ij}}$, as opposed to just their magnitudes ${|a_{ij}|}$) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients ${\lambda_i}$ or ${a_{ij}}$ are replaced by operators instead.

To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix

$\displaystyle A = \begin{pmatrix} \Lambda_1 & 0 & \ldots & 0 \\ 0 & \Lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \Lambda_n \end{pmatrix} \ \ \ \ \ (5)$

where each ${\Lambda_i}$ is now a ${m_i \times m_i}$ matrix, and so ${A}$ is an ${m \times m}$ matrix with ${m := m_1 + \ldots +m_n}$. Then we have

$\displaystyle \|A\|_{op} = \sup_{1 \leq i \leq n} \|\Lambda_i\|_{op}. \ \ \ \ \ (6)$

Indeed, the lower bound is trivial (as can be seen by testing ${A}$ on vectors which are supported on the ${i^{th}}$ block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition

$\displaystyle {\bf C}^m \equiv \bigoplus_{i=1}^m {\bf C}^{m_i} \ \ \ \ \ (7)$

to decompose an arbitrary vector ${x \in {\bf C}^m}$ as

$\displaystyle x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}$

with ${x_i \in {\bf C}^{m_i}}$, in which case we have

$\displaystyle Ax = \begin{pmatrix} \Lambda_1 x_1 \\ \Lambda_2 x_2 \\ \vdots \\ \Lambda_n x_n \end{pmatrix}$

and the upper bound in (6) then follows from a simple computation.

The operator ${T}$ associated to the matrix ${A}$ in (5) can be viewed as a sum ${T = \sum_{i=1}^n T_i}$, where each ${T_i}$ corresponds to the ${\Lambda_i}$ block of ${A}$, in which case (6) can also be written as

$\displaystyle \|T\|_{op} = \sup_{1 \leq i \leq n} \|T_i\|_{op}. \ \ \ \ \ (8)$

When ${n}$ is large, this is a significant improvement over the triangle inequality, which merely gives

$\displaystyle \|T\|_{op} \leq \sum_{1 \leq i \leq n} \|T_i\|_{op}.$

The reason for this gain can ultimately be traced back to the “orthogonality” of the ${T_i}$; that they “occupy different columns” and “different rows” of the range and domain of ${T}$. This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities

$\displaystyle T_i^* T_j = 0 \ \ \ \ \ (9)$

and

$\displaystyle T_i T^* j = 0 \ \ \ \ \ (10)$

whenever ${i \neq j}$. (The first identity asserts that the ranges of the ${T_i}$ are orthogonal to each other, and the second asserts that the coranges of the ${T_i}$ (the ranges of the adjoints ${T_i^*}$) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).

The Cotlar-Stein lemma is an extension of this observation to the case where the ${T_i}$ are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have

Lemma 1 (Cotlar-Stein lemma) Let ${T_1,\ldots,T_n: H \rightarrow H'}$ be a finite sequence of bounded linear operators from one Hilbert space ${H}$ to another ${H'}$, obeying the bounds

$\displaystyle \sum_{j=1}^n \| T_i T_j^* \|_{op}^{1/2} \leq M \ \ \ \ \ (11)$

and

$\displaystyle \sum_{j=1}^n \| T_i^* T_j \|_{op}^{1/2} \leq M \ \ \ \ \ (12)$

for all ${i=1,\ldots,n}$ and some ${M > 0}$ (compare with (2), (3)). Then one has

$\displaystyle \| \sum_{i=1}^n T_i \|_{op} \leq M. \ \ \ \ \ (13)$

Note from the basic ${TT^*}$ identity

$\displaystyle \|T\|_{op} = \|TT^* \|_{op}^{1/2} = \|T^* T\|_{op}^{1/2} \ \ \ \ \ (14)$

that the hypothesis (11) (or (12)) already gives the bound

$\displaystyle \|T_i\|_{op} \leq M \ \ \ \ \ (15)$

on each component ${T_i}$ of ${T}$, which by the triangle inequality gives the inferior bound

$\displaystyle \| \sum_{i=1}^n T_i \|_{op} \leq nM;$

the point of the Cotlar-Stein lemma is that the dependence on ${n}$ in this bound is eliminated in (13), which in particular makes the bound suitable for extension to the limit ${n \rightarrow \infty}$ (see Remark 1 below).

The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.

The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators ${T}$ which “do not mix scales together too much”, in that operators ${T}$ map functions “that oscillate at a given scale ${2^{-i}}$” to functions that still mostly oscillate at the same scale ${2^{-i}}$. In that case, one can often split ${T}$ into components ${T_i}$ which essentically capture the scale ${2^{-i}}$ behaviour, and understanding ${L^2}$ boundedness properties of ${T}$ then reduces to establishing the boundedness of the simpler operators ${T_i}$ (and of establishing a sufficient decay in products such as ${T_i^* T_j}$ or ${T_i T_j^*}$ when ${i}$ and ${j}$ are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the ${T_i}$, but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition ${T = \sum_i T_i}$ in a fashion that is tailored to the particular operator ${T}$, and is not necessarily dictated by Fourier-analytic considerations.

Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of ${T}$ is understood through the operator norm of a large power of ${T}$ (or more precisely, of its self-adjoint square ${TT^*}$ or ${T^* T}$). Indeed, from an iteration of (14) we see that for any natural number ${N}$, one has

$\displaystyle \|T\|_{op}^{2N} = \| (TT^*)^N \|_{op}. \ \ \ \ \ (16)$

To estimate the right-hand side, we expand out the right-hand side and apply the triangle inequality to bound it by

$\displaystyle \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^* \|_{op}. \ \ \ \ \ (17)$

Recall that when we applied the triangle inequality directly to ${T}$, we lost a factor of ${n}$ in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.

To bound (17), we use the basic inequality ${\|ST\|_{op} \leq \|S\|_{op} \|T\|_{op}}$ in two different ways. If we group the product ${T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^*}$ in pairs, we can bound the summand of (17) by

$\displaystyle \| T_{i_1} T_{j_1}^* \|_{op} \ldots \| T_{i_N} T_{j_N}^* \|_{op}.$

On the other hand, we can group the product by pairs in another way, to obtain the bound of

$\displaystyle \| T_{i_1} \|_{op} \| T_{j_1}^* T_{i_2} \|_{op} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op} \| T_{j_N}^* \|_{op}.$

We bound ${\| T_{i_1} \|_{op}}$ and ${\| T_{j_N}^* \|_{op}}$ crudely by ${M}$ using (15). Taking the geometric mean of the above bounds, we can thus bound (17) by

$\displaystyle M \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* \|_{op}^{1/2} \| T_{j_1}^* T_{i_2} \|_{op}^{1/2} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op}^{1/2} \| T_{i_N} T_{j_N}^* \|_{op}^{1/2}.$

If we then sum this series first in ${j_N}$, then in ${i_N}$, then moving back all the way to ${i_1}$, using (11) and (12) alternately, we obtain a final bound of

$\displaystyle n M^{2N}$

for (16). Taking ${N^{th}}$ roots, we obtain

$\displaystyle \|T\|_{op} \leq n^{1/2N} M.$

Sending ${N \rightarrow \infty}$, we obtain the claim.

Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums ${\sum_{i=1}^\infty T_i}$ (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any ${f \in H}$, the sum ${\sum_{i=1}^\infty T_i f}$ is unconditionally convergent in ${H'}$ (and furthermore has bounded ${2}$-variation), and the resulting operator ${\sum_{i=1}^\infty T_i}$ is a bounded linear operator with an operator norm bound on ${M}$.

Remark 2 If we specialise to the case where all the ${T_i}$ are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.

Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality

$\displaystyle \|A\|_{op}^{2N} \leq \hbox{tr}( (AA^*)^N )$

(which follows easily from the singular value decomposition), we can bound ${\|A\|_{op}^{2N}}$ by

$\displaystyle \sum_{i_1,\ldots,j_N \in \{1,\ldots,n\}} a_{i_1,j_1} \overline{a_{j_1,i_2}} \ldots a_{i_N,j_N} \overline{a_{j_N,i_1}}.$

Estimating the other two terms in the summand by ${M}$, and then repeatedly summing the indices one at a time as before, we obtain

$\displaystyle \|A\|_{op}^{2N} \leq n M^{2N}$

and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.

Recall that a (real) topological vector space is a real vector space ${V = (V, 0, +, \cdot)}$ equipped with a topology ${{\mathcal F}}$ that makes the vector space operations ${+: V \times V \rightarrow V}$ and ${\cdot: {\bf R} \times V \rightarrow V}$ continuous. One often restricts attention to Hausdorff topological vector spaces; in practice, this is not a severe restriction because it turns out that any topological vector space can be made Hausdorff by quotienting out the closure ${\overline{\{0\}}}$ of the origin ${\{0\}}$. One can also discuss complex topological vector spaces, and the theory is not significantly different; but for sake of exposition we shall restrict attention here to the real case.

An obvious example of a topological vector space is a finite-dimensional vector space such as ${{\bf R}^n}$ with the usual topology. Of course, there are plenty of infinite-dimensional topological vector spaces also, such as infinite-dimensional normed vector spaces (with the strong, weak, or weak-* topologies) or Frechet spaces.

One way to distinguish the finite and infinite dimensional topological vector spaces is via local compactness. Recall that a topological space is locally compact if every point in that space has a compact neighbourhood. From the Heine-Borel theorem, all finite-dimensional vector spaces (with the usual topology) are locally compact. In infinite dimensions, one can trivially make a vector space locally compact by giving it a trivial topology, but once one restricts to the Hausdorff case, it seems impossible to make a space locally compact. For instance, in an infinite-dimensional normed vector space ${V}$ with the strong topology, an iteration of the Riesz lemma shows that the closed unit ball ${B}$ in that space contains an infinite sequence with no convergent subsequence, which (by the Heine-Borel theorem) implies that ${V}$ cannot be locally compact. If one gives ${V}$ the weak-* topology instead, then ${B}$ is now compact by the Banach-Alaoglu theorem, but is no longer a neighbourhood of the identity in this topology. In fact, we have the following result:

Theorem 1 Every locally compact Hausdorff topological vector space is finite-dimensional.

The first proof of this theorem that I am aware of is by André Weil. There is also a related result:

Theorem 2 Every finite-dimensional Hausdorff topological vector space has the usual topology.

As a corollary, every locally compact Hausdorff topological vector space is in fact isomorphic to ${{\bf R}^n}$ with the usual topology for some ${n}$. This can be viewed as a very special case of the theorem of Gleason, which is a key component of the solution to Hilbert’s fifth problem, that a locally compact group ${G}$ with no small subgroups (in the sense that there is a neighbourhood of the identity that contains no non-trivial subgroups) is necessarily isomorphic to a Lie group. Indeed, Theorem 1 is in fact used in the proof of Gleason’s theorem (the rough idea being to first locate a “tangent space” to ${G}$ at the origin, with the tangent vectors described by “one-parameter subgroups” of ${G}$, and show that this space is a locally compact Hausdorff topological space, and hence finite dimensional by Theorem 1).

Theorem 2 may seem devoid of content, but it does contain some subtleties, as it hinges crucially on the joint continuity of the vector space operations ${+: V \times V \rightarrow V}$ and ${\cdot: {\bf R} \times V \rightarrow V}$, and not just on the separate continuity in each coordinate. Consider for instance the one-dimensional vector space ${{\bf R}}$ with the co-compact topology (a non-empty set is open iff its complement is compact in the usual topology). In this topology, the space is ${T_1}$ (though not Hausdorff), the scalar multiplication map ${\cdot: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is jointly continuous as long as the scalar is not zero, and the addition map ${+: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is continuous in each coordinate (i.e. translations are continuous), but not jointly continuous; for instance, the set ${\{ (x,y) \in {\bf R}: x+y \not \in [0,1]\}}$ does not contain a non-trivial Cartesian product of two sets that are open in the co-compact topology. So this is not a counterexample to Theorem 2. Similarly for the cocountable or cofinite topologies on ${{\bf R}}$ (the latter topology, incidentally, is the same as the Zariski topology on ${{\bf R}}$).

Another near-counterexample comes from the topology of ${{\bf R}}$ inherited by pulling back the usual topology on the unit circle ${{\bf R}/{\bf Z}}$. Admittedly, this pullback topology is not quite Hausdorff, but the addition map ${+: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is jointly continuous. On the other hand, the scalar multiplication map ${\cdot: {\bf R} \times {\bf R} \rightarrow {\bf R}}$ is not continuous at all. A slight variant of this topology comes from pulling back the usual topology on the torus ${({\bf R}/{\bf Z})^2}$ under the map ${x \mapsto (x,\alpha x)}$ for some irrational ${\alpha}$; this restores the Hausdorff property, and addition is still jointly continuous, but multiplication remains discontinuous.

As some final examples, consider ${{\bf R}}$ with the discrete topology; here, the topology is Hausdorff, addition is jointly continuous, and every dilation is continuous, but multiplication is not jointly continuous. If one instead gives ${{\bf R}}$ the half-open topology, then again the topology is Hausdorff and addition is jointly continuous, but scalar multiplication is only jointly continuous once one restricts the scalar to be non-negative.

Below the fold, I record the textbook proof of Theorem 2 and Theorem 1. There is nothing particularly original in this presentation, but I wanted to record it here for my own future reference, and perhaps these results will also be of interest to some other readers.

A few days ago, I found myself needing to use the Fredholm alternative in functional analysis:

Theorem 1 (Fredholm alternative) Let ${X}$ be a Banach space, let ${T: X \rightarrow X}$ be a compact operator, and let ${\lambda \in {\bf C}}$ be non-zero. Then exactly one of the following statements hold:

• (Eigenvalue) There is a non-trivial solution ${x \in X}$ to the equation ${Tx = \lambda x}$.
• (Bounded resolvent) The operator ${T-\lambda}$ has a bounded inverse ${(T-\lambda)^{-1}}$ on ${X}$.

Among other things, the Fredholm alternative can be used to establish the spectral theorem for compact operators. A hypothesis such as compactness is necessary; the shift operator ${U}$ on ${\ell^2({\bf Z})}$, for instance, has no eigenfunctions, but ${U-z}$ is not invertible for any unit complex number ${z}$. The claim is also false when ${\lambda=0}$; consider for instance the multiplication operator ${Tf(n) := \frac{1}{n} f(n)}$ on ${\ell^2({\bf N})}$, which is compact and has no eigenvalue at zero, but is not invertible.

It had been a while since I had studied the spectral theory of compact operators, and I found that I could not immediately reconstruct a proof of the Fredholm alternative from first principles. So I set myself the exercise of doing so. I thought that I had managed to establish the alternative in all cases, but as pointed out in comments, my argument is restricted to the case where the compact operator ${T}$ is approximable, which means that it is the limit of finite rank operators in the uniform topology. Many Banach spaces (and in particular, all Hilbert spaces) have the approximation property that implies (by a result of Grothendieck) that all compact operators on that space are almost finite rank. For instance, if ${X}$ is a Hilbert space, then any compact operator is approximable, because any compact set can be approximated by a finite-dimensional subspace, and in a Hilbert space, the orthogonal projection operator to a subspace is always a contraction. (In more general Banach spaces, finite-dimensional subspaces are still complemented, but the operator norm of the projection can be large.) Unfortunately, there are examples of Banach spaces for which the approximation property fails; the first such examples were discovered by Enflo, and a subsequent paper of by Alexander demonstrated the existence of compact operators in certain Banach spaces that are not approximable.

I also found out that this argument was essentially also discovered independently by by MacCluer-Hull and by Uuye. Nevertheless, I am recording this argument here, together with two more traditional proofs of the Fredholm alternative (based on the Riesz lemma and a continuity argument respectively).

One of the most notorious open problems in functional analysis is the invariant subspace problem for Hilbert spaces, which I will state here as a conjecture:

Conjecture 1 (Invariant Subspace Problem, ISP0) Let ${H}$ be an infinite dimensional complex Hilbert space, and let ${T: H \rightarrow H}$ be a bounded linear operator. Then ${H}$ contains a proper closed invariant subspace ${V}$ (thus ${TV \subset V}$).

As stated this conjecture is quite infinitary in nature. Just for fun, I set myself the task of trying to find an equivalent reformulation of this conjecture that only involved finite-dimensional spaces and operators. This turned out to be somewhat difficult, but not entirely impossible, if one adopts a sufficiently generous version of “finitary” (cf. my discussion of how to finitise the infinitary pigeonhole principle). Unfortunately, the finitary formulation that I arrived at ended up being rather complicated (in particular, involving the concept of a “barrier”), and did not obviously suggest a path to resolving the conjecture; but it did at least provide some simpler finitary consequences of the conjecture which might be worth focusing on as subproblems.

I should point out that the arguments here are quite “soft” in nature and are not really addressing the heart of the invariant subspace problem; but I think it is still of interest to observe that this problem is not purely an infinitary problem, and does have some non-trivial finitary consequences.

I am indebted to Henry Towsner for many discussions on this topic.

A (complex, semi-definite) inner product space is a complex vector space ${V}$ equipped with a sesquilinear form ${\langle, \rangle: V \times V \rightarrow {\bf C}}$ which is conjugate symmetric, in the sense that ${\langle w, v \rangle = \overline{\langle v, w \rangle}}$ for all ${v,w \in V}$, and non-negative in the sense that ${\langle v, v \rangle \geq 0}$ for all ${v \in V}$. By inspecting the non-negativity of ${\langle v+\lambda w, v+\lambda w\rangle}$ for complex numbers ${\lambda \in {\bf C}}$, one obtains the Cauchy-Schwarz inequality

$\displaystyle |\langle v, w \rangle| \leq |\langle v, v \rangle|^{1/2} |\langle w, w \rangle|^{1/2};$

if one then defines ${\|v\| := |\langle v, v \rangle|^{1/2}}$, one then quickly concludes the triangle inequality

$\displaystyle \|v + w \| \leq \|v\| + \|w\|$

which then soon implies that ${\| \|}$ is a semi-norm on ${V}$. If we make the additional assumption that the inner product ${\langle,\rangle}$ is positive definite, i.e. that ${\langle v, v \rangle > 0}$ whenever ${v}$ is non-zero, then this semi-norm becomes a norm. If ${V}$ is complete with respect to the metric ${d(v,w) := \|v-w\|}$ induced by this norm, then ${V}$ is called a Hilbert space.

The above material is extremely standard, and can be found in any graduate real analysis course; I myself covered it here. But what is perhaps less well known (except inside the fields of additive combinatorics and ergodic theory) is that the above theory of classical Hilbert spaces is just the first case of a hierarchy of higher order Hilbert spaces, in which the binary inner product ${f, g \mapsto \langle f, g \rangle}$ is replaced with a ${2^d}$-ary inner product ${(f_\omega)_{\omega \in \{0,1\}^d} \mapsto \langle (f_\omega)_{\omega \in \{0,1\}^d}}$ that obeys an appropriate generalisation of the conjugate symmetry, sesquilinearity, and positive semi-definiteness axioms. Such inner products then obey a higher order Cauchy-Schwarz inequality, known as the Cauchy-Schwarz-Gowers inequality, and then also obey a triangle inequality and become semi-norms (or norms, if the inner product was non-degenerate). Examples of such norms and spaces include the Gowers uniformity norms ${\| \|_{U^d(G)}}$, the Gowers box norms ${\| \|_{\Box^d(X_1 \times \ldots \times X_d)}}$, and the Gowers-Host-Kra seminorms ${\| \|_{U^d(X)}}$; a more elementary example are the family of Lebesgue spaces ${L^{2^d}(X)}$ when the exponent is a power of two. They play a central role in modern additive combinatorics and to certain aspects of ergodic theory, particularly those relating to Szemerédi’s theorem (or its ergodic counterpart, the Furstenberg multiple recurrence theorem); they also arise in the regularity theory of hypergraphs (which is not unrelated to the other two topics).

A simple example to keep in mind here is the order two Hilbert space ${L^4(X)}$ on a measure space ${X = (X,{\mathcal B},\mu)}$, where the inner product takes the form

$\displaystyle \langle f_{00}, f_{01}, f_{10}, f_{11} \rangle_{L^4(X)} := \int_X f_{00}(x) \overline{f_{01}(x)} \overline{f_{10}(x)} f_{11}(x)\ d\mu(x).$

In this brief note I would like to set out the abstract theory of such higher order Hilbert spaces. This is not new material, being already implicit in the breakthrough papers of Gowers and Host-Kra, but I just wanted to emphasise the fact that the material is abstract, and is not particularly tied to any explicit choice of norm so long as a certain axiom are satisfied. (Also, I wanted to write things down so that I would not have to reconstruct this formalism again in the future.) Unfortunately, the notation is quite heavy and the abstract axiom is a little strange; it may be that there is a better way to formulate things. In this particular case it does seem that a concrete approach is significantly clearer, but abstraction is at least possible.

Note: the discussion below is likely to be comprehensible only to readers who already have some exposure to the Gowers norms.

In harmonic analysis and PDE, one often wants to place a function $f: {\bf R}^d \to {\bf C}$ on some domain (let’s take a Euclidean space ${\bf R}^d$ for simplicity) in one or more function spaces in order to quantify its “size” in some sense.  Examples include

• The Lebesgue spaces $L^p$ of functions $f$ whose norm $\|f\|_{L^p} := (\int_{{\bf R}^d} |f|^p)^{1/p}$ is finite, as well as their relatives such as the weak $L^p$ spaces $L^{p,\infty}$ (and more generally the Lorentz spaces $L^{p,q}$) and Orlicz spaces such as $L \log L$ and $e^L$;
• The classical regularity spaces $C^k$, together with their Hölder continuous counterparts $C^{k,\alpha}$;
• The Sobolev spaces $W^{s,p}$ of functions $f$ whose norm $\|f\|_{W^{s,p}} = \|f\|_{L^p} + \| |\nabla|^s f\|_{L^p}$ is finite (other equivalent definitions of this norm exist, and there are technicalities if $s$ is negative or $p \not \in (1,\infty)$), as well as relatives such as homogeneous Sobolev spaces $\dot W^{s,p}$, Besov spaces $B^{s,p}_q$, and Triebel-Lizorkin spaces $F^{s,p}_q$.  (The conventions for the superscripts and subscripts here are highly variable.)
• Hardy spaces ${\mathcal H}^p$, the space BMO of functions of bounded mean oscillation (and the subspace VMO of functions of vanishing mean oscillation);
• The Wiener algebra $A$;
• Morrey spaces $M^p_q$;
• The space $M$ of finite measures;
• etc., etc.

As the above partial list indicates, there is an entire zoo of function spaces one could consider, and it can be difficult at first to see how they are organised with respect to each other.  However, one can get some clarity in this regard by drawing a type diagram for the function spaces one is trying to study.  A type diagram assigns a tuple (usually a pair) of relevant exponents to each function space.  For function spaces $X$ on Euclidean space, two such exponents are the regularity $s$ of the space, and the integrability $p$ of the space.  These two quantities are somewhat fuzzy in nature (and are not easily defined for all possible function spaces), but can basically be described as follows.  We test the function space norm $\|f\|_X$ of a modulated rescaled bump function

$f(x) := A e^{i x \cdot \xi} \phi( \frac{x-x_0}{R} )$ (1)

where $A > 0$ is an amplitude, $R > 0$ is a radius, $\phi \in C^\infty_c({\bf R}^d)$ is a test function, $x_0$ is a position, and $\xi \in {\bf R}^d$ is a frequency of some magnitude $|\xi| \sim N$.  One then studies how the norm $\|f\|_X$ depends on the parameters $A, R, N$.  Typically, one has a relationship of the form

$\|f\|_X \sim A N^s R^{d/p}$ (2)

for some exponents $s, p$, at least in the high-frequency case when $N$ is large (in particular, from the uncertainty principle it is natural to require $N \gtrsim 1/R$, and when dealing with inhomogeneous norms it is also natural to require $N \gtrsim 1$).  The exponent $s$ measures how sensitive the $X$ norm is to oscillation, and thus controls regularity; if $s$ is large, then oscillating functions will have large $X$ norm, and thus functions in $X$ will tend not to oscillate too much and thus be smooth.    Similarly, the exponent $p$ measures how sensitive the $X$ norm is to the function $f$ spreading out to large scales; if $p$ is small, then slowly decaying functions will have large norm, so that functions in $X$ tend to decay quickly; conversely, if $p$ is large, then singular functions will tend to have large norm, so that functions in $X$ will tend to not have high peaks.

Note that the exponent $s$ in (2) could be positive, zero, or negative, however the exponent $p$ should be non-negative, since intuitively enlarging $R$ should always lead to a larger (or at least comparable) norm.  Finally, the exponent in the $A$ parameter should always be $1$, since norms are by definition homogeneous.  Note also that the position $x_0$ plays no role in (1); this reflects the fact that most of the popular function spaces in analysis are translation-invariant.

The type diagram below plots the $s, 1/p$ indices of various spaces.  The black dots indicate those spaces for which the $s, 1/p$ indices are fixed; the blue dots are those spaces for which at least one of the $s, 1/p$ indices are variable (and so, depending on the value chosen for these parameters, these spaces may end up in a different location on the type diagram than the typical location indicated here).

(There are some minor cheats in this diagram, for instance for the Orlicz spaces $L \log L$ and $e^L$ one has to adjust (1) by a logarithmic factor.   Also, the norms for the Schwartz space ${\mathcal S}$ are not translation-invariant and thus not perfectly describable by this formalism. This picture should be viewed as a visual aid only, and not as a genuinely rigorous mathematical statement.)

The type diagram can be used to clarify some of the relationships between function spaces, such as Sobolev embedding.  For instance, when working with inhomogeneous spaces (which basically identifies low frequencies $N \ll 1$ with medium frequencies $N \sim 1$, so that one is effectively always in the regime $N \gtrsim 1$), then decreasing the $s$ parameter results in decreasing the right-hand side of (1).  Thus, one expects the function space norms to get smaller (and the function spaces to get larger) if one decreases $s$ while keeping $p$ fixed.  Thus, for instance, $W^{k,p}$ should be contained in $W^{k-1,p}$, and so forth.  Note however that this inclusion is not available for homogeneous function spaces such as $\dot W^{k,p}$, in which the frequency parameter $N$ can be either much larger than $1$ or much smaller than $1$.

Similarly, if one is working in a compact domain rather than in ${\bf R}^d$, then one has effectively capped the radius parameter $R$ to be bounded, and so we expect the function space norms to get smaller (and the function spaces to get larger) as one increases $1/p$, thus for instance $L^2$ will be contained in $L^1$.  Conversely, if one is working in a discrete domain such as ${\Bbb Z}^d$, then the radius parameter $R$ has now effectively been bounded from below, and the reverse should occur: the function spaces should get larger as one decreases $1/p$.  (If the domain is both compact and discrete, then it is finite, and on a finite-dimensional space all norms are equivalent.)

As mentioned earlier, the uncertainty principle suggests that one has the restriction $N \gtrsim 1/R$.  From this and (2), we expect to be able to enlarge the function space by trading in the regularity parameter $s$ for the integrability parameter $p$, keeping the dimensional quantity $d/p - s$ fixed.  This is indeed how Sobolev embedding works.   Note in some cases one runs out of regularity before p goes all the way to infinity (thus ending up at an $L^p$ space), while in other cases p hits infinity first.  In the latter case, one can embed the Sobolev space into a Holder space such as $C^{k,\alpha}$.

On continuous domains, one can send the frequency $N$ off to infinity, keeping the amplitude $A$ and radius $R$ fixed.  From this and (1) we see that norms with a lower regularity $s$ can never hope to control norms with a higher regularity $s' > s$, no matter what one does with the integrability parameter.   Note however that in discrete settings this obstruction disappears; when working on, say, ${\bf Z}^d$, then in fact one can gain as much regularity as one wishes for free, and there is no distinction between a Lebesgue space $\ell^p$ and their Sobolev counterparts $W^{k,p}$ in such a setting.

When interpolating between two spaces (using either the real or complex interpolation method), the interpolated space usually has regularity and integrability exponents on the line segment between the corresponding exponents of the endpoint spaces.  (This can be heuristically justified from the formula (2) by thinking about how the real or complex interpolation methods actually work.)  Typically, one can control the norm of the interpolated space by the geometric mean of the endpoint norms that is indicated by this line segment; again, this is plausible from looking at (2).

The space $L^2$ is self-dual.  More generally, the dual of a function space $X$ will generally have type exponents that are the reflection of the original exponents around the $L^2$ origin.  Consider for instance the dual spaces $H^s, H^{-s}$ or ${\mathcal H}^1, BMO$ in the above diagram.

Spaces whose integrability exponent $p$ is larger than 1 (i.e. which lie to the left of the dotted line) tend to be Banach spaces, while spaces whose integrability exponent is less than 1 are almost never Banach spaces.  (This can be justified by covering a large ball into small balls and considering how (1) would interact with the triangle inequality in this case).  The case $p=1$ is borderline; some spaces at this level of integrability, such as $L^1$, are Banach spaces, while other spaces, such as $L^{1,\infty}$, are not.

While the regularity $s$ and integrability $p$ are usually the most important exponents in a function space (because amplitude, width, and frequency are usually the most important features of a function in analysis), they do not tell the entire story.  One major reason for this is that the modulated bump functions (1), while an important class of test examples of functions, are by no means the only functions that one would wish to study.  For instance, one could also consider sums of bump functions (1) at different scales.  The behaviour of the function space norms on such spaces is often controlled by secondary exponents, such as the second exponent $q$ that arises in Lorentz spaces, Besov spaces, or Triebel-Lizorkin spaces.  For instance, consider the function

$f_M(x) := \sum_{m=1}^M 2^{-md} \phi(x/2^m)$, (3)

where $M$ is a large integer, representing the number of distinct scales present in $f_M$.  Any function space with regularity $s=0$ and $p=1$ should assign each summand $2^{-md} \phi(x/2^m)$ in (3) a norm of O(1), so the norm of $f_M$ could be as large as $O(M)$ if one assumes the triangle inequality.  This is indeed the case for the $L^1$ norm, but for the weak $L^1$ norm, i.e. the $L^{1,\infty}$ norm,  $f_M$ only has size $O(1)$.  More generally, for the Lorentz spaces $L^{1,q}$, $f_M$ will have a norm of about $O(M^{1/q})$.   Thus we see that such secondary exponents can influence the norm of a function by an amount which is polynomial in the number of scales.  In many applications, though, the number of scales is a “logarithmic” quantity and thus of lower order interest when compared against the “polynomial” exponents such as $s$ and $p$.  So the fine distinctions between, say, strong $L^1$ and weak $L^1$, are only of interest in “critical” situations in which one cannot afford to lose any logarithmic factors (this is for instance the case in much of Calderon-Zygmund theory).

We have cheated somewhat by only working in the high frequency regime.  When dealing with inhomogeneous spaces, one often has a different set of exponents for (1) in the low-frequency regime than in the high-frequency regime.  In such cases, one sometimes has to use a more complicated type diagram to  genuinely model the situation, e.g. by assigning to each space a convex set of type exponents rather than a single exponent, or perhaps having two separate type diagrams, one for the high frequency regime and one for the low frequency regime.   Such diagrams can get quite complicated, and will probably not be much use to a beginner in the subject, though in the hands of an expert who knows what he or she is doing, they can still be an effective visual aid.

This is a technical post inspired by separate conversations with Jim Colliander and with Soonsik Kwon on the relationship between two techniques used to control non-radiating solutions to dispersive nonlinear equations, namely the “double Duhamel trick” and the “in/out decomposition”. See for instance these lecture notes of Killip and Visan for a survey of these two techniques and other related methods in the subject. (I should caution that this post is likely to be unintelligible to anyone not already working in this area.)

For sake of discussion we shall focus on solutions to a nonlinear Schrödinger equation

$\displaystyle iu_t + \Delta u = F(u)$

and we will not concern ourselves with the specific regularity of the solution ${u}$, or the specific properties of the nonlinearity ${F}$ here. We will also not address the issue of how to justify the formal computations being performed here.

Solutions to this equation enjoy the forward Duhamel formula

$\displaystyle u(t) = e^{i(t-t_0)\Delta} u(t_0) - i \int_{t_0}^t e^{i(t-t')\Delta} F(u(t'))\ dt'$

for times ${t}$ to the future of ${t_0}$ in the lifespan of the solution, as well as the backward Duhamel formula

$\displaystyle u(t) = e^{i(t-t_1)\Delta} u(t_1) + i \int_t^{t_1} e^{i(t-t')\Delta} F(u(t'))\ dt'$

for all times ${t}$ to the past of ${t_1}$ in the lifespan of the solution. The first formula asserts that the solution at a given time is determined by the initial state and by the immediate past, while the second formula is the time reversal of the first, asserting that the solution at a given time is determined by the final state and the immediate future. These basic causal formulae are the foundation of the local theory of these equations, and in particular play an instrumental role in establishing local well-posedness for these equations. In this local theory, the main philosophy is to treat the homogeneous (or linear) term ${e^{i(t-t_0)\Delta} u(t_0)}$ or ${e^{i(t-t_1)\Delta} u(t_1)}$ as the main term, and the inhomogeneous (or nonlinear, or forcing) integral term as an error term.

The situation is reversed when one turns to the global theory, and looks at the asymptotic behaviour of a solution as one approaches a limiting time ${T}$ (which can be infinite if one has global existence, or finite if one has finite time blowup). After a suitable rescaling, the linear portion of the solution often disappears from view, leaving one with an asymptotic blowup profile solution which is non-radiating in the sense that the linear components of the Duhamel formulae vanish, thus

$\displaystyle u(t) = - i \int_{t_0}^t e^{i(t-t')\Delta} F(u(t'))\ dt' \ \ \ \ \ (1)$

and

$\displaystyle u(t) = i \int_t^{t_1} e^{i(t-t')\Delta} F(u(t'))\ dt' \ \ \ \ \ (2)$

where ${t_0, t_1}$ are the endpoint times of existence. (This type of situation comes up for instance in the Kenig-Merle approach to critical regularity problems, by reducing to a minimal blowup solution which is almost periodic modulo symmetries, and hence non-radiating.) These types of non-radiating solutions are propelled solely by their own nonlinear self-interactions from the immediate past or immediate future; they are generalisations of “nonlinear bound states” such as solitons.

A key task is then to somehow combine the forward representation (1) and the backward representation (2) to obtain new information on ${u(t)}$ itself, that cannot be obtained from either representation alone; it seems that the immediate past and immediate future can collectively exert more control on the present than they each do separately. This type of problem can be abstracted as follows. Let ${\|u(t)\|_{Y_+}}$ be the infimal value of ${\|F_+\|_N}$ over all forward representations of ${u(t)}$ of the form

$\displaystyle u(t) = \int_{t_0}^t e^{i(t-t')\Delta} F_+(t') \ dt' \ \ \ \ \ (3)$

where ${N}$ is some suitable spacetime norm (e.g. a Strichartz-type norm), and similarly let ${\|u(t)\|_{Y_-}}$ be the infimal value of ${\|F_-\|_N}$ over all backward representations of ${u(t)}$ of the form

$\displaystyle u(t) = \int_{t}^{t_1} e^{i(t-t')\Delta} F_-(t') \ dt'. \ \ \ \ \ (4)$

Typically, one already has (or is willing to assume as a bootstrap hypothesis) control on ${F(u)}$ in the norm ${N}$, which gives control of ${u(t)}$ in the norms ${Y_+, Y_-}$. The task is then to use the control of both the ${Y_+}$ and ${Y_-}$ norm of ${u(t)}$ to gain control of ${u(t)}$ in a more conventional Hilbert space norm ${X}$, which is typically a Sobolev space such as ${H^s}$ or ${L^2}$.

One can use some classical functional analysis to clarify this situation. By the closed graph theorem, the above task is (morally, at least) equivalent to establishing an a priori bound of the form

$\displaystyle \| u \|_X \lesssim \|u\|_{Y_+} + \|u\|_{Y_-} \ \ \ \ \ (5)$

for all reasonable ${u}$ (e.g. test functions). The double Duhamel trick accomplishes this by establishing the stronger estimate

$\displaystyle |\langle u, v \rangle_X| \lesssim \|u\|_{Y_+} \|v\|_{Y_-} \ \ \ \ \ (6)$

for all reasonable ${u, v}$; note that setting ${u=v}$ and applying the arithmetic-geometric inequality then gives (5). The point is that if ${u}$ has a forward representation (3) and ${v}$ has a backward representation (4), then the inner product ${\langle u, v \rangle_X}$ can (formally, at least) be expanded as a double integral

$\displaystyle \int_{t_0}^t \int_{t}^{t_1} \langle e^{i(t''-t')\Delta} F_+(t'), e^{i(t''-t')\Delta} F_-(t') \rangle_X\ dt'' dt'.$

The dispersive nature of the linear Schrödinger equation often causes ${\langle e^{i(t''-t')\Delta} F_+(t'), e^{i(t''-t')\Delta} F_-(t') \rangle_X}$ to decay, especially in high dimensions. In high enough dimension (typically one needs five or higher dimensions, unless one already has some spacetime control on the solution), the decay is stronger than ${1/|t'-t''|^2}$, so that the integrand becomes absolutely integrable and one recovers (6).

Unfortunately it appears that estimates of the form (6) fail in low dimensions (for the type of norms ${N}$ that actually show up in applications); there is just too much interaction between past and future to hope for any reasonable control of this inner product. But one can try to obtain (5) by other means. By the Hahn-Banach theorem (and ignoring various issues related to reflexivity), (5) is equivalent to the assertion that every ${u \in X}$ can be decomposed as ${u = u_+ + u_-}$, where ${\|u_+\|_{Y_+^*} \lesssim \|u\|_X}$ and ${\|u_-\|_{Y_-^*} \lesssim \|v\|_X}$. Indeed once one has such a decomposition, one obtains (5) by computing the inner product of ${u}$ with ${u=u_++u_-}$ in ${X}$ in two different ways. One can also (morally at least) write ${\|u_+\|_{Y_+^*}}$ as ${\| e^{i(\cdot-t)\Delta} u_+\|_{N^*([t_0,t])}}$ and similarly write ${\|u_-\|_{Y_-^*}}$ as ${\| e^{i(\cdot-t)\Delta} u_-\|_{N^*([t,t_1])}}$

So one can dualise the task of proving (5) as that of obtaining a decomposition of an arbitrary initial state ${u}$ into two components ${u_+}$ and ${u_-}$, where the former disperses into the past and the latter disperses into the future under the linear evolution. We do not know how to achieve this type of task efficiently in general – and doing so would likely lead to a significant advance in the subject (perhaps one of the main areas in this topic where serious harmonic analysis is likely to play a major role). But in the model case of spherically symmetric data ${u}$, one can perform such a decomposition quite easily: one uses microlocal projections to set ${u_+}$ to be the “inward” pointing component of ${u}$, which propagates towards the origin in the future and away from the origin in the past, and ${u_-}$ to simimlarly be the “outward” component of ${u}$. As spherical symmetry significantly dilutes the amplitude of the solution (and hence the strength of the nonlinearity) away from the origin, this decomposition tends to work quite well for applications, and is one of the main reasons (though not the only one) why we have a global theory for low-dimensional nonlinear Schrödinger equations in the radial case, but not in general.

The in/out decomposition is a linear one, but the Hahn-Banach argument gives no reason why the decomposition needs to be linear. (Note that other well-known decompositions in analysis, such as the Fefferman-Stein decomposition of BMO, are necessarily nonlinear, a fact which is ultimately equivalent to the non-complemented nature of a certain subspace of a Banach space; see these lecture notes of mine and this old blog post for some discussion.) So one could imagine a sophisticated nonlinear decomposition as a general substitute for the in/out decomposition. See for instance this paper of Bourgain and Brezis for some of the subtleties of decomposition even in very classical function spaces such as ${H^{1/2}(R)}$. Alternatively, there may well be a third way to obtain estimates of the form (5) that do not require either decomposition or the double Duhamel trick; such a method may well clarify the relative relationship between past, present, and future for critical nonlinear dispersive equations, which seems to be a key aspect of the theory that is still only partially understood. (In particular, it seems that one needs a fairly strong decoupling of the present from both the past and the future to get the sort of elliptic-like regularity results that allow us to make further progress with such equations.)

As discussed in previous notes, a function space norm can be viewed as a means to rigorously quantify various statistics of a function ${f: X \rightarrow {\bf C}}$. For instance, the “height” and “width” can be quantified via the ${L^p(X,\mu)}$ norms (and their relatives, such as the Lorentz norms ${\|f\|_{L^{p,q}(X,\mu)}}$). Indeed, if ${f}$ is a step function ${f = A 1_E}$, then the ${L^p}$ norm of ${f}$ is a combination ${\|f\|_{L^p(X,\mu)} = |A| \mu(E)^{1/p}}$ of the height (or amplitude) ${A}$ and the width ${\mu(E)}$.

However, there are more features of a function ${f}$ of interest than just its width and height. When the domain ${X}$ is a Euclidean space ${{\bf R}^d}$ (or domains related to Euclidean spaces, such as open subsets of ${{\bf R}^d}$, or manifolds), then another important feature of such functions (especially in PDE) is the regularity of a function, as well as the related concept of the frequency scale of a function. These terms are not rigorously defined; but roughly speaking, regularity measures how smooth a function is (or how many times one can differentiate the function before it ceases to be a function), while the frequency scale of a function measures how quickly the function oscillates (and would be inversely proportional to the wavelength). One can illustrate this informal concept with some examples:

• Let ${\phi \in C^\infty_c({\bf R})}$ be a test function that equals ${1}$ near the origin, and ${N}$ be a large number. Then the function ${f(x) := \phi(x) \sin(Nx)}$ oscillates at a wavelength of about ${1/N}$, and a frequency scale of about ${N}$. While ${f}$ is, strictly speaking, a smooth function, it becomes increasingly less smooth in the limit ${N \rightarrow \infty}$; for instance, the derivative ${f'(x) = \phi'(x) \sin(Nx) + N \phi(x) \cos(Nx)}$ grows at a roughly linear rate as ${N \rightarrow \infty}$, and the higher derivatives grow at even faster rates. So this function does not really have any regularity in the limit ${N \rightarrow \infty}$. Note however that the height and width of this function is bounded uniformly in ${N}$; so regularity and frequency scale are independent of height and width.
• Continuing the previous example, now consider the function ${g(x) := N^{-s} \phi(x) \sin(Nx)}$, where ${s \geq 0}$ is some parameter. This function also has a frequency scale of about ${N}$. But now it has a certain amount of regularity, even in the limit ${N \rightarrow \infty}$; indeed, one easily checks that the ${k^{th}}$ derivative of ${g}$ stays bounded in ${N}$ as long as ${k \leq s}$. So one could view this function as having “${s}$ degrees of regularity” in the limit ${N \rightarrow \infty}$.
• In a similar vein, the function ${N^{-s} \phi(Nx)}$ also has a frequency scale of about ${N}$, and can be viewed as having ${s}$ degrees of regularity in the limit ${N \rightarrow \infty}$.
• The function ${\phi(x) |x|^s 1_{x > 0}}$ also has about ${s}$ degrees of regularity, in the sense that it can be differentiated up to ${s}$ times before becoming unbounded. By performing a dyadic decomposition of the ${x}$ variable, one can also decompose this function into components ${\psi(2^n x) |x|^s}$ for ${n \geq 0}$, where ${\psi(x) := (\phi(x)-\phi(2x)) 1_{x>0}}$ is a bump function supported away from the origin; each such component has frequency scale about ${2^n}$ and ${s}$ degrees of regularity. Thus we see that the original function ${\phi(x) |x|^s 1_{x > 0}}$ has a range of frequency scales, ranging from about ${1}$ all the way to ${+\infty}$.
• One can of course concoct higher-dimensional analogues of these examples. For instance, the localised plane wave ${\phi(x) \sin(\xi \cdot x)}$ in ${{\bf R}^d}$, where ${\phi \in C^\infty_c({\bf R}^d)}$ is a test function, would have a frequency scale of about ${|\xi|}$.

There are a variety of function space norms that can be used to capture frequency scale (or regularity) in addition to height and width. The most common and well-known examples of such spaces are the Sobolev space norms ${\| f\|_{W^{s,p}({\bf R}^d)}}$, although there are a number of other norms with similar features (such as Hölder norms, Besov norms, and Triebel-Lizorkin norms). Very roughly speaking, the ${W^{s,p}}$ norm is like the ${L^p}$ norm, but with “${s}$ additional degrees of regularity”. For instance, in one dimension, the function ${A \phi(x/R) \sin(Nx)}$, where ${\phi}$ is a fixed test function and ${R, N}$ are large, will have a ${W^{s,p}}$ norm of about ${|A| R^{1/p} N^s}$, thus combining the “height” ${|A|}$, the “width” ${R}$, and the “frequency scale” ${N}$ of this function together. (Compare this with the ${L^p}$ norm of the same function, which is about ${|A| R^{1/p}}$.)

To a large extent, the theory of the Sobolev spaces ${W^{s,p}({\bf R}^d)}$ resembles their Lebesgue counterparts ${L^p({\bf R}^d)}$ (which are as the special case of Sobolev spaces when ${s=0}$), but with the additional benefit of being able to interact very nicely with (weak) derivatives: a first derivative ${\frac{\partial f}{\partial x_j}}$ of a function in an ${L^p}$ space usually leaves all Lebesgue spaces, but a first derivative of a function in the Sobolev space ${W^{s,p}}$ will end up in another Sobolev space ${W^{s-1,p}}$. This compatibility with the differentiation operation begins to explain why Sobolev spaces are so useful in the theory of partial differential equations. Furthermore, the regularity parameter ${s}$ in Sobolev spaces is not restricted to be a natural number; it can be any real number, and one can use fractional derivative or integration operators to move from one regularity to another. Despite the fact that most partial differential equations involve differential operators of integer order, fractional spaces are still of importance; for instance it often turns out that the Sobolev spaces which are critical (scale-invariant) for a certain PDE are of fractional order.

The uncertainty principle in Fourier analysis places a constraint between the width and frequency scale of a function; roughly speaking (and in one dimension for simplicity), the product of the two quantities has to be bounded away from zero (or to put it another way, a wave is always at least as wide as its wavelength). This constraint can be quantified as the very useful Sobolev embedding theorem, which allows one to trade regularity for integrability: a function in a Sobolev space ${W^{s,p}}$ will automatically lie in a number of other Sobolev spaces ${W^{\tilde s,\tilde p}}$ with ${\tilde s < s}$ and ${\tilde p > p}$; in particular, one can often embed Sobolev spaces into Lebesgue spaces. The trade is not reversible: one cannot start with a function with a lot of integrability and no regularity, and expect to recover regularity in a space of lower integrability. (One can already see this with the most basic example of Sobolev embedding, coming from the fundamental theorem of calculus. If a (continuously differentiable) function ${f: {\bf R} \rightarrow {\bf R}}$ has ${f'}$ in ${L^1({\bf R})}$, then we of course have ${f \in L^\infty({\bf R})}$; but the converse is far from true.)

Plancherel’s theorem reveals that Fourier-analytic tools are particularly powerful when applied to ${L^2}$ spaces. Because of this, the Fourier transform is very effective at dealing with the ${L^2}$-based Sobolev spaces ${W^{s,2}({\bf R}^d)}$, often abbreviated ${H^s({\bf R}^d)}$. Indeed, using the fact that the Fourier transform converts regularity to decay, we will see that the ${H^s({\bf R}^d)}$ spaces are nothing more than Fourier transforms of weighted ${L^2}$ spaces, and in particular enjoy a Hilbert space structure. These Sobolev spaces, and in particular the energy space ${H^1({\bf R}^d)}$, are of particular importance in any PDE that involves some sort of energy functional (this includes large classes of elliptic, parabolic, dispersive, and wave equations, and especially those equations connected to physics and/or geometry).

We will not fully develop the theory of Sobolev spaces here, as this would require the theory of singular integrals, which is beyond the scope of this course. There are of course many references for further reading; one is Stein’s “Singular integrals and differentiability properties of functions“.

In set theory, a function ${f: X \rightarrow Y}$ is defined as an object that evaluates every input ${x}$ to exactly one output ${f(x)}$. However, in various branches of mathematics, it has become convenient to generalise this classical concept of a function to a more abstract one. For instance, in operator algebras, quantum mechanics, or non-commutative geometry, one often replaces commutative algebras of (real or complex-valued) functions on some space ${X}$, such as ${C(X)}$ or ${L^\infty(X)}$, with a more general – and possibly non-commutative – algebra (e.g. a ${C^*}$-algebra or a von Neumann algebra). Elements in this more abstract algebra are no longer definable as functions in the classical sense of assigning a single value ${f(x)}$ to every point ${x \in X}$, but one can still define other operations on these “generalised functions” (e.g. one can multiply or take inner products between two such objects).

Generalisations of functions are also very useful in analysis. In our study of ${L^p}$ spaces, we have already seen one such generalisation, namely the concept of a function defined up to almost everywhere equivalence. Such a function ${f}$ (or more precisely, an equivalence class of classical functions) cannot be evaluated at any given point ${x}$, if that point has measure zero. However, it is still possible to perform algebraic operations on such functions (e.g. multiplying or adding two functions together), and one can also integrate such functions on measurable sets (provided, of course, that the function has some suitable integrability condition). We also know that the ${L^p}$ spaces can usually be described via duality, as the dual space of ${L^{p'}}$ (except in some endpoint cases, namely when ${p=\infty}$, or when ${p=1}$ and the underlying space is not ${\sigma}$-finite).

We have also seen (via the Lebesgue-Radon-Nikodym theorem) that locally integrable functions ${f \in L^1_{\hbox{loc}}({\bf R})}$ on, say, the real line ${{\bf R}}$, can be identified with locally finite absolutely continuous measures ${m_f}$ on the line, by multiplying Lebesgue measure ${m}$ by the function ${f}$. So another way to generalise the concept of a function is to consider arbitrary locally finite Radon measures ${\mu}$ (not necessarily absolutely continuous), such as the Dirac measure ${\delta_0}$. With this concept of “generalised function”, one can still add and subtract two measures ${\mu, \nu}$, and integrate any measure ${\mu}$ against a (bounded) measurable set ${E}$ to obtain a number ${\mu(E)}$, but one cannot evaluate a measure ${\mu}$ (or more precisely, the Radon-Nikodym derivative ${d\mu/dm}$ of that measure) at a single point ${x}$, and one also cannot multiply two measures together to obtain another measure. From the Riesz representation theorem, we also know that the space of (finite) Radon measures can be described via duality, as linear functionals on ${C_c({\bf R})}$.

There is an even larger class of generalised functions that is very useful, particularly in linear PDE, namely the space of distributions, say on a Euclidean space ${{\bf R}^d}$. In contrast to Radon measures ${\mu}$, which can be defined by how they “pair up” against continuous, compactly supported test functions ${f \in C_c({\bf R}^d)}$ to create numbers ${\langle f, \mu \rangle := \int_{{\bf R}^d} f\ d\overline{\mu}}$, a distribution ${\lambda}$ is defined by how it pairs up against a smooth compactly supported function ${f \in C^\infty_c({\bf R}^d)}$ to create a number ${\langle f, \lambda \rangle}$. As the space ${C^\infty_c({\bf R}^d)}$ of smooth compactly supported functions is smaller than (but dense in) the space ${C_c({\bf R}^d)}$ of continuous compactly supported functions (and has a stronger topology), the space of distributions is larger than that of measures. But the space ${C^\infty_c({\bf R}^d)}$ is closed under more operations than ${C_c({\bf R}^d)}$, and in particular is closed under differential operators (with smooth coefficients). Because of this, the space of distributions is similarly closed under such operations; in particular, one can differentiate a distribution and get another distribution, which is something that is not always possible with measures or ${L^p}$ functions. But as measures or functions can be interpreted as distributions, this leads to the notion of a weak derivative for such objects, which makes sense (but only as a distribution) even for functions that are not classically differentiable. Thus the theory of distributions can allow one to rigorously manipulate rough functions “as if” they were smooth, although one must still be careful as some operations on distributions are not well-defined, most notably the operation of multiplying two distributions together. Nevertheless one can use this theory to justify many formal computations involving derivatives, integrals, etc. (including several computations used routinely in physics) that would be difficult to formalise rigorously in a purely classical framework.

If one shrinks the space of distributions slightly, to the space of tempered distributions (which is formed by enlarging dual class ${C^\infty_c({\bf R}^d)}$ to the Schwartz class ${{\mathcal S}({\bf R}^d)}$), then one obtains closure under another important operation, namely the Fourier transform. This allows one to define various Fourier-analytic operations (e.g. pseudodifferential operators) on such distributions.

Of course, at the end of the day, one is usually not all that interested in distributions in their own right, but would like to be able to use them as a tool to study more classical objects, such as smooth functions. Fortunately, one can recover facts about smooth functions from facts about the (far rougher) space of distributions in a number of ways. For instance, if one convolves a distribution with a smooth, compactly supported function, one gets back a smooth function. This is a particularly useful fact in the theory of constant-coefficient linear partial differential equations such as ${Lu=f}$, as it allows one to recover a smooth solution ${u}$ from smooth, compactly supported data ${f}$ by convolving ${f}$ with a specific distribution ${G}$, known as the fundamental solution of ${L}$. We will give some examples of this later in these notes.

It is this unusual and useful combination of both being able to pass from classical functions to generalised functions (e.g. by differentiation) and then back from generalised functions to classical functions (e.g. by convolution) that sets the theory of distributions apart from other competing theories of generalised functions, in particular allowing one to justify many formal calculations in PDE and Fourier analysis rigorously with relatively little additional effort. On the other hand, being defined by linear duality, the theory of distributions becomes somewhat less useful when one moves to more nonlinear problems, such as nonlinear PDE. However, they still serve an important supporting role in such problems as a “ambient space” of functions, inside of which one carves out more useful function spaces, such as Sobolev spaces, which we will discuss in the next set of notes.

Recently, I have been studying the concept of amenability on groups. This concept can be defined in a “combinatorial” or “finitary” fashion, using Følner sequences, and also in a more “functional-analytic” or “infinitary”‘ fashion, using invariant means. I wanted to get some practice passing back and forth between these two definitions, so I wrote down some notes on how to do this, and also how to take some facts about amenability that are usually proven in one setting, and prove them instead in the other. These notes are thus mostly for my own benefit, but I thought I might post them here also, in case anyone else is interested.