You are currently browsing the category archive for the ‘math.CA’ category.

The classical inverse function theorem reads as follows:

Theorem 1 ({C^1} inverse function theorem) Let {\Omega \subset {\bf R}^n} be an open set, and let {f: \Omega \rightarrow {\bf R}^n} be an continuously differentiable function, such that for every {x_0 \in \Omega}, the derivative map {Df(x_0): {\bf R}^n \rightarrow {\bf R}^n} is invertible. Then {f} is a local homeomorphism; thus, for every {x_0 \in \Omega}, there exists an open neighbourhood {U} of {x_0} and an open neighbourhood {V} of {f(x_0)} such that {f} is a homeomorphism from {U} to {V}.

It is also not difficult to show by inverting the Taylor expansion

\displaystyle  f(x) = f(x_0) + Df(x_0)(x-x_0) + o(\|x-x_0\|)

that at each {x_0}, the local inverses {f^{-1}: V \rightarrow U} are also differentiable at {f(x_0)} with derivative

\displaystyle  Df^{-1}(f(x_0)) = Df(x_0)^{-1}. \ \ \ \ \ (1)

The textbook proof of the inverse function theorem proceeds by an application of the contraction mapping theorem. Indeed, one may normalise {x_0=f(x_0)=0} and {Df(0)} to be the identity map; continuity of {Df} then shows that {Df(x)} is close to the identity for small {x}, which may be used (in conjunction with the fundamental theorem of calculus) to make {x \mapsto x-f(x)+y} a contraction on a small ball around the origin for small {y}, at which point the contraction mapping theorem readily finishes off the problem.

I recently learned (after I asked this question on Math Overflow) that the hypothesis of continuous differentiability may be relaxed to just everywhere differentiability:

Theorem 2 (Everywhere differentiable inverse function theorem) Let {\Omega \subset {\bf R}^n} be an open set, and let {f: \Omega \rightarrow {\bf R}^n} be an everywhere differentiable function, such that for every {x_0 \in \Omega}, the derivative map {Df(x_0): {\bf R}^n \rightarrow {\bf R}^n} is invertible. Then {f} is a local homeomorphism; thus, for every {x_0 \in \Omega}, there exists an open neighbourhood {U} of {x_0} and an open neighbourhood {V} of {f(x_0)} such that {f} is a homeomorphism from {U} to {V}.

As before, one can recover the differentiability of the local inverses, with the derivative of the inverse given by the usual formula (1).

This result implicitly follows from the more general results of Cernavskii about the structure of finite-to-one open and closed maps, however the arguments there are somewhat complicated (and subsequent proofs of those results, such as the one by Vaisala, use some powerful tools from algebraic geometry, such as dimension theory). There is however a more elementary proof of Saint Raymond that was pointed out to me by Julien Melleray. It only uses basic point-set topology (for instance, the concept of a connected component) and the basic topological and geometric structure of Euclidean space (in particular relying primarily on local compactness, local connectedness, and local convexity). I decided to present (an arrangement of) Saint Raymond’s proof here.

To obtain a local homeomorphism near {x_0}, there are basically two things to show: local surjectivity near {x_0} (thus, for {y} near {f(x_0)}, one can solve {f(x)=y} for some {x} near {x_0}) and local injectivity near {x_0} (thus, for distinct {x_1, x_2} near {f(x_0)}, {f(x_1)} is not equal to {f(x_2)}). Local surjectivity is relatively easy; basically, the standard proof of the inverse function theorem works here, after replacing the contraction mapping theorem (which is no longer available due to the possibly discontinuous nature of {Df}) with the Brouwer fixed point theorem instead (or one could also use degree theory, which is more or less an equivalent approach). The difficulty is local injectivity – one needs to preclude the existence of nearby points {x_1, x_2} with {f(x_1) = f(x_2) = y}; note that in contrast to the contraction mapping theorem that provides both existence and uniqueness of fixed points, the Brouwer fixed point theorem only gives existence and not uniqueness.

In one dimension {n=1} one can proceed by using Rolle’s theorem. Indeed, as one traverses the interval from {x_1} to {x_2}, one must encounter some intermediate point {x_*} which maximises the quantity {|f(x_*)-y|}, and which is thus instantaneously non-increasing both to the left and to the right of {x_*}. But, by hypothesis, {f'(x_*)} is non-zero, and this easily leads to a contradiction.

Saint Raymond’s argument for the higher dimensional case proceeds in a broadly similar way. Starting with two nearby points {x_1, x_2} with {f(x_1)=f(x_2)=y}, one finds a point {x_*} which “locally extremises” {\|f(x_*)-y\|} in the following sense: {\|f(x_*)-y\|} is equal to some {r_*>0}, but {x_*} is adherent to at least two distinct connected components {U_1, U_2} of the set {U = \{ x: \|f(x)-y\| < r_* \}}. (This is an oversimplification, as one has to restrict the available points {x} in {U} to a suitably small compact set, but let us ignore this technicality for now.) Note from the non-degenerate nature of {Df(x_*)} that {x_*} was already adherent to {U}; the point is that {x_*} “disconnects” {U} in some sense. Very roughly speaking, the way such a critical point {x_*} is found is to look at the sets {\{ x: \|f(x)-y\| \leq r \}} as {r} shrinks from a large initial value down to zero, and one finds the first value of {r_*} below which this set disconnects {x_1} from {x_2}. (Morally, one is performing some sort of Morse theory here on the function {x \mapsto \|f(x)-y\|}, though this function does not have anywhere near enough regularity for classical Morse theory to apply.)

The point {x_*} is mapped to a point {f(x_*)} on the boundary {\partial B(y,r_*)} of the ball {B(y,r_*)}, while the components {U_1, U_2} are mapped to the interior of this ball. By using a continuity argument, one can show (again very roughly speaking) that {f(U_1)} must contain a “hemispherical” neighbourhood {\{ z \in B(y,r_*): \|z-f(x_*)\| < \kappa \}} of {f(x_*)} inside {B(y,r_*)}, and similarly for {f(U_2)}. But then from differentiability of {f} at {x_*}, one can then show that {U_1} and {U_2} overlap near {x_*}, giving a contradiction.

The rigorous details of the proof are provided below the fold.

Read the rest of this entry »

This is another installment of my my series of posts on Hilbert’s fifth problem. One formulation of this problem is answered by the following theorem of Gleason and Montgomery-Zippin:

Theorem 1 (Hilbert’s fifth problem) Let {G} be a topological group which is locally Euclidean. Then {G} is isomorphic to a Lie group.

Theorem 1 is deep and difficult result, but the discussion in the previous posts has reduced the proof of this Theorem to that of establishing two simpler results, involving the concepts of a no small subgroups (NSS) subgroup, and that of a Gleason metric. We briefly recall the relevant definitions:

Definition 2 (NSS) A topological group {G} is said to have no small subgroups, or is NSS for short, if there is an open neighbourhood {U} of the identity in {G} that contains no subgroups of {G} other than the trivial subgroup {\{ \hbox{id}\}}.

Definition 3 (Gleason metric) Let {G} be a topological group. A Gleason metric on {G} is a left-invariant metric {d: G \times G \rightarrow {\bf R}^+} which generates the topology on {G} and obeys the following properties for some constant {C>0}, writing {\|g\|} for {d(g,\hbox{id})}:

  • (Escape property) If {g \in G} and {n \geq 1} is such that {n \|g\| \leq \frac{1}{C}}, then

    \displaystyle  \|g^n\| \geq \frac{1}{C} n \|g\|. \ \ \ \ \ (1)

  • (Commutator estimate) If {g, h \in G} are such that {\|g\|, \|h\| \leq \frac{1}{C}}, then

    \displaystyle  \|[g,h]\| \leq C \|g\| \|h\|, \ \ \ \ \ (2)

    where {[g,h] := g^{-1}h^{-1}gh} is the commutator of {g} and {h}.

The remaining steps in the resolution of Hilbert’s fifth problem are then as follows:

Theorem 4 (Reduction to the NSS case) Let {G} be a locally compact group, and let {U} be an open neighbourhood of the identity in {G}. Then there exists an open subgroup {G'} of {G}, and a compact subgroup {N} of {G'} contained in {U}, such that {G'/N} is NSS and locally compact.

Theorem 5 (Gleason’s lemma) Let {G} be a locally compact NSS group. Then {G} has a Gleason metric.

The purpose of this post is to establish these two results, using arguments that are originally due to Gleason. We will split this task into several subtasks, each of which improves the structure on the group {G} by some amount:

Proposition 6 (From locally compact to metrisable) Let {G} be a locally compact group, and let {U} be an open neighbourhood of the identity in {G}. Then there exists an open subgroup {G'} of {G}, and a compact subgroup {N} of {G'} contained in {U}, such that {G'/N} is locally compact and metrisable.

For any open neighbourhood {U} of the identity in {G}, let {Q(U)} be the union of all the subgroups of {G} that are contained in {U}. (Thus, for instance, {G} is NSS if and only if {Q(U)} is trivial for all sufficiently small {U}.)

Proposition 7 (From metrisable to subgroup trapping) Let {G} be a locally compact metrisable group. Then {G} has the subgroup trapping property: for every open neighbourhood {U} of the identity, there exists another open neighbourhood {V} of the identity such that {Q(V)} generates a subgroup {\langle Q(V) \rangle} contained in {U}.

Proposition 8 (From subgroup trapping to NSS) Let {G} be a locally compact group with the subgroup trapping property, and let {U} be an open neighbourhood of the identity in {G}. Then there exists an open subgroup {G'} of {G}, and a compact subgroup {N} of {G'} contained in {U}, such that {G'/N} is locally compact and NSS.

Proposition 9 (From NSS to the escape property) Let {G} be a locally compact NSS group. Then there exists a left-invariant metric {d} on {G} generating the topology on {G} which obeys the escape property (1) for some constant {C}.

Proposition 10 (From escape to the commutator estimate) Let {G} be a locally compact group with a left-invariant metric {d} that obeys the escape property (1). Then {d} also obeys the commutator property (2).

It is clear that Propositions 6, 7, and 8 combine to give Theorem 4, and Propositions 9, 10 combine to give Theorem 5.

Propositions 6-10 are all proven separately, but their proofs share some common strategies and ideas. The first main idea is to construct metrics on a locally compact group {G} by starting with a suitable “bump function” {\phi \in C_c(G)} (i.e. a continuous, compactly supported function from {G} to {{\bf R}}) and pulling back the metric structure on {C_c(G)} by using the translation action {\tau_g \phi(x) := \phi(g^{-1} x)}, thus creating a (semi-)metric

\displaystyle  d_\phi( g, h ) := \| \tau_g \phi - \tau_h \phi \|_{C_c(G)} := \sup_{x \in G} |\phi(g^{-1} x) - \phi(h^{-1} x)|. \ \ \ \ \ (3)

One easily verifies that this is indeed a (semi-)metric (in that it is non-negative, symmetric, and obeys the triangle inequality); it is also left-invariant, and so we have {d_\phi(g,h) = \|g^{-1} h \|_\phi = \| h^{-1} g \|_\phi}, where

\displaystyle  \| g \|_\phi = d_\phi(g,\hbox{id}) = \| \partial_g \phi \|_{C_c(G)}

where {\partial_g} is the difference operator {\partial_g = 1 - \tau_g},

\displaystyle  \partial_g \phi(x) = \phi(x) - \phi(g^{-1} x).

This construction was already seen in the proof of the Birkhoff-Kakutani theorem, which is the main tool used to establish Proposition 6. For the other propositions, the idea is to choose a bump function {\phi} that is “smooth” enough that it creates a metric with good properties such as the commutator estimate (2). Roughly speaking, to get a bound of the form (2), one needs {\phi} to have “{C^{1,1}} regularity” with respect to the “right” smooth structure on {G} By {C^{1,1}} regularity, we mean here something like a bound of the form

\displaystyle  \| \partial_g \partial_h \phi \|_{C_c(G)} \ll \|g\|_\phi \|h\|_\phi \ \ \ \ \ (4)

for all {g,h \in G}. Here we use the usual asymptotic notation, writing {X \ll Y} or {X=O(Y)} if {X \leq CY} for some constant {C} (which can vary from line to line).

The following lemma illustrates how {C^{1,1}} regularity can be used to build Gleason metrics.

Lemma 11 Suppose that {\phi \in C_c(G)} obeys (4). Then the (semi-)metric {d_\phi} (and associated (semi-)norm {\|\|_\phi}) obey the escape property (1) and the commutator property (2).

Proof: We begin with the commutator property (2). Observe the identity

\displaystyle  \tau_{[g,h]} = \tau_{hg}^{-1} \tau_{gh}

whence

\displaystyle  \partial_{[g,h]} = \tau_{hg}^{-1} ( \tau_{hg} - \tau_{gh} )

\displaystyle  = \tau_{hg}^{-1} ( \partial_h \partial_g - \partial_g \partial_h ).

From the triangle inequality (and translation-invariance of the {C_c(G)} norm) we thus see that (2) follows from (4). Similarly, to obtain the escape property (1), observe the telescoping identity

\displaystyle  \partial_{g^n} = n \partial_g + \sum_{i=0}^{n-1} \partial_g \partial_{g^i}

for any {g \in G} and natural number {n}, and thus by the triangle inequality

\displaystyle  \| g^n \|_\phi = n \| g \|_\phi + O( \sum_{i=0}^{n-1} \| \partial_g \partial_{g^i} \phi \|_{C_c(G)} ). \ \ \ \ \ (5)

But from (4) (and the triangle inequality) we have

\displaystyle  \| \partial_g \partial_{g^i} \phi \|_{C_c(G)} \ll \|g\|_\phi \|g^i \|_\phi \ll i \|g\|_\phi^2

and thus we have the “Taylor expansion”

\displaystyle  \|g^n\|_\phi = n \|g\|_\phi + O( n^2 \|g\|_\phi^2 )

which gives (1). \Box

It remains to obtain {\phi} that have the desired {C^{1,1}} regularity property. In order to get such regular bump functions, we will use the trick of convolving together two lower regularity bump functions (such as two functions with “{C^{0,1}} regularity” in some sense to be determined later). In order to perform this convolution, we will use the fundamental tool of (left-invariant) Haar measure {\mu} on the locally compact group {G}. Here we exploit the basic fact that the convolution

\displaystyle  f_1 * f_2(x) := \int_G f_1(y) f_2(y^{-1} x)\ d\mu(y) \ \ \ \ \ (6)

of two functions {f_1,f_2 \in C_c(G)} tends to be smoother than either of the two factors {f_1,f_2}. This is easiest to see in the abelian case, since in this case we can distribute derivatives according to the law

\displaystyle  \partial_g (f_1 * f_2) = (\partial_g f_1) * f_2 = f_1 * (\partial_g f_2),

which suggests that the order of “differentiability” of {f_1*f_2} should be the sum of the orders of {f_1} and {f_2} separately.

These ideas are already sufficient to establish Proposition 10 directly, and also Proposition 9 when comined with an additional bootstrap argument. The proofs of Proposition 7 and Proposition 8 use similar techniques, but is more difficult due to the potential presence of small subgroups, which require an application of the Peter-Weyl theorem to properly control. Both of these theorems will be proven below the fold, thus (when combined with the preceding posts) completing the proof of Theorem 1.

The presentation here is based on some unpublished notes of van den Dries and Goldbring on Hilbert’s fifth problem. I am indebted to Emmanuel Breuillard, Ben Green, and Tom Sanders for many discussions related to these arguments.

Read the rest of this entry »

A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm {\|T\|_{op}} of a linear map {T: H \rightarrow H'} between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case {T: {\bf C}^m \rightarrow {\bf C}^n} is of interest, as this operator norm is the same as the largest singular value {\sigma_1(A)} of the {n \times m} matrix {A} associated to {T}.

In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an {n \times n} diagonal matrix {D = \hbox{diag}(\lambda_1,\ldots,\lambda_n)}. In this case, the operator norm is simply the supremum norm of the diagonal coefficients:

\displaystyle  \|D\|_{op} = \sup_{1 \leq i \leq n} |\lambda_i|. \ \ \ \ \ (1)

A variant of (1) is Schur’s test, which for simplicity we will phrase in the setting of finite-dimensional operators {T: {\bf C}^m \rightarrow {\bf C}^n} given by a matrix {A = (a_{ij})_{1 \leq i \leq n; 1 \leq j \leq m}} via the usual formula

\displaystyle  T (x_j)_{j=1}^m := ( \sum_{j=1}^m a_{ij} x_j )_{i=1}^n.

A simple version of this test is as follows: if all the absolute row sums and columns sums of {A} are bounded by some constant {M}, thus

\displaystyle  \sum_{j=1}^m |a_{ij}| \leq M \ \ \ \ \ (2)

for all {1 \leq i \leq n} and

\displaystyle  \sum_{i=1}^n |a_{ij}| \leq M \ \ \ \ \ (3)

for all {1 \leq j \leq m}, then

\displaystyle  \|T\|_{op} = \|A\|_{op} \leq M \ \ \ \ \ (4)

(note that this generalises (the upper bound in) (1).) Indeed, to see (4), it suffices by duality and homogeneity to show that

\displaystyle  |\sum_{i=1}^n (\sum_{j=1}^m a_{ij} x_j) y_i| \leq M

whenever {(x_j)_{j=1}^m} and {(y_i)_{i=1}^n} are sequences with {\sum_{j=1}^m |x_j|^2 = \sum_{i=1}^n |y_i|^2 = 1}; but this easily follows from the arithmetic mean-geometric mean inequality

\displaystyle  |a_{ij} x_j) y_i| \leq \frac{1}{2} |a_{ij}| |x_i|^2 + \frac{1}{2} |a_{ij}| |y_j|^2

and (2), (3).

Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients {a_{ij}}, as opposed to just their magnitudes {|a_{ij}|}) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients {\lambda_i} or {a_{ij}} are replaced by operators instead.

To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix

\displaystyle  A = \begin{pmatrix} \Lambda_1 & 0 & \ldots & 0 \\ 0 & \Lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \Lambda_n \end{pmatrix} \ \ \ \ \ (5)

where each {\Lambda_i} is now a {m_i \times m_i} matrix, and so {A} is an {m \times m} matrix with {m := m_1 + \ldots +m_n}. Then we have

\displaystyle  \|A\|_{op} = \sup_{1 \leq i \leq n} \|\Lambda_i\|_{op}. \ \ \ \ \ (6)

Indeed, the lower bound is trivial (as can be seen by testing {A} on vectors which are supported on the {i^{th}} block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition

\displaystyle  {\bf C}^m \equiv \bigoplus_{i=1}^m {\bf C}^{m_i} \ \ \ \ \ (7)

to decompose an arbitrary vector {x \in {\bf C}^m} as

\displaystyle  x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

with {x_i \in {\bf C}^{m_i}}, in which case we have

\displaystyle  Ax = \begin{pmatrix} \Lambda_1 x_1 \\ \Lambda_2 x_2 \\ \vdots \\ \Lambda_n x_n \end{pmatrix}

and the upper bound in (6) then follows from a simple computation.

The operator {T} associated to the matrix {A} in (5) can be viewed as a sum {T = \sum_{i=1}^n T_i}, where each {T_i} corresponds to the {\Lambda_i} block of {A}, in which case (6) can also be written as

\displaystyle  \|T\|_{op} = \sup_{1 \leq i \leq n} \|T_i\|_{op}. \ \ \ \ \ (8)

When {n} is large, this is a significant improvement over the triangle inequality, which merely gives

\displaystyle  \|T\|_{op} \leq \sum_{1 \leq i \leq n} \|T_i\|_{op}.

The reason for this gain can ultimately be traced back to the “orthogonality” of the {T_i}; that they “occupy different columns” and “different rows” of the range and domain of {T}. This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities

\displaystyle  T_i^* T_j = 0 \ \ \ \ \ (9)

and

\displaystyle  T_i T^* j = 0 \ \ \ \ \ (10)

whenever {i \neq j}. (The first identity asserts that the ranges of the {T_i} are orthogonal to each other, and the second asserts that the coranges of the {T_i} (the ranges of the adjoints {T_i^*}) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).

The Cotlar-Stein lemma is an extension of this observation to the case where the {T_i} are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have

Lemma 1 (Cotlar-Stein lemma) Let {T_1,\ldots,T_n: H \rightarrow H'} be a finite sequence of bounded linear operators from one Hilbert space {H} to another {H'}, obeying the bounds

\displaystyle  \sum_{j=1}^n \| T_i T_j^* \|_{op}^{1/2} \leq M \ \ \ \ \ (11)

and

\displaystyle  \sum_{j=1}^n \| T_i^* T_j \|_{op}^{1/2} \leq M \ \ \ \ \ (12)

for all {i=1,\ldots,n} and some {M > 0} (compare with (2), (3)). Then one has

\displaystyle  \| \sum_{i=1}^n T_i \|_{op} \leq M. \ \ \ \ \ (13)

Note from the basic {TT^*} identity

\displaystyle  \|T\|_{op} = \|TT^* \|_{op}^{1/2} = \|T^* T\|_{op}^{1/2} \ \ \ \ \ (14)

that the hypothesis (11) (or (12)) already gives the bound

\displaystyle  \|T_i\|_{op} \leq M \ \ \ \ \ (15)

on each component {T_i} of {T}, which by the triangle inequality gives the inferior bound

\displaystyle  \| \sum_{i=1}^n T_i \|_{op} \leq nM;

the point of the Cotlar-Stein lemma is that the dependence on {n} in this bound is eliminated in (13), which in particular makes the bound suitable for extension to the limit {n \rightarrow \infty} (see Remark 1 below).

The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.

The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators {T} which “do not mix scales together too much”, in that operators {T} map functions “that oscillate at a given scale {2^{-i}}” to functions that still mostly oscillate at the same scale {2^{-i}}. In that case, one can often split {T} into components {T_i} which essentically capture the scale {2^{-i}} behaviour, and understanding {L^2} boundedness properties of {T} then reduces to establishing the boundedness of the simpler operators {T_i} (and of establishing a sufficient decay in products such as {T_i^* T_j} or {T_i T_j^*} when {i} and {j} are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the {T_i}, but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition {T = \sum_i T_i} in a fashion that is tailored to the particular operator {T}, and is not necessarily dictated by Fourier-analytic considerations.

Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of {T} is understood through the operator norm of a large power of {T} (or more precisely, of its self-adjoint square {TT^*} or {T^* T}). Indeed, from an iteration of (14) we see that for any natural number {N}, one has

\displaystyle  \|T\|_{op}^{2N} = \| (TT^*)^N \|_{op}. \ \ \ \ \ (16)

To estimate the right-hand side, we expand out the right-hand side and apply the triangle inequality to bound it by

\displaystyle  \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^* \|_{op}. \ \ \ \ \ (17)

Recall that when we applied the triangle inequality directly to {T}, we lost a factor of {n} in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.

To bound (17), we use the basic inequality {\|ST\|_{op} \leq \|S\|_{op} \|T\|_{op}} in two different ways. If we group the product {T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^*} in pairs, we can bound the summand of (17) by

\displaystyle  \| T_{i_1} T_{j_1}^* \|_{op} \ldots \| T_{i_N} T_{j_N}^* \|_{op}.

On the other hand, we can group the product by pairs in another way, to obtain the bound of

\displaystyle  \| T_{i_1} \|_{op} \| T_{j_1}^* T_{i_2} \|_{op} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op} \| T_{j_N}^* \|_{op}.

We bound {\| T_{i_1} \|_{op}} and {\| T_{j_N}^* \|_{op}} crudely by {M} using (15). Taking the geometric mean of the above bounds, we can thus bound (17) by

\displaystyle  M \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* \|_{op}^{1/2} \| T_{j_1}^* T_{i_2} \|_{op}^{1/2} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op}^{1/2} \| T_{i_N} T_{j_N}^* \|_{op}^{1/2}.

If we then sum this series first in {j_N}, then in {i_N}, then moving back all the way to {i_1}, using (11) and (12) alternately, we obtain a final bound of

\displaystyle  n M^{2N}

for (16). Taking {N^{th}} roots, we obtain

\displaystyle  \|T\|_{op} \leq n^{1/2N} M.

Sending {N \rightarrow \infty}, we obtain the claim.

Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums {\sum_{i=1}^\infty T_i} (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any {f \in H}, the sum {\sum_{i=1}^\infty T_i f} is unconditionally convergent in {H'} (and furthermore has bounded {2}-variation), and the resulting operator {\sum_{i=1}^\infty T_i} is a bounded linear operator with an operator norm bound on {M}.

Remark 2 If we specialise to the case where all the {T_i} are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.

Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality

\displaystyle  \|A\|_{op}^{2N} \leq \hbox{tr}( (AA^*)^N )

(which follows easily from the singular value decomposition), we can bound {\|A\|_{op}^{2N}} by

\displaystyle  \sum_{i_1,\ldots,j_N \in \{1,\ldots,n\}} a_{i_1,j_1} \overline{a_{j_1,i_2}} \ldots a_{i_N,j_N} \overline{a_{j_N,i_1}}.

Estimating the other two terms in the summand by {M}, and then repeatedly summing the indices one at a time as before, we obtain

\displaystyle  \|A\|_{op}^{2N} \leq n M^{2N}

and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.

If {f: {\bf R}^d \rightarrow {\bf C}} is a locally integrable function, we define the Hardy-Littlewood maximal function {Mf: {\bf R}^d \rightarrow {\bf C}} by the formula

\displaystyle  Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,

where {B(x,r)} is the ball of radius {r} centred at {x}, and {|E|} denotes the measure of a set {E}. The Hardy-Littlewood maximal inequality asserts that

\displaystyle  |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)

for all {f\in L^1({\bf R}^d)}, all {\lambda > 0}, and some constant {C_d > 0} depending only on {d}. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem

\displaystyle  \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)

for all {f \in L^1({\bf R}^d)} and almost every {x \in {\bf R}^d}. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality {\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}) we see that

\displaystyle  \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)

for all {p > 1} and {f \in L^p({\bf R}^d)}, and some constant {C_{d,p}} depending on {d} and {p}.

The exact dependence of {C_{d,p}} on {d} and {p} is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form {C_d = C^d} for some absolute constant {C>1}. Inserting this into the Marcinkiewicz theorem, one obtains a constant {C_{d,p}} of the form {C_{d,p} = \frac{C^d}{p-1}} for some {C>1} (and taking {p} bounded away from infinity, for simplicity). The dependence on {p} is about right, but the dependence on {d} should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of {d}:

Theorem 1 One can take {C_{d,p} = C_p} for each {p>1}, where {C_p} depends only on {p}.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

\displaystyle  M_S f(x) := \sup_{r>0} A_r |f|(x)

where {A_r} are the spherical averaging operators

\displaystyle  A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)

and {d\sigma^{d-1}} is normalised surface measure on the sphere {S^{d-1}}. Because this is an uncountable supremum, and the averaging operators {A_r} do not have good continuity properties in {r}, it is not a priori obvious that {M_S f} is even a measurable function for, say, locally integrable {f}; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions {f}. The Stein maximal theorem for the spherical maximal function then asserts that if {d \geq 3} and {p > \frac{d}{d-1}}, then we have

\displaystyle  \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)

for all (continuous) {f \in L^p({\bf R}^d)}. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence {\lim_{r \rightarrow 0} A_r f(x) = f(x)} of the spherical averages for any {f \in L^p({\bf R}^d)} when {d \geq 3} and {p > \frac{d}{d-1}}, although we will not focus on this application here.)

The condition {p > \frac{d}{d-1}} can be seen to be necessary as follows. Take {f} to be any fixed bump function. A brief calculation then shows that {M_S f(x)} decays like {|x|^{1-d}} as {|x| \rightarrow \infty}, and hence {M_S f} does not lie in {L^p({\bf R}^d)} unless {p > \frac{d}{d-1}}. By taking {f} to be a rescaled bump function supported on a small ball, one can show that the condition {p > \frac{d}{d-1}} is necessary even if we replace {{\bf R}^d} with a compact region (and similarly restrict the radius parameter {r} to be bounded). The condition {d \geq 3} however is not quite necessary; the result is also true when {d=2}, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator {Mf}, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

\displaystyle  Mf(x) \leq M_S f(x)

for any (continuous) {f}, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant {C_{p,d}}. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of {L^p({\bf R}^d)} by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when {d \geq 3} and {p > \frac{d}{d-1}}; and secondly, the constant {C_{d,p}} in that theorem still depends on dimension {d}. The first objection can be easily disposed of, for if {p>1}, then the hypotheses {d \geq 3} and {p > \frac{d}{d-1}} will automatically be satisfied for {d} sufficiently large (depending on {p}); note that the case when {d} is bounded (with a bound depending on {p}) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant {C_{d,p}} in (3) depends on {d}. However, here we can use the method of rotations to show that the constants {C_{p,d}} can be taken to be non-increasing (and hence bounded) in {d}. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that {C_{d+1,p} \leq C_{d,p}}, in the sense that any bound of the form

\displaystyle  \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)

for the {d}-dimensional spherical maximal function, implies the same bound

\displaystyle  \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)

for the {d+1}-dimensional spherical maximal function, with exactly the same constant {A}. For any direction {\omega_0 \in S^d \subset {\bf R}^{d+1}}, consider the averaging operators

\displaystyle  M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)

for any continuous {f: {\bf R}^{d+1} \rightarrow {\bf C}}, where

\displaystyle  A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)

where {U_{\omega_0}} is some orthogonal transformation mapping the sphere {S^{d-1}} to the sphere {S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}; the exact choice of orthogonal transformation {U_{\omega_0}} is irrelevant due to the rotation-invariance of surface measure {d\sigma^{d-1}} on the sphere {S^{d-1}}. A simple application of Fubini’s theorem (after first rotating {\omega_0} to be, say, the standard unit vector {e_d}) using (4) then shows that

\displaystyle  \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)

uniformly in {\omega_0}. On the other hand, by viewing the {d}-dimensional sphere {S^d} as an average of the spheres {S^{d-1,\omega_0}}, we have the identity

\displaystyle  A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of {f} on the sphere {\{ y \in {\bf R}^{d+1}: |y-x|=r\}}. This implies that

\displaystyle  M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant {C_d} for the weak {(1,1)} inequality (1) is independent of dimension, as the weak {L^1} quasinorm {\| \|_{L^{1,\infty}}} is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether {C_d} in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take {C_d = Cd} for some absolute constant {C}, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function

\displaystyle  \sup_{t > 0} e^{t\Delta} |f|(x).

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type {(1,1)} with a constant of {1}, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls {B(x,r)} with cubes, then the weak {(1,1)} constant {C_d} must go to infinity as {d \rightarrow \infty}.

Read the rest of this entry »

Suppose one has a measure space {X = (X, {\mathcal B}, \mu)} and a sequence of operators {T_n: L^p(X) \rightarrow L^p(X)} that are bounded on some {L^p(X)} space, with {1 \leq p < \infty}. Suppose that on some dense subclass of functions {f} in {L^p(X)} (e.g. continuous compactly supported functions, if the space {X} is reasonable), one already knows that {T_n f} converges pointwise almost everywhere to some limit {Tf}, for another bounded operator {T: L^p(X) \rightarrow L^p(X)} (e.g. {T} could be the identity operator). What additional ingredient does one need to pass to the limit and conclude that {T_n f} converges almost everywhere to {Tf} for all {f} in {L^p(X)} (and not just for {f} in a dense subclass)?

One standard way to proceed here is to study the maximal operator

\displaystyle T_* f(x) := \sup_n |T_n f(x)|

and aim to establish a weak-type maximal inequality

\displaystyle \| T_* f \|_{L^{p,\infty}(X)} \leq C \| f \|_{L^p(X)} \ \ \ \ \ (1)

 

for all {f \in L^p(X)} (or all {f} in the dense subclass), and some constant {C}, where {L^{p,\infty}} is the weak {L^p} norm

\displaystyle \|f\|_{L^{p,\infty}(X)} := \sup_{t > 0} t \mu( \{ x \in X: |f(x)| \geq t \})^{1/p}.

A standard approximation argument using (1) then shows that {T_n f} will now indeed converge to {Tf} pointwise almost everywhere for all {f} in {L^p(X)}, and not just in the dense subclass. See for instance these lecture notes of mine, in which this method is used to deduce the Lebesgue differentiation theorem from the Hardy-Littlewood maximal inequality. This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it is strictly necessary. In particular, is it possible to have a pointwise convergence result {T_n f \mapsto T f} without being able to obtain a weak-type maximal inequality of the form (1)?

In the case of norm convergence (in which one asks for {T_n f} to converge to {Tf} in the {L^p} norm, rather than in the pointwise almost everywhere sense), the answer is no, thanks to the uniform boundedness principle, which among other things shows that norm convergence is only possible if one has the uniform bound

\displaystyle \sup_n \| T_n f \|_{L^p(X)} \leq C \| f \|_{L^p(X)} \ \ \ \ \ (2)

 

for some {C>0} and all {f \in L^p(X)}; and conversely, if one has the uniform bound, and one has already established norm convergence of {T_n f} to {Tf} on a dense subclass of {L^p(X)}, (2) will extend that norm convergence to all of {L^p(X)}.

Returning to pointwise almost everywhere convergence, the answer in general is “yes”. Consider for instance the rank one operators

\displaystyle T_n f(x) := 1_{[n,n+1]} \int_0^1 f(y)\ dy

from {L^1({\bf R})} to {L^1({\bf R})}. It is clear that {T_n f} converges pointwise almost everywhere to zero as {n \rightarrow \infty} for any {f \in L^1({\bf R})}, and the operators {T_n} are uniformly bounded on {L^1({\bf R})}, but the maximal function {T_*} does not obey (1). One can modify this example in a number of ways to defeat almost any reasonable conjecture that something like (1) should be necessary for pointwise almost everywhere convergence.

In spite of this, a remarkable observation of Stein, now known as Stein’s maximal principle, asserts that the maximal inequality is necessary to prove pointwise almost everywhere convergence, if one is working on a compact group and the operators {T_n} are translation invariant, and if the exponent {p} is at most {2}:

Theorem 1 (Stein maximal principle) Let {G} be a compact group, let {X} be a homogeneous space of {G} with a finite Haar measure {\mu}, let {1\leq p \leq 2}, and let {T_n: L^p(X) \rightarrow L^p(X)} be a sequence of bounded linear operators commuting with translations, such that {T_n f} converges pointwise almost everywhere for each {f \in L^p(X)}. Then (1) holds.

This is not quite the most general vesion of the principle; some additional variants and generalisations are given in the original paper of Stein. For instance, one can replace the discrete sequence {T_n} of operators with a continuous sequence {T_t} without much difficulty. As a typical application of this principle, we see that Carleson’s celebrated theorem that the partial Fourier series {\sum_{n=-N}^N \hat f(n) e^{2\pi i nx}} of an {L^2({\bf R}/{\bf Z})} function {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} converge almost everywhere is in fact equivalent to the estimate

\displaystyle \| \sup_{N>0} |\sum_{n=-N}^N \hat f(n) e^{2\pi i n\cdot}|\|_{L^{2,\infty}({\bf R}/{\bf Z})} \leq C \|f\|_{L^2({\bf R}/{\bf Z})}. \ \ \ \ \ (3)

 

And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (3), and Stein’s maximal principle strongly suggests that this is the optimal way to try to prove this theorem.

On the other hand, the theorem does fail for {p>2}, and almost everywhere convergence results in {L^p} for {p>2} can be proven by other methods than weak {(p,p)} estimates. For instance, the convergence of Bochner-Riesz multipliers in {L^p({\bf R}^n)} for any {n} (and for {p} in the range predicted by the Bochner-Riesz conjecture) was verified for {p > 2} by Carbery, Rubio de Francia, and Vega, despite the fact that the weak {(p,p)} of even a single Bochner-Riesz multiplier, let alone the maximal function, has still not been completely verified in this range. (Carbery, Rubio de Francia and Vega use weighted {L^2} estimates for the maximal Bochner-Riesz operator, rather than {L^p} type estimates.) For {p \leq 2}, though, Stein’s principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz means is equivalent to the weak {(p,p)} estimate (1).

Stein’s principle is restricted to compact groups (such as the torus {({\bf R}/{\bf Z})^n} or the rotation group {SO(n)}) and their homogeneous spaces (such as the torus {({\bf R}/{\bf Z})^n} again, or the sphere {S^{n-1}}). As stated, the principle fails in the noncompact setting; for instance, in {{\bf R}}, the convolution operators {T_n f := f * 1_{[n,n+1]}} are such that {T_n f} converges pointwise almost everywhere to zero for every {f \in L^1({\bf R}^n)}, but the maximal function is not of weak-type {(1,1)}. However, in many applications on non-compact domains, the {T_n} are “localised” enough that one can transfer from a non-compact setting to a compact setting and then apply Stein’s principle. For instance, Carleson’s theorem on the real line {{\bf R}} is equivalent to Carleson’s theorem on the circle {{\bf R}/{\bf Z}} (due to the localisation of the Dirichlet kernels), which as discussed before is equivalent to the estimate (3) on the circle, which by a scaling argument is equivalent to the analogous estimate on the real line {{\bf R}}.

Stein’s argument from his 1961 paper can be viewed nowadays as an application of the probabilistic method; starting with a sequence of increasingly bad counterexamples to the maximal inequality (1), one randomly combines them together to create a single “infinitely bad” counterexample. To make this idea work, Stein employs two basic ideas:

  1. The random rotations (or random translations) trick. Given a subset {E} of {X} of small but positive measure, one can randomly select about {|G|/|E|} translates {g_i E} of {E} that cover most of {X}.
  2. The random sums trick Given a collection {f_1,\ldots,f_n: X \rightarrow {\bf C}} of signed functions that may possibly cancel each other in a deterministic sum {\sum_{i=1}^n f_i}, one can perform a random sum {\sum_{i=1}^n \pm f_i} instead to obtain a random function whose magnitude will usually be comparable to the square function {(\sum_{i=1}^n |f_i|^2)^{1/2}}; this can be made rigorous by concentration of measure results, such as Khintchine’s inequality.

These ideas have since been used repeatedly in harmonic analysis. For instance, I used the random rotations trick in a recent paper with Jordan Ellenberg and Richard Oberlin on Kakeya-type estimates in finite fields. The random sums trick is by now a standard tool to build various counterexamples to estimates (or to convergence results) in harmonic analysis, for instance being used by Fefferman in his famous paper disproving the boundedness of the ball multiplier on {L^p({\bf R}^n)} for {p \neq 2}, {n \geq 2}. Another use of the random sum trick is to show that Theorem 1 fails once {p>2}; see Stein’s original paper for details.

Another use of the random rotations trick, closely related to Theorem 1, is the Nikishin-Stein factorisation theorem. Here is Stein’s formulation of this theorem:

Theorem 2 (Stein factorisation theorem) Let {G} be a compact group, let {X} be a homogeneous space of {G} with a finite Haar measure {\mu}, let {1\leq p \leq 2} and {q>0}, and let {T: L^p(X) \rightarrow L^q(X)} be a bounded linear operator commuting with translations and obeying the estimate

\displaystyle \|T f \|_{L^q(X)} \leq A \|f\|_{L^p(X)}

for all {f \in L^p(X)} and some {A>0}. Then {T} also maps {L^p(X)} to {L^{p,\infty}(X)}, with

\displaystyle \|T f \|_{L^{p,\infty}(X)} \leq C_{p,q} A \|f\|_{L^p(X)}

for all {f \in L^p(X)}, with {C_{p,q}} depending only on {p, q}.

This result is trivial with {q \geq p}, but becomes useful when {q<p}. In this regime, the translation invariance allows one to freely “upgrade” a strong-type {(p,q)} result to a weak-type {(p,p)} result. In other words, bounded linear operators from {L^p(X)} to {L^q(X)} automatically factor through the inclusion {L^{p,\infty}(X) \subset L^q(X)}, which helps explain the name “factorisation theorem”. Factorisation theory has been developed further by many authors, including Maurey and Pisier.

Stein’s factorisation theorem (or more precisely, a variant of it) is useful in the theory of Kakeya and restriction theorems in Euclidean space, as first observed by Bourgain.

In 1970, Nikishin obtained the following generalisation of Stein’s factorisation theorem in which the translation-invariance hypothesis can be dropped, at the cost of excluding a set of small measure:

Theorem 3 (Nikishin-Stein factorisation theorem) Let {X} be a finite measure space, let {1\leq p \leq 2} and {q>0}, and let {T: L^p(X) \rightarrow L^q(X)} be a bounded linear operator commuting with translations and obeying the estimate

\displaystyle \|T f \|_{L^q(X)} \leq A \|f\|_{L^p(X)}

for all {f \in L^p(X)} and some {A>0}. Then for any {\epsilon > 0}, there exists a subset {E} of {X} of measure at most {\epsilon} such that

\displaystyle \|T f \|_{L^{p,\infty}(X \backslash E)} \leq C_{p,q,\epsilon} A \|f\|_{L^p(X)} \ \ \ \ \ (4)

 

for all {f \in L^p(X)}, with {C_{p,q,\epsilon}} depending only on {p, q, \epsilon}.

One can recover Theorem 2 from Theorem 3 by an averaging argument to eliminate the exceptional set; we omit the details.

Read the rest of this entry »

Igor Rodnianski and I have just uploaded to the arXiv our paper “Effective limiting absorption principles, and applications“, submitted to Communications in Mathematical Physics. In this paper we derive limiting absorption principles (of type discussed in this recent post) for a general class of Schrödinger operators {H = -\Delta + V} on a wide class of manifolds, namely the asymptotically conic manifolds. The precise definition of such manifolds is somewhat technical, but they include as a special case the asymptotically flat manifolds, which in turn include as a further special case the smooth compact perturbations of Euclidean space {{\bf R}^n} (i.e. the smooth Riemannian manifolds that are identical to {{\bf R}^n} outside of a compact set). The potential {V} is assumed to be a short range potential, which roughly speaking means that it decays faster than {1/|x|} as {x \rightarrow \infty}; for several of the applications (particularly at very low energies) we need to in fact assume that {V} is a strongly short range potential, which roughly speaking means that it decays faster than {1/|x|^2}.

To begin with, we make no hypotheses about the topology or geodesic geometry of the manifold {M}; in particular, we allow {M} to be trapping in the sense that it contains geodesic flows that do not escape to infinity, but instead remain trapped in a bounded subset of {M}. We also allow the potential {V} to be signed, which in particular allows bound states (eigenfunctions of negative energy) to be created. For standard technical reasons we restrict attention to dimensions three and higher: {d \geq 3}.

It is well known that such Schrödinger operators {H} are essentially self-adjoint, and their spectrum consists of purely absolutely continuous spectrum on {(0,+\infty)}, together with possibly some eigenvalues at zero and negative energy (and at zero energy and in dimensions three and four, there are also the possibility of resonances which, while not strictly eigenvalues, have a somewhat analogous effect on the dynamics of the Laplacian and related objects, such as resolvents). In particular, the resolvents {R(\lambda \pm i\epsilon) := (H - \lambda \mp i\epsilon)^{-1}} make sense as bounded operators on {L^2(M)} for any {\lambda \in {\bf R}} and {\epsilon > 0}. As discussed in the previous blog post, it is of interest to obtain bounds for the behaviour of these resolvents, as this can then be used via some functional calculus manipulations to obtain control on many other operators and PDE relating to the Schrödinger operator {H}, such as the Helmholtz equation, the time-dependent Schrödinger equation, and the wave equation. In particular, it is of interest to obtain limiting absorption estimates such as

\displaystyle  \| R(\lambda \pm i\epsilon) f \|_{H^{0,-1/2-\sigma}(M)} \leq C(M,V,\lambda,\sigma) \| f \|_{H^{0,1/2+\sigma}(M)} \ \ \ \ \ (1)

for {\lambda \in {\bf R}} (and particularly in the positive energy regime {\lambda>0}), where {\sigma,\epsilon > 0} and {f} is an arbitrary test function. The constant {C(M,V,\lambda,\sigma)} needs to be independent of {\epsilon} for such estimates to be truly useful, but it is also of interest to determine the extent to which these constants depend on {M}, {V}, and {\lambda}. The dependence on {\sigma} is relatively uninteresting and henceforth we will suppress it. In particular, our paper focused to a large extent on quantitative methods that could give effective bounds on {C(M,V,\lambda)} in terms of quantities such as the magnitude {A} of the potential {V} in a suitable norm.

It turns out to be convenient to distinguish between three regimes:

  • The high-energy regime {\lambda \gg 1};
  • The medium-energy regime {\lambda \sim 1}; and
  • The low-energy regime {0 < \lambda \ll 1}.

Our methods actually apply more or less uniformly to all three regimes, but the nature of the conclusions is quite different in each of the three regimes.

The high-energy regime {\lambda \gg 1} was essentially worked out by Burq, although we give an independent treatment of Burq’s results here. In this regime it turns out that we have an unconditional estimate of the form (1) with a constant of the shape

\displaystyle  C(M,V,\lambda) = C(M,A) e^{C(M,A) \sqrt{\lambda}}

where {C(M,A)} is a constant that depends only on {M} and on a parameter {A} that controls the size of the potential {V}. This constant, while exponentially growing, is still finite, which among other things is enough to rule out the possibility that {H} contains eigenfunctions (i.e. point spectrum) embedded in the high-energy portion of the spectrum. As is well known, if {M} contains a certain type of trapped geodesic (in particular those arising from positively curved portions of the manifold, such as the equator of a sphere), then it is possible to construct pseudomodes {f} that show that this sort of exponential growth is necessary. On the other hand, if we make the non-trapping hypothesis that all geodesics in {M} escape to infinity, then we can obtain a much stronger high-energy limiting absorption estimate, namely

\displaystyle  C(M,V,\lambda,\sigma) = C(M,A) \lambda^{-1/2}.

The exponent {1/2} here is closely related to the standard fact that on non-trapping manifolds, there is a local smoothing effect for the time-dependent Schrödinger equation that gains half a derivative of regularity (cf. previous blog post). In the high-energy regime, the dynamics are well-approximated by semi-classical methods, and in particular one can use tools such as the positive commutator method and pseudo-differential calculus to obtain the desired estimates. In case of trapping one also needs the standard technique of Carleman inequalities to control the compact (and possibly trapping) core of the manifold, and in particular needing the delicate two-weight Carleman inequalities of Burq.

In the medium and low energy regimes one needs to work harder. In the medium energy regime {\lambda \sim 1}, we were able to obtain a uniform bound

\displaystyle  C(M,V,\lambda) \leq C(M,A)

for all asymptotically conic manifolds (trapping or not) and all short-range potentials. To establish this bound, we have to supplement the existing tools of the positive commutator method and Carleman inequalities with an additional ODE-type analysis of various energies of the solution {u = R(\lambda \pm i\epsilon) f} to a Helmholtz equation on large spheres, as will be discussed in more detail below the fold.

The methods also extend to the low-energy regime {0 < \lambda \ll 1}. Here, the bounds become somewhat interesting, with a subtle distinction between effective estimates that are uniform over all potentials {V} which are bounded in a suitable sense by a parameter {A} (e.g. obeying {|V(x)| \leq A \langle x \rangle^{-2-2\sigma}} for all {x}), and ineffective estimates that exploit qualitative properties of {V} (such as the absence of eigenfunctions or resonances at zero) and are thus not uniform over {V}. On the effective side, and for potentials that are strongly short range (at least at local scales {|x| = O(\lambda^{-1/2})}; one can tolerate merely short-range behaviour at more global scales, but this is a technicality that we will not discuss further here) we were able to obtain a polynomial bound of the form

\displaystyle  C(M,V,\lambda) \leq C(M,A) \lambda^{-C(M,A)}

that blew up at a large polynomial rate at the origin. Furthermore, by carefully designing a sequence of potentials {V} that induce near-eigenfunctions that resemble two different Bessel functions of the radial variable glued together, we are able to show that this type of polynomial bound is sharp in the following sense: given any constant {C > 0}, there exists a sequence {V_n} of potentials on Euclidean space {{\bf R}^d} uniformly bounded by {A}, and a sequence {\lambda_n} of energies going to zero, such that

\displaystyle  C({\bf R}^d,V_n,\lambda_n) \geq \lambda_n^{-C}.

This shows that if one wants bounds that are uniform in the potential {V}, then arbitrary polynomial blowup is necessary.

Interestingly, though, if we fix the potential {V}, and then ask for bounds that are not necessarily uniform in {V}, then one can do better, as was already observed in a classic paper of Jensen and Kato concerning power series expansions of the resolvent near the origin. In particular, if we make the spectral assumption that {V} has no eigenfunctions or resonances at zero, then an argument (based on (a variant of) the Fredholm alternative, which as discussed in this recent blog post gives ineffective bounds) gives a bound of the form

\displaystyle  C(M,V,\lambda) \leq C(M,V) \lambda^{-1/2}

in the low-energy regime (but note carefully here that the constant {C(M,V)} on the right-hand side depends on the potential {V} itself, and not merely on the parameter {A} that upper bounds it). Even if there are eigenvalues or resonances, it turns out that one can still obtain a similar bound but with an exponent of {\lambda^{-3/2}} instead of {\lambda^{-1/2}}. This limited blowup at infinity is in sharp contrast to the arbitrarily large polynomial blowup rate that can occur if one demands uniform bounds. (This particular subtlety between uniform and non-uniform estimates confused us, by the way, for several weeks; for a long time we thought that we had somehow found a contradiction between our results and the results of Jensen and Kato.)

As applications of our limiting absorption estimates, we give local smoothing and dispersive estimates for solutions (as well as the closely related RAGE type theorems) to the time-dependent Schrödinger and wave equations, and also reprove standard facts about the spectrum of Schrödinger operators in this setting.

Read the rest of this entry »

In a few weeks, Princeton University will host a conference in Analysis and Applications in honour of the 80th birthday of Elias Stein (though, technically, Eli’s 80th birthday was actually in January). As one of Eli’s students, I was originally scheduled to be one of the speakers at this conference; but unfortunately, for family reasons I will be unable to attend. In lieu of speaking at this conference, I have decided to devote some space on this blog for this month to present some classic results of Eli from his many decades of work in harmonic analysis, ergodic theory, several complex variables, and related topics. My choice of selections here will be a personal and idiosyncratic one; the results I present are not necessarily the “best” or “deepest” of his results, but are ones that I find particularly elegant and appealing. (There will also inevitably be some overlap here with Charlie Fefferman’s article “Selected theorems by Eli Stein“, which not coincidentally was written for Stein’s 60th birthday conference in 1991.)

In this post I would like to describe one of Eli Stein’s very first results that is still used extremely widely today, namely his interpolation theorem from 1956 (and its refinement, the Fefferman-Stein interpolation theorem from 1972). This is a deceptively innocuous, yet remarkably powerful, generalisation of the classic Riesz-Thorin interpolation theorem which uses methods from complex analysis (and in particular, the Lindelöf theorem or the Phragmén-Lindelöf principle) to show that if a linear operator {T: L^{p_0}(X) + L^{p_1}(X) \rightarrow L^{q_0}(Y) + L^{q_1}(Y)} from one ({\sigma}-finite) measure space {X = (X,{\mathcal X},\mu)} to another {Y = (Y, {\mathcal Y}, \nu)} obeyed the estimates

\displaystyle  \| Tf \|_{L^{q_0}(Y)} \leq B_0 \|f\|_{L^{p_0}(X)} \ \ \ \ \ (1)

for all {f \in L^{p_0}(X)} and

\displaystyle  \| Tf \|_{L^{q_1}(Y)} \leq B_1 \|f\|_{L^{p_1}(X)} \ \ \ \ \ (2)

for all {f \in L^{p_1}(X)}, where {1 \leq p_0,p_1,q_0,q_1 \leq \infty} and {B_0,B_1 > 0}, then one automatically also has the interpolated estimates

\displaystyle  \| Tf \|_{L^{q_\theta}(Y)} \leq B_\theta \|f\|_{L^{p_\theta}(X)} \ \ \ \ \ (3)

for all {f \in L^{p_\theta}(X)} and {0 \leq \theta \leq 1}, where the quantities {p_\theta, q_\theta, B_\theta} are defined by the formulae

\displaystyle  \frac{1}{p_\theta} = \frac{1-\theta}{p_0} + \frac{\theta}{p_1}

\displaystyle  \frac{1}{q_\theta} = \frac{1-\theta}{q_0} + \frac{\theta}{q_1}

\displaystyle  B_\theta = B_0^{1-\theta} B_1^\theta.

The Riesz-Thorin theorem is already quite useful (it gives, for instance, by far the quickest proof of the Hausdorff-Young inequality for the Fourier transform, to name just one application), but it requires the same linear operator {T} to appear in (1), (2), and (3). Eli Stein realised, though, that due to the complex-analytic nature of the proof of the Riesz-Thorin theorem, it was possible to allow different linear operators to appear in (1), (2), (3), so long as the dependence was analytic. A bit more precisely: if one had a family {T_z} of operators which depended in an analytic manner on a complex variable {z} in the strip {\{ z \in {\bf C}: 0 \leq \hbox{Re}(z) \leq 1 \}} (thus, for any test functions {f, g}, the inner product {\langle T_z f, g \rangle} would be analytic in {z}) which obeyed some mild regularity assumptions (which are slightly technical and are omitted here), and one had the estimates

\displaystyle  \| T_{0+it} f \|_{L^{q_0}(Y)} \leq C_t \|f\|_{L^{p_0}(X)}

and

\displaystyle  \| T_{1+it} f \|_{L^{q_1}(Y)} \leq C_t\|f\|_{L^{p_1}(X)}

for all {t \in {\bf R}} and some quantities {C_t} that grew at most exponentially in {t} (actually, any growth rate significantly slower than the double-exponential {e^{\exp(\pi |t|)}} would suffice here), then one also has the interpolated estimates

\displaystyle  \| T_\theta f \|_{L^{q_\theta}(Y)} \leq C' \|f\|_{L^{p_\theta}(X)}

for all {0 \leq \theta \leq 1} and a constant {C'} depending only on {C, p_0, p_1, q_0, q_1}.

Read the rest of this entry »

I’ve just finished writing the first draft of my third book coming out of the 2010 blog posts, namely “Higher order Fourier analysis“, which was based primarily on my graduate course in the topic, though it also contains material from some additional posts related to linear and higher order Fourier analysis on the blog.  It is available online here.  As usual, comments and corrections are welcome.  There is also a stub page for the book, which at present does not contain much more than the above link.

 

As I have done in the last three years, I am spending some time at the beginning of this year converting some of my posts on this blog into book format.  This time round, the situation is a bit different because the majority of mathematical posts last year came from three courses I have taught: random matrices, higher-order Fourier analysis, and measure theory.  These topics are sufficiently unrelated to each other, and to the other mathematical posts from 2010, that I am thinking of having as many as four distinct books this time around, though my plans are not yet definite in this regard.

In any event, I have started the process by converting the measure theory notes to book form, a draft copy of which is now available here.  I have also started up a stub of a book page for this text, though it has little content at present beyond that link.    I will be continuing to work on it in parallel with the rest of the conversion process.  As always, any comments and corrections are very welcome.

Let {G} be a compact group. (Throughout this post, all topological groups are assumed to be Hausdorff.) Then {G} has a number of unitary representations, i.e. continuous homomorphisms {\rho: G \rightarrow U(H)} to the group {U(H)} of unitary operators on a Hilbert space {H}, equipped with the strong operator topology. In particular, one has the left-regular representation {\tau: G \rightarrow U(L^2(G))}, where we equip {G} with its normalised Haar measure {\mu} (and the Borel {\sigma}-algebra) to form the Hilbert space {L^2(G)}, and {\tau} is the translation operation

\displaystyle  \tau(g) f(x) := f(g^{-1} x).

We call two unitary representations {\rho: G \rightarrow U(H)} and {\rho': G \rightarrow U(H')} isomorphic if one has {\rho'(g) = U \rho(g) U^{-1}} for some unitary transformation {U: H \rightarrow H'}, in which case we write {\rho \equiv \rho'}.

Given two unitary representations {\rho: G \rightarrow U(H)} and {\rho': G \rightarrow U(H')}, one can form their direct sum {\rho \oplus \rho': G \rightarrow U(H \oplus H')} in the obvious manner: {\rho \oplus \rho'(g)(v) := (\rho(g) v, \rho'(g) v)}. Conversely, if a unitary representation {\rho: G \rightarrow U(H)} has a closed invariant subspace {V \subset H} of {H} (thus {\rho(g) V \subset V} for all {g \in G}), then the orthogonal complement {V^\perp} is also invariant, leading to a decomposition {\rho \equiv \rho\downharpoonright_V \oplus \rho\downharpoonright_{V^\perp}} of {\rho} into the subrepresentations {\rho\downharpoonright_V: G \rightarrow U(V)}, {\rho\downharpoonright_{V^\perp}: G \rightarrow U(V^\perp)}. Accordingly, we will call a unitary representation {\rho: G \rightarrow U(H)} irreducible if {H} is nontrivial (i.e. {H \neq \{0\}}) and there are no nontrivial invariant subspaces (i.e. no invariant subspaces other than {\{0\}} and {H}); the irreducible representations play a role in the subject analogous to those of prime numbers in multiplicative number theory. By the principle of infinite descent, every finite-dimensional unitary representation is then expressible (perhaps non-uniquely) as the direct sum of irreducible representations.

The Peter-Weyl theorem asserts, among other things, that the same claim is true for the regular representation:

Theorem 1 (Peter-Weyl theorem) Let {G} be a compact group. Then the regular representation {\tau: G \rightarrow U(L^2(G))} is isomorphic to the direct sum of irreducible representations. In fact, one has {\tau \equiv \bigoplus_{\xi \in \hat G} \rho_\xi^{\oplus \hbox{dim}(V_\xi)}}, where {(\rho_\xi)_{\xi \in \hat G}} is an enumeration of the irreducible finite-dimensional unitary representations {\rho_\xi: G \rightarrow U(V_\xi)} of {G} (up to isomorphism). (It is not difficult to see that such an enumeration exists.)

In the case when {G} is abelian, the Peter-Weyl theorem is a consequence of the Plancherel theorem; in that case, the irreducible representations are all one dimensional, and are thus indexed by the space {\hat G} of characters {\xi: G \rightarrow {\bf R}/{\bf Z}} (i.e. continuous homomorphisms into the unit circle {{\bf R}/{\bf Z}}), known as the Pontryagin dual of {G}. (See for instance my lecture notes on the Fourier transform.) Conversely, the Peter-Weyl theorem can be used to deduce the Plancherel theorem for compact groups, as well as other basic results in Fourier analysis on these groups, such as the Fourier inversion formula.

Because the regular representation is faithful (i.e. injective), a corollary of the Peter-Weyl theorem (and a classical theorem of Cartan) is that every compact group can be expressed as the inverse limit of Lie groups, leading to a solution to Hilbert’s fifth problem in the compact case. Furthermore, the compact case is then an important building block in the more general theory surrounding Hilbert’s fifth problem, and in particular a result of Yamabe that any locally compact group contains an open subgroup that is the inverse limit of Lie groups.

I’ve recently become interested in the theory around Hilbert’s fifth problem, due to the existence of a correspondence principle between locally compact groups and approximate groups, which play a fundamental role in arithmetic combinatorics. I hope to elaborate upon this correspondence in a subsequent post, but I will mention that versions of this principle play a crucial role in Gromov’s proof of his theorem on groups of polynomial growth (discussed previously on this blog), and in a more recent paper of Hrushovski on approximate groups (also discussed previously). It is also analogous in many ways to the more well-known Furstenberg correspondence principle between ergodic theory and combinatorics (also discussed previously).

Because of the above motivation, I have decided to write some notes on how the Peter-Weyl theorem is proven. This is utterly standard stuff in abstract harmonic analysis; these notes are primarily for my own benefit, but perhaps they may be of interest to some readers also.

Read the rest of this entry »

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 2,328 other followers