You are currently browsing the tag archive for the ‘Elias Stein’ tag.

Just a short note that the memorial article “Analysis and applications: The mathematical work of Elias Stein” has just been published in the Bulletin of the American Mathematical Society.  This article was a collective effort led by Charlie Fefferman, Alex Ionescu, Steve Wainger and myself to describe the various mathematical contributions of Elias Stein, who passed away in December 2018; it also features contributions from Loredana Lanzani, Akos Magyar, Mariusz Mirek, Alexander Nagel, Duong Phong, Lillian Pierce, Fulvio Ricci, Christopher Sogge, and Brian Street.  (My contribution was mostly focused on Stein’s contribution to restriction theory.)

I was deeply saddened to learn that Elias Stein died yesterday, aged 87.

I have talked about some of Eli’s older mathematical work in these blog posts.  He continued to be quite active mathematically in recent years, for instance finishing six papers (with various co-authors including Jean Bourgain, Mariusz Mirek, Błażej Wróbel, and Pavel Zorin-Kranich) in just this year alone.  I last met him at Wrocław, Poland last September for a conference in his honour; he was in good health (and good spirits) then.   Here is a picture of Eli together with several of his students (including myself) who were at that meeting (taken from the conference web site):

S6301678 Eli was an amazingly effective advisor; throughout my graduate studies I think he never had fewer than five graduate students, and there was often a line outside his door when he was meeting with students such as myself.   (The Mathematics Geneaology Project lists 52 students of Eli, but if anything this is an under-estimate.)  My weekly meetings with Eli would tend to go something like this: I would report on all the many different things I had tried over the past week, without much success, to solve my current research problem; Eli would listen patiently to everything I said, concentrate for a moment, and then go over to his filing cabinet and fish out a preprint to hand to me, saying “I think the authors in this paper encountered similar problems and resolved it using Method X”.  I would then go back to my office and read the preprint, and indeed they had faced something similar and I could often adapt the techniques there to resolve my immediate obstacles (only to encounter further ones for the next week, but that’s the way research tends to go, especially as a graduate student).  Amongst other things, these meetings impressed upon me the value of mathematical experience, by being able to make more key progress on a problem in a handful of minutes than I was able to accomplish in a whole week.  (There is a well known story about the famous engineer Charles Steinmetz fixing a broken piece of machinery by making a chalk mark; my meetings with Eli often had a similar feel to them.)

Eli’s lectures were always masterpieces of clarity.  In one hour, he would set up a theorem, motivate it, explain the strategy, and execute it flawlessly; even after twenty years of teaching my own classes, I have yet to figure out his secret of somehow always being able to arrive at the natural finale of a mathematical presentation at the end of each hour without having to improvise at least a little bit halfway during the lecture.  The clear and self-contained nature of his lectures (and his many books) were a large reason why I decided to specialise as a graduate student in harmonic analysis (though I would eventually return to other interests, such as analytic number theory, many years after my graduate studies).

Looking back at my time with Eli, I now realise that he was extraordinarily patient and understanding with the brash and naive teenager he had to meet with every week.  A key turning point in my own career came after my oral qualifying exams, in which I very nearly failed due to my overconfidence and lack of preparation, particularly in my chosen specialty of harmonic analysis.  After the exam, he sat down with me and told me, as gently and diplomatically as possible, that my performance was a disappointment, and that I seriously needed to solidify my mathematical knowledge.  This turned out to be exactly what I needed to hear; I got motivated to actually work properly so as not to disappoint my advisor again.

So many of us in the field of harmonic analysis were connected to Eli in one way or another; the field always felt to me like a large extended family, with Eli as one of the patriarchs.  He will be greatly missed.

[UPDATE: Here is Princeton’s obituary for Elias Stein.]

A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm {\|T\|_{op}} of a linear map {T: H \rightarrow H'} between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case {T: {\bf C}^m \rightarrow {\bf C}^n} is of interest, as this operator norm is the same as the largest singular value {\sigma_1(A)} of the {n \times m} matrix {A} associated to {T}.

In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an {n \times n} diagonal matrix {D = \hbox{diag}(\lambda_1,\ldots,\lambda_n)}. In this case, the operator norm is simply the supremum norm of the diagonal coefficients:

\displaystyle  \|D\|_{op} = \sup_{1 \leq i \leq n} |\lambda_i|. \ \ \ \ \ (1)

A variant of (1) is Schur’s test, which for simplicity we will phrase in the setting of finite-dimensional operators {T: {\bf C}^m \rightarrow {\bf C}^n} given by a matrix {A = (a_{ij})_{1 \leq i \leq n; 1 \leq j \leq m}} via the usual formula

\displaystyle  T (x_j)_{j=1}^m := ( \sum_{j=1}^m a_{ij} x_j )_{i=1}^n.

A simple version of this test is as follows: if all the absolute row sums and columns sums of {A} are bounded by some constant {M}, thus

\displaystyle  \sum_{j=1}^m |a_{ij}| \leq M \ \ \ \ \ (2)

for all {1 \leq i \leq n} and

\displaystyle  \sum_{i=1}^n |a_{ij}| \leq M \ \ \ \ \ (3)

for all {1 \leq j \leq m}, then

\displaystyle  \|T\|_{op} = \|A\|_{op} \leq M \ \ \ \ \ (4)

(note that this generalises (the upper bound in) (1).) Indeed, to see (4), it suffices by duality and homogeneity to show that

\displaystyle  |\sum_{i=1}^n (\sum_{j=1}^m a_{ij} x_j) y_i| \leq M

whenever {(x_j)_{j=1}^m} and {(y_i)_{i=1}^n} are sequences with {\sum_{j=1}^m |x_j|^2 = \sum_{i=1}^n |y_i|^2 = 1}; but this easily follows from the arithmetic mean-geometric mean inequality

\displaystyle  |a_{ij} x_j) y_i| \leq \frac{1}{2} |a_{ij}| |x_i|^2 + \frac{1}{2} |a_{ij}| |y_j|^2

and (2), (3).

Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients {a_{ij}}, as opposed to just their magnitudes {|a_{ij}|}) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients {\lambda_i} or {a_{ij}} are replaced by operators instead.

To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix

\displaystyle  A = \begin{pmatrix} \Lambda_1 & 0 & \ldots & 0 \\ 0 & \Lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \Lambda_n \end{pmatrix} \ \ \ \ \ (5)

where each {\Lambda_i} is now a {m_i \times m_i} matrix, and so {A} is an {m \times m} matrix with {m := m_1 + \ldots +m_n}. Then we have

\displaystyle  \|A\|_{op} = \sup_{1 \leq i \leq n} \|\Lambda_i\|_{op}. \ \ \ \ \ (6)

Indeed, the lower bound is trivial (as can be seen by testing {A} on vectors which are supported on the {i^{th}} block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition

\displaystyle  {\bf C}^m \equiv \bigoplus_{i=1}^m {\bf C}^{m_i} \ \ \ \ \ (7)

to decompose an arbitrary vector {x \in {\bf C}^m} as

\displaystyle  x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

with {x_i \in {\bf C}^{m_i}}, in which case we have

\displaystyle  Ax = \begin{pmatrix} \Lambda_1 x_1 \\ \Lambda_2 x_2 \\ \vdots \\ \Lambda_n x_n \end{pmatrix}

and the upper bound in (6) then follows from a simple computation.

The operator {T} associated to the matrix {A} in (5) can be viewed as a sum {T = \sum_{i=1}^n T_i}, where each {T_i} corresponds to the {\Lambda_i} block of {A}, in which case (6) can also be written as

\displaystyle  \|T\|_{op} = \sup_{1 \leq i \leq n} \|T_i\|_{op}. \ \ \ \ \ (8)

When {n} is large, this is a significant improvement over the triangle inequality, which merely gives

\displaystyle  \|T\|_{op} \leq \sum_{1 \leq i \leq n} \|T_i\|_{op}.

The reason for this gain can ultimately be traced back to the “orthogonality” of the {T_i}; that they “occupy different columns” and “different rows” of the range and domain of {T}. This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities

\displaystyle  T_i^* T_j = 0 \ \ \ \ \ (9)


\displaystyle  T_i T^* j = 0 \ \ \ \ \ (10)

whenever {i \neq j}. (The first identity asserts that the ranges of the {T_i} are orthogonal to each other, and the second asserts that the coranges of the {T_i} (the ranges of the adjoints {T_i^*}) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).

The Cotlar-Stein lemma is an extension of this observation to the case where the {T_i} are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have

Lemma 1 (Cotlar-Stein lemma) Let {T_1,\ldots,T_n: H \rightarrow H'} be a finite sequence of bounded linear operators from one Hilbert space {H} to another {H'}, obeying the bounds

\displaystyle  \sum_{j=1}^n \| T_i T_j^* \|_{op}^{1/2} \leq M \ \ \ \ \ (11)


\displaystyle  \sum_{j=1}^n \| T_i^* T_j \|_{op}^{1/2} \leq M \ \ \ \ \ (12)

for all {i=1,\ldots,n} and some {M > 0} (compare with (2), (3)). Then one has

\displaystyle  \| \sum_{i=1}^n T_i \|_{op} \leq M. \ \ \ \ \ (13)

Note from the basic {TT^*} identity

\displaystyle  \|T\|_{op} = \|TT^* \|_{op}^{1/2} = \|T^* T\|_{op}^{1/2} \ \ \ \ \ (14)

that the hypothesis (11) (or (12)) already gives the bound

\displaystyle  \|T_i\|_{op} \leq M \ \ \ \ \ (15)

on each component {T_i} of {T}, which by the triangle inequality gives the inferior bound

\displaystyle  \| \sum_{i=1}^n T_i \|_{op} \leq nM;

the point of the Cotlar-Stein lemma is that the dependence on {n} in this bound is eliminated in (13), which in particular makes the bound suitable for extension to the limit {n \rightarrow \infty} (see Remark 1 below).

The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.

The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators {T} which “do not mix scales together too much”, in that operators {T} map functions “that oscillate at a given scale {2^{-i}}” to functions that still mostly oscillate at the same scale {2^{-i}}. In that case, one can often split {T} into components {T_i} which essentically capture the scale {2^{-i}} behaviour, and understanding {L^2} boundedness properties of {T} then reduces to establishing the boundedness of the simpler operators {T_i} (and of establishing a sufficient decay in products such as {T_i^* T_j} or {T_i T_j^*} when {i} and {j} are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the {T_i}, but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition {T = \sum_i T_i} in a fashion that is tailored to the particular operator {T}, and is not necessarily dictated by Fourier-analytic considerations.

Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of {T} is understood through the operator norm of a large power of {T} (or more precisely, of its self-adjoint square {TT^*} or {T^* T}). Indeed, from an iteration of (14) we see that for any natural number {N}, one has

\displaystyle  \|T\|_{op}^{2N} = \| (TT^*)^N \|_{op}. \ \ \ \ \ (16)

To estimate the right-hand side, we expand out the right-hand side and apply the triangle inequality to bound it by

\displaystyle  \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^* \|_{op}. \ \ \ \ \ (17)

Recall that when we applied the triangle inequality directly to {T}, we lost a factor of {n} in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.

To bound (17), we use the basic inequality {\|ST\|_{op} \leq \|S\|_{op} \|T\|_{op}} in two different ways. If we group the product {T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^*} in pairs, we can bound the summand of (17) by

\displaystyle  \| T_{i_1} T_{j_1}^* \|_{op} \ldots \| T_{i_N} T_{j_N}^* \|_{op}.

On the other hand, we can group the product by pairs in another way, to obtain the bound of

\displaystyle  \| T_{i_1} \|_{op} \| T_{j_1}^* T_{i_2} \|_{op} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op} \| T_{j_N}^* \|_{op}.

We bound {\| T_{i_1} \|_{op}} and {\| T_{j_N}^* \|_{op}} crudely by {M} using (15). Taking the geometric mean of the above bounds, we can thus bound (17) by

\displaystyle  M \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* \|_{op}^{1/2} \| T_{j_1}^* T_{i_2} \|_{op}^{1/2} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op}^{1/2} \| T_{i_N} T_{j_N}^* \|_{op}^{1/2}.

If we then sum this series first in {j_N}, then in {i_N}, then moving back all the way to {i_1}, using (11) and (12) alternately, we obtain a final bound of

\displaystyle  n M^{2N}

for (16). Taking {N^{th}} roots, we obtain

\displaystyle  \|T\|_{op} \leq n^{1/2N} M.

Sending {N \rightarrow \infty}, we obtain the claim.

Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums {\sum_{i=1}^\infty T_i} (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any {f \in H}, the sum {\sum_{i=1}^\infty T_i f} is unconditionally convergent in {H'} (and furthermore has bounded {2}-variation), and the resulting operator {\sum_{i=1}^\infty T_i} is a bounded linear operator with an operator norm bound on {M}.

Remark 2 If we specialise to the case where all the {T_i} are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.

Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality

\displaystyle  \|A\|_{op}^{2N} \leq \hbox{tr}( (AA^*)^N )

(which follows easily from the singular value decomposition), we can bound {\|A\|_{op}^{2N}} by

\displaystyle  \sum_{i_1,\ldots,j_N \in \{1,\ldots,n\}} a_{i_1,j_1} \overline{a_{j_1,i_2}} \ldots a_{i_N,j_N} \overline{a_{j_N,i_1}}.

Estimating the other two terms in the summand by {M}, and then repeatedly summing the indices one at a time as before, we obtain

\displaystyle  \|A\|_{op}^{2N} \leq n M^{2N}

and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.

If {f: {\bf R}^d \rightarrow {\bf C}} is a locally integrable function, we define the Hardy-Littlewood maximal function {Mf: {\bf R}^d \rightarrow {\bf C}} by the formula

\displaystyle  Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,

where {B(x,r)} is the ball of radius {r} centred at {x}, and {|E|} denotes the measure of a set {E}. The Hardy-Littlewood maximal inequality asserts that

\displaystyle  |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)

for all {f\in L^1({\bf R}^d)}, all {\lambda > 0}, and some constant {C_d > 0} depending only on {d}. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem

\displaystyle  \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)

for all {f \in L^1({\bf R}^d)} and almost every {x \in {\bf R}^d}. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality {\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}) we see that

\displaystyle  \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)

for all {p > 1} and {f \in L^p({\bf R}^d)}, and some constant {C_{d,p}} depending on {d} and {p}.

The exact dependence of {C_{d,p}} on {d} and {p} is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form {C_d = C^d} for some absolute constant {C>1}. Inserting this into the Marcinkiewicz theorem, one obtains a constant {C_{d,p}} of the form {C_{d,p} = \frac{C^d}{p-1}} for some {C>1} (and taking {p} bounded away from infinity, for simplicity). The dependence on {p} is about right, but the dependence on {d} should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of {d}:

Theorem 1 One can take {C_{d,p} = C_p} for each {p>1}, where {C_p} depends only on {p}.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

\displaystyle  M_S f(x) := \sup_{r>0} A_r |f|(x)

where {A_r} are the spherical averaging operators

\displaystyle  A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)

and {d\sigma^{d-1}} is normalised surface measure on the sphere {S^{d-1}}. Because this is an uncountable supremum, and the averaging operators {A_r} do not have good continuity properties in {r}, it is not a priori obvious that {M_S f} is even a measurable function for, say, locally integrable {f}; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions {f}. The Stein maximal theorem for the spherical maximal function then asserts that if {d \geq 3} and {p > \frac{d}{d-1}}, then we have

\displaystyle  \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)

for all (continuous) {f \in L^p({\bf R}^d)}. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence {\lim_{r \rightarrow 0} A_r f(x) = f(x)} of the spherical averages for any {f \in L^p({\bf R}^d)} when {d \geq 3} and {p > \frac{d}{d-1}}, although we will not focus on this application here.)

The condition {p > \frac{d}{d-1}} can be seen to be necessary as follows. Take {f} to be any fixed bump function. A brief calculation then shows that {M_S f(x)} decays like {|x|^{1-d}} as {|x| \rightarrow \infty}, and hence {M_S f} does not lie in {L^p({\bf R}^d)} unless {p > \frac{d}{d-1}}. By taking {f} to be a rescaled bump function supported on a small ball, one can show that the condition {p > \frac{d}{d-1}} is necessary even if we replace {{\bf R}^d} with a compact region (and similarly restrict the radius parameter {r} to be bounded). The condition {d \geq 3} however is not quite necessary; the result is also true when {d=2}, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator {Mf}, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

\displaystyle  Mf(x) \leq M_S f(x)

for any (continuous) {f}, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant {C_{p,d}}. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of {L^p({\bf R}^d)} by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when {d \geq 3} and {p > \frac{d}{d-1}}; and secondly, the constant {C_{d,p}} in that theorem still depends on dimension {d}. The first objection can be easily disposed of, for if {p>1}, then the hypotheses {d \geq 3} and {p > \frac{d}{d-1}} will automatically be satisfied for {d} sufficiently large (depending on {p}); note that the case when {d} is bounded (with a bound depending on {p}) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant {C_{d,p}} in (3) depends on {d}. However, here we can use the method of rotations to show that the constants {C_{p,d}} can be taken to be non-increasing (and hence bounded) in {d}. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that {C_{d+1,p} \leq C_{d,p}}, in the sense that any bound of the form

\displaystyle  \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)

for the {d}-dimensional spherical maximal function, implies the same bound

\displaystyle  \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)

for the {d+1}-dimensional spherical maximal function, with exactly the same constant {A}. For any direction {\omega_0 \in S^d \subset {\bf R}^{d+1}}, consider the averaging operators

\displaystyle  M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)

for any continuous {f: {\bf R}^{d+1} \rightarrow {\bf C}}, where

\displaystyle  A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)

where {U_{\omega_0}} is some orthogonal transformation mapping the sphere {S^{d-1}} to the sphere {S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}; the exact choice of orthogonal transformation {U_{\omega_0}} is irrelevant due to the rotation-invariance of surface measure {d\sigma^{d-1}} on the sphere {S^{d-1}}. A simple application of Fubini’s theorem (after first rotating {\omega_0} to be, say, the standard unit vector {e_d}) using (4) then shows that

\displaystyle  \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)

uniformly in {\omega_0}. On the other hand, by viewing the {d}-dimensional sphere {S^d} as an average of the spheres {S^{d-1,\omega_0}}, we have the identity

\displaystyle  A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of {f} on the sphere {\{ y \in {\bf R}^{d+1}: |y-x|=r\}}. This implies that

\displaystyle  M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant {C_d} for the weak {(1,1)} inequality (1) is independent of dimension, as the weak {L^1} quasinorm {\| \|_{L^{1,\infty}}} is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether {C_d} in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take {C_d = Cd} for some absolute constant {C}, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function

\displaystyle  \sup_{t > 0} e^{t\Delta} |f|(x).

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type {(1,1)} with a constant of {1}, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls {B(x,r)} with cubes, then the weak {(1,1)} constant {C_d} must go to infinity as {d \rightarrow \infty}.

Read the rest of this entry »

Suppose one has a measure space {X = (X, {\mathcal B}, \mu)} and a sequence of operators {T_n: L^p(X) \rightarrow L^p(X)} that are bounded on some {L^p(X)} space, with {1 \leq p < \infty}. Suppose that on some dense subclass of functions {f} in {L^p(X)} (e.g. continuous compactly supported functions, if the space {X} is reasonable), one already knows that {T_n f} converges pointwise almost everywhere to some limit {Tf}, for another bounded operator {T: L^p(X) \rightarrow L^p(X)} (e.g. {T} could be the identity operator). What additional ingredient does one need to pass to the limit and conclude that {T_n f} converges almost everywhere to {Tf} for all {f} in {L^p(X)} (and not just for {f} in a dense subclass)?

One standard way to proceed here is to study the maximal operator

\displaystyle T_* f(x) := \sup_n |T_n f(x)|

and aim to establish a weak-type maximal inequality

\displaystyle \| T_* f \|_{L^{p,\infty}(X)} \leq C \| f \|_{L^p(X)} \ \ \ \ \ (1)


for all {f \in L^p(X)} (or all {f} in the dense subclass), and some constant {C}, where {L^{p,\infty}} is the weak {L^p} norm

\displaystyle \|f\|_{L^{p,\infty}(X)} := \sup_{t > 0} t \mu( \{ x \in X: |f(x)| \geq t \})^{1/p}.

A standard approximation argument using (1) then shows that {T_n f} will now indeed converge to {Tf} pointwise almost everywhere for all {f} in {L^p(X)}, and not just in the dense subclass. See for instance these lecture notes of mine, in which this method is used to deduce the Lebesgue differentiation theorem from the Hardy-Littlewood maximal inequality. This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it is strictly necessary. In particular, is it possible to have a pointwise convergence result {T_n f \mapsto T f} without being able to obtain a weak-type maximal inequality of the form (1)?

In the case of norm convergence (in which one asks for {T_n f} to converge to {Tf} in the {L^p} norm, rather than in the pointwise almost everywhere sense), the answer is no, thanks to the uniform boundedness principle, which among other things shows that norm convergence is only possible if one has the uniform bound

\displaystyle \sup_n \| T_n f \|_{L^p(X)} \leq C \| f \|_{L^p(X)} \ \ \ \ \ (2)


for some {C>0} and all {f \in L^p(X)}; and conversely, if one has the uniform bound, and one has already established norm convergence of {T_n f} to {Tf} on a dense subclass of {L^p(X)}, (2) will extend that norm convergence to all of {L^p(X)}.

Returning to pointwise almost everywhere convergence, the answer in general is “yes”. Consider for instance the rank one operators

\displaystyle T_n f(x) := 1_{[n,n+1]} \int_0^1 f(y)\ dy

from {L^1({\bf R})} to {L^1({\bf R})}. It is clear that {T_n f} converges pointwise almost everywhere to zero as {n \rightarrow \infty} for any {f \in L^1({\bf R})}, and the operators {T_n} are uniformly bounded on {L^1({\bf R})}, but the maximal function {T_*} does not obey (1). One can modify this example in a number of ways to defeat almost any reasonable conjecture that something like (1) should be necessary for pointwise almost everywhere convergence.

In spite of this, a remarkable observation of Stein, now known as Stein’s maximal principle, asserts that the maximal inequality is necessary to prove pointwise almost everywhere convergence, if one is working on a compact group and the operators {T_n} are translation invariant, and if the exponent {p} is at most {2}:

Theorem 1 (Stein maximal principle) Let {G} be a compact group, let {X} be a homogeneous space of {G} with a finite Haar measure {\mu}, let {1\leq p \leq 2}, and let {T_n: L^p(X) \rightarrow L^p(X)} be a sequence of bounded linear operators commuting with translations, such that {T_n f} converges pointwise almost everywhere for each {f \in L^p(X)}. Then (1) holds.

This is not quite the most general vesion of the principle; some additional variants and generalisations are given in the original paper of Stein. For instance, one can replace the discrete sequence {T_n} of operators with a continuous sequence {T_t} without much difficulty. As a typical application of this principle, we see that Carleson’s celebrated theorem that the partial Fourier series {\sum_{n=-N}^N \hat f(n) e^{2\pi i nx}} of an {L^2({\bf R}/{\bf Z})} function {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} converge almost everywhere is in fact equivalent to the estimate

\displaystyle \| \sup_{N>0} |\sum_{n=-N}^N \hat f(n) e^{2\pi i n\cdot}|\|_{L^{2,\infty}({\bf R}/{\bf Z})} \leq C \|f\|_{L^2({\bf R}/{\bf Z})}. \ \ \ \ \ (3)


And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (3), and Stein’s maximal principle strongly suggests that this is the optimal way to try to prove this theorem.

On the other hand, the theorem does fail for {p>2}, and almost everywhere convergence results in {L^p} for {p>2} can be proven by other methods than weak {(p,p)} estimates. For instance, the convergence of Bochner-Riesz multipliers in {L^p({\bf R}^n)} for any {n} (and for {p} in the range predicted by the Bochner-Riesz conjecture) was verified for {p > 2} by Carbery, Rubio de Francia, and Vega, despite the fact that the weak {(p,p)} of even a single Bochner-Riesz multiplier, let alone the maximal function, has still not been completely verified in this range. (Carbery, Rubio de Francia and Vega use weighted {L^2} estimates for the maximal Bochner-Riesz operator, rather than {L^p} type estimates.) For {p \leq 2}, though, Stein’s principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz means is equivalent to the weak {(p,p)} estimate (1).

Stein’s principle is restricted to compact groups (such as the torus {({\bf R}/{\bf Z})^n} or the rotation group {SO(n)}) and their homogeneous spaces (such as the torus {({\bf R}/{\bf Z})^n} again, or the sphere {S^{n-1}}). As stated, the principle fails in the noncompact setting; for instance, in {{\bf R}}, the convolution operators {T_n f := f * 1_{[n,n+1]}} are such that {T_n f} converges pointwise almost everywhere to zero for every {f \in L^1({\bf R}^n)}, but the maximal function is not of weak-type {(1,1)}. However, in many applications on non-compact domains, the {T_n} are “localised” enough that one can transfer from a non-compact setting to a compact setting and then apply Stein’s principle. For instance, Carleson’s theorem on the real line {{\bf R}} is equivalent to Carleson’s theorem on the circle {{\bf R}/{\bf Z}} (due to the localisation of the Dirichlet kernels), which as discussed before is equivalent to the estimate (3) on the circle, which by a scaling argument is equivalent to the analogous estimate on the real line {{\bf R}}.

Stein’s argument from his 1961 paper can be viewed nowadays as an application of the probabilistic method; starting with a sequence of increasingly bad counterexamples to the maximal inequality (1), one randomly combines them together to create a single “infinitely bad” counterexample. To make this idea work, Stein employs two basic ideas:

  1. The random rotations (or random translations) trick. Given a subset {E} of {X} of small but positive measure, one can randomly select about {|X|/|E|} translates {g_i E} of {E} that cover most of {X}.
  2. The random sums trick Given a collection {f_1,\ldots,f_n: X \rightarrow {\bf C}} of signed functions that may possibly cancel each other in a deterministic sum {\sum_{i=1}^n f_i}, one can perform a random sum {\sum_{i=1}^n \pm f_i} instead to obtain a random function whose magnitude will usually be comparable to the square function {(\sum_{i=1}^n |f_i|^2)^{1/2}}; this can be made rigorous by concentration of measure results, such as Khintchine’s inequality.

These ideas have since been used repeatedly in harmonic analysis. For instance, I used the random rotations trick in a recent paper with Jordan Ellenberg and Richard Oberlin on Kakeya-type estimates in finite fields. The random sums trick is by now a standard tool to build various counterexamples to estimates (or to convergence results) in harmonic analysis, for instance being used by Fefferman in his famous paper disproving the boundedness of the ball multiplier on {L^p({\bf R}^n)} for {p \neq 2}, {n \geq 2}. Another use of the random sum trick is to show that Theorem 1 fails once {p>2}; see Stein’s original paper for details.

Another use of the random rotations trick, closely related to Theorem 1, is the Nikishin-Stein factorisation theorem. Here is Stein’s formulation of this theorem:

Theorem 2 (Stein factorisation theorem) Let {G} be a compact group, let {X} be a homogeneous space of {G} with a finite Haar measure {\mu}, let {1\leq p \leq 2} and {q>0}, and let {T: L^p(X) \rightarrow L^q(X)} be a bounded linear operator commuting with translations and obeying the estimate

\displaystyle \|T f \|_{L^q(X)} \leq A \|f\|_{L^p(X)}

for all {f \in L^p(X)} and some {A>0}. Then {T} also maps {L^p(X)} to {L^{p,\infty}(X)}, with

\displaystyle \|T f \|_{L^{p,\infty}(X)} \leq C_{p,q} A \|f\|_{L^p(X)}

for all {f \in L^p(X)}, with {C_{p,q}} depending only on {p, q}.

This result is trivial with {q \geq p}, but becomes useful when {q<p}. In this regime, the translation invariance allows one to freely “upgrade” a strong-type {(p,q)} result to a weak-type {(p,p)} result. In other words, bounded linear operators from {L^p(X)} to {L^q(X)} automatically factor through the inclusion {L^{p,\infty}(X) \subset L^q(X)}, which helps explain the name “factorisation theorem”. Factorisation theory has been developed further by many authors, including Maurey and Pisier.

Stein’s factorisation theorem (or more precisely, a variant of it) is useful in the theory of Kakeya and restriction theorems in Euclidean space, as first observed by Bourgain.

In 1970, Nikishin obtained the following generalisation of Stein’s factorisation theorem in which the translation-invariance hypothesis can be dropped, at the cost of excluding a set of small measure:

Theorem 3 (Nikishin-Stein factorisation theorem) Let {X} be a finite measure space, let {1\leq p \leq 2} and {q>0}, and let {T: L^p(X) \rightarrow L^q(X)} be a bounded linear operator obeying the estimate

\displaystyle \|T f \|_{L^q(X)} \leq A \|f\|_{L^p(X)}

for all {f \in L^p(X)} and some {A>0}. Then for any {\epsilon > 0}, there exists a subset {E} of {X} of measure at most {\epsilon} such that

\displaystyle \|T f \|_{L^{p,\infty}(X \backslash E)} \leq C_{p,q,\epsilon} A \|f\|_{L^p(X)} \ \ \ \ \ (4)


for all {f \in L^p(X)}, with {C_{p,q,\epsilon}} depending only on {p, q, \epsilon}.

One can recover Theorem 2 from Theorem 3 by an averaging argument to eliminate the exceptional set; we omit the details.

Read the rest of this entry »

In a few weeks, Princeton University will host a conference in Analysis and Applications in honour of the 80th birthday of Elias Stein (though, technically, Eli’s 80th birthday was actually in January). As one of Eli’s students, I was originally scheduled to be one of the speakers at this conference; but unfortunately, for family reasons I will be unable to attend. In lieu of speaking at this conference, I have decided to devote some space on this blog for this month to present some classic results of Eli from his many decades of work in harmonic analysis, ergodic theory, several complex variables, and related topics. My choice of selections here will be a personal and idiosyncratic one; the results I present are not necessarily the “best” or “deepest” of his results, but are ones that I find particularly elegant and appealing. (There will also inevitably be some overlap here with Charlie Fefferman’s article “Selected theorems by Eli Stein“, which not coincidentally was written for Stein’s 60th birthday conference in 1991.)

In this post I would like to describe one of Eli Stein’s very first results that is still used extremely widely today, namely his interpolation theorem from 1956 (and its refinement, the Fefferman-Stein interpolation theorem from 1972). This is a deceptively innocuous, yet remarkably powerful, generalisation of the classic Riesz-Thorin interpolation theorem which uses methods from complex analysis (and in particular, the Lindelöf theorem or the Phragmén-Lindelöf principle) to show that if a linear operator {T: L^{p_0}(X) + L^{p_1}(X) \rightarrow L^{q_0}(Y) + L^{q_1}(Y)} from one ({\sigma}-finite) measure space {X = (X,{\mathcal X},\mu)} to another {Y = (Y, {\mathcal Y}, \nu)} obeyed the estimates

\displaystyle  \| Tf \|_{L^{q_0}(Y)} \leq B_0 \|f\|_{L^{p_0}(X)} \ \ \ \ \ (1)

for all {f \in L^{p_0}(X)} and

\displaystyle  \| Tf \|_{L^{q_1}(Y)} \leq B_1 \|f\|_{L^{p_1}(X)} \ \ \ \ \ (2)

for all {f \in L^{p_1}(X)}, where {1 \leq p_0,p_1,q_0,q_1 \leq \infty} and {B_0,B_1 > 0}, then one automatically also has the interpolated estimates

\displaystyle  \| Tf \|_{L^{q_\theta}(Y)} \leq B_\theta \|f\|_{L^{p_\theta}(X)} \ \ \ \ \ (3)

for all {f \in L^{p_\theta}(X)} and {0 \leq \theta \leq 1}, where the quantities {p_\theta, q_\theta, B_\theta} are defined by the formulae

\displaystyle  \frac{1}{p_\theta} = \frac{1-\theta}{p_0} + \frac{\theta}{p_1}

\displaystyle  \frac{1}{q_\theta} = \frac{1-\theta}{q_0} + \frac{\theta}{q_1}

\displaystyle  B_\theta = B_0^{1-\theta} B_1^\theta.

The Riesz-Thorin theorem is already quite useful (it gives, for instance, by far the quickest proof of the Hausdorff-Young inequality for the Fourier transform, to name just one application), but it requires the same linear operator {T} to appear in (1), (2), and (3). Eli Stein realised, though, that due to the complex-analytic nature of the proof of the Riesz-Thorin theorem, it was possible to allow different linear operators to appear in (1), (2), (3), so long as the dependence was analytic. A bit more precisely: if one had a family {T_z} of operators which depended in an analytic manner on a complex variable {z} in the strip {\{ z \in {\bf C}: 0 \leq \hbox{Re}(z) \leq 1 \}} (thus, for any test functions {f, g}, the inner product {\langle T_z f, g \rangle} would be analytic in {z}) which obeyed some mild regularity assumptions (which are slightly technical and are omitted here), and one had the estimates

\displaystyle  \| T_{0+it} f \|_{L^{q_0}(Y)} \leq C_t \|f\|_{L^{p_0}(X)}


\displaystyle  \| T_{1+it} f \|_{L^{q_1}(Y)} \leq C_t\|f\|_{L^{p_1}(X)}

for all {t \in {\bf R}} and some quantities {C_t} that grew at most exponentially in {t} (actually, any growth rate significantly slower than the double-exponential {e^{\exp(\pi |t|)}} would suffice here), then one also has the interpolated estimates

\displaystyle  \| T_\theta f \|_{L^{q_\theta}(Y)} \leq C' \|f\|_{L^{p_\theta}(X)}

for all {0 \leq \theta \leq 1} and a constant {C'} depending only on {C, p_0, p_1, q_0, q_1}.

Read the rest of this entry »

In the third of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli presented a slightly different topic, which is work in preparation with Alex Nagel, Fulvio Ricci, and Steve Wainger, on algebras of singular integral operators which are sensitive to multiple different geometries in a nilpotent Lie group.

Read the rest of this entry »

In the second of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli expanded on the themes in the first lecture, in particular providing more details as to the recent (not yet published) results of Lanzani and Stein on the boundedness of the Cauchy integral on domains in several complex variables.

Read the rest of this entry »

The first Distinguished Lecture Series at UCLA for this academic year is given by Elias Stein (who, incidentally, was my graduate student advisor), who is lecturing on “Singular Integrals and Several Complex Variables: Some New Perspectives“.  The first lecture was a historical (and non-technical) survey of modern harmonic analysis (which, amazingly, was compressed into half an hour), followed by an introduction as to how this theory is currently in the process of being adapted to handle the basic analytical issues in several complex variables, a topic which in many ways is still only now being developed.  The second and third lectures will focus on these issues in greater depth.

As usual, any errors here are due to my transcription and interpretation of the lecture.

[Update, Oct 27: The slides from the talk are now available here.]

Read the rest of this entry »