You are currently browsing the tag archive for the ‘Elias Stein’ tag.
A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm of a linear map between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case is of interest, as this operator norm is the same as the largest singular value of the matrix associated to .
In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an diagonal matrix . In this case, the operator norm is simply the supremum norm of the diagonal coefficients:
whenever and are sequences with ; but this easily follows from the arithmetic mean-geometric mean inequality
Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients , as opposed to just their magnitudes ) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients or are replaced by operators instead.
To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix
Indeed, the lower bound is trivial (as can be seen by testing on vectors which are supported on the block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition
to decompose an arbitrary vector as
with , in which case we have
and the upper bound in (6) then follows from a simple computation.
When is large, this is a significant improvement over the triangle inequality, which merely gives
The reason for this gain can ultimately be traced back to the “orthogonality” of the ; that they “occupy different columns” and “different rows” of the range and domain of . This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities
whenever . (The first identity asserts that the ranges of the are orthogonal to each other, and the second asserts that the coranges of the (the ranges of the adjoints ) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).
The Cotlar-Stein lemma is an extension of this observation to the case where the are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have
on each component of , which by the triangle inequality gives the inferior bound
The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.
The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators which “do not mix scales together too much”, in that operators map functions “that oscillate at a given scale ” to functions that still mostly oscillate at the same scale . In that case, one can often split into components which essentically capture the scale behaviour, and understanding boundedness properties of then reduces to establishing the boundedness of the simpler operators (and of establishing a sufficient decay in products such as or when and are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the , but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition in a fashion that is tailored to the particular operator , and is not necessarily dictated by Fourier-analytic considerations.
Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of is understood through the operator norm of a large power of (or more precisely, of its self-adjoint square or ). Indeed, from an iteration of (14) we see that for any natural number , one has
Recall that when we applied the triangle inequality directly to , we lost a factor of in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.
On the other hand, we can group the product by pairs in another way, to obtain the bound of
for (16). Taking roots, we obtain
Sending , we obtain the claim.
Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any , the sum is unconditionally convergent in (and furthermore has bounded -variation), and the resulting operator is a bounded linear operator with an operator norm bound on .
Remark 2 If we specialise to the case where all the are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.
Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality
(which follows easily from the singular value decomposition), we can bound by
Estimating the other two terms in the summand by , and then repeatedly summing the indices one at a time as before, we obtain
and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.
If is a locally integrable function, we define the Hardy-Littlewood maximal function by the formula
for all , all , and some constant depending only on . By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem
for all and almost every . See for instance my lecture notes on this topic.
By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality ) we see that
for all and , and some constant depending on and .
The exact dependence of on and is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form for some absolute constant . Inserting this into the Marcinkiewicz theorem, one obtains a constant of the form for some (and taking bounded away from infinity, for simplicity). The dependence on is about right, but the dependence on should not be exponential.
The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function
where are the spherical averaging operators
and is normalised surface measure on the sphere . Because this is an uncountable supremum, and the averaging operators do not have good continuity properties in , it is not a priori obvious that is even a measurable function for, say, locally integrable ; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions . The Stein maximal theorem for the spherical maximal function then asserts that if and , then we have
for all (continuous) . We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence of the spherical averages for any when and , although we will not focus on this application here.)
The condition can be seen to be necessary as follows. Take to be any fixed bump function. A brief calculation then shows that decays like as , and hence does not lie in unless . By taking to be a rescaled bump function supported on a small ball, one can show that the condition is necessary even if we replace with a compact region (and similarly restrict the radius parameter to be bounded). The condition however is not quite necessary; the result is also true when , but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.
The Hardy-Littlewood maximal operator , which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality
for any (continuous) , which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant . (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of by a standard limiting argument.)
At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when and ; and secondly, the constant in that theorem still depends on dimension . The first objection can be easily disposed of, for if , then the hypotheses and will automatically be satisfied for sufficiently large (depending on ); note that the case when is bounded (with a bound depending on ) is already handled by the classical maximal inequality (2).
We still have to deal with the second objection, namely that constant in (3) depends on . However, here we can use the method of rotations to show that the constants can be taken to be non-increasing (and hence bounded) in . The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that , in the sense that any bound of the form
for the -dimensional spherical maximal function, with exactly the same constant . For any direction , consider the averaging operators
for any continuous , where
where is some orthogonal transformation mapping the sphere to the sphere ; the exact choice of orthogonal transformation is irrelevant due to the rotation-invariance of surface measure on the sphere . A simple application of Fubini’s theorem (after first rotating to be, say, the standard unit vector ) using (4) then shows that
uniformly in . On the other hand, by viewing the -dimensional sphere as an average of the spheres , we have the identity
indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of on the sphere . This implies that
Remark 1 Unfortunately, the method of rotations does not work to show that the constant for the weak inequality (1) is independent of dimension, as the weak quasinorm is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take for some absolute constant , by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function
The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type with a constant of , and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls with cubes, then the weak constant must go to infinity as .
Suppose one has a measure space and a sequence of operators that are bounded on some space, with . Suppose that on some dense subclass of functions in (e.g. continuous compactly supported functions, if the space is reasonable), one already knows that converges pointwise almost everywhere to some limit , for another bounded operator (e.g. could be the identity operator). What additional ingredient does one need to pass to the limit and conclude that converges almost everywhere to for all in (and not just for in a dense subclass)?
One standard way to proceed here is to study the maximal operator
A standard approximation argument using (1) then shows that will now indeed converge to pointwise almost everywhere for all in , and not just in the dense subclass. See for instance these lecture notes of mine, in which this method is used to deduce the Lebesgue differentiation theorem from the Hardy-Littlewood maximal inequality. This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it is strictly necessary. In particular, is it possible to have a pointwise convergence result without being able to obtain a weak-type maximal inequality of the form (1)?
In the case of norm convergence (in which one asks for to converge to in the norm, rather than in the pointwise almost everywhere sense), the answer is no, thanks to the uniform boundedness principle, which among other things shows that norm convergence is only possible if one has the uniform bound
for some and all ; and conversely, if one has the uniform bound, and one has already established norm convergence of to on a dense subclass of , (2) will extend that norm convergence to all of .
Returning to pointwise almost everywhere convergence, the answer in general is “yes”. Consider for instance the rank one operators
from to . It is clear that converges pointwise almost everywhere to zero as for any , and the operators are uniformly bounded on , but the maximal function does not obey (1). One can modify this example in a number of ways to defeat almost any reasonable conjecture that something like (1) should be necessary for pointwise almost everywhere convergence.
In spite of this, a remarkable observation of Stein, now known as Stein’s maximal principle, asserts that the maximal inequality is necessary to prove pointwise almost everywhere convergence, if one is working on a compact group and the operators are translation invariant, and if the exponent is at most :
Theorem 1 (Stein maximal principle) Let be a compact group, let be a homogeneous space of with a finite Haar measure , let , and let be a sequence of bounded linear operators commuting with translations, such that converges pointwise almost everywhere for each . Then (1) holds.
This is not quite the most general vesion of the principle; some additional variants and generalisations are given in the original paper of Stein. For instance, one can replace the discrete sequence of operators with a continuous sequence without much difficulty. As a typical application of this principle, we see that Carleson’s celebrated theorem that the partial Fourier series of an function converge almost everywhere is in fact equivalent to the estimate
And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (3), and Stein’s maximal principle strongly suggests that this is the optimal way to try to prove this theorem.
On the other hand, the theorem does fail for , and almost everywhere convergence results in for can be proven by other methods than weak estimates. For instance, the convergence of Bochner-Riesz multipliers in for any (and for in the range predicted by the Bochner-Riesz conjecture) was verified for by Carbery, Rubio de Francia, and Vega, despite the fact that the weak of even a single Bochner-Riesz multiplier, let alone the maximal function, has still not been completely verified in this range. (Carbery, Rubio de Francia and Vega use weighted estimates for the maximal Bochner-Riesz operator, rather than type estimates.) For , though, Stein’s principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz means is equivalent to the weak estimate (1).
Stein’s principle is restricted to compact groups (such as the torus or the rotation group ) and their homogeneous spaces (such as the torus again, or the sphere ). As stated, the principle fails in the noncompact setting; for instance, in , the convolution operators are such that converges pointwise almost everywhere to zero for every , but the maximal function is not of weak-type . However, in many applications on non-compact domains, the are “localised” enough that one can transfer from a non-compact setting to a compact setting and then apply Stein’s principle. For instance, Carleson’s theorem on the real line is equivalent to Carleson’s theorem on the circle (due to the localisation of the Dirichlet kernels), which as discussed before is equivalent to the estimate (3) on the circle, which by a scaling argument is equivalent to the analogous estimate on the real line .
Stein’s argument from his 1961 paper can be viewed nowadays as an application of the probabilistic method; starting with a sequence of increasingly bad counterexamples to the maximal inequality (1), one randomly combines them together to create a single “infinitely bad” counterexample. To make this idea work, Stein employs two basic ideas:
- The random rotations (or random translations) trick. Given a subset of of small but positive measure, one can randomly select about translates of that cover most of .
- The random sums trick Given a collection of signed functions that may possibly cancel each other in a deterministic sum , one can perform a random sum instead to obtain a random function whose magnitude will usually be comparable to the square function ; this can be made rigorous by concentration of measure results, such as Khintchine’s inequality.
These ideas have since been used repeatedly in harmonic analysis. For instance, I used the random rotations trick in a recent paper with Jordan Ellenberg and Richard Oberlin on Kakeya-type estimates in finite fields. The random sums trick is by now a standard tool to build various counterexamples to estimates (or to convergence results) in harmonic analysis, for instance being used by Fefferman in his famous paper disproving the boundedness of the ball multiplier on for , . Another use of the random sum trick is to show that Theorem 1 fails once ; see Stein’s original paper for details.
Another use of the random rotations trick, closely related to Theorem 1, is the Nikishin-Stein factorisation theorem. Here is Stein’s formulation of this theorem:
Theorem 2 (Stein factorisation theorem) Let be a compact group, let be a homogeneous space of with a finite Haar measure , let and , and let be a bounded linear operator commuting with translations and obeying the estimate
for all and some . Then also maps to , with
for all , with depending only on .
This result is trivial with , but becomes useful when . In this regime, the translation invariance allows one to freely “upgrade” a strong-type result to a weak-type result. In other words, bounded linear operators from to automatically factor through the inclusion , which helps explain the name “factorisation theorem”. Factorisation theory has been developed further by many authors, including Maurey and Pisier.
Stein’s factorisation theorem (or more precisely, a variant of it) is useful in the theory of Kakeya and restriction theorems in Euclidean space, as first observed by Bourgain.
In 1970, Nikishin obtained the following generalisation of Stein’s factorisation theorem in which the translation-invariance hypothesis can be dropped, at the cost of excluding a set of small measure:
In a few weeks, Princeton University will host a conference in Analysis and Applications in honour of the 80th birthday of Elias Stein (though, technically, Eli’s 80th birthday was actually in January). As one of Eli’s students, I was originally scheduled to be one of the speakers at this conference; but unfortunately, for family reasons I will be unable to attend. In lieu of speaking at this conference, I have decided to devote some space on this blog for this month to present some classic results of Eli from his many decades of work in harmonic analysis, ergodic theory, several complex variables, and related topics. My choice of selections here will be a personal and idiosyncratic one; the results I present are not necessarily the “best” or “deepest” of his results, but are ones that I find particularly elegant and appealing. (There will also inevitably be some overlap here with Charlie Fefferman’s article “Selected theorems by Eli Stein“, which not coincidentally was written for Stein’s 60th birthday conference in 1991.)
In this post I would like to describe one of Eli Stein’s very first results that is still used extremely widely today, namely his interpolation theorem from 1956 (and its refinement, the Fefferman-Stein interpolation theorem from 1972). This is a deceptively innocuous, yet remarkably powerful, generalisation of the classic Riesz-Thorin interpolation theorem which uses methods from complex analysis (and in particular, the Lindelöf theorem or the Phragmén-Lindelöf principle) to show that if a linear operator from one (-finite) measure space to another obeyed the estimates
for all and , where the quantities are defined by the formulae
The Riesz-Thorin theorem is already quite useful (it gives, for instance, by far the quickest proof of the Hausdorff-Young inequality for the Fourier transform, to name just one application), but it requires the same linear operator to appear in (1), (2), and (3). Eli Stein realised, though, that due to the complex-analytic nature of the proof of the Riesz-Thorin theorem, it was possible to allow different linear operators to appear in (1), (2), (3), so long as the dependence was analytic. A bit more precisely: if one had a family of operators which depended in an analytic manner on a complex variable in the strip (thus, for any test functions , the inner product would be analytic in ) which obeyed some mild regularity assumptions (which are slightly technical and are omitted here), and one had the estimates
for all and some quantities that grew at most exponentially in (actually, any growth rate significantly slower than the double-exponential would suffice here), then one also has the interpolated estimates
for all and a constant depending only on .
In the third of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli presented a slightly different topic, which is work in preparation with Alex Nagel, Fulvio Ricci, and Steve Wainger, on algebras of singular integral operators which are sensitive to multiple different geometries in a nilpotent Lie group.
In the second of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli expanded on the themes in the first lecture, in particular providing more details as to the recent (not yet published) results of Lanzani and Stein on the boundedness of the Cauchy integral on domains in several complex variables.
The first Distinguished Lecture Series at UCLA for this academic year is given by Elias Stein (who, incidentally, was my graduate student advisor), who is lecturing on “Singular Integrals and Several Complex Variables: Some New Perspectives“. The first lecture was a historical (and non-technical) survey of modern harmonic analysis (which, amazingly, was compressed into half an hour), followed by an introduction as to how this theory is currently in the process of being adapted to handle the basic analytical issues in several complex variables, a topic which in many ways is still only now being developed. The second and third lectures will focus on these issues in greater depth.
As usual, any errors here are due to my transcription and interpretation of the lecture.
[Update, Oct 27: The slides from the talk are now available here.]