You are currently browsing the tag archive for the ‘Elias Stein’ tag.

I was deeply saddened to learn that Elias Stein died yesterday, aged 87.

I have talked about some of Eli’s older mathematical work in these blog posts. He continued to be quite active mathematically in recent years, for instance finishing six papers (with various co-authors including Jean Bourgain, Mariusz Mirek, Błażej Wróbel, and Pavel Zorin-Kranich) in just this year alone. I last met him at Wrocław, Poland last September for a conference in his honour; he was in good health (and good spirits) then. Here is a picture of Eli together with several of his students (including myself) who were at that meeting (taken from the conference web site):

Eli was an amazingly effective advisor; throughout my graduate studies I think he never had fewer than five graduate students, and there was often a line outside his door when he was meeting with students such as myself. (The Mathematics Geneaology Project lists 52 students of Eli, but if anything this is an under-estimate.) My weekly meetings with Eli would tend to go something like this: I would report on all the many different things I had tried over the past week, without much success, to solve my current research problem; Eli would listen patiently to everything I said, concentrate for a moment, and then go over to his filing cabinet and fish out a preprint to hand to me, saying “I think the authors in this paper encountered similar problems and resolved it using Method X”. I would then go back to my office and read the preprint, and indeed they had faced something similar and I could often adapt the techniques there to resolve my immediate obstacles (only to encounter further ones for the next week, but that’s the way research tends to go, especially as a graduate student). Amongst other things, these meetings impressed upon me the value of mathematical experience, by being able to make more key progress on a problem in a handful of minutes than I was able to accomplish in a whole week. (There is a well known story about the famous engineer Charles Steinmetz fixing a broken piece of machinery by making a chalk mark; my meetings with Eli often had a similar feel to them.)

Eli’s lectures were always masterpieces of clarity. In one hour, he would set up a theorem, motivate it, explain the strategy, and execute it flawlessly; even after twenty years of teaching my own classes, I have yet to figure out his secret of somehow always being able to arrive at the natural finale of a mathematical presentation at the end of each hour without having to improvise at least a little bit halfway during the lecture. The clear and self-contained nature of his lectures (and his many books) were a large reason why I decided to specialise as a graduate student in harmonic analysis (though I would eventually return to other interests, such as analytic number theory, many years after my graduate studies).

Looking back at my time with Eli, I now realise that he was extraordinarily patient and understanding with the brash and naive teenager he had to meet with every week. A key turning point in my own career came after my oral qualifying exams, in which I very nearly failed due to my overconfidence and lack of preparation, particularly in my chosen specialty of harmonic analysis. After the exam, he sat down with me and told me, as gently and diplomatically as possible, that my performance was a disappointment, and that I seriously needed to solidify my mathematical knowledge. This turned out to be exactly what I needed to hear; I got motivated to actually work properly so as not to disappoint my advisor again.

So many of us in the field of harmonic analysis were connected to Eli in one way or another; the field always felt to me like a large extended family, with Eli as one of the patriarchs. He will be greatly missed.

[UPDATE: Here is Princeton’s obituary for Elias Stein.]

A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm of a linear map between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case is of interest, as this operator norm is the same as the largest singular value of the matrix associated to .

In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a *diagonal operator*, such as that associated to an diagonal matrix . In this case, the operator norm is simply the supremum norm of the diagonal coefficients:

A variant of (1) is Schur’s test, which for simplicity we will phrase in the setting of finite-dimensional operators given by a matrix via the usual formula

A simple version of this test is as follows: if all the absolute row sums and columns sums of are bounded by some constant , thus

(note that this generalises (the upper bound in) (1).) Indeed, to see (4), it suffices by duality and homogeneity to show that

whenever and are sequences with ; but this easily follows from the arithmetic mean-geometric mean inequality

Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the *phase* of the coefficients , as opposed to just their magnitudes ) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients or are replaced by operators instead.

To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a *block-diagonal* matrix

where each is now a matrix, and so is an matrix with . Then we have

Indeed, the lower bound is trivial (as can be seen by testing on vectors which are supported on the block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition

to decompose an arbitrary vector as

with , in which case we have

and the upper bound in (6) then follows from a simple computation.

The operator associated to the matrix in (5) can be viewed as a sum , where each corresponds to the block of , in which case (6) can also be written as

When is large, this is a significant improvement over the triangle inequality, which merely gives

The reason for this gain can ultimately be traced back to the “orthogonality” of the ; that they “occupy different columns” and “different rows” of the range and domain of . This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities

whenever . (The first identity asserts that the ranges of the are orthogonal to each other, and the second asserts that the coranges of the (the ranges of the adjoints ) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).

The *Cotlar-Stein lemma* is an extension of this observation to the case where the are merely *almost orthogonal* rather than *orthogonal*, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have

Lemma 1 (Cotlar-Stein lemma)Let be a finite sequence of bounded linear operators from one Hilbert space to another , obeying the boundsfor all and some (compare with (2), (3)). Then one has

that the hypothesis (11) (or (12)) already gives the bound

on each component of , which by the triangle inequality gives the inferior bound

the point of the Cotlar-Stein lemma is that the dependence on in this bound is eliminated in (13), which in particular makes the bound suitable for extension to the limit (see Remark 1 below).

The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.

The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators which “do not mix scales together too much”, in that operators map functions “that oscillate at a given scale ” to functions that still mostly oscillate at the same scale . In that case, one can often split into components which essentically capture the scale behaviour, and understanding boundedness properties of then reduces to establishing the boundedness of the simpler operators (and of establishing a sufficient decay in products such as or when and are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the , but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition in a fashion that is tailored to the particular operator , and is not necessarily dictated by Fourier-analytic considerations.

Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of is understood through the operator norm of a large power of (or more precisely, of its self-adjoint square or ). Indeed, from an iteration of (14) we see that for any natural number , one has

To estimate the right-hand side, we expand out the right-hand side and apply the triangle inequality to bound it by

Recall that when we applied the triangle inequality directly to , we lost a factor of in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.

To bound (17), we use the basic inequality in two different ways. If we group the product in pairs, we can bound the summand of (17) by

On the other hand, we can group the product by pairs in another way, to obtain the bound of

We bound and crudely by using (15). Taking the geometric mean of the above bounds, we can thus bound (17) by

If we then sum this series first in , then in , then moving back all the way to , using (11) and (12) alternately, we obtain a final bound of

for (16). Taking roots, we obtain

Sending , we obtain the claim.

Remark 1As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any , the sum is unconditionally convergent in (and furthermore has bounded -variation), and the resulting operator is a bounded linear operator with an operator norm bound on .

Remark 2If we specialise to the case where all the are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.

Remark 3One can prove Schur’s test by a similar method. Indeed, starting from the inequality(which follows easily from the singular value decomposition), we can bound by

Estimating the other two terms in the summand by , and then repeatedly summing the indices one at a time as before, we obtain

and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.

If is a locally integrable function, we define the Hardy-Littlewood maximal function by the formula

where is the ball of radius centred at , and denotes the measure of a set . The *Hardy-Littlewood maximal inequality* asserts that

for all , all , and some constant depending only on . By a standard density argument, this implies in particular that we have the *Lebesgue differentiation theorem*

for all and almost every . See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality ) we see that

for all and , and some constant depending on and .

The exact dependence of on and is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form for some absolute constant . Inserting this into the Marcinkiewicz theorem, one obtains a constant of the form for some (and taking bounded away from infinity, for simplicity). The dependence on is about right, but the dependence on should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of :

The argument is based on an earlier bound of Stein from 1976 on the *spherical maximal function*

where are the spherical averaging operators

and is normalised surface measure on the sphere . Because this is an uncountable supremum, and the averaging operators do not have good continuity properties in , it is not *a priori* obvious that is even a measurable function for, say, locally integrable ; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions . The Stein maximal theorem for the spherical maximal function then asserts that if and , then we have

for all (continuous) . We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence of the spherical averages for any when and , although we will not focus on this application here.)

The condition can be seen to be necessary as follows. Take to be any fixed bump function. A brief calculation then shows that decays like as , and hence does not lie in unless . By taking to be a rescaled bump function supported on a small ball, one can show that the condition is necessary even if we replace with a compact region (and similarly restrict the radius parameter to be bounded). The condition however is not quite necessary; the result is also true when , but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator , which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

for any (continuous) , which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant . (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when and ; and secondly, the constant in that theorem still depends on dimension . The first objection can be easily disposed of, for if , then the hypotheses and will automatically be satisfied for sufficiently large (depending on ); note that the case when is bounded (with a bound depending on ) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant in (3) depends on . However, here we can use the method of rotations to show that the constants can be taken to be non-increasing (and hence bounded) in . The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that , in the sense that any bound of the form

for the -dimensional spherical maximal function, implies the same bound

for the -dimensional spherical maximal function, with exactly the same constant . For any direction , consider the averaging operators

for any continuous , where

where is some orthogonal transformation mapping the sphere to the sphere ; the exact choice of orthogonal transformation is irrelevant due to the rotation-invariance of surface measure on the sphere . A simple application of Fubini’s theorem (after first rotating to be, say, the standard unit vector ) using (4) then shows that

uniformly in . On the other hand, by viewing the -dimensional sphere as an average of the spheres , we have the identity

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of on the sphere . This implies that

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1Unfortunately, the method of rotations does not work to show that the constant for the weak inequality (1) is independent of dimension, as the weak quasinorm is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take for some absolute constant , by comparing the Hardy-Littlewood maximal function with the heat kernel maximal functionThe abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type with a constant of , and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls with cubes, then the weak constant must go to infinity as .

Suppose one has a measure space and a sequence of operators that are bounded on some space, with . Suppose that on some dense subclass of functions in (e.g. continuous compactly supported functions, if the space is reasonable), one already knows that converges pointwise almost everywhere to some limit , for another bounded operator (e.g. could be the identity operator). What additional ingredient does one need to pass to the limit and conclude that converges almost everywhere to for *all* in (and not just for in a dense subclass)?

One standard way to proceed here is to study the *maximal operator*

and aim to establish a *weak-type maximal inequality*

for all (or all in the dense subclass), and some constant , where is the weak norm

A standard approximation argument using (1) then shows that will now indeed converge to pointwise almost everywhere for all in , and not just in the dense subclass. See for instance these lecture notes of mine, in which this method is used to deduce the Lebesgue differentiation theorem from the Hardy-Littlewood maximal inequality. This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it is strictly necessary. In particular, is it possible to have a pointwise convergence result without being able to obtain a weak-type maximal inequality of the form (1)?

In the case of *norm* convergence (in which one asks for to converge to in the norm, rather than in the pointwise almost everywhere sense), the answer is no, thanks to the uniform boundedness principle, which among other things shows that norm convergence is only possible if one has the uniform bound

for some and all ; and conversely, if one has the uniform bound, and one has already established norm convergence of to on a dense subclass of , (2) will extend that norm convergence to all of .

Returning to pointwise almost everywhere convergence, the answer in general is “yes”. Consider for instance the rank one operators

from to . It is clear that converges pointwise almost everywhere to zero as for any , and the operators are uniformly bounded on , but the maximal function does not obey (1). One can modify this example in a number of ways to defeat almost any reasonable conjecture that something like (1) should be necessary for pointwise almost everywhere convergence.

In spite of this, a remarkable observation of Stein, now known as *Stein’s maximal principle*, asserts that the maximal inequality *is* necessary to prove pointwise almost everywhere convergence, if one is working on a compact group and the operators are translation invariant, and if the exponent is at most :

Theorem 1 (Stein maximal principle)Let be a compact group, let be a homogeneous space of with a finite Haar measure , let , and let be a sequence of bounded linear operators commuting with translations, such that converges pointwise almost everywhere for each . Then (1) holds.

This is not quite the most general vesion of the principle; some additional variants and generalisations are given in the original paper of Stein. For instance, one can replace the discrete sequence of operators with a continuous sequence without much difficulty. As a typical application of this principle, we see that Carleson’s celebrated theorem that the partial Fourier series of an function converge almost everywhere is in fact equivalent to the estimate

And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (3), and Stein’s maximal principle strongly suggests that this is the optimal way to try to prove this theorem.

On the other hand, the theorem does fail for , and almost everywhere convergence results in for can be proven by other methods than weak estimates. For instance, the convergence of Bochner-Riesz multipliers in for any (and for in the range predicted by the Bochner-Riesz conjecture) was verified for by Carbery, Rubio de Francia, and Vega, despite the fact that the weak of even a *single* Bochner-Riesz multiplier, let alone the maximal function, has still not been completely verified in this range. (Carbery, Rubio de Francia and Vega use weighted estimates for the maximal Bochner-Riesz operator, rather than type estimates.) For , though, Stein’s principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz means is equivalent to the weak estimate (1).

Stein’s principle is restricted to compact groups (such as the torus or the rotation group ) and their homogeneous spaces (such as the torus again, or the sphere ). As stated, the principle fails in the noncompact setting; for instance, in , the convolution operators are such that converges pointwise almost everywhere to zero for every , but the maximal function is not of weak-type . However, in many applications on non-compact domains, the are “localised” enough that one can transfer from a non-compact setting to a compact setting and then apply Stein’s principle. For instance, Carleson’s theorem on the real line is equivalent to Carleson’s theorem on the circle (due to the localisation of the Dirichlet kernels), which as discussed before is equivalent to the estimate (3) on the circle, which by a scaling argument is equivalent to the analogous estimate on the real line .

Stein’s argument from his 1961 paper can be viewed nowadays as an application of the probabilistic method; starting with a sequence of increasingly bad counterexamples to the maximal inequality (1), one randomly combines them together to create a single “infinitely bad” counterexample. To make this idea work, Stein employs two basic ideas:

- The
*random rotations (or random translations) trick*. Given a subset of of small but positive measure, one can randomly select about translates of that cover most of . - The
*random sums trick*Given a collection of signed functions that may possibly cancel each other in a deterministic sum , one can perform a random sum instead to obtain a random function whose magnitude will usually be comparable to the square function ; this can be made rigorous by concentration of measure results, such as Khintchine’s inequality.

These ideas have since been used repeatedly in harmonic analysis. For instance, I used the random rotations trick in a recent paper with Jordan Ellenberg and Richard Oberlin on Kakeya-type estimates in finite fields. The random sums trick is by now a standard tool to build various counterexamples to estimates (or to convergence results) in harmonic analysis, for instance being used by Fefferman in his famous paper disproving the boundedness of the ball multiplier on for , . Another use of the random sum trick is to show that Theorem 1 fails once ; see Stein’s original paper for details.

Another use of the random rotations trick, closely related to Theorem 1, is the *Nikishin-Stein factorisation theorem*. Here is Stein’s formulation of this theorem:

Theorem 2 (Stein factorisation theorem)Let be a compact group, let be a homogeneous space of with a finite Haar measure , let and , and let be a bounded linear operator commuting with translations and obeying the estimatefor all and some . Then also maps to , with

for all , with depending only on .

This result is trivial with , but becomes useful when . In this regime, the translation invariance allows one to freely “upgrade” a strong-type result to a weak-type result. In other words, bounded linear operators from to automatically factor through the inclusion , which helps explain the name “factorisation theorem”. Factorisation theory has been developed further by many authors, including Maurey and Pisier.

Stein’s factorisation theorem (or more precisely, a variant of it) is useful in the theory of Kakeya and restriction theorems in Euclidean space, as first observed by Bourgain.

In 1970, Nikishin obtained the following generalisation of Stein’s factorisation theorem in which the translation-invariance hypothesis can be dropped, at the cost of excluding a set of small measure:

Theorem 3 (Nikishin-Stein factorisation theorem)Let be a finite measure space, let and , and let be a bounded linear operator obeying the estimatefor all and some . Then for any , there exists a subset of of measure at most such that

One can recover Theorem 2 from Theorem 3 by an averaging argument to eliminate the exceptional set; we omit the details.

In a few weeks, Princeton University will host a conference in Analysis and Applications in honour of the 80th birthday of Elias Stein (though, technically, Eli’s 80th birthday was actually in January). As one of Eli’s students, I was originally scheduled to be one of the speakers at this conference; but unfortunately, for family reasons I will be unable to attend. In lieu of speaking at this conference, I have decided to devote some space on this blog for this month to present some classic results of Eli from his many decades of work in harmonic analysis, ergodic theory, several complex variables, and related topics. My choice of selections here will be a personal and idiosyncratic one; the results I present are not necessarily the “best” or “deepest” of his results, but are ones that I find particularly elegant and appealing. (There will also inevitably be some overlap here with Charlie Fefferman’s article “Selected theorems by Eli Stein“, which not coincidentally was written for Stein’s 60th birthday conference in 1991.)

In this post I would like to describe one of Eli Stein’s very first results that is still used extremely widely today, namely his interpolation theorem from 1956 (and its refinement, the Fefferman-Stein interpolation theorem from 1972). This is a deceptively innocuous, yet remarkably powerful, generalisation of the classic Riesz-Thorin interpolation theorem which uses methods from complex analysis (and in particular, the Lindelöf theorem or the Phragmén-Lindelöf principle) to show that if a linear operator from one (-finite) measure space to another obeyed the estimates

for all , where and , then one automatically also has the interpolated estimates

for all and , where the quantities are defined by the formulae

The Riesz-Thorin theorem is already quite useful (it gives, for instance, by far the quickest proof of the Hausdorff-Young inequality for the Fourier transform, to name just one application), but it requires the *same* linear operator to appear in (1), (2), and (3). Eli Stein realised, though, that due to the complex-analytic nature of the proof of the Riesz-Thorin theorem, it was possible to allow *different* linear operators to appear in (1), (2), (3), so long as the dependence was analytic. A bit more precisely: if one had a family of operators which depended in an analytic manner on a complex variable in the strip (thus, for any test functions , the inner product would be analytic in ) which obeyed some mild regularity assumptions (which are slightly technical and are omitted here), and one had the estimates

and

for all and some quantities that grew at most exponentially in (actually, any growth rate significantly slower than the double-exponential would suffice here), then one also has the interpolated estimates

for all and a constant depending only on .

In the third of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli presented a slightly different topic, which is work in preparation with Alex Nagel, Fulvio Ricci, and Steve Wainger, on algebras of singular integral operators which are sensitive to multiple different geometries in a nilpotent Lie group.

In the second of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli expanded on the themes in the first lecture, in particular providing more details as to the recent (not yet published) results of Lanzani and Stein on the boundedness of the Cauchy integral on domains in several complex variables.

The first Distinguished Lecture Series at UCLA for this academic year is given by Elias Stein (who, incidentally, was my graduate student advisor), who is lecturing on “Singular Integrals and Several Complex Variables: Some New Perspectives“. The first lecture was a historical (and non-technical) survey of modern harmonic analysis (which, amazingly, was compressed into half an hour), followed by an introduction as to how this theory is currently in the process of being adapted to handle the basic analytical issues in several complex variables, a topic which in many ways is still only now being developed. The second and third lectures will focus on these issues in greater depth.

As usual, any errors here are due to my transcription and interpretation of the lecture.

[*Update*, Oct 27: The slides from the talk are now available here.]

## Recent Comments