You are currently browsing the tag archive for the ‘Hardy-Littlewood maximal inequality’ tag.

If ${f: {\bf R}^d \rightarrow {\bf C}}$ is a locally integrable function, we define the Hardy-Littlewood maximal function ${Mf: {\bf R}^d \rightarrow {\bf C}}$ by the formula

$\displaystyle Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,$

where ${B(x,r)}$ is the ball of radius ${r}$ centred at ${x}$, and ${|E|}$ denotes the measure of a set ${E}$. The Hardy-Littlewood maximal inequality asserts that

$\displaystyle |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)$

for all ${f\in L^1({\bf R}^d)}$, all ${\lambda > 0}$, and some constant ${C_d > 0}$ depending only on ${d}$. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem

$\displaystyle \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)$

for all ${f \in L^1({\bf R}^d)}$ and almost every ${x \in {\bf R}^d}$. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality ${\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}$) we see that

$\displaystyle \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)$

for all ${p > 1}$ and ${f \in L^p({\bf R}^d)}$, and some constant ${C_{d,p}}$ depending on ${d}$ and ${p}$.

The exact dependence of ${C_{d,p}}$ on ${d}$ and ${p}$ is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form ${C_d = C^d}$ for some absolute constant ${C>1}$. Inserting this into the Marcinkiewicz theorem, one obtains a constant ${C_{d,p}}$ of the form ${C_{d,p} = \frac{C^d}{p-1}}$ for some ${C>1}$ (and taking ${p}$ bounded away from infinity, for simplicity). The dependence on ${p}$ is about right, but the dependence on ${d}$ should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of ${d}$:

Theorem 1 One can take ${C_{d,p} = C_p}$ for each ${p>1}$, where ${C_p}$ depends only on ${p}$.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

$\displaystyle M_S f(x) := \sup_{r>0} A_r |f|(x)$

where ${A_r}$ are the spherical averaging operators

$\displaystyle A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)$

and ${d\sigma^{d-1}}$ is normalised surface measure on the sphere ${S^{d-1}}$. Because this is an uncountable supremum, and the averaging operators ${A_r}$ do not have good continuity properties in ${r}$, it is not a priori obvious that ${M_S f}$ is even a measurable function for, say, locally integrable ${f}$; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions ${f}$. The Stein maximal theorem for the spherical maximal function then asserts that if ${d \geq 3}$ and ${p > \frac{d}{d-1}}$, then we have

$\displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)$

for all (continuous) ${f \in L^p({\bf R}^d)}$. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence ${\lim_{r \rightarrow 0} A_r f(x) = f(x)}$ of the spherical averages for any ${f \in L^p({\bf R}^d)}$ when ${d \geq 3}$ and ${p > \frac{d}{d-1}}$, although we will not focus on this application here.)

The condition ${p > \frac{d}{d-1}}$ can be seen to be necessary as follows. Take ${f}$ to be any fixed bump function. A brief calculation then shows that ${M_S f(x)}$ decays like ${|x|^{1-d}}$ as ${|x| \rightarrow \infty}$, and hence ${M_S f}$ does not lie in ${L^p({\bf R}^d)}$ unless ${p > \frac{d}{d-1}}$. By taking ${f}$ to be a rescaled bump function supported on a small ball, one can show that the condition ${p > \frac{d}{d-1}}$ is necessary even if we replace ${{\bf R}^d}$ with a compact region (and similarly restrict the radius parameter ${r}$ to be bounded). The condition ${d \geq 3}$ however is not quite necessary; the result is also true when ${d=2}$, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator ${Mf}$, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

$\displaystyle Mf(x) \leq M_S f(x)$

for any (continuous) ${f}$, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant ${C_{p,d}}$. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of ${L^p({\bf R}^d)}$ by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when ${d \geq 3}$ and ${p > \frac{d}{d-1}}$; and secondly, the constant ${C_{d,p}}$ in that theorem still depends on dimension ${d}$. The first objection can be easily disposed of, for if ${p>1}$, then the hypotheses ${d \geq 3}$ and ${p > \frac{d}{d-1}}$ will automatically be satisfied for ${d}$ sufficiently large (depending on ${p}$); note that the case when ${d}$ is bounded (with a bound depending on ${p}$) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant ${C_{d,p}}$ in (3) depends on ${d}$. However, here we can use the method of rotations to show that the constants ${C_{p,d}}$ can be taken to be non-increasing (and hence bounded) in ${d}$. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that ${C_{d+1,p} \leq C_{d,p}}$, in the sense that any bound of the form

$\displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)$

for the ${d}$-dimensional spherical maximal function, implies the same bound

$\displaystyle \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)$

for the ${d+1}$-dimensional spherical maximal function, with exactly the same constant ${A}$. For any direction ${\omega_0 \in S^d \subset {\bf R}^{d+1}}$, consider the averaging operators

$\displaystyle M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)$

for any continuous ${f: {\bf R}^{d+1} \rightarrow {\bf C}}$, where

$\displaystyle A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)$

where ${U_{\omega_0}}$ is some orthogonal transformation mapping the sphere ${S^{d-1}}$ to the sphere ${S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}$; the exact choice of orthogonal transformation ${U_{\omega_0}}$ is irrelevant due to the rotation-invariance of surface measure ${d\sigma^{d-1}}$ on the sphere ${S^{d-1}}$. A simple application of Fubini’s theorem (after first rotating ${\omega_0}$ to be, say, the standard unit vector ${e_d}$) using (4) then shows that

$\displaystyle \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)$

uniformly in ${\omega_0}$. On the other hand, by viewing the ${d}$-dimensional sphere ${S^d}$ as an average of the spheres ${S^{d-1,\omega_0}}$, we have the identity

$\displaystyle A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);$

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of ${f}$ on the sphere ${\{ y \in {\bf R}^{d+1}: |y-x|=r\}}$. This implies that

$\displaystyle M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)$

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant ${C_d}$ for the weak ${(1,1)}$ inequality (1) is independent of dimension, as the weak ${L^1}$ quasinorm ${\| \|_{L^{1,\infty}}}$ is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether ${C_d}$ in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take ${C_d = Cd}$ for some absolute constant ${C}$, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function

$\displaystyle \sup_{t > 0} e^{t\Delta} |f|(x).$

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type ${(1,1)}$ with a constant of ${1}$, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls ${B(x,r)}$ with cubes, then the weak ${(1,1)}$ constant ${C_d}$ must go to infinity as ${d \rightarrow \infty}$.

Let ${[a,b]}$ be a compact interval of positive length (thus ${-\infty < a < b < +\infty}$). Recall that a function ${F: [a,b] \rightarrow {\bf R}}$ is said to be differentiable at a point ${x \in [a,b]}$ if the limit

$\displaystyle F'(x) := \lim_{y \rightarrow x; y \in [a,b] \backslash \{x\}} \frac{F(y)-F(x)}{y-x} \ \ \ \ \ (1)$

exists. In that case, we call ${F'(x)}$ the strong derivative, classical derivative, or just derivative for short, of ${F}$ at ${x}$. We say that ${F}$ is everywhere differentiable, or differentiable for short, if it is differentiable at all points ${x \in [a,b]}$, and differentiable almost everywhere if it is differentiable at almost every point ${x \in [a,b]}$. If ${F}$ is differentiable everywhere and its derivative ${F'}$ is continuous, then we say that ${F}$ is continuously differentiable.

Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.

Exercise 2 If ${F: [a,b] \rightarrow {\bf R}}$ is everywhere differentiable, show that ${F}$ is continuous and ${F'}$ is measurable. If ${F}$ is almost everywhere differentiable, show that the (almost everywhere defined) function ${F'}$ is measurable (i.e. it is equal to an everywhere defined measurable function on ${[a,b]}$ outside of a null set), but give an example to demonstrate that ${F}$ need not be continuous.

Exercise 3 Give an example of a function ${F: [a,b] \rightarrow {\bf R}}$ which is everywhere differentiable, but not continuously differentiable. (Hint: choose an ${F}$ that vanishes quickly at some point, say at the origin ${0}$, but which also oscillates rapidly near that point.)

In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.

Theorem 4 (Rolle’s theorem) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function such that ${F(a)=F(b)}$. Then there exists ${x \in (a,b)}$ such that ${F'(x)=0}$.

Proof: By subtracting a constant from ${F}$ (which does not affect differentiability or the derivative) we may assume that ${F(a)=F(b)=0}$. If ${F}$ is identically zero then the claim is trivial, so assume that ${F}$ is non-zero somewhere. By replacing ${F}$ with ${-F}$ if necessary, we may assume that ${F}$ is positive somewhere, thus ${\sup_{x \in [a,b]} F(x) > 0}$. On the other hand, as ${F}$ is continuous and ${[a,b]}$ is compact, ${F}$ must attain its maximum somewhere, thus there exists ${x \in [a,b]}$ such that ${F(x) \geq F(y)}$ for all ${y \in [a,b]}$. Then ${F(x)}$ must be positive and so ${x}$ cannot equal either ${a}$ or ${b}$, and thus must lie in the interior. From the right limit of (1) we see that ${F'(x) \leq 0}$, while from the left limit we have ${F'(x) \geq 0}$. Thus ${F'(x)=0}$ and the claim follows. $\Box$

Remark 5 Observe that the same proof also works if ${F}$ is only differentiable in the interior ${(a,b)}$ of the interval ${[a,b]}$, so long as it is continuous all the way up to the boundary of ${[a,b]}$.

Exercise 6 Give an example to show that Rolle’s theorem can fail if ${f}$ is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that ${f}$ is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.

Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when ${F}$ is real-valued, as it relies heavily on the least upper bound property for the domain ${{\bf R}}$. If, for instance, we consider complex-valued scalar functions ${F: [a,b] \rightarrow {\bf C}}$, then the theorem can fail; for instance, the function ${F: [0,1] \rightarrow {\bf C}}$ defined by ${F(x) := e^{2\pi i x} - 1}$ vanishes at both endpoints and is differentiable, but its derivative ${F'(x) = 2\pi i e^{2\pi i x}}$ is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative ${F'}$ both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as ${{\bf R}^n}$.

One can easily amplify Rolle’s theorem to the mean value theorem:

Corollary 8 (Mean value theorem) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function. Then there exists ${x \in (a,b)}$ such that ${F'(x)=\frac{F(b)-F(a)}{b-a}}$.

Proof: Apply Rolle’s theorem to the function ${x \mapsto F(x) - \frac{F(b)-F(a)}{b-a} (x-a)}$. $\Box$

Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.

Exercise 10 (Uniqueness of antiderivatives up to constants) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ and ${G: [a,b] \rightarrow {\bf R}}$ be differentiable functions. Show that ${F'(x)=G'(x)}$ for every ${x \in [a,b]}$ if and only if ${F(x)=G(x)+C}$ for some constant ${C \in {\bf R}}$ and all ${x \in [a,b]}$.

We can use the mean value theorem to deduce one of the fundamental theorems of calculus:

Theorem 11 (Second fundamental theorem of calculus) Let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function, such that ${F'}$ is Riemann integrable. Then the Riemann integral ${\int_a^b F'(x)\ dx}$ of ${F'}$ is equal to ${F(b) - F(a)}$. In particular, we have ${\int_a^b F'(x)\ dx = F(b)-F(a)}$ whenever ${F}$ is continuously differentiable.

Proof: Let ${\varepsilon > 0}$. By the definition of Riemann integrability, there exists a finite partition ${a = t_0 < t_1 < \ldots < t_k = b}$ such that

$\displaystyle |\sum_{j=1}^k F'(t^*_j) (t_j - t_{j-1}) - \int_a^b F'(x)| \leq \varepsilon$

for every choice of ${t^*_j \in [t_{j-1},t_j]}$.

Fix this partition. From the mean value theorem, for each ${1 \leq j \leq k}$ one can find ${t^*_j \in [t_{j-1},t_j]}$ such that

$\displaystyle F'(t^*_j) (t_j - t_{j-1}) = F(t_j) - F(t_{j-1})$

and thus by telescoping series

$\displaystyle |(F(b)-F(a)) - \int_a^b F'(x)| \leq \varepsilon.$

Since ${\varepsilon > 0}$ was arbitrary, the claim follows. $\Box$

Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.

Of course, we also have the other half of the fundamental theorem of calculus:

Theorem 13 (First fundamental theorem of calculus) Let ${[a,b]}$ be a compact interval of positive length. Let ${f: [a,b] \rightarrow {\bf C}}$ be a continuous function, and let ${F: [a,b] \rightarrow {\bf C}}$ be the indefinite integral ${F(x) := \int_a^x f(t)\ dt}$. Then ${F}$ is differentiable on ${[a,b]}$, with derivative ${F'(x) = f(x)}$ for all ${x \in [a,b]}$. In particular, ${F}$ is continuously differentiable.

Proof: It suffices to show that

$\displaystyle \lim_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h} = f(x)$

for all ${x \in [a,b)}$, and

$\displaystyle \lim_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h} = f(x)$

for all ${x \in (a,b]}$. After a change of variables, we can write

$\displaystyle \frac{F(x+h)-F(x)}{h} = \int_0^1 f(x+ht)\ dt$

for any ${x \in [a,b)}$ and any sufficiently small ${h>0}$, or any ${x \in (a,b]}$ and any sufficiently small ${h<0}$. As ${f}$ is continuous, the function ${t \mapsto f(x+ht)}$ converges uniformly to ${f(x)}$ on ${[0,1]}$ as ${h \rightarrow 0}$ (keeping ${x}$ fixed). As the interval ${[0,1]}$ is bounded, ${\int_0^1 f(x+ht)\ dt}$ thus converges to ${\int_0^1 f(x)\ dt = f(x)}$, and the claim follows. $\Box$

Corollary 14 (Differentiation theorem for continuous functions) Let ${f: [a,b] \rightarrow {\bf C}}$ be a continuous function on a compact interval. Then we have

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt = f(x)$

for all ${x \in [a,b)}$,

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x-h,x]} f(t)\ dt = f(x)$

for all ${x \in (a,b]}$, and thus

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{2h} \int_{[x-h,x+h]} f(t)\ dt = f(x)$

for all ${x \in (a,b)}$.

In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions ${F, F', f}$ are relaxed. Among the results proven in these notes are

• The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every ${x}$ if ${f}$ is merely absolutely integrable, rather than continuous;
• A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
• The second fundamental theorem of calculus for absolutely continuous functions.

The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »

Assaf Naor and I have just uploaded to the arXiv our paper “Random Martingales and localization of maximal inequalities“, to be submitted shortly. This paper investigates the best constant in generalisations of the classical Hardy-Littlewood maximal inequality

for any absolutely integrable ${f: {\mathbb R}^n \rightarrow {\mathbb R}}$, where ${B(x,r)}$ is the Euclidean ball of radius ${r}$ centred at ${x}$, and ${|E|}$ denotes the Lebesgue measure of a subset ${E}$ of ${{\mathbb R}^n}$. This inequality is fundamental to a large part of real-variable harmonic analysis, and in particular to Calderón-Zygmund theory. A similar inequality in fact holds with the Euclidean norm replaced by any other convex norm on ${{\mathbb R}^n}$.

The exact value of the constant ${C_n}$ is only known in ${n=1}$, with a remarkable result of Melas establishing that ${C_1 = \frac{11+\sqrt{61}}{12}}$. Classical covering lemma arguments give the exponential upper bound ${C_n \leq 2^n}$ when properly optimised (a direct application of the Vitali covering lemma gives ${C_n \leq 5^n}$, but one can reduce ${5}$ to ${2}$ by being careful). In an important paper of Stein and Strömberg, the improved bound ${C_n = O( n \log n )}$ was obtained for any convex norm by a more intricate covering norm argument, and the slight improvement ${C_n = O(n)}$ obtained in the Euclidean case by another argument more adapted to the Euclidean setting that relied on heat kernels. In the other direction, a recent result of Aldaz shows that ${C_n \rightarrow \infty}$ in the case of the ${\ell^\infty}$ norm, and in fact in an even more recent preprint of Aubrun, the lower bound ${C_n \gg_\epsilon \log^{1-\epsilon} n}$ for any ${\epsilon > 0}$ has been obtained in this case. However, these lower bounds do not apply in the Euclidean case, and one may still conjecture that ${C_n}$ is in fact uniformly bounded in this case.

Unfortunately, we do not make direct progress on these problems here. However, we do show that the Stein-Strömberg bound ${C_n = O(n \log n)}$ is extremely general, applying to a wide class of metric measure spaces obeying a certain “microdoubling condition at dimension ${n}$“; and conversely, in such level of generality, it is essentially the best estimate possible, even with additional metric measure hypotheses on the space. Thus, if one wants to improve this bound for a specific maximal inequality, one has to use specific properties of the geometry (such as the connections between Euclidean balls and heat kernels). Furthermore, in the general setting of metric measure spaces, one has a general localisation principle, which roughly speaking asserts that in order to prove a maximal inequality over all scales ${r \in (0,+\infty)}$, it suffices to prove such an inequality in a smaller range ${r \in [R, nR]}$ uniformly in ${R>0}$. It is this localisation which ultimately explains the significance of the ${n \log n}$ growth in the Stein-Strömberg result (there are ${O(n \log n)}$ essentially distinct scales in any range ${[R,nR]}$). It also shows that if one restricts the radii ${r}$ to a lacunary range (such as powers of ${2}$), the best constant improvees to ${O(\log n)}$; if one restricts the radii to an even sparser range such as powers of ${n}$, the best constant becomes ${O(1)}$.