If ${f: {\bf R}^d \rightarrow {\bf C}}$ is a locally integrable function, we define the Hardy-Littlewood maximal function ${Mf: {\bf R}^d \rightarrow {\bf C}}$ by the formula

$\displaystyle Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,$

where ${B(x,r)}$ is the ball of radius ${r}$ centred at ${x}$, and ${|E|}$ denotes the measure of a set ${E}$. The Hardy-Littlewood maximal inequality asserts that

$\displaystyle |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)$

for all ${f\in L^1({\bf R}^d)}$, all ${\lambda > 0}$, and some constant ${C_d > 0}$ depending only on ${d}$. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem

$\displaystyle \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)$

for all ${f \in L^1({\bf R}^d)}$ and almost every ${x \in {\bf R}^d}$. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality ${\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}$) we see that

$\displaystyle \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)$

for all ${p > 1}$ and ${f \in L^p({\bf R}^d)}$, and some constant ${C_{d,p}}$ depending on ${d}$ and ${p}$.

The exact dependence of ${C_{d,p}}$ on ${d}$ and ${p}$ is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form ${C_d = C^d}$ for some absolute constant ${C>1}$. Inserting this into the Marcinkiewicz theorem, one obtains a constant ${C_{d,p}}$ of the form ${C_{d,p} = \frac{C^d}{p-1}}$ for some ${C>1}$ (and taking ${p}$ bounded away from infinity, for simplicity). The dependence on ${p}$ is about right, but the dependence on ${d}$ should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of ${d}$:

Theorem 1 One can take ${C_{d,p} = C_p}$ for each ${p>1}$, where ${C_p}$ depends only on ${p}$.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

$\displaystyle M_S f(x) := \sup_{r>0} A_r |f|(x)$

where ${A_r}$ are the spherical averaging operators

$\displaystyle A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)$

and ${d\sigma^{d-1}}$ is normalised surface measure on the sphere ${S^{d-1}}$. Because this is an uncountable supremum, and the averaging operators ${A_r}$ do not have good continuity properties in ${r}$, it is not a priori obvious that ${M_S f}$ is even a measurable function for, say, locally integrable ${f}$; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions ${f}$. The Stein maximal theorem for the spherical maximal function then asserts that if ${d \geq 3}$ and ${p > \frac{d}{d-1}}$, then we have

$\displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)$

for all (continuous) ${f \in L^p({\bf R}^d)}$. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence ${\lim_{r \rightarrow 0} A_r f(x) = f(x)}$ of the spherical averages for any ${f \in L^p({\bf R}^d)}$ when ${d \geq 3}$ and ${p > \frac{d}{d-1}}$, although we will not focus on this application here.)

The condition ${p > \frac{d}{d-1}}$ can be seen to be necessary as follows. Take ${f}$ to be any fixed bump function. A brief calculation then shows that ${M_S f(x)}$ decays like ${|x|^{1-d}}$ as ${|x| \rightarrow \infty}$, and hence ${M_S f}$ does not lie in ${L^p({\bf R}^d)}$ unless ${p > \frac{d}{d-1}}$. By taking ${f}$ to be a rescaled bump function supported on a small ball, one can show that the condition ${p > \frac{d}{d-1}}$ is necessary even if we replace ${{\bf R}^d}$ with a compact region (and similarly restrict the radius parameter ${r}$ to be bounded). The condition ${d \geq 3}$ however is not quite necessary; the result is also true when ${d=2}$, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator ${Mf}$, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

$\displaystyle Mf(x) \leq M_S f(x)$

for any (continuous) ${f}$, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant ${C_{p,d}}$. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of ${L^p({\bf R}^d)}$ by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when ${d \geq 3}$ and ${p > \frac{d}{d-1}}$; and secondly, the constant ${C_{d,p}}$ in that theorem still depends on dimension ${d}$. The first objection can be easily disposed of, for if ${p>1}$, then the hypotheses ${d \geq 3}$ and ${p > \frac{d}{d-1}}$ will automatically be satisfied for ${d}$ sufficiently large (depending on ${p}$); note that the case when ${d}$ is bounded (with a bound depending on ${p}$) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant ${C_{d,p}}$ in (3) depends on ${d}$. However, here we can use the method of rotations to show that the constants ${C_{p,d}}$ can be taken to be non-increasing (and hence bounded) in ${d}$. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that ${C_{d+1,p} \leq C_{d,p}}$, in the sense that any bound of the form

$\displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)$

for the ${d}$-dimensional spherical maximal function, implies the same bound

$\displaystyle \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)$

for the ${d+1}$-dimensional spherical maximal function, with exactly the same constant ${A}$. For any direction ${\omega_0 \in S^d \subset {\bf R}^{d+1}}$, consider the averaging operators

$\displaystyle M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)$

for any continuous ${f: {\bf R}^{d+1} \rightarrow {\bf C}}$, where

$\displaystyle A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)$

where ${U_{\omega_0}}$ is some orthogonal transformation mapping the sphere ${S^{d-1}}$ to the sphere ${S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}$; the exact choice of orthogonal transformation ${U_{\omega_0}}$ is irrelevant due to the rotation-invariance of surface measure ${d\sigma^{d-1}}$ on the sphere ${S^{d-1}}$. A simple application of Fubini’s theorem (after first rotating ${\omega_0}$ to be, say, the standard unit vector ${e_d}$) using (4) then shows that

$\displaystyle \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)$

uniformly in ${\omega_0}$. On the other hand, by viewing the ${d}$-dimensional sphere ${S^d}$ as an average of the spheres ${S^{d-1,\omega_0}}$, we have the identity

$\displaystyle A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);$

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of ${f}$ on the sphere ${\{ y \in {\bf R}^{d+1}: |y-x|=r\}}$. This implies that

$\displaystyle M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)$

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant ${C_d}$ for the weak ${(1,1)}$ inequality (1) is independent of dimension, as the weak ${L^1}$ quasinorm ${\| \|_{L^{1,\infty}}}$ is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether ${C_d}$ in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take ${C_d = Cd}$ for some absolute constant ${C}$, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function

$\displaystyle \sup_{t > 0} e^{t\Delta} |f|(x).$

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type ${(1,1)}$ with a constant of ${1}$, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls ${B(x,r)}$ with cubes, then the weak ${(1,1)}$ constant ${C_d}$ must go to infinity as ${d \rightarrow \infty}$.

— 1. Proof of spherical maximal inequality —

We now sketch the proof of Stein’s spherical maximal inequality (3) for ${d \geq 3}$, ${p > \frac{d}{d-1}}$, and ${f \in L^p({\bf R}^d)}$ continuous. To motivate the argument, let us first establish the simpler estimate

$\displaystyle \| M_S^1 f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)}$

where ${M_S^1}$ is the spherical maximal function restricted to unit scales:

$\displaystyle M_S^1 f(x) := \sup_{1 \leq r \leq 2} A_r |f|(x).$

For the rest of these notes, we suppress the dependence of constants on ${d}$ and ${p}$, using ${X \lesssim Y}$ as short-hand for ${X \leq C_{p,d} Y}$.

It will of course suffice to establish the estimate

$\displaystyle \| \sup_{1 \leq r \leq 2} |A_r f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (7)$

for all continuous ${f \in L^p({\bf R}^d)}$, as the original claim follows by replacing ${f}$ with ${|f|}$. Also, since the bound is trivially true for ${p=\infty}$, and we crucially have ${\frac{d}{d-1} < 2}$ in three and higher dimensions, we can restrict attention to the regime ${p<2}$.

We establish this bound using a Littlewood-Paley decomposition

$\displaystyle f = \sum_N P_N f$

where ${N}$ ranges over dyadic numbers ${2^k}$, ${k \in {\bf Z}}$, and ${P_N}$ is a smooth Fourier projection to frequencies ${|\xi| \sim N}$; a bit more formally, we have

$\displaystyle \widehat{P_N f}(\xi) = \psi(\frac{\xi}{N}) \hat f(\xi)$

where ${\psi}$ is a bump function supported on the annulus ${\{ \xi \in {\bf R}^d: 1/2 \leq |\xi| \leq 2\}}$ such that ${\sum_N \psi(\frac{\xi}{N}) = 1}$ for all non-zero ${\xi}$. Actually, for the purposes of proving (7), it is more convenient to use the decomposition

$\displaystyle f = P_{\leq 1} f + \sum_{N>1} P_N f$

where ${P_{\leq 1} = \sum_{N \leq 1} P_N}$ is the projection to frequencies ${|\xi| \lesssim 1}$. By the triangle inequality, it then suffices to show the bounds

$\displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_{\leq 1} f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (8)$

and

$\displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_N f(x)| \|_{L^p({\bf R}^d)} \lesssim N^{-\epsilon} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (9)$

for all ${N \geq 1}$ and some ${\epsilon>0}$ depending only on ${p,d}$.

To prove the low-frequency bound (8), observe that ${P_{\leq 1}}$ is a convolution operator with a Schwartz function, and from this and the radius restriction ${1 \leq r \leq 2}$ we see that ${A_r P_{\leq 1}}$ is a convolution operator with a Schwartz function of uniformly bounded norms. From this we obtain the pointwise bound

$\displaystyle A_r P_{\leq 1} f(x) \lesssim Mf(x) \ \ \ \ \ (10)$

and the claim (8) follows from (2).

Now we turn to the more interesting high-frequency bound (9). Here, ${P_N}$ is a convolution operator with an approximation to the identity at scale ${\sim 1/N}$, and so ${A_r P_N}$ is a convolution operator with a function of magnitude ${O(N)}$ concentrated on an annulus of thickness ${O(1/N)}$ around the sphere of radius ${r}$. This can be used to give the pointwise bound

$\displaystyle A_r P_N f(x) \lesssim N Mf(x), \ \ \ \ \ (11)$

which by (2) gives the bound

$\displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_N f(x)| \|_{L^q({\bf R}^d)} \lesssim_q N \|f\|_{L^q({\bf R}^d)} \ \ \ \ \ (12)$

for any ${q > 1}$. This is not directly strong enough to prove (9), due to the “loss of one derivative” as manifested by the factor ${N}$. On the other hand, this bound (12) holds for all ${q>1}$, and not just in the range ${p > \frac{d}{d-1}}$.

To counterbalance this loss of one derivative, we turn to ${L^2}$ estimates. A standard stationary phase computation (or Bessel function computation) shows that ${A_r}$ is a Fourier multiplier whose symbol decays like ${|\xi|^{-(d-1)/2}}$. As such, Plancherel’s theorem yields the ${L^2}$ bound

$\displaystyle \| A_r P_N f \|_{L^2({\bf R}^d)} \lesssim N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}$

uniformly in ${1 \leq r \leq 2}$. But we still have to take the supremum over ${r}$. This is an uncountable supremum, so one cannot just apply a union bound argument. However, from the uncertainty principle, we expect ${P_N f}$ to be “blurred out” at spatial scale ${1/N}$, which suggests that the averages ${A_r P_N f}$ do not vary much when ${r}$ is restricted to an interval of size ${1/N}$. Heuristically, this then suggests that

$\displaystyle \sup_{1 \leq r \leq 2} |A_r P_N f| \sim \sup_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f|.$

Estimating the discrete supremum on the right-hand side somewhat crudely by the square-function,

$\displaystyle \sup_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f| \leq (\sum_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f|^2)^{1/2},$

and taking ${L^2}$ norms, one is then led to the heuristic prediction that

$\displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_N f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}. \ \ \ \ \ (13)$

One can make this heuristic precise using the one-dimensional Sobolev embedding inequality adapted to scale ${1/N}$, namely that

$\displaystyle \sup_{1 \leq r \leq 2} |g(r)| \lesssim N^{1/2} (\int_1^2 |g(r)|^2\ dr)^{1/2} + N^{-1/2} (\int_1^2 |g'(r)|^2\ dr)^{1/2}.$

To prove this inequality, one starts with the local one-dimensional Sobolev inequality

$\displaystyle \sup_{0 \leq r \leq 1} |g(r)| \lesssim \int_0^{1} |g(r)|\ dr + \int_0^{1} |g'(r)|\ dr,$

rescales this inequality to the scale ${1/N}$, and then covers the interval ${[1,2]}$ by boundedly overlapping intervals of length ${1/N}$.

A routine computation shows that

$\displaystyle \| \frac{d}{dr} A_r P_N f \|_{L^2({\bf R}^d)} \lesssim N \times N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}$

(which formalises the heuristic that ${A_r P_N f}$ is roughly constant at ${r}$-scales ${1/N}$), and this soon leads to a rigorous proof of (13).

An interpolation between (12) and (13) (for ${q}$ sufficiently close to ${1}$) then gives (9) for some ${\epsilon > 0}$ (here we crucially use that ${p > \frac{d}{d-1}}$ and ${p<2}$).

Now we control the full maximal function ${M_S f}$. It suffices to show that

$\displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)},$

where ${R}$ ranges over dyadic numbers.

For any fixed ${R}$, the natural spatial scale is ${R}$, and the natural frequency scale is thus ${1/R}$. We therefore split

$\displaystyle f = P_{\leq 1/R} f + \sum_{N > 1} P_{N/R} f,$

and aim to establish the bounds

$\displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{\leq 1/R} f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (14)$

and

$\displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f(x)| \|_{L^p({\bf R}^d)} \lesssim N^{-\epsilon} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (15)$

for each ${N > 1}$ and some ${\epsilon>0}$ depending only on ${d}$ and ${p}$, similarly to before.

A rescaled version of the derivation of (10) gives

$\displaystyle A_r P_{\leq 1/R} f(x) \lesssim Mf(x)$

for all ${R \leq r \leq 2R}$, which already lets us deduce (14). As for (15), a rescaling of (11) gives

$\displaystyle A_r P_{N/R} f(x) \lesssim N Mf(x),$

for all ${R \leq r \leq 2R}$, and thus

$\displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f(x)| \|_{L^q({\bf R}^d)} \lesssim N \|f\|_{L^q({\bf R}^d)} \ \ \ \ \ (16)$

for all ${q>1}$. Meanwhile, at the ${L^2}$ level, we have

$\displaystyle \| A_r P_{N/R} f \|_{L^2({\bf R}^d)} \lesssim N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}$

and

$\displaystyle \| \frac{d}{dr} A_r P_{N/R} f \|_{L^2({\bf R}^d)} \lesssim \frac{N}{R} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}$

and so

$\displaystyle \| (\frac{1}{R} \int_R^{2R} |A_r P_{N/R} f|^2\ dr)^{1/2} + (\frac{R}{N^2} \int_R^{2R} |\frac{d}{dr} A_r P_{N/R} f|^2\ dr)^{1/2} \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}$

which implies by rescaled Sobolev embedding that

$\displaystyle \| \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}.$

In fact, by writing ${P_{N/R} f = P_{N/R} \tilde P_{N/R} f}$, where ${\tilde P_{N/R}}$ is a slight widening of ${P_{N/R}}$, we have

$\displaystyle \| \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|\tilde P_{N/R} f\|_{L^2({\bf R}^d)};$

square summing this (and bounding a supremum by a square function) and using Plancherel we obtain

$\displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}.$

Interpolating this against (16) as before we obtain (15) as required.