If {f: {\bf R}^d \rightarrow {\bf C}} is a locally integrable function, we define the Hardy-Littlewood maximal function {Mf: {\bf R}^d \rightarrow {\bf C}} by the formula

\displaystyle  Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,

where {B(x,r)} is the ball of radius {r} centred at {x}, and {|E|} denotes the measure of a set {E}. The Hardy-Littlewood maximal inequality asserts that

\displaystyle  |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)

for all {f\in L^1({\bf R}^d)}, all {\lambda > 0}, and some constant {C_d > 0} depending only on {d}. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem

\displaystyle  \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)

for all {f \in L^1({\bf R}^d)} and almost every {x \in {\bf R}^d}. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality {\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}) we see that

\displaystyle  \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)

for all {p > 1} and {f \in L^p({\bf R}^d)}, and some constant {C_{d,p}} depending on {d} and {p}.

The exact dependence of {C_{d,p}} on {d} and {p} is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form {C_d = C^d} for some absolute constant {C>1}. Inserting this into the Marcinkiewicz theorem, one obtains a constant {C_{d,p}} of the form {C_{d,p} = \frac{C^d}{p-1}} for some {C>1} (and taking {p} bounded away from infinity, for simplicity). The dependence on {p} is about right, but the dependence on {d} should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of {d}:

Theorem 1 One can take {C_{d,p} = C_p} for each {p>1}, where {C_p} depends only on {p}.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

\displaystyle  M_S f(x) := \sup_{r>0} A_r |f|(x)

where {A_r} are the spherical averaging operators

\displaystyle  A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)

and {d\sigma^{d-1}} is normalised surface measure on the sphere {S^{d-1}}. Because this is an uncountable supremum, and the averaging operators {A_r} do not have good continuity properties in {r}, it is not a priori obvious that {M_S f} is even a measurable function for, say, locally integrable {f}; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions {f}. The Stein maximal theorem for the spherical maximal function then asserts that if {d \geq 3} and {p > \frac{d}{d-1}}, then we have

\displaystyle  \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)

for all (continuous) {f \in L^p({\bf R}^d)}. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence {\lim_{r \rightarrow 0} A_r f(x) = f(x)} of the spherical averages for any {f \in L^p({\bf R}^d)} when {d \geq 3} and {p > \frac{d}{d-1}}, although we will not focus on this application here.)

The condition {p > \frac{d}{d-1}} can be seen to be necessary as follows. Take {f} to be any fixed bump function. A brief calculation then shows that {M_S f(x)} decays like {|x|^{1-d}} as {|x| \rightarrow \infty}, and hence {M_S f} does not lie in {L^p({\bf R}^d)} unless {p > \frac{d}{d-1}}. By taking {f} to be a rescaled bump function supported on a small ball, one can show that the condition {p > \frac{d}{d-1}} is necessary even if we replace {{\bf R}^d} with a compact region (and similarly restrict the radius parameter {r} to be bounded). The condition {d \geq 3} however is not quite necessary; the result is also true when {d=2}, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator {Mf}, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

\displaystyle  Mf(x) \leq M_S f(x)

for any (continuous) {f}, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant {C_{p,d}}. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of {L^p({\bf R}^d)} by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when {d \geq 3} and {p > \frac{d}{d-1}}; and secondly, the constant {C_{d,p}} in that theorem still depends on dimension {d}. The first objection can be easily disposed of, for if {p>1}, then the hypotheses {d \geq 3} and {p > \frac{d}{d-1}} will automatically be satisfied for {d} sufficiently large (depending on {p}); note that the case when {d} is bounded (with a bound depending on {p}) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant {C_{d,p}} in (3) depends on {d}. However, here we can use the method of rotations to show that the constants {C_{p,d}} can be taken to be non-increasing (and hence bounded) in {d}. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that {C_{d+1,p} \leq C_{d,p}}, in the sense that any bound of the form

\displaystyle  \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)

for the {d}-dimensional spherical maximal function, implies the same bound

\displaystyle  \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)

for the {d+1}-dimensional spherical maximal function, with exactly the same constant {A}. For any direction {\omega_0 \in S^d \subset {\bf R}^{d+1}}, consider the averaging operators

\displaystyle  M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)

for any continuous {f: {\bf R}^{d+1} \rightarrow {\bf C}}, where

\displaystyle  A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)

where {U_{\omega_0}} is some orthogonal transformation mapping the sphere {S^{d-1}} to the sphere {S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}; the exact choice of orthogonal transformation {U_{\omega_0}} is irrelevant due to the rotation-invariance of surface measure {d\sigma^{d-1}} on the sphere {S^{d-1}}. A simple application of Fubini’s theorem (after first rotating {\omega_0} to be, say, the standard unit vector {e_d}) using (4) then shows that

\displaystyle  \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)

uniformly in {\omega_0}. On the other hand, by viewing the {d}-dimensional sphere {S^d} as an average of the spheres {S^{d-1,\omega_0}}, we have the identity

\displaystyle  A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of {f} on the sphere {\{ y \in {\bf R}^{d+1}: |y-x|=r\}}. This implies that

\displaystyle  M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant {C_d} for the weak {(1,1)} inequality (1) is independent of dimension, as the weak {L^1} quasinorm {\| \|_{L^{1,\infty}}} is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether {C_d} in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take {C_d = Cd} for some absolute constant {C}, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function

\displaystyle  \sup_{t > 0} e^{t\Delta} |f|(x).

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type {(1,1)} with a constant of {1}, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls {B(x,r)} with cubes, then the weak {(1,1)} constant {C_d} must go to infinity as {d \rightarrow \infty}.

— 1. Proof of spherical maximal inequality —

We now sketch the proof of Stein’s spherical maximal inequality (3) for {d \geq 3}, {p > \frac{d}{d-1}}, and {f \in L^p({\bf R}^d)} continuous. To motivate the argument, let us first establish the simpler estimate

\displaystyle  \| M_S^1 f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)}

where {M_S^1} is the spherical maximal function restricted to unit scales:

\displaystyle  M_S^1 f(x) := \sup_{1 \leq r \leq 2} A_r |f|(x).

For the rest of these notes, we suppress the dependence of constants on {d} and {p}, using {X \lesssim Y} as short-hand for {X \leq C_{p,d} Y}.

It will of course suffice to establish the estimate

\displaystyle  \| \sup_{1 \leq r \leq 2} |A_r f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (7)

for all continuous {f \in L^p({\bf R}^d)}, as the original claim follows by replacing {f} with {|f|}. Also, since the bound is trivially true for {p=\infty}, and we crucially have {\frac{d}{d-1} < 2} in three and higher dimensions, we can restrict attention to the regime {p<2}.

We establish this bound using a Littlewood-Paley decomposition

\displaystyle  f = \sum_N P_N f

where {N} ranges over dyadic numbers {2^k}, {k \in {\bf Z}}, and {P_N} is a smooth Fourier projection to frequencies {|\xi| \sim N}; a bit more formally, we have

\displaystyle  \widehat{P_N f}(\xi) = \psi(\frac{\xi}{N}) \hat f(\xi)

where {\psi} is a bump function supported on the annulus {\{ \xi \in {\bf R}^d: 1/2 \leq |\xi| \leq 2\}} such that {\sum_N \psi(\frac{\xi}{N}) = 1} for all non-zero {\xi}. Actually, for the purposes of proving (7), it is more convenient to use the decomposition

\displaystyle  f = P_{\leq 1} f + \sum_{N>1} P_N f

where {P_{\leq 1} = \sum_{N \leq 1} P_N} is the projection to frequencies {|\xi| \lesssim 1}. By the triangle inequality, it then suffices to show the bounds

\displaystyle  \| \sup_{1 \leq r \leq 2} |A_r P_{\leq 1} f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (8)


\displaystyle  \| \sup_{1 \leq r \leq 2} |A_r P_N f(x)| \|_{L^p({\bf R}^d)} \lesssim N^{-\epsilon} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (9)

for all {N \geq 1} and some {\epsilon>0} depending only on {p,d}.

To prove the low-frequency bound (8), observe that {P_{\leq 1}} is a convolution operator with a bump function, and from this and the radius restriction {1 \leq r \leq 2} we see that {A_r P_{\leq 1}} is a convolution operator with a function of uniformly bounded size and support. From this we obtain the pointwise bound

\displaystyle  A_r P_{\leq 1} f(x) \lesssim Mf(x) \ \ \ \ \ (10)

and the claim (8) follows from (2).

Now we turn to the more interesting high-frequency bound (9). Here, {P_N} is a convolution operator with an approximation to the identity at scale {\sim 1/N}, and so {A_r P_N} is a convolution operator with a function of magnitude {O(N)} concentrated on an annulus of thickness {O(1/N)} around the sphere of radius {R}. This can be used to give the pointwise bound

\displaystyle  A_r P_N f(x) \lesssim N Mf(x), \ \ \ \ \ (11)

which by (2) gives the bound

\displaystyle  \| \sup_{1 \leq r \leq 2} |A_r P_N f(x)| \|_{L^q({\bf R}^d)} \lesssim_q N \|f\|_{L^q({\bf R}^d)} \ \ \ \ \ (12)

for any {q > 1}. This is not directly strong enough to prove (9), due to the “loss of one derivative” as manifested by the factor {N}. On the other hand, this bound (12) holds for all {q>1}, and not just in the range {p > \frac{d}{d-1}}.

To counterbalance this loss of one derivative, we turn to {L^2} estimates. A standard stationary phase computation (or Bessel function computation) shows that {A_r} is a Fourier multiplier whose symbol decays like {|\xi|^{-(d-1)/2}}. As such, Plancherel’s theorem yields the {L^2} bound

\displaystyle  \| A_r P_N f \|_{L^2({\bf R}^d)} \lesssim N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

uniformly in {1 \leq r \leq 2}. But we still have to take the supremum over {r}. This is an uncountable supremum, so one cannot just apply a union bound argument. However, from the uncertainty principle, we expect {P_N f} to be “blurred out” at spatial scale {1/N}, which suggests that the averages {A_r P_N f} do not vary much when {r} is restricted to an interval of size {1/N}. Heuristically, this then suggests that

\displaystyle  \sup_{1 \leq r \leq 2} |A_r P_N f| \sim \sup_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f|.

Estimating the discrete supremum on the right-hand side somewhat crudely by the square-function,

\displaystyle  \sup_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f| \leq (\sum_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f|^2)^{1/2},

and taking {L^2} norms, one is then led to the heuristic prediction that

\displaystyle  \| \sup_{1 \leq r \leq 2} |A_r P_N f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}. \ \ \ \ \ (13)

One can make this heuristic precise using the one-dimensional Sobolev embedding inequality adapted to scale {1/N}, namely that

\displaystyle  \sup_{1 \leq r \leq 2} |g(r)| \lesssim N^{1/2} (\int_1^2 |g(r)|^2\ dr)^{1/2} + N^{-1/2} (\int_1^2 |g'(r)|^2\ dr)^{1/2}.

A routine computation shows that

\displaystyle  \| \frac{d}{dr} A_r P_N f \|_{L^2({\bf R}^d)} \lesssim N \times N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

(which formalises the heuristic that {A_r P_N f} is roughly constant at {r}-scales {1/N}), and this soon leads to a rigorous proof of (13).

An interpolation between (12) and (13) (for {q} sufficiently close to {1}) then gives (9) for some {\epsilon > 0} (here we crucially use that {p > \frac{d}{d-1}} and {p<2}).

Now we control the full maximal function {M_S f}. It suffices to show that

\displaystyle  \| \sup_R \sup_{R \leq r \leq 2R} |A_r f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)},

where {R} ranges over dyadic numbers.

For any fixed {R}, the natural spatial scale is {R}, and the natural frequency scale is thus {1/R}. We therefore split

\displaystyle  f = P_{\leq 1/R} f + \sum_{N > 1} P_{N/R} f,

and aim to establish the bounds

\displaystyle  \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{\leq 1/R} f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (14)


\displaystyle  \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f(x)| \|_{L^p({\bf R}^d)} \lesssim N^{-\epsilon} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (15)

for each {N > 1} and some {\epsilon>0} depending only on {d} and {p}, similarly to before.

A rescaled version of the derivation of (10) gives

\displaystyle  A_r P_{\leq 1/R} f(x) \lesssim Mf(x)

for all {R \leq r \leq 2R}, which already lets us deduce (14). As for (15), a rescaling of (11) gives

\displaystyle  A_r P_{N/R} f(x) \lesssim N Mf(x),

for all {R \leq r \leq 2R}, and thus

\displaystyle  \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f(x)| \|_{L^q({\bf R}^d)} \lesssim N \|f\|_{L^q({\bf R}^d)} \ \ \ \ \ (16)

for all {q>1}. Meanwhile, at the {L^2} level, we have

\displaystyle  \| A_r P_{N/R} f \|_{L^2({\bf R}^d)} \lesssim N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}


\displaystyle  \| \frac{d}{dr} A_r P_{N/R} f \|_{L^2({\bf R}^d)} \lesssim \frac{N}{R} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

and so

\displaystyle  \| (\frac{1}{R} \int_R^{2R} |A_r P_{N/R} f|^2\ dr)^{1/2} + (\frac{R}{N^2} \int_R^{2R} |\frac{d}{dr} A_r P_{N/R} f|^2\ dr)^{1/2} \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

which implies by rescaled Sobolev embedding that

\displaystyle  \| \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}.

In fact, by writing {P_{N/R} f = P_{N/R} \tilde P_{N/R} f}, where {\tilde P_{N/R}} is a slight widening of {P_{N/R}}, we have

\displaystyle  \| \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|\tilde P_{N/R} f\|_{L^2({\bf R}^d)};

square summing this (and bounding a supremum by a square function) and using Plancherel we obtain

\displaystyle  \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}.

Interpolating this against (16) as before we obtain (15) as required.