Assaf Naor and I have just uploaded to the arXiv our paper “Random Martingales and localization of maximal inequalities“, to be submitted shortly. This paper investigates the best constant in generalisations of the classical Hardy-Littlewood maximal inequality

for any absolutely integrable {f: {\mathbb R}^n \rightarrow {\mathbb R}}, where {B(x,r)} is the Euclidean ball of radius {r} centred at {x}, and {|E|} denotes the Lebesgue measure of a subset {E} of {{\mathbb R}^n}. This inequality is fundamental to a large part of real-variable harmonic analysis, and in particular to Calderón-Zygmund theory. A similar inequality in fact holds with the Euclidean norm replaced by any other convex norm on {{\mathbb R}^n}.

The exact value of the constant {C_n} is only known in {n=1}, with a remarkable result of Melas establishing that {C_1 = \frac{11+\sqrt{61}}{12}}. Classical covering lemma arguments give the exponential upper bound {C_n \leq 2^n} when properly optimised (a direct application of the Vitali covering lemma gives {C_n \leq 5^n}, but one can reduce {5} to {2} by being careful). In an important paper of Stein and Strömberg, the improved bound {C_n = O( n \log n )} was obtained for any convex norm by a more intricate covering norm argument, and the slight improvement {C_n = O(n)} obtained in the Euclidean case by another argument more adapted to the Euclidean setting that relied on heat kernels. In the other direction, a recent result of Aldaz shows that {C_n \rightarrow \infty} in the case of the {\ell^\infty} norm, and in fact in an even more recent preprint of Aubrun, the lower bound {C_n \gg_\epsilon \log^{1-\epsilon} n} for any {\epsilon > 0} has been obtained in this case. However, these lower bounds do not apply in the Euclidean case, and one may still conjecture that {C_n} is in fact uniformly bounded in this case.

Unfortunately, we do not make direct progress on these problems here. However, we do show that the Stein-Strömberg bound {C_n = O(n \log n)} is extremely general, applying to a wide class of metric measure spaces obeying a certain “microdoubling condition at dimension {n}“; and conversely, in such level of generality, it is essentially the best estimate possible, even with additional metric measure hypotheses on the space. Thus, if one wants to improve this bound for a specific maximal inequality, one has to use specific properties of the geometry (such as the connections between Euclidean balls and heat kernels). Furthermore, in the general setting of metric measure spaces, one has a general localisation principle, which roughly speaking asserts that in order to prove a maximal inequality over all scales {r \in (0,+\infty)}, it suffices to prove such an inequality in a smaller range {r \in [R, nR]} uniformly in {R>0}. It is this localisation which ultimately explains the significance of the {n \log n} growth in the Stein-Strömberg result (there are {O(n \log n)} essentially distinct scales in any range {[R,nR]}). It also shows that if one restricts the radii {r} to a lacunary range (such as powers of {2}), the best constant improvees to {O(\log n)}; if one restricts the radii to an even sparser range such as powers of {n}, the best constant becomes {O(1)}.

The method of proof is a little unusual from a harmonic analysis perspective, though it is now quite standard in computer science and metric space geometry. The main idea is to try to convert the underlying metric to an ultrametric, such as those arising from dyadic models of Euclidean space (e.g. from infinite binary trees). The reason for this is that maximal inequalities are easy to prove for ultrametrics, indeed the Hardy-Littlewood inequality is always true in such settings with constant {C=1}. A key reason for this is the following nesting property: in an ultrametric space, if two balls {B(x,r)}, {B(x,r')} intersect, then the larger ball in fact contains the smaller ball. Because of this, the Hardy-Littlewood maximal inequality becomes essentially trivial, as can be seen by considering maximal balls {B(x,r)} in which the function {f} has mean at least {\lambda}, which are disjoint by the ultrametric property.

Now, a direct application of this idea to such metric as the Euclidean metric gives losses that are exponential in the dimension, mainly because one has to triple (resp. double) the size of a ball {B(x,r)} before it will capture all the balls {B(x',r')} of lesser radius that intersect it (resp. contain the centre {x} of the original ball). Indeed, this is morally speaking where the Vitali covering lemma bound {C_n \leq 2^n} comes from. The standard way to say this is that the doubling constant {\sup_{x,r} |B(x,2r)|/|B(x,r)|} of the Euclidean metric is exponentially large (indeed, it is {2^n}), and indeed it is easy to construct examples of metric measure spaces where the best constant in the Hardy-Littlewood inequality is as large as the doubling constant. (The canonical example here is the star graph with the graph metric and counting measure, where the doubling constant and maximal constant are both essentially the degree of the star.)

But suppose that if instead of trying to capture all balls of lesser radius, one only wanted to capture all balls {B(x',r')} intersecting a given ball of much smaller radius, and in particular if {r' \leq r/n}. Then one only needs to enlarge the original ball {B(x,r)} by a factor of {1+2/n} to achieve this, and this only increases the volume by {(1+2/n)^n \approx e^2 = O(1)}. Thus we see that the Euclidean metric behaves “like an ultrametric” so long as one can somehow separate scales by a ratio of {n} or more. (This observation has also turned out to be crucial for efficient estimates in modern additive combinatorics, resulting in a technique of scale separation for balls (or related objects such as Bohr sets) that is sometimes informally referred to as “Bourgainisation” due to the breakthrough work of Bourgain on Roth’s theorem, which introduced the technique.) This is the intuition behind the localisation result.

How can one formalise this intuition? To avoid some annoying technical issues it is convenient to work on the unit torus {({\mathbb R}/{\mathbb Z})^n} (with the induced Euclidean metric) rather than the Euclidean space {{\mathbb R}^n}; note that a simple scaling argument shows that the maximal inequality constants for the former control those of the latter.

Let us impose a random {n}-ary mesh on the torus by identifying the torus with a randomly translated copy of the unit cube {[0,1]^n}, then partitioning this cube into {n^n} subcubes of length {n^{-1}}, each of which is partitioned further into {n^n} subcubes of length {n^{-2}}, and so forth.

This mesh induces an ultrametric on the torus, with the distance between two distinct points {x, y} defined as {n^{-j}} where {n^{-j}} is the length of the minimal cube in the mesh that contains both {x} and {y}; this is basically the ultrametric coming from viewing the mesh as an infinite {n^n}-ary tree. However, this ultrametric is not particularly close to the Euclidean metric; in particular, two points close to, but on opposite sides, of a cube in the mesh will be close in the Euclidean metric but far apart in the ultrametric.

But one can avoid this issue by the standard trick of padding the mesh. Suppose one takes every cube in the mesh and shrinks it by a factor of {1/n}, thus decreasing the volume by {(1-1/n)^n \approx 1/e \sim 1}; call this the padded cube at this scale. Note from the random nature of the mesh construction that every point has a constant probability of lying in the padded cube at a fixed scale; in particular, by Fubini’s theorem (aka the first moment method), given a measurable set {E} in the torus, the padded cubes at a fixed scale will capture a constant fraction of {E}. It is because of this that we can afford to restrict the maximal function to padded cubes without significantly losing control of the constants (let me be a bit vague here about exactly how one does this).

The key point is that inside a padded cube of sidelength {R}, any ball of radius {R/n} or less intersecting this padded cube, will not intersect any other padded cube at this scale. Thus, balls at this scale or less behave somewhat as if the Euclidean metric was an ultrametric, and this turns out to be enough to achieve localisation.

This argument turns out to be extremely general, and applies to any metric measure space {(X,d,\mu)} (i.e. a metric space with a Radon measure) for which one has the microdoubling condition {\mu(B(x,(1+\frac{1}{n})r)) = O( \mu(x,r))}. In particular, this applies to Alhfors-David regular spaces in which {\mu(B(x,r))} is comparable to {\omega_n r^n} for some constant {\omega_n > 0}. Instead of using cubes to create the mesh, it turns out that one can just use randomly placed balls whose radii are essentially powers of {n} to achieve a much stranger looking, but functionally comparable, mesh.

Because of microdoubling, one can assume without loss of generality that all radii are powers of {(1+\frac{1}{n})}, since one can round off to the nearest such power without much loss. There are only {O(n \log n)} such powers in any range {[R,nR]}, which thus gives the Stein-Strömberg result in the general microdoubling setting. (We also give an alternate proof of this result, adapting a slightly different probabilistic argument of Lindenstrauss.)

On the other hand, one can concoct microdoubling spaces in which the balls of radius {(1+\frac{1}{n})^j} for {j = O(n \log n)} are genuinely “different”, so much so that the best constant in the maximal inequality is {\gg n \log n}, even if one makes reasonable assumptions such as translation-invariance (assuming an abelian group structure on the space), Ahlfors-David regularity, and {L^p} bounds (the maximal operator is bounded on {L^p} for every {p>1}). The basic counterexample here is constructed by working in a finite field vector space with counting measure and creating a rather unusual (but still Ahlfors-David regular) metric which, for each {j}, distributes a large portion of the mass of the ball of radius {(1+\frac{1}{n})^j} centred at the origin to a different subspace of the vector space. This essentially converts the Hardy-Littlewood maximal function to a maximal function over averages of unrelated subspaces, which one can show to be large by straightforward computations.

Finally, we give an example of a natural space without any microdoubling properties whatsoever for which one still has a maximal inequality, namely the free non-abelian group on a bounded number of generators. Here one can rely instead on the expander (or isoperimetric) properties of the Cayley graph. This suggests to us that one should have a Hardy-Littlewood maximal inequality for measure-preserving actions of the free group on a probability space, but unfortunately the standard transference technology for doing this does not apply because the free group is highly non-amenable. This question therefore remains open (although {L^p} bounds were previously established by Nevo and Stein).