This post is a sequel of sorts to my earlier post on hard and soft analysis, and the finite convergence principle. Here, I want to discuss a well-known theorem in infinitary soft analysis – the Lebesgue differentiation theorem – and whether there is any meaningful finitary version of this result. Along the way, it turns out that we will uncover a simple analogue of the Szemerédi regularity lemma, for subsets of the interval rather than for graphs. (Actually, regularity lemmas seem to appear in just about any context in which fine-scaled objects can be approximated by coarse-scaled ones.) The connection between regularity lemmas and results such as the Lebesgue differentiation theorem was recently highlighted by Elek and Szegedy, while the connection between the finite convergence principle and results such as the pointwise ergodic theorem (which is a close cousin of the Lebesgue differentiation theorem) was recently detailed by Avigad, Gerhardy, and Towsner.

The Lebesgue differentiation theorem has many formulations, but we will avoid the strongest versions and just stick to the following model case for simplicity:

Lebesgue differentiation theorem. If f: [0,1] \to [0,1] is Lebesgue measurable, then for almost every x \in [0,1] we have f(x) = \lim_{r \to 0} \frac{1}{r} \int_x^{x+r} f(y)\ dy. Equivalently, the fundamental theorem of calculus f(x) = \frac{d}{dy} \int_0^y f(z) dz|_{y=x} is true for almost every x in [0,1].

Here we use the oriented definite integral, thus \int_x^y = - \int_y^x. Specialising to the case where f = 1_A is an indicator function, we obtain the Lebesgue density theorem as a corollary:

Lebesgue density theorem. Let A \subset [0,1] be Lebesgue measurable. Then for almost every x \in A, we have \frac{|A \cap [x-r,x+r]|}{2r} \to 1 as r \to 0^+, where |A| denotes the Lebesgue measure of A.

In other words, almost all the points x of A are points of density of A, which roughly speaking means that as one passes to finer and finer scales, the immediate vicinity of x becomes increasingly saturated with A. (Points of density are like robust versions of interior points, thus the Lebesgue density theorem is an assertion that measurable sets are almost like open sets. This is Littlewood’s first principle.) One can also deduce the Lebesgue differentiation theorem back from the Lebesgue density theorem by approximating f by a finite linear combination of indicator functions; we leave this as an exercise.

The Lebesgue differentiation and density theorems are qualitative in nature: they assert that \frac{1}{r} \int_x^{x+r} f(y)\ dy eventually gets close to f(x) for almost every x, or that A will eventually occupy most of [x-r,x+r] for almost every x in A, by taking r small enough, but does not give a quantitative bound for how small r has to be. The following simple example shows why there is a problem. Let n be a large integer, and partition [0,1] into 2^n dyadic intervals {}[0,1/2^n], [1/2^n,2/2^n], \ldots, [1-1/2^n,1] (never mind about the overlaps on the boundary, as they have zero measure, and the Lebesgue philosophy in analysis is to always ignore anything which happens on sets of measure zero). Let A_n be the union of every other interval:

A_n = [0,1/2^n] \cup [2/2^n,3/2^n] \cup \ldots \cup [1-2/2^n,1-1/2^n].

One then sees that if x is any element of A_n which is not on the boundary, then it is indeed true that the local density \frac{|A_n \cap [x-r,x+r]|}{2r} of A_n will eventually converge to 1, but one has to wait until r is of size 1/2^n or smaller before one sees this; for scales much larger than this, the local density will remain stubbornly close to 1/2. A similar phenomenon holds for the indicator functions f_n := 1_{A_n}: the local average \frac{1}{r} \int_x^{x+r} f_n(y)\ dy will eventually get close to f_n(x), which is either 0 or 1, but when |r| \gg 1/2^n, these averages will also stay close to 1/2. (Closely related to this is the fact that the functions f_n converge weakly to 1/2, despite only taking values in {0,1}.)

Intuitively, what is going on here is that while each set A_n is certainly Lebesgue measurable, these sets are getting increasingly “less measurable” as n gets large, and the rate of convergence in the Lebesgue differentiation and density theorems depends on how measurable the sets A_n are. One can illustrate this by considering (non-rigorously) the limiting case n=\infty as follows. Suppose we select a random subset A_\infty of [0,1] by requiring each real number x in [0,1] to lie in A_\infty with an independent probability of 1/2 (thus we are flipping an uncountable number of coins to determine this set!). The law of large numbers (applied very non-rigorously!) then suggests that with probability 1, A_\infty should have density 1/2 in every single interval I in [0,1], thus |A_\infty \cap I| = \frac{1}{2} |I|. This would seem to violate the Lebesgue density theorem; but what is going on here is that the set A_\infty is in fact almost surely non-measurable (indeed, the Lebesgue density theorem provides a proof of this fact, modulo the issues of justifying several non-rigorous claims in this paragraph).

So, it seems that to proceed further we need to quantify the notion of measurability, in order to decide which sets or functions are “more measurable” than others. There are several ways to make such a quantification. Here are some typical proposals:

Definition. A set A \subset [0,1] is (\epsilon,n)-measurable if there exists a set B which is the union of dyadic intervals {}[j/2^n, (j+1)/2^n] at scale 2^{-n}, such that A and B only differ on a set of Lebesgue measure (or outer measure) \epsilon.

Definition. A function f: [0,1] \to [0,1] is (\epsilon,n)-measurable if there exists a function g which is constant on the dyadic intervals {}[j/2^n, (j+1)/2^n], which differs from f in L^1-norm by at most \epsilon, thus \int_0^1 |f(x)-g(x)|\ dx \leq \epsilon.

(One can phrase these definitions using the \sigma-algebra generated by the dyadic intervals of length 2^{-n}; we will not do so here, but these \sigma-algebras are certainly underlying the ideas here. Their presence is particuarly prominent in the “ergodic theory” approach to this circle of ideas, which we are not focusing on here.)

Then one can obtain the following quantitative results:

Quantitative Lebesgue differentiation theorem. Let f: [0,1] \to [0,1] be (\epsilon,n)-measurable. Then for all x in [0,1] outside of a set of measure O(\sqrt{\epsilon}) we have \frac{1}{r} \int_x^{x+r} f(y)\ dy= f(x) + O(\sqrt{\epsilon}) for all 0 < r < \sqrt{\epsilon} 2^{-n}.

Quantitative Lebesgue density theorem. Let A \subset [0,1] be (\epsilon,n)-measurable. Then for all x in A outside of a set of measure O(\sqrt{\epsilon}) we have |A \cap [x-r,x+r]| = (1 - O(\sqrt{\epsilon})) 2r for all 0 < r < \sqrt{\epsilon} 2^{-n}.

These results follow quickly from the Hardy-Littlewood maximal inequality, and by exploiting the low “complexity” of the structured objects B and g which approximate A and f respectively; we omit the standard arguments.

To connect these quantitative results back to their qualitative counterparts, one needs to connect the quantitative notion of (\epsilon,N)-measurability with the traditional qualitative notion of Lebesgue measurability. The relevant result that provides this connection is

Lebesgue approximation theorem, first version. Let A \subset [0,1] be measurable. Then for every \epsilon > 0 there exists n such that A is (\epsilon, n)-measurable.

Lebesgue approximation theorem, second version. Let f: [0,1] \to [0,1] be measurable. Then for every \epsilon > 0 there exists n such that f is (\epsilon, n)-measurable.

These two results are easily seen to be equivalent. Let us quickly recall the proof of the first version:

Proof. The claim is easily verified when A is the finite union of dyadic intervals, and then by monotone convergence one also verifies the claim when A is compact (or open). One then verifies that the claim is closed under countable unions, intersections, and complements, which then gives the claim for all Borel-measurable sets. The claim is also obviously true for null sets, and thus true for Lebesgue-measurable sets. \Box

So, we’re done, right? Well, there is still an unsatisfactory issue: the Lebesgue approximation theorems guarantee, for any given \epsilon, that a measurable set A or a measurable function f will eventually be (\epsilon,n)-measurable by taking n large enough, but don’t give any bound as to what this n will be. In a sense, this is unavoidable, even if we consider “nice” objects such as compact sets A or piecewise constant functions f; the example of the set A_n or the function f_n discussed previously show that for fixed \epsilon, one can be forced to take n to be arbitrarily large.

However, we can start looking for substitutes for these theorems which do have quantitative bounds. Let’s focus on the first version of the Lebesgue approximation theorem, and in particular in the case when A is compact. Then, we can write A = \bigcap_{n=1}^\infty A^{(n)}, where A^{(n)} is the union of all the (closed) dyadic intervals which intersect A. The measures |A^{(n)}| are a monotone decreasing sequence of numbers between 0 and 1, and thus (by the infinite convergence principle!) they have a limit, which (by the upper continuity of Lebesgue measure) is just |A|. Thus, for every \epsilon > 0 we have |A^{(n)}| - |A| < \epsilon for all sufficiently large n, which explains why A is (\epsilon,n)-measurable for all large n.

So now we see where the lack of a bound on n is coming from – it is the fact that the infinite convergence principle also does not provide an effective bound on the rate of convergence. But in the previous posting on this topic, we saw how the finite convergence theorem did provide an effective substitute for the infinite convergence principle. If we apply it directly to this sequence |A^{(n)}|, this is what we get:

Effective Lebesgue approximation theorem. Let F: {\Bbb N} \to {\Bbb N} be any function, and let \epsilon > 0. Then there exists an integer N with the following property: given any subset A of [0,1], there exists 1 \leq n \leq N such that A^{(n+F(n))} is (\epsilon, n)-measurable.

This theorem does give a specific upper bound on the scale n one has to reach in order to get quantitative measurability. The catch, though, is that the measurability is not attained for the original set A, but instead on some discretisation A^{(n+F(n))} of A. However, we can make the scale at which we are forced to discretise to be arbitrarily finer than the scale at which we have the measurability.

Nevertheless, this theorem is still a little unsatisfying, because it did not directly say too much about the original set A. There is an alternate approach which gives a more interesting result. In the previous results, the goal was to try to approximate an arbitrary object (a set or a function) by a “structured” or “low-complexity” one (a finite union of intervals, or a piecewise constant function), thus trying to move away from “pseudorandom” or “high-complexity” objects (such as the sets A_n and functions f_n discussed earlier). Of course, the fact that these pseudorandom objects actually exist is what is making this goal difficult to achieve satisfactorily. However, one can adopt a different philosophy, namely to embrace both the structured and the pseudorandom aspects of these objects, and focus instead on creating an efficient decomposition of arbitrary objects into the structured and pseudorandom components.

To do this, we need to understand what “pseudorandom” means. One clue is to look at the examples A_n and f_n discussed earlier. Observe that if one averages f_n on any reasonable sized interval J, one gets something very close to the global average of f_n, i.e. 1/2. In other words, the integral of f_n on an interval J is close to the global average of f_n times |J|. (This is also true when J is a small interval, since in this case both expressions are small.) This motivates the following definition:

Definition. A function f: [0,1] \to [0,1] is said to be \epsilon-regular on a dyadic interval I if we have |\int_J f(y)\ dy - \frac{|J|}{|I|} \int_I f(y)\ dy| \leq \epsilon |I| for all dyadic subintervals J \subset I.

Thus, for instance, f_n is 2^{-n}-regular on [0,1]. We then have an analogue of the Szemerédi regularity lemma for subsets of the interval, which I will dub the “Lebesgue regularity lemma”:

Lebesgue regularity lemma. If \epsilon > 0 and f: [0,1] \to [0,1] is measurable, then there exists an positive integer n = O_\epsilon(1) (i.e. n is bounded by a quantity depending only on \epsilon), such that f is \epsilon-regular on all but at most \epsilon 2^n of the 2^n dyadic intervals of length 2^{-n}.

Proof. As with the proof of many other regularity lemmas, we shall rely primarily on the energy increment argument (the energy is also known as the index in some literature). For minor notational reasons we will take \epsilon to be a negative power of 2. For each integer n, let f^{(n)}: [0,1] \to [0,1] be the conditional expectation of f to the dyadic intervals of length 2^{-n}, thus on each such interval I, f^{(n)} is equal to the constant value \frac{1}{I} \int_I f (again, we are ignoring sets of measure zero). An easy application of Pythagoras’s theorem (for L^2([0,1])) shows that the energies E_n := \int_0^1 |f^{(n)}(x)|^2\ dx are an increasing sequence in n, and bounded between 0 and 1. Applying (a special case of) the finite convergence principle, we can find n = O_\epsilon(1) such that we have the energy metastability

E_{n+\log_2 1/\epsilon} - E_n \leq \epsilon^3.

(Indeed, we obtain a fairly civilised bound of n \leq \epsilon^3 \log_2 1/\epsilon). Applying Pythagoras’ theorem again, we conclude

\int_0^1 |f^{(n+\log_2 1/\epsilon)}(x) - f^{(n)}(x)|^2 dx \leq \epsilon^3

which by Markov’s inequality implies that

\int_I |f^{(n+\log_2 1/\epsilon)}(x) - f^{(n)}(x)|^2 dx \leq \epsilon^2 |I|

for all but \epsilon 2^{n} of the 2^n dyadic intervals I of length 2^{-n}. The Cauchy-Schwarz inequality then quickly shows that f is \epsilon-regular on all of these intervals. \Box

One can of course specialise the above lemma to indicator functions f = 1_A to obtain a regularity lemma for subsets of the interval, whose formulation I will leave to the reader as an exercise.

An inspection of the proof shows that the full power of the finite convergence principle was not tapped here, because we only used it to get a metastability region of \log_2 1/\epsilon. One can get a stronger result (at the cost of worsening the bound on n) by extending this region of stability. To motivate this stronger version, first observe that if a function f is \epsilon-regular on an interval I, then on that interval we have a decomposition f = c+h on I where c = \frac{1}{|I|} \int_I f(y) dy is the mean of f on I (in particular, c is constant), and h has small averages in the sense that |\int_J h(y)\ dy| \leq \epsilon |I| for all dyadic subintervals J of I. We can do better than this:

Definition. A function f: [0,1] \to [0,1] is said to be strongly (\epsilon,m)-regular on a dyadic interval I if there exists a decomposition f=c+e+h on I, where c = \frac{1}{|I|} \int_I f(y) dy is the mean of f on I, e is small in the sense that \frac{1}{|I|} \int_I |e(y)|\ dy \leq \epsilon, and h has vanishing averages in the sense that \int_J h(y) dy = 0 for all dyadic subintervals J \subset I with |J| \geq 2^{-m} |I|.

Note that strong (\epsilon,m)-regularity implies 2\epsilon-regularity as long as 2^{-m} \leq \epsilon. Strong (\epsilon,m)-regularity offers much better control on the fluctuation of f at finer scales, as long as the scale is not too fine (this is where the parameter m comes in).

Strong Lebesgue regularity lemma. If \epsilon > 0, F: {\Bbb N} \to {\Bbb N}, and f: [0,1] \to [0,1] is measurable, then there exists an positive integer n = O_{\epsilon,F}(1) such that f is (\epsilon,F(n))-regular on all but at most \epsilon 2^n of the 2^n dyadic intervals of length 2^{-n}.

The bound on n is rather poor, it is basically a 1/\epsilon^3-fold iteration of the function n \mapsto n + F(n) \log_2 1/\epsilon applied to 1, so for instance if one wanted F to be exponential in nature then n might be as large as a tower of exponentials of height 1/\epsilon^3 or so. (A very similar situation occurs for the Szemerédi regularity lemma, which has a variety of such strong versions, including one by Alon-Fischer-Krivelevich-Szegedy, one by Rödl-Schacht, and one by myself.)

We can now return to the Lebesgue differentiation theorem, and use the strong regularity lemma to obtain a more satisfactory quantitative version of that theorem:

Quantitative Lebesgue differentiation theorem. If \epsilon > 0, F: {\Bbb N} \to {\Bbb N}, and f: [0,1] \to [0,1] is measurable, then there exists an positive integer n = O_{\epsilon,F}(1) such that for all x in [0,1] outside of a set of measure O(\epsilon) we have the Cauchy sequence property |\frac{1}{r} \int_x^{x+r} f(y)\ dy - \frac{1}{s} \int_x^{x+s} f(y)\ dy| \leq \epsilon for all 2^{-n-F(n)} < r, s < 2^{-n}.

This theorem can be deduced fairly easily by combining the strong regularity lemma with the Hardy-Littlewood maximal inequality (to deal with the errors e), and by covering (most of) the non-dyadic intervals [x,x+r] or [x,x+s] by dyadic intervals and using the boundedness of f to deal with the remainder. We leave the details as an exercise.

One sign that this is a true finitary analogue of the infinitary differentiation theorem is that this finitary theorem implies most of the infinitary theorem; namely, it shows that for any measurable f, and almost every x, the sequence \frac{1}{r} \int_x^{x+r} f(y) is a Cauchy sequence as r \to 0, although it does not show that the limit is equal to f(x). (Finitary statements can handle Cauchy sequences – which make sense even in the rationals – but have some trouble actually evaluating the limits of such sequences – which need the (infinite precision) real numbers and thus not truly finitary.) Conversely, using weak compactness methods one can deduce the quantitative differentiation theorem from the infinitary one, in much the same way that the finite and infinite convergence principles can be deduced from each other.

The strong Lebesgue regularity lemma can also be used to deduce the (one-dimensional case of the) Rademacher differentiation theorem, namely that a Lipschitz continuous function from [0,1] to the reals is almost everywhere differentiable. To see this, suppose for contradiction that we could find a function g which was Lipschitz continuous but failed to be differentiable on a set of positive measure, thus for every x in this set, the (continuous) sequence \frac{g(x+r)-g(x)}{r} is not a Cauchy sequence as r goes to zero. We can normalise the Lipschitz constant of g to equal 1. Then by standard arguments we can find \epsilon > 0 and a function F: {\Bbb N} \to {\Bbb N} such that for every x in a set of measure greater than \epsilon, and every n, the sequence \frac{g(x+r)-g(x)}{r} fluctuates by at least \epsilon in the range 2^{-n-F(n)} < r < 2^{-n}. Now let M be a very large integer (depending on \epsilon and F) and discretise g to scale 2^{-M} to create a piecewise linear approximant g_M, which is the antiderivative of a bounded function f which is constant on dyadic intervals of length 2^{-M}. We apply the strong Lebesgue regularity lemma to f and find a scale n = O_{F,\epsilon}(1) for which the conclusion of that lemma holds; by choosing n large enough we can ensure that M \geq n+F(n) \geq n. It is then not hard to see that the lemma contradicts the previous assertion that \frac{g(x+r)-g(x)}{r} fluctuates for certain ranges of x and r.

In my next post I will talk about a new paper of mine using some of these ideas to establish a quantitative version of the Besicovitch projection theorem.

[Update, June 19: Definition of strong regularity corrected.]