You are currently browsing the tag archive for the ‘Lebesgue differentiation theorem’ tag.

There are a number of ways to construct the real numbers {{\bf R}}, for instance

  • as the metric completion of {{\bf Q}} (thus, {{\bf R}} is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
  • as the space of Dedekind cuts on the rationals {{\bf Q}};
  • as the space of quasimorphisms {\phi: {\bf Z} \rightarrow {\bf Z}} on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)

There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number {N \in {}^* {\bf N} \backslash {\bf N}}, one can define two external additive subgroups of the nonstandard integers {{}^* {\bf Z}}:

  • The group {O(N) := \{ n \in {}^* {\bf Z}: |n| \leq CN \hbox{ for some } C \in {\bf N} \}} of all nonstandard integers of magnitude less than or comparable to {N}; and
  • The group {o(N) := \{ n \in {}^* {\bf Z}: |n| \leq C^{-1} N \hbox{ for all } C \in {\bf N} \}} of nonstandard integers of magnitude infinitesimally smaller than {N}.

The group {o(N)} is a subgroup of {O(N)}, so we may form the quotient group {O(N)/o(N)}. This space is isomorphic to the reals {{\bf R}}, and can in fact be used to construct the reals:

Proposition 1 For any coset {n + o(N)} of {O(N)/o(N)}, there is a unique real number {\hbox{st} \frac{n}{N}} with the property that {\frac{n}{N} = \hbox{st} \frac{n}{N} + o(1)}. The map {n + o(N) \mapsto \hbox{st} \frac{n}{N}} is then an isomorphism between the additive groups {O(N)/o(N)} and {{\bf R}}.

Proof: Uniqueness is clear. For existence, observe that the set {\{ x \in {\bf R}: Nx \leq n + o(N) \}} is a Dedekind cut, and its supremum can be verified to have the required properties for {\hbox{st} \frac{n}{N}}. \Box

In a similar vein, we can view the unit interval {[0,1]} in the reals as the quotient

\displaystyle  [0,1] \equiv [N] / o(N) \ \ \ \ \ (1)

where {[N]} is the nonstandard (i.e. internal) set {\{ n \in {\bf N}: n \leq N \}}; of course, {[N]} is not a group, so one should interpret {[N]/o(N)} as the image of {[N]} under the quotient map {{}^* {\bf Z} \rightarrow {}^* {\bf Z} / o(N)} (or {O(N) \rightarrow O(N)/o(N)}, if one prefers). Or to put it another way, (1) asserts that {[0,1]} is the image of {[N]} with respect to the map {\pi: n \mapsto \hbox{st} \frac{n}{N}}.

In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on {[N]}. Given an internal subset {A} of {[N]}, we may define the elementary measure {\mu_0(A)} of {A} by the formula

\displaystyle  \mu_0(A) := \hbox{st} \frac{|A|}{N}.

This is a finitely additive probability measure on the Boolean algebra of internal subsets of {[N]}. We can then construct the Loeb outer measure {\mu^*(A)} of any subset {A \subset [N]} in complete analogy with Lebesgue outer measure by the formula

\displaystyle  \mu^*(A) := \inf \sum_{n=1}^\infty \mu_0(A_n)

where {(A_n)_{n=1}^\infty} ranges over all sequences of internal subsets of {[N]} that cover {A}. We say that a subset {A} of {[N]} is Loeb measurable if, for any (standard) {\epsilon>0}, one can find an internal subset {B} of {[N]} which differs from {A} by a set of Loeb outer measure at most {\epsilon}, and in that case we define the Loeb measure {\mu(A)} of {A} to be {\mu^*(A)}. It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space {{\mathcal L}} of Loeb measurable sets is a {\sigma}-algebra, and that {\mu} is a countably additive probability measure on this space that extends the elementary measure {\mu_0}. Thus {[N]} now has the structure of a probability space {([N], {\mathcal L}, \mu)}.

Now, the group {o(N)} acts (Loeb-almost everywhere) on the probability space {[N]} by the addition map, thus {T^h n := n+h} for {n \in [N]} and {h \in o(N)} (excluding a set of Loeb measure zero where {n+h} exits {[N]}). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor {Z^0_{o(N)}([N]) = ([N], {\mathcal L}^{o(N)}, \mu\downharpoonright_{{\mathcal L}^{o(N)}})}, defined by restricting attention to those Loeb measurable sets {A \subset [N]} with the property that {T^h A} is equal {\mu}-almost everywhere to {A} for each {h \in o(N)}.

The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval {[0,1]} with Lebesgue measure {m} (and the trivial action of {o(N)}), by the same factor map {\pi: n \mapsto \hbox{st} \frac{n}{N}} used in (1). More precisely:

Theorem 2 Given a set {A \in {\mathcal L}^{o(N)}}, there exists a Lebesgue measurable set {B \subset [0,1]}, unique up to {m}-a.e. equivalence, such that {A} is {\mu}-a.e. equivalent to the set {\pi^{-1}(B) := \{ n \in [N]: \hbox{st} \frac{n}{N} \in B \}}. Conversely, if {B \in [0,1]} is Lebesgue measurable, then {\pi^{-1}(B)} is in {{\mathcal L}^{o(N)}}, and {\mu( \pi^{-1}(B) ) = m( B )}.

More informally, we have the measure-theoretic version

\displaystyle  [0,1] \equiv Z^0_{o(N)}( [N] )

of (1).

Proof: We first prove the converse. It is clear that {\pi^{-1}(B)} is {o(N)}-invariant, so it suffices to show that {\pi^{-1}(B)} is Loeb measurable with Loeb measure {m(B)}. This is easily verified when {B} is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of {\pi^{-1}(E)} is bounded by the Lebesgue outer measure of {E} for any set {E \subset [0,1]}; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.

Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let {A \in {\mathcal L}^{o(N)}}. Let {\epsilon>0} be an arbitrary standard real number, then we can find an internal set {A_\epsilon \subset [N]} which differs from {A} by a set of Loeb measure at most {\epsilon}. As {A} is {o(N)}-invariant, we conclude that for every {h \in o(N)}, {A_\epsilon} and {T^h A_\epsilon} differ by a set of Loeb measure (and hence elementary measure) at most {2\epsilon}. By the (contrapositive of the) underspill principle, there must exist a standard {\delta>0} such that {A_\epsilon} and {T^h A_\epsilon} differ by a set of elementary measure at most {2\epsilon} for all {|h| \leq \delta N}. If we then define the nonstandard function {f_\epsilon: [N] \rightarrow {}^* {\bf R}} by the formula

\displaystyle  f(n) := \hbox{st} \frac{1}{\delta N} \sum_{m \in [N]: m \leq n \leq m+\delta N} 1_{A_\epsilon}(m),

then from the (nonstandard) triangle inequality we have

\displaystyle  \frac{1}{N} \sum_{n \in [N]} |f(n) - 1_{A_\epsilon}(n)| \leq 3\epsilon

(say). On the other hand, {f} has the Lipschitz continuity property

\displaystyle  |f(n)-f(m)| \leq \frac{2|n-m|}{\delta N}

and so in particular we see that

\displaystyle  \hbox{st} f(n) = \tilde f( \hbox{st} \frac{n}{N} )

for some Lipschitz continuous function {\tilde f: [0,1] \rightarrow [0,1]}. If we then let {E_\epsilon} be the set where {\tilde f \geq 1 - \sqrt{\epsilon}}, one can check that {A_\epsilon} differs from {\pi^{-1}(E_\epsilon)} by a set of Loeb outer measure {O(\sqrt{\epsilon})}, and hence {A} does so also. Sending {\epsilon} to zero, we see (from the converse claim) that {1_{E_\epsilon}} is a Cauchy sequence in {L^1} and thus converges in {L^1} for some Lebesgue measurable {E}. The sets {A_\epsilon} then converge in Loeb outer measure to {\pi^{-1}(E)}, giving the claim. \Box

Thanks to the Lebesgue differentiation theorem, the conditional expectation {{\bf E}( f | Z^0_{o(N)}([N]))} of a bounded Loeb-measurable function {f: [N] \rightarrow {\bf R}} can be expressed (as a function on {[0,1]}, defined {m}-a.e.) as

\displaystyle  {\bf E}( f | Z^0_{o(N)}([N]))(x) := \lim_{\epsilon \rightarrow 0} \frac{1}{2\epsilon} \int_{[x-\epsilon N,x+\epsilon N]} f\ d\mu.

By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts {T^h f}, {h = o(N)} of minimal {L^2} norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages {\frac{1}{H} \sum_{h=1}^H T^h f} for {H=O(N)} converge (as a net, rather than a sequence) in {L^2} to {{\bf E}( f | Z^0_{o(N)}([N]))}.

If {f: [N] \rightarrow [-1,1]} is (the standard part of) an internal function, that is to say the ultralimit of a sequence {f_n: [N_n] \rightarrow [-1,1]} of finitary bounded functions, one can view the measurable function {F := {\bf E}( f | Z^0_{o(N)}([N]))} as a limit of the {f_n} that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function {F: [0,1] \rightarrow [-1,1]} is related to the discrete functions {f_n: [N_n] \rightarrow [-1,1]} by the formula

\displaystyle  \int_a^b F(x)\ dx = \hbox{st} \lim_{n \rightarrow p} \frac{1}{N_n} \sum_{a N_n \leq m \leq b N_n} f_n(m)

for all {0 \leq a < b \leq 1}, where {p} is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence {n_j} such that

\displaystyle  \int_a^b F(x)\ dx = \lim_{j \rightarrow \infty} \frac{1}{N_{n_j}} \sum_{a N_{n_j} \leq m \leq b N_{n_j}} f_n(m),

thus {F} is the asymptotic density function of the {f_n}. For instance, if {f_n} is the indicator function of a randomly chosen subset of {[N_n]}, then the asymptotic density function would equal {1/2} (almost everywhere, at least).

I’m continuing to look into understanding the ergodic theory of {o(N)} actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter {N}, rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.

Hans Lindblad and I have just uploaded to the arXiv our joint paper “Asymptotic decay for a one-dimensional nonlinear wave equation“, submitted to Analysis & PDE.  This paper, to our knowledge, is the first paper to analyse the asymptotic behaviour of the one-dimensional defocusing nonlinear wave equation

{}-u_{tt}+u_{xx} = |u|^{p-1} u (1)

where u: {\bf R} \times {\bf R} \to {\bf R} is the solution and p>1 is a fixed exponent.  Nowadays, this type of equation is considered a very simple example of a non-linear wave equation (there is only one spatial dimension, the equation is semilinear, the conserved energy is positive definite and coercive, and there are no derivatives in the nonlinear term), and indeed it is not difficult to show that any solution whose conserved energy

E[u] := \int_{{\bf R}} \frac{1}{2} |u_t|^2 + \frac{1}{2} |u_x|^2 + \frac{1}{p+1} |u|^{p+1}\ dx

is finite, will exist globally for all time (and remain finite energy, of course).  In particular, from the one-dimensional Gagliardo-Nirenberg inequality (a variant of the Sobolev embedding theorem), such solutions will remain uniformly bounded in L^\infty_x({\bf R}) for all time.

However, this leaves open the question of the asymptotic behaviour of such solutions in the limit as t \to \infty.  In higher dimensions, there are a variety of scattering and asymptotic completeness results which show that solutions to nonlinear wave equations such as (1) decay asymptotically in various senses, at least if one is in the perturbative regime in which the solution is assumed small in some sense (e.g. small energy).  For instance, a typical result might be that spatial norms such as \|u(t)\|_{L^q({\bf R})} might go to zero (in an average sense, at least).   In general, such results for nonlinear wave equations are ultimately based on the fact that the linear wave equation in higher dimensions also enjoys an analogous decay as t \to +\infty, as linear waves in higher dimensions spread out and disperse over time.  (This can be formalised by decay estimates on the fundamental solution of the linear wave equation, or by basic estimates such as the (long-time) Strichartz estimates and their relatives.)  The idea is then to view the nonlinear wave equation as a perturbation of the linear one.

On the other hand, the solution to the linear one-dimensional wave equation

-u_{tt} + u_{xx} = 0 (2)

does not exhibit any decay in time; as one learns in an undergraduate PDE class, the general (finite energy) solution to such an equation is given by the superposition of two travelling waves,

u(t,x) = f(x+t) + g(x-t) (3)

where f and g also have finite energy, so in particular norms such as \|u(t)\|_{L^\infty_x({\bf R})} cannot decay to zero as t \to \infty unless the solution is completely trivial.

Nevertheless, we were able to establish a nonlinear decay effect for equation (1), caused more by the nonlinear right-hand side of (1) than by the linear left-hand side, to obtain L^\infty_x({\bf R}) decay on the average:

Theorem 1. (Average L^\infty_x decay) If u is a finite energy solution to (1), then \frac{1}{2T} \int_{-T}^T \|u(t)\|_{L^\infty_x({\bf R})} tends to zero as T \to \infty.

Actually we prove a slightly stronger statement than Theorem 1, in that the decay is uniform among all solutions with a given energy bound, but I will stick to the above formulation of the main result for simplicity.

Informally, the reason for the nonlinear decay is as follows.  The linear evolution tries to force waves to move at constant velocity (indeed, from (3) we see that linear waves move at the speed of light c=1).  But the defocusing nature of the nonlinearity will spread out any wave that is propagating along a constant velocity worldline.  This intuition can be formalised by a Morawetz-type energy estimate that shows that the nonlinear potential energy must decay along any rectangular slab of spacetime (that represents the neighbourhood of a constant velocity worldline).

Now, just because the linear wave equation propagates along constant velocity worldlines, this does not mean that the nonlinear wave equation does too; one could imagine that a wave packet could propagate along a more complicated trajectory t \mapsto x(t) in which the velocity x'(t) is not constant.  However, energy methods still force the solution of the nonlinear wave equation to obey finite speed of propagation, which in the wave packet context means (roughly speaking) that the nonlinear trajectory t \mapsto x(t) is a Lipschitz continuous function (with Lipschitz constant at most 1).

And now we deploy a trick which appears to be new to the field of nonlinear wave equations: we invoke the Rademacher differentiation theorem (or Lebesgue differentiation theorem), which asserts that Lipschitz continuous functions are almost everywhere differentiable.  (By coincidence, I am teaching this theorem in my current course, both in one dimension (which is the case of interest here) and in higher dimensions.)  A compactness argument allows one to extract a quantitative estimate from this theorem (cf. this earlier blog post of mine) which, roughly speaking, tells us that there are large portions of the trajectory t \mapsto x(t) which behave approximately linearly at an appropriate scale.  This turns out to be a good enough control on the trajectory that one can apply the Morawetz inequality and rule out the existence of persistent wave packets over long periods of time, which is what leads to Theorem 1.

There is still scope for further work to be done on the asymptotics.  In particular, we still do not have a good understanding of what the asymptotic profile of the solution should be, even in the perturbative regime; standard nonlinear geometric optics methods do not appear to work very well due to the extremely weak decay.

Let {[a,b]} be a compact interval of positive length (thus {-\infty < a < b < +\infty}). Recall that a function {F: [a,b] \rightarrow {\bf R}} is said to be differentiable at a point {x \in [a,b]} if the limit

\displaystyle F'(x) := \lim_{y \rightarrow x; y \in [a,b] \backslash \{x\}} \frac{F(y)-F(x)}{y-x} \ \ \ \ \ (1)

exists. In that case, we call {F'(x)} the strong derivative, classical derivative, or just derivative for short, of {F} at {x}. We say that {F} is everywhere differentiable, or differentiable for short, if it is differentiable at all points {x \in [a,b]}, and differentiable almost everywhere if it is differentiable at almost every point {x \in [a,b]}. If {F} is differentiable everywhere and its derivative {F'} is continuous, then we say that {F} is continuously differentiable.

Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.

Exercise 2 If {F: [a,b] \rightarrow {\bf R}} is everywhere differentiable, show that {F} is continuous and {F'} is measurable. If {F} is almost everywhere differentiable, show that the (almost everywhere defined) function {F'} is measurable (i.e. it is equal to an everywhere defined measurable function on {[a,b]} outside of a null set), but give an example to demonstrate that {F} need not be continuous.

Exercise 3 Give an example of a function {F: [a,b] \rightarrow {\bf R}} which is everywhere differentiable, but not continuously differentiable. (Hint: choose an {F} that vanishes quickly at some point, say at the origin {0}, but which also oscillates rapidly near that point.)

In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.

Theorem 4 (Rolle’s theorem) Let {[a,b]} be a compact interval of positive length, and let {F: [a,b] \rightarrow {\bf R}} be a differentiable function such that {F(a)=F(b)}. Then there exists {x \in (a,b)} such that {F'(x)=0}.

Proof: By subtracting a constant from {F} (which does not affect differentiability or the derivative) we may assume that {F(a)=F(b)=0}. If {F} is identically zero then the claim is trivial, so assume that {F} is non-zero somewhere. By replacing {F} with {-F} if necessary, we may assume that {F} is positive somewhere, thus {\sup_{x \in [a,b]} F(x) > 0}. On the other hand, as {F} is continuous and {[a,b]} is compact, {F} must attain its maximum somewhere, thus there exists {x \in [a,b]} such that {F(x) \geq F(y)} for all {y \in [a,b]}. Then {F(x)} must be positive and so {x} cannot equal either {a} or {b}, and thus must lie in the interior. From the right limit of (1) we see that {F'(x) \leq 0}, while from the left limit we have {F'(x) \geq 0}. Thus {F'(x)=0} and the claim follows. \Box

Remark 5 Observe that the same proof also works if {F} is only differentiable in the interior {(a,b)} of the interval {[a,b]}, so long as it is continuous all the way up to the boundary of {[a,b]}.

Exercise 6 Give an example to show that Rolle’s theorem can fail if {f} is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that {f} is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.

Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when {F} is real-valued, as it relies heavily on the least upper bound property for the domain {{\bf R}}. If, for instance, we consider complex-valued scalar functions {F: [a,b] \rightarrow {\bf C}}, then the theorem can fail; for instance, the function {F: [0,1] \rightarrow {\bf C}} defined by {F(x) := e^{2\pi i x} - 1} vanishes at both endpoints and is differentiable, but its derivative {F'(x) = 2\pi i e^{2\pi i x}} is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative {F'} both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as {{\bf R}^n}.

One can easily amplify Rolle’s theorem to the mean value theorem:

Corollary 8 (Mean value theorem) Let {[a,b]} be a compact interval of positive length, and let {F: [a,b] \rightarrow {\bf R}} be a differentiable function. Then there exists {x \in (a,b)} such that {F'(x)=\frac{F(b)-F(a)}{b-a}}.

Proof: Apply Rolle’s theorem to the function {x \mapsto F(x) - \frac{F(b)-F(a)}{b-a} (x-a)}. \Box

Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.

Exercise 10 (Uniqueness of antiderivatives up to constants) Let {[a,b]} be a compact interval of positive length, and let {F: [a,b] \rightarrow {\bf R}} and {G: [a,b] \rightarrow {\bf R}} be differentiable functions. Show that {F'(x)=G'(x)} for every {x \in [a,b]} if and only if {F(x)=G(x)+C} for some constant {C \in {\bf R}} and all {x \in [a,b]}.

We can use the mean value theorem to deduce one of the fundamental theorems of calculus:

Theorem 11 (Second fundamental theorem of calculus) Let {F: [a,b] \rightarrow {\bf R}} be a differentiable function, such that {F'} is Riemann integrable. Then the Riemann integral {\int_a^b F'(x)\ dx} of {F'} is equal to {F(b) - F(a)}. In particular, we have {\int_a^b F'(x)\ dx = F(b)-F(a)} whenever {F} is continuously differentiable.

Proof: Let {\varepsilon > 0}. By the definition of Riemann integrability, there exists a finite partition {a = t_0 < t_1 < \ldots < t_k = b} such that

\displaystyle |\sum_{j=1}^k F'(t^*_j) (t_j - t_{j-1}) - \int_a^b F'(x)| \leq \varepsilon

for every choice of {t^*_j \in [t_{j-1},t_j]}.

Fix this partition. From the mean value theorem, for each {1 \leq j \leq k} one can find {t^*_j \in [t_{j-1},t_j]} such that

\displaystyle F'(t^*_j) (t_j - t_{j-1}) = F(t_j) - F(t_{j-1})

and thus by telescoping series

\displaystyle |(F(b)-F(a)) - \int_a^b F'(x)| \leq \varepsilon.

Since {\varepsilon > 0} was arbitrary, the claim follows. \Box

Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.

Of course, we also have the other half of the fundamental theorem of calculus:

Theorem 13 (First fundamental theorem of calculus) Let {[a,b]} be a compact interval of positive length. Let {f: [a,b] \rightarrow {\bf C}} be a continuous function, and let {F: [a,b] \rightarrow {\bf C}} be the indefinite integral {F(x) := \int_a^x f(t)\ dt}. Then {F} is differentiable on {[a,b]}, with derivative {F'(x) = f(x)} for all {x \in [a,b]}. In particular, {F} is continuously differentiable.

Proof: It suffices to show that

\displaystyle \lim_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h} = f(x)

for all {x \in [a,b)}, and

\displaystyle \lim_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h} = f(x)

for all {x \in (a,b]}. After a change of variables, we can write

\displaystyle \frac{F(x+h)-F(x)}{h} = \int_0^1 f(x+ht)\ dt

for any {x \in [a,b)} and any sufficiently small {h>0}, or any {x \in (a,b]} and any sufficiently small {h<0}. As {f} is continuous, the function {t \mapsto f(x+ht)} converges uniformly to {f(x)} on {[0,1]} as {h \rightarrow 0} (keeping {x} fixed). As the interval {[0,1]} is bounded, {\int_0^1 f(x+ht)\ dt} thus converges to {\int_0^1 f(x)\ dt = f(x)}, and the claim follows. \Box

Corollary 14 (Differentiation theorem for continuous functions) Let {f: [a,b] \rightarrow {\bf C}} be a continuous function on a compact interval. Then we have

\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt = f(x)

for all {x \in [a,b)},

\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x-h,x]} f(t)\ dt = f(x)

for all {x \in (a,b]}, and thus

\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{2h} \int_{[x-h,x+h]} f(t)\ dt = f(x)

for all {x \in (a,b)}.

In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions {F, F', f} are relaxed. Among the results proven in these notes are

  • The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every {x} if {f} is merely absolutely integrable, rather than continuous;
  • A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
  • The second fundamental theorem of calculus for absolutely continuous functions.

The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »

This post is a sequel of sorts to my earlier post on hard and soft analysis, and the finite convergence principle. Here, I want to discuss a well-known theorem in infinitary soft analysis – the Lebesgue differentiation theorem – and whether there is any meaningful finitary version of this result. Along the way, it turns out that we will uncover a simple analogue of the Szemerédi regularity lemma, for subsets of the interval rather than for graphs. (Actually, regularity lemmas seem to appear in just about any context in which fine-scaled objects can be approximated by coarse-scaled ones.) The connection between regularity lemmas and results such as the Lebesgue differentiation theorem was recently highlighted by Elek and Szegedy, while the connection between the finite convergence principle and results such as the pointwise ergodic theorem (which is a close cousin of the Lebesgue differentiation theorem) was recently detailed by Avigad, Gerhardy, and Towsner.

The Lebesgue differentiation theorem has many formulations, but we will avoid the strongest versions and just stick to the following model case for simplicity:

Lebesgue differentiation theorem. If f: [0,1] \to [0,1] is Lebesgue measurable, then for almost every x \in [0,1] we have f(x) = \lim_{r \to 0} \frac{1}{r} \int_x^{x+r} f(y)\ dy. Equivalently, the fundamental theorem of calculus f(x) = \frac{d}{dy} \int_0^y f(z) dz|_{y=x} is true for almost every x in [0,1].

Here we use the oriented definite integral, thus \int_x^y = - \int_y^x. Specialising to the case where f = 1_A is an indicator function, we obtain the Lebesgue density theorem as a corollary:

Lebesgue density theorem. Let A \subset [0,1] be Lebesgue measurable. Then for almost every x \in A, we have \frac{|A \cap [x-r,x+r]|}{2r} \to 1 as r \to 0^+, where |A| denotes the Lebesgue measure of A.

In other words, almost all the points x of A are points of density of A, which roughly speaking means that as one passes to finer and finer scales, the immediate vicinity of x becomes increasingly saturated with A. (Points of density are like robust versions of interior points, thus the Lebesgue density theorem is an assertion that measurable sets are almost like open sets. This is Littlewood’s first principle.) One can also deduce the Lebesgue differentiation theorem back from the Lebesgue density theorem by approximating f by a finite linear combination of indicator functions; we leave this as an exercise.

Read the rest of this entry »

Archives