You are currently browsing the tag archive for the ‘Lebesgue differentiation theorem’ tag.
There are a number of ways to construct the real numbers , for instance
- as the metric completion of
(thus,
is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
- as the space of Dedekind cuts on the rationals
;
- as the space of quasimorphisms
on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)
There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number , one can define two external additive subgroups of the nonstandard integers
:
- The group
of all nonstandard integers of magnitude less than or comparable to
; and
- The group
of nonstandard integers of magnitude infinitesimally smaller than
.
The group is a subgroup of
, so we may form the quotient group
. This space is isomorphic to the reals
, and can in fact be used to construct the reals:
Proposition 1 For any coset
of
, there is a unique real number
with the property that
. The map
is then an isomorphism between the additive groups
and
.
Proof: Uniqueness is clear. For existence, observe that the set is a Dedekind cut, and its supremum can be verified to have the required properties for
.
In a similar vein, we can view the unit interval in the reals as the quotient
where is the nonstandard (i.e. internal) set
; of course,
is not a group, so one should interpret
as the image of
under the quotient map
(or
, if one prefers). Or to put it another way, (1) asserts that
is the image of
with respect to the map
.
In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on . Given an internal subset
of
, we may define the elementary measure
of
by the formula
This is a finitely additive probability measure on the Boolean algebra of internal subsets of . We can then construct the Loeb outer measure
of any subset
in complete analogy with Lebesgue outer measure by the formula
where ranges over all sequences of internal subsets of
that cover
. We say that a subset
of
is Loeb measurable if, for any (standard)
, one can find an internal subset
of
which differs from
by a set of Loeb outer measure at most
, and in that case we define the Loeb measure
of
to be
. It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space
of Loeb measurable sets is a
-algebra, and that
is a countably additive probability measure on this space that extends the elementary measure
. Thus
now has the structure of a probability space
.
Now, the group acts (Loeb-almost everywhere) on the probability space
by the addition map, thus
for
and
(excluding a set of Loeb measure zero where
exits
). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor
, defined by restricting attention to those Loeb measurable sets
with the property that
is equal
-almost everywhere to
for each
.
The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval with Lebesgue measure
(and the trivial action of
), by the same factor map
used in (1). More precisely:
Theorem 2 Given a set
, there exists a Lebesgue measurable set
, unique up to
-a.e. equivalence, such that
is
-a.e. equivalent to the set
. Conversely, if
is Lebesgue measurable, then
is in
, and
.
More informally, we have the measure-theoretic version
of (1).
Proof: We first prove the converse. It is clear that is
-invariant, so it suffices to show that
is Loeb measurable with Loeb measure
. This is easily verified when
is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of
is bounded by the Lebesgue outer measure of
for any set
; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.
Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let . Let
be an arbitrary standard real number, then we can find an internal set
which differs from
by a set of Loeb measure at most
. As
is
-invariant, we conclude that for every
,
and
differ by a set of Loeb measure (and hence elementary measure) at most
. By the (contrapositive of the) underspill principle, there must exist a standard
such that
and
differ by a set of elementary measure at most
for all
. If we then define the nonstandard function
by the formula
then from the (nonstandard) triangle inequality we have
(say). On the other hand, has the Lipschitz continuity property
and so in particular we see that
for some Lipschitz continuous function . If we then let
be the set where
, one can check that
differs from
by a set of Loeb outer measure
, and hence
does so also. Sending
to zero, we see (from the converse claim) that
is a Cauchy sequence in
and thus converges in
for some Lebesgue measurable
. The sets
then converge in Loeb outer measure to
, giving the claim.
Thanks to the Lebesgue differentiation theorem, the conditional expectation of a bounded Loeb-measurable function
can be expressed (as a function on
, defined
-a.e.) as
By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts ,
of minimal
norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages
for
converge (as a net, rather than a sequence) in
to
.
If is (the standard part of) an internal function, that is to say the ultralimit of a sequence
of finitary bounded functions, one can view the measurable function
as a limit of the
that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function
is related to the discrete functions
by the formula
for all , where
is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence
such that
thus is the asymptotic density function of the
. For instance, if
is the indicator function of a randomly chosen subset of
, then the asymptotic density function would equal
(almost everywhere, at least).
I’m continuing to look into understanding the ergodic theory of actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter
, rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.
Hans Lindblad and I have just uploaded to the arXiv our joint paper “Asymptotic decay for a one-dimensional nonlinear wave equation“, submitted to Analysis & PDE. This paper, to our knowledge, is the first paper to analyse the asymptotic behaviour of the one-dimensional defocusing nonlinear wave equation
(1)
where is the solution and
is a fixed exponent. Nowadays, this type of equation is considered a very simple example of a non-linear wave equation (there is only one spatial dimension, the equation is semilinear, the conserved energy is positive definite and coercive, and there are no derivatives in the nonlinear term), and indeed it is not difficult to show that any solution whose conserved energy
is finite, will exist globally for all time (and remain finite energy, of course). In particular, from the one-dimensional Gagliardo-Nirenberg inequality (a variant of the Sobolev embedding theorem), such solutions will remain uniformly bounded in for all time.
However, this leaves open the question of the asymptotic behaviour of such solutions in the limit as . In higher dimensions, there are a variety of scattering and asymptotic completeness results which show that solutions to nonlinear wave equations such as (1) decay asymptotically in various senses, at least if one is in the perturbative regime in which the solution is assumed small in some sense (e.g. small energy). For instance, a typical result might be that spatial norms such as
might go to zero (in an average sense, at least). In general, such results for nonlinear wave equations are ultimately based on the fact that the linear wave equation in higher dimensions also enjoys an analogous decay as
, as linear waves in higher dimensions spread out and disperse over time. (This can be formalised by decay estimates on the fundamental solution of the linear wave equation, or by basic estimates such as the (long-time) Strichartz estimates and their relatives.) The idea is then to view the nonlinear wave equation as a perturbation of the linear one.
On the other hand, the solution to the linear one-dimensional wave equation
(2)
does not exhibit any decay in time; as one learns in an undergraduate PDE class, the general (finite energy) solution to such an equation is given by the superposition of two travelling waves,
(3)
where and
also have finite energy, so in particular norms such as
cannot decay to zero as
unless the solution is completely trivial.
Nevertheless, we were able to establish a nonlinear decay effect for equation (1), caused more by the nonlinear right-hand side of (1) than by the linear left-hand side, to obtain decay on the average:
Theorem 1. (Average
decay) If
is a finite energy solution to (1), then
tends to zero as
.
Actually we prove a slightly stronger statement than Theorem 1, in that the decay is uniform among all solutions with a given energy bound, but I will stick to the above formulation of the main result for simplicity.
Informally, the reason for the nonlinear decay is as follows. The linear evolution tries to force waves to move at constant velocity (indeed, from (3) we see that linear waves move at the speed of light ). But the defocusing nature of the nonlinearity will spread out any wave that is propagating along a constant velocity worldline. This intuition can be formalised by a Morawetz-type energy estimate that shows that the nonlinear potential energy must decay along any rectangular slab of spacetime (that represents the neighbourhood of a constant velocity worldline).
Now, just because the linear wave equation propagates along constant velocity worldlines, this does not mean that the nonlinear wave equation does too; one could imagine that a wave packet could propagate along a more complicated trajectory in which the velocity
is not constant. However, energy methods still force the solution of the nonlinear wave equation to obey finite speed of propagation, which in the wave packet context means (roughly speaking) that the nonlinear trajectory
is a Lipschitz continuous function (with Lipschitz constant at most
).
And now we deploy a trick which appears to be new to the field of nonlinear wave equations: we invoke the Rademacher differentiation theorem (or Lebesgue differentiation theorem), which asserts that Lipschitz continuous functions are almost everywhere differentiable. (By coincidence, I am teaching this theorem in my current course, both in one dimension (which is the case of interest here) and in higher dimensions.) A compactness argument allows one to extract a quantitative estimate from this theorem (cf. this earlier blog post of mine) which, roughly speaking, tells us that there are large portions of the trajectory which behave approximately linearly at an appropriate scale. This turns out to be a good enough control on the trajectory that one can apply the Morawetz inequality and rule out the existence of persistent wave packets over long periods of time, which is what leads to Theorem 1.
There is still scope for further work to be done on the asymptotics. In particular, we still do not have a good understanding of what the asymptotic profile of the solution should be, even in the perturbative regime; standard nonlinear geometric optics methods do not appear to work very well due to the extremely weak decay.
Let be a compact interval of positive length (thus
). Recall that a function
is said to be differentiable at a point
if the limit
exists. In that case, we call the strong derivative, classical derivative, or just derivative for short, of
at
. We say that
is everywhere differentiable, or differentiable for short, if it is differentiable at all points
, and differentiable almost everywhere if it is differentiable at almost every point
. If
is differentiable everywhere and its derivative
is continuous, then we say that
is continuously differentiable.
Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.
Exercise 2 If
is everywhere differentiable, show that
is continuous and
is measurable. If
is almost everywhere differentiable, show that the (almost everywhere defined) function
is measurable (i.e. it is equal to an everywhere defined measurable function on
outside of a null set), but give an example to demonstrate that
need not be continuous.
Exercise 3 Give an example of a function
which is everywhere differentiable, but not continuously differentiable. (Hint: choose an
that vanishes quickly at some point, say at the origin
, but which also oscillates rapidly near that point.)
In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.
Theorem 4 (Rolle’s theorem) Let
be a compact interval of positive length, and let
be a differentiable function such that
. Then there exists
such that
.
Proof: By subtracting a constant from (which does not affect differentiability or the derivative) we may assume that
. If
is identically zero then the claim is trivial, so assume that
is non-zero somewhere. By replacing
with
if necessary, we may assume that
is positive somewhere, thus
. On the other hand, as
is continuous and
is compact,
must attain its maximum somewhere, thus there exists
such that
for all
. Then
must be positive and so
cannot equal either
or
, and thus must lie in the interior. From the right limit of (1) we see that
, while from the left limit we have
. Thus
and the claim follows.
Remark 5 Observe that the same proof also works if
is only differentiable in the interior
of the interval
, so long as it is continuous all the way up to the boundary of
.
Exercise 6 Give an example to show that Rolle’s theorem can fail if
is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that
is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.
Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when
is real-valued, as it relies heavily on the least upper bound property for the domain
. If, for instance, we consider complex-valued scalar functions
, then the theorem can fail; for instance, the function
defined by
vanishes at both endpoints and is differentiable, but its derivative
is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative
both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as
.
One can easily amplify Rolle’s theorem to the mean value theorem:
Corollary 8 (Mean value theorem) Let
be a compact interval of positive length, and let
be a differentiable function. Then there exists
such that
.
Proof: Apply Rolle’s theorem to the function .
Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.
Exercise 10 (Uniqueness of antiderivatives up to constants) Let
be a compact interval of positive length, and let
and
be differentiable functions. Show that
for every
if and only if
for some constant
and all
.
We can use the mean value theorem to deduce one of the fundamental theorems of calculus:
Theorem 11 (Second fundamental theorem of calculus) Let
be a differentiable function, such that
is Riemann integrable. Then the Riemann integral
of
is equal to
. In particular, we have
whenever
is continuously differentiable.
Proof: Let . By the definition of Riemann integrability, there exists a finite partition
such that
for every choice of .
Fix this partition. From the mean value theorem, for each one can find
such that
and thus by telescoping series
Since was arbitrary, the claim follows.
Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.
Of course, we also have the other half of the fundamental theorem of calculus:
Theorem 13 (First fundamental theorem of calculus) Let
be a compact interval of positive length. Let
be a continuous function, and let
be the indefinite integral
. Then
is differentiable on
, with derivative
for all
. In particular,
is continuously differentiable.
Proof: It suffices to show that
for all , and
for all . After a change of variables, we can write
for any and any sufficiently small
, or any
and any sufficiently small
. As
is continuous, the function
converges uniformly to
on
as
(keeping
fixed). As the interval
is bounded,
thus converges to
, and the claim follows.
Corollary 14 (Differentiation theorem for continuous functions) Let
be a continuous function on a compact interval. Then we have
for all
,
for all
, and thus
for all
.
In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions are relaxed. Among the results proven in these notes are
- The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every
if
is merely absolutely integrable, rather than continuous;
- A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
- The second fundamental theorem of calculus for absolutely continuous functions.
The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »
This post is a sequel of sorts to my earlier post on hard and soft analysis, and the finite convergence principle. Here, I want to discuss a well-known theorem in infinitary soft analysis – the Lebesgue differentiation theorem – and whether there is any meaningful finitary version of this result. Along the way, it turns out that we will uncover a simple analogue of the Szemerédi regularity lemma, for subsets of the interval rather than for graphs. (Actually, regularity lemmas seem to appear in just about any context in which fine-scaled objects can be approximated by coarse-scaled ones.) The connection between regularity lemmas and results such as the Lebesgue differentiation theorem was recently highlighted by Elek and Szegedy, while the connection between the finite convergence principle and results such as the pointwise ergodic theorem (which is a close cousin of the Lebesgue differentiation theorem) was recently detailed by Avigad, Gerhardy, and Towsner.
The Lebesgue differentiation theorem has many formulations, but we will avoid the strongest versions and just stick to the following model case for simplicity:
Lebesgue differentiation theorem. If
is Lebesgue measurable, then for almost every
we have
. Equivalently, the fundamental theorem of calculus
is true for almost every x in [0,1].
Here we use the oriented definite integral, thus . Specialising to the case where
is an indicator function, we obtain the Lebesgue density theorem as a corollary:
Lebesgue density theorem. Let
be Lebesgue measurable. Then for almost every
, we have
as
, where |A| denotes the Lebesgue measure of A.
In other words, almost all the points x of A are points of density of A, which roughly speaking means that as one passes to finer and finer scales, the immediate vicinity of x becomes increasingly saturated with A. (Points of density are like robust versions of interior points, thus the Lebesgue density theorem is an assertion that measurable sets are almost like open sets. This is Littlewood’s first principle.) One can also deduce the Lebesgue differentiation theorem back from the Lebesgue density theorem by approximating f by a finite linear combination of indicator functions; we leave this as an exercise.
Recent Comments