You are currently browsing the tag archive for the ‘Lebesgue differentiation theorem’ tag.
There are a number of ways to construct the real numbers , for instance
- as the metric completion of (thus, is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
- as the space of Dedekind cuts on the rationals ;
- as the space of quasimorphisms on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)
There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number , one can define two external additive subgroups of the nonstandard integers :
- The group of all nonstandard integers of magnitude less than or comparable to ; and
- The group of nonstandard integers of magnitude infinitesimally smaller than .
The group is a subgroup of , so we may form the quotient group . This space is isomorphic to the reals , and can in fact be used to construct the reals:
Proposition 1 For any coset of , there is a unique real number with the property that . The map is then an isomorphism between the additive groups and .
Proof: Uniqueness is clear. For existence, observe that the set is a Dedekind cut, and its supremum can be verified to have the required properties for .
where is the nonstandard (i.e. internal) set ; of course, is not a group, so one should interpret as the image of under the quotient map (or , if one prefers). Or to put it another way, (1) asserts that is the image of with respect to the map .
In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on . Given an internal subset of , we may define the elementary measure of by the formula
This is a finitely additive probability measure on the Boolean algebra of internal subsets of . We can then construct the Loeb outer measure of any subset in complete analogy with Lebesgue outer measure by the formula
where ranges over all sequences of internal subsets of that cover . We say that a subset of is Loeb measurable if, for any (standard) , one can find an internal subset of which differs from by a set of Loeb outer measure at most , and in that case we define the Loeb measure of to be . It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space of Loeb measurable sets is a -algebra, and that is a countably additive probability measure on this space that extends the elementary measure . Thus now has the structure of a probability space .
Now, the group acts (Loeb-almost everywhere) on the probability space by the addition map, thus for and (excluding a set of Loeb measure zero where exits ). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor , defined by restricting attention to those Loeb measurable sets with the property that is equal -almost everywhere to for each .
The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval with Lebesgue measure (and the trivial action of ), by the same factor map used in (1). More precisely:
Theorem 2 Given a set , there exists a Lebesgue measurable set , unique up to -a.e. equivalence, such that is -a.e. equivalent to the set . Conversely, if is Lebesgue measurable, then is in , and .
More informally, we have the measure-theoretic version
Proof: We first prove the converse. It is clear that is -invariant, so it suffices to show that is Loeb measurable with Loeb measure . This is easily verified when is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of is bounded by the Lebesgue outer measure of for any set ; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.
Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let . Let be an arbitrary standard real number, then we can find an internal set which differs from by a set of Loeb measure at most . As is -invariant, we conclude that for every , and differ by a set of Loeb measure (and hence elementary measure) at most . By the (contrapositive of the) underspill principle, there must exist a standard such that and differ by a set of elementary measure at most for all . If we then define the nonstandard function by the formula
then from the (nonstandard) triangle inequality we have
(say). On the other hand, has the Lipschitz continuity property
and so in particular we see that
for some Lipschitz continuous function . If we then let be the set where , one can check that differs from by a set of Loeb outer measure , and hence does so also. Sending to zero, we see (from the converse claim) that is a Cauchy sequence in and thus converges in for some Lebesgue measurable . The sets then converge in Loeb outer measure to , giving the claim.
Thanks to the Lebesgue differentiation theorem, the conditional expectation of a bounded Loeb-measurable function can be expressed (as a function on , defined -a.e.) as
By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts , of minimal norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages for converge (as a net, rather than a sequence) in to .
If is (the standard part of) an internal function, that is to say the ultralimit of a sequence of finitary bounded functions, one can view the measurable function as a limit of the that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function is related to the discrete functions by the formula
for all , where is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence such that
thus is the asymptotic density function of the . For instance, if is the indicator function of a randomly chosen subset of , then the asymptotic density function would equal (almost everywhere, at least).
I’m continuing to look into understanding the ergodic theory of actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter , rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.
Hans Lindblad and I have just uploaded to the arXiv our joint paper “Asymptotic decay for a one-dimensional nonlinear wave equation“, submitted to Analysis & PDE. This paper, to our knowledge, is the first paper to analyse the asymptotic behaviour of the one-dimensional defocusing nonlinear wave equation
where is the solution and is a fixed exponent. Nowadays, this type of equation is considered a very simple example of a non-linear wave equation (there is only one spatial dimension, the equation is semilinear, the conserved energy is positive definite and coercive, and there are no derivatives in the nonlinear term), and indeed it is not difficult to show that any solution whose conserved energy
is finite, will exist globally for all time (and remain finite energy, of course). In particular, from the one-dimensional Gagliardo-Nirenberg inequality (a variant of the Sobolev embedding theorem), such solutions will remain uniformly bounded in for all time.
However, this leaves open the question of the asymptotic behaviour of such solutions in the limit as . In higher dimensions, there are a variety of scattering and asymptotic completeness results which show that solutions to nonlinear wave equations such as (1) decay asymptotically in various senses, at least if one is in the perturbative regime in which the solution is assumed small in some sense (e.g. small energy). For instance, a typical result might be that spatial norms such as might go to zero (in an average sense, at least). In general, such results for nonlinear wave equations are ultimately based on the fact that the linear wave equation in higher dimensions also enjoys an analogous decay as , as linear waves in higher dimensions spread out and disperse over time. (This can be formalised by decay estimates on the fundamental solution of the linear wave equation, or by basic estimates such as the (long-time) Strichartz estimates and their relatives.) The idea is then to view the nonlinear wave equation as a perturbation of the linear one.
On the other hand, the solution to the linear one-dimensional wave equation
does not exhibit any decay in time; as one learns in an undergraduate PDE class, the general (finite energy) solution to such an equation is given by the superposition of two travelling waves,
where and also have finite energy, so in particular norms such as cannot decay to zero as unless the solution is completely trivial.
Nevertheless, we were able to establish a nonlinear decay effect for equation (1), caused more by the nonlinear right-hand side of (1) than by the linear left-hand side, to obtain decay on the average:
Theorem 1. (Average decay) If is a finite energy solution to (1), then tends to zero as .
Actually we prove a slightly stronger statement than Theorem 1, in that the decay is uniform among all solutions with a given energy bound, but I will stick to the above formulation of the main result for simplicity.
Informally, the reason for the nonlinear decay is as follows. The linear evolution tries to force waves to move at constant velocity (indeed, from (3) we see that linear waves move at the speed of light ). But the defocusing nature of the nonlinearity will spread out any wave that is propagating along a constant velocity worldline. This intuition can be formalised by a Morawetz-type energy estimate that shows that the nonlinear potential energy must decay along any rectangular slab of spacetime (that represents the neighbourhood of a constant velocity worldline).
Now, just because the linear wave equation propagates along constant velocity worldlines, this does not mean that the nonlinear wave equation does too; one could imagine that a wave packet could propagate along a more complicated trajectory in which the velocity is not constant. However, energy methods still force the solution of the nonlinear wave equation to obey finite speed of propagation, which in the wave packet context means (roughly speaking) that the nonlinear trajectory is a Lipschitz continuous function (with Lipschitz constant at most ).
And now we deploy a trick which appears to be new to the field of nonlinear wave equations: we invoke the Rademacher differentiation theorem (or Lebesgue differentiation theorem), which asserts that Lipschitz continuous functions are almost everywhere differentiable. (By coincidence, I am teaching this theorem in my current course, both in one dimension (which is the case of interest here) and in higher dimensions.) A compactness argument allows one to extract a quantitative estimate from this theorem (cf. this earlier blog post of mine) which, roughly speaking, tells us that there are large portions of the trajectory which behave approximately linearly at an appropriate scale. This turns out to be a good enough control on the trajectory that one can apply the Morawetz inequality and rule out the existence of persistent wave packets over long periods of time, which is what leads to Theorem 1.
There is still scope for further work to be done on the asymptotics. In particular, we still do not have a good understanding of what the asymptotic profile of the solution should be, even in the perturbative regime; standard nonlinear geometric optics methods do not appear to work very well due to the extremely weak decay.
exists. In that case, we call the strong derivative, classical derivative, or just derivative for short, of at . We say that is everywhere differentiable, or differentiable for short, if it is differentiable at all points , and differentiable almost everywhere if it is differentiable at almost every point . If is differentiable everywhere and its derivative is continuous, then we say that is continuously differentiable.
Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.
Exercise 2 If is everywhere differentiable, show that is continuous and is measurable. If is almost everywhere differentiable, show that the (almost everywhere defined) function is measurable (i.e. it is equal to an everywhere defined measurable function on outside of a null set), but give an example to demonstrate that need not be continuous.
Exercise 3 Give an example of a function which is everywhere differentiable, but not continuously differentiable. (Hint: choose an that vanishes quickly at some point, say at the origin , but which also oscillates rapidly near that point.)
In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.
Theorem 4 (Rolle’s theorem) Let be a compact interval of positive length, and let be a differentiable function such that . Then there exists such that .
Proof: By subtracting a constant from (which does not affect differentiability or the derivative) we may assume that . If is identically zero then the claim is trivial, so assume that is non-zero somewhere. By replacing with if necessary, we may assume that is positive somewhere, thus . On the other hand, as is continuous and is compact, must attain its maximum somewhere, thus there exists such that for all . Then must be positive and so cannot equal either or , and thus must lie in the interior. From the right limit of (1) we see that , while from the left limit we have . Thus and the claim follows.
Remark 5 Observe that the same proof also works if is only differentiable in the interior of the interval , so long as it is continuous all the way up to the boundary of .
Exercise 6 Give an example to show that Rolle’s theorem can fail if is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.
Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when is real-valued, as it relies heavily on the least upper bound property for the domain . If, for instance, we consider complex-valued scalar functions , then the theorem can fail; for instance, the function defined by vanishes at both endpoints and is differentiable, but its derivative is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as .
One can easily amplify Rolle’s theorem to the mean value theorem:
Corollary 8 (Mean value theorem) Let be a compact interval of positive length, and let be a differentiable function. Then there exists such that .
Proof: Apply Rolle’s theorem to the function .
Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.
Exercise 10 (Uniqueness of antiderivatives up to constants) Let be a compact interval of positive length, and let and be differentiable functions. Show that for every if and only if for some constant and all .
We can use the mean value theorem to deduce one of the fundamental theorems of calculus:
Theorem 11 (Second fundamental theorem of calculus) Let be a differentiable function, such that is Riemann integrable. Then the Riemann integral of is equal to . In particular, we have whenever is continuously differentiable.
Proof: Let . By the definition of Riemann integrability, there exists a finite partition such that
for every choice of .
Fix this partition. From the mean value theorem, for each one can find such that
and thus by telescoping series
Since was arbitrary, the claim follows.
Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.
Of course, we also have the other half of the fundamental theorem of calculus:
Theorem 13 (First fundamental theorem of calculus) Let be a compact interval of positive length. Let be a continuous function, and let be the indefinite integral . Then is differentiable on , with derivative for all . In particular, is continuously differentiable.
Proof: It suffices to show that
for all , and
for all . After a change of variables, we can write
for any and any sufficiently small , or any and any sufficiently small . As is continuous, the function converges uniformly to on as (keeping fixed). As the interval is bounded, thus converges to , and the claim follows.
for all ,
for all , and thus
for all .
In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions are relaxed. Among the results proven in these notes are
- The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every if is merely absolutely integrable, rather than continuous;
- A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
- The second fundamental theorem of calculus for absolutely continuous functions.
The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »
This post is a sequel of sorts to my earlier post on hard and soft analysis, and the finite convergence principle. Here, I want to discuss a well-known theorem in infinitary soft analysis – the Lebesgue differentiation theorem – and whether there is any meaningful finitary version of this result. Along the way, it turns out that we will uncover a simple analogue of the Szemerédi regularity lemma, for subsets of the interval rather than for graphs. (Actually, regularity lemmas seem to appear in just about any context in which fine-scaled objects can be approximated by coarse-scaled ones.) The connection between regularity lemmas and results such as the Lebesgue differentiation theorem was recently highlighted by Elek and Szegedy, while the connection between the finite convergence principle and results such as the pointwise ergodic theorem (which is a close cousin of the Lebesgue differentiation theorem) was recently detailed by Avigad, Gerhardy, and Towsner.
The Lebesgue differentiation theorem has many formulations, but we will avoid the strongest versions and just stick to the following model case for simplicity:
Lebesgue density theorem. Let be Lebesgue measurable. Then for almost every , we have as , where |A| denotes the Lebesgue measure of A.
In other words, almost all the points x of A are points of density of A, which roughly speaking means that as one passes to finer and finer scales, the immediate vicinity of x becomes increasingly saturated with A. (Points of density are like robust versions of interior points, thus the Lebesgue density theorem is an assertion that measurable sets are almost like open sets. This is Littlewood’s first principle.) One can also deduce the Lebesgue differentiation theorem back from the Lebesgue density theorem by approximating f by a finite linear combination of indicator functions; we leave this as an exercise.