245A, Notes 5: Differentiation theorems

16 October, 2010 in 245A - Real analysis, math.CA | Tags: absolute continuity, bounded variation, Hardy-Littlewood maximal inequality, Lebesgue differentiation theorem, rising sun lemma, total variation | by Terence Tao

Let ${[a,b]}$ be a compact interval of positive length (thus ${-\infty < a < b < +\infty}$ ). Recall that a function ${F: [a,b] \rightarrow {\bf R}}$ is said to be differentiable at a point ${x \in [a,b]}$ if the limit

$\displaystyle F'(x) := \lim_{y \rightarrow x; y \in [a,b] \backslash \{x\}} \frac{F(y)-F(x)}{y-x} \ \ \ \ \ (1)$

exists. In that case, we call ${F'(x)}$ the strong derivative, classical derivative, or just derivative for short, of ${F}$ at ${x}$ . We say that ${F}$ is everywhere differentiable, or differentiable for short, if it is differentiable at all points ${x \in [a,b]}$ , and differentiable almost everywhere if it is differentiable at almost every point ${x \in [a,b]}$ . If ${F}$ is differentiable everywhere and its derivative ${F'}$ is continuous, then we say that ${F}$ is continuously differentiable.

Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.

Exercise 2 If ${F: [a,b] \rightarrow {\bf R}}$ is everywhere differentiable, show that ${F}$ is continuous and ${F'}$ is measurable. If ${F}$ is almost everywhere differentiable, show that the (almost everywhere defined) function ${F'}$ is measurable (i.e. it is equal to an everywhere defined measurable function on ${[a,b]}$ outside of a null set), but give an example to demonstrate that ${F}$ need not be continuous.

Exercise 3 Give an example of a function ${F: [a,b] \rightarrow {\bf R}}$ which is everywhere differentiable, but not continuously differentiable. (Hint: choose an ${F}$ that vanishes quickly at some point, say at the origin ${0}$ , but which also oscillates rapidly near that point.)

In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.

Theorem 4 (Rolle’s theorem) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function such that ${F(a)=F(b)}$ . Then there exists ${x \in (a,b)}$ such that ${F'(x)=0}$ .

Proof: By subtracting a constant from ${F}$ (which does not affect differentiability or the derivative) we may assume that ${F(a)=F(b)=0}$ . If ${F}$ is identically zero then the claim is trivial, so assume that ${F}$ is non-zero somewhere. By replacing ${F}$ with ${-F}$ if necessary, we may assume that ${F}$ is positive somewhere, thus ${\sup_{x \in [a,b]} F(x) > 0}$ . On the other hand, as ${F}$ is continuous and ${[a,b]}$ is compact, ${F}$ must attain its maximum somewhere, thus there exists ${x \in [a,b]}$ such that ${F(x) \geq F(y)}$ for all ${y \in [a,b]}$ . Then ${F(x)}$ must be positive and so ${x}$ cannot equal either ${a}$ or ${b}$ , and thus must lie in the interior. From the right limit of (1) we see that ${F'(x) \leq 0}$ , while from the left limit we have ${F'(x) \geq 0}$ . Thus ${F'(x)=0}$ and the claim follows. $\Box$

Remark 5 Observe that the same proof also works if ${F}$ is only differentiable in the interior ${(a,b)}$ of the interval ${[a,b]}$ , so long as it is continuous all the way up to the boundary of ${[a,b]}$ .

Exercise 6 Give an example to show that Rolle’s theorem can fail if ${f}$ is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that ${f}$ is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.

Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when ${F}$ is real-valued, as it relies heavily on the least upper bound property for the domain ${{\bf R}}$ . If, for instance, we consider complex-valued scalar functions ${F: [a,b] \rightarrow {\bf C}}$ , then the theorem can fail; for instance, the function ${F: [0,1] \rightarrow {\bf C}}$ defined by ${F(x) := e^{2\pi i x} - 1}$ vanishes at both endpoints and is differentiable, but its derivative ${F'(x) = 2\pi i e^{2\pi i x}}$ is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative ${F'}$ both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as ${{\bf R}^n}$ .

One can easily amplify Rolle’s theorem to the mean value theorem:

Corollary 8 (Mean value theorem) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function. Then there exists ${x \in (a,b)}$ such that ${F'(x)=\frac{F(b)-F(a)}{b-a}}$ .

Proof: Apply Rolle’s theorem to the function ${x \mapsto F(x) - \frac{F(b)-F(a)}{b-a} (x-a)}$ . $\Box$

Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.

Exercise 10 (Uniqueness of antiderivatives up to constants) Let ${[a,b]}$ be a compact interval of positive length, and let ${F: [a,b] \rightarrow {\bf R}}$ and ${G: [a,b] \rightarrow {\bf R}}$ be differentiable functions. Show that ${F'(x)=G'(x)}$ for every ${x \in [a,b]}$ if and only if ${F(x)=G(x)+C}$ for some constant ${C \in {\bf R}}$ and all ${x \in [a,b]}$ .

We can use the mean value theorem to deduce one of the fundamental theorems of calculus:

Theorem 11 (Second fundamental theorem of calculus) Let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function, such that ${F'}$ is Riemann integrable. Then the Riemann integral ${\int_a^b F'(x)\ dx}$ of ${F'}$ is equal to ${F(b) - F(a)}$ . In particular, we have ${\int_a^b F'(x)\ dx = F(b)-F(a)}$ whenever ${F}$ is continuously differentiable.

Proof: Let ${\varepsilon > 0}$ . By the definition of Riemann integrability, there exists a finite partition ${a = t_0 < t_1 < \ldots < t_k = b}$ such that

$\displaystyle |\sum_{j=1}^k F'(t^*_j) (t_j - t_{j-1}) - \int_a^b F'(x)| \leq \varepsilon$

for every choice of ${t^*_j \in [t_{j-1},t_j]}$ .
Fix this partition. From the mean value theorem, for each ${1 \leq j \leq k}$ one can find ${t^*_j \in [t_{j-1},t_j]}$ such that

$\displaystyle F'(t^*_j) (t_j - t_{j-1}) = F(t_j) - F(t_{j-1})$

and thus by telescoping series

$\displaystyle |(F(b)-F(a)) - \int_a^b F'(x)| \leq \varepsilon.$

Since ${\varepsilon > 0}$ was arbitrary, the claim follows. $\Box$

Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.

Of course, we also have the other half of the fundamental theorem of calculus:

Theorem 13 (First fundamental theorem of calculus) Let ${[a,b]}$ be a compact interval of positive length. Let ${f: [a,b] \rightarrow {\bf C}}$ be a continuous function, and let ${F: [a,b] \rightarrow {\bf C}}$ be the indefinite integral ${F(x) := \int_a^x f(t)\ dt}$ . Then ${F}$ is differentiable on ${[a,b]}$ , with derivative ${F'(x) = f(x)}$ for all ${x \in [a,b]}$ . In particular, ${F}$ is continuously differentiable.

Proof: It suffices to show that

$\displaystyle \lim_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h} = f(x)$

for all ${x \in [a,b)}$ , and

$\displaystyle \lim_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h} = f(x)$

for all ${x \in (a,b]}$ . After a change of variables, we can write

$\displaystyle \frac{F(x+h)-F(x)}{h} = \int_0^1 f(x+ht)\ dt$

for any ${x \in [a,b)}$ and any sufficiently small ${h>0}$ , or any ${x \in (a,b]}$ and any sufficiently small ${h<0}$ . As ${f}$ is continuous, the function ${t \mapsto f(x+ht)}$ converges uniformly to ${f(x)}$ on ${[0,1]}$ as ${h \rightarrow 0}$ (keeping ${x}$ fixed). As the interval ${[0,1]}$ is bounded, ${\int_0^1 f(x+ht)\ dt}$ thus converges to ${\int_0^1 f(x)\ dt = f(x)}$ , and the claim follows. $\Box$

Corollary 14 (Differentiation theorem for continuous functions) Let ${f: [a,b] \rightarrow {\bf C}}$ be a continuous function on a compact interval. Then we have

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt = f(x)$

for all ${x \in [a,b)}$ ,

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x-h,x]} f(t)\ dt = f(x)$

for all ${x \in (a,b]}$ , and thus

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{2h} \int_{[x-h,x+h]} f(t)\ dt = f(x)$

for all ${x \in (a,b)}$ .

In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions ${F, F', f}$ are relaxed. Among the results proven in these notes are

The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every ${x}$ if ${f}$ is merely absolutely integrable, rather than continuous;
A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
The second fundamental theorem of calculus for absolutely continuous functions.

The material here is loosely based on Chapter 3 of Stein-Shakarchi.

— 1. The Lebesgue differentiation theorem in one dimension —

The main objective of this section is to show

Theorem 15 (Lebesgue differentiation theorem, one-dimensional case) Let ${f: {\bf R} \rightarrow {\bf C}}$ be an absolutely integrable function, and let ${F: {\bf R} \rightarrow {\bf C}}$ be the definite integral ${F(x) := \int_{[-\infty,x]} f(t)\ dt}$ . Then ${F}$ is continuous and almost everywhere differentiable, and ${F'(x)= f(x)}$ for almost every ${x \in {\bf R}}$ .

This can be viewed as a variant of Corollary 14; the hypotheses are weaker because ${f}$ is only assumed to be absolutely integrable, rather than continuous (and can live on the entire real line, and not just on a compact interval); but the conclusion is weaker too, because ${F}$ is only found to be almost everywhere differentiable, rather than everywhere differentiable. (But such a relaxation of the conclusion is necessary at this level of generality; consider for instance the example when ${f = 1_{[0,1]}}$ .)
The continuity is an easy exercise:

Exercise 16 Let ${f: {\bf R} \rightarrow {\bf C}}$ be an absolutely integrable function, and let ${F: {\bf R} \rightarrow {\bf C}}$ be the definite integral ${F(x) := \int_{[-\infty,x]} f(t)\ dt}$ . Show that ${F}$ is continuous.

The main difficulty is to show that ${F'(x)=f(x)}$ for almost every ${x \in {\bf R}}$ . This will follow from

Theorem 17 (Lebesgue differentiation theorem, second formulation) Let ${f: {\bf R} \rightarrow {\bf C}}$ be an absolutely integrable function. Then

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt = f(x) \ \ \ \ \ (2)$

for almost every ${x \in {\bf R}}$ , and

$\displaystyle \lim_{h \rightarrow 0^+} \frac{1}{h} \int_{[x-h,x]} f(t)\ dt = f(x) \ \ \ \ \ (3)$

for almost every ${x \in {\bf R}}$ .

Exercise 18 Show that Theorem 15 follows from Theorem 17.

We will just prove the first fact (2); the second fact (3) is similar (or can be deduced from (2) by replacing ${f}$ with the reflected function ${x \mapsto f(-x)}$ .
We are taking ${f}$ to be complex valued, but it is clear from taking real and imaginary parts that it suffices to prove the claim when ${f}$ is real-valued, and we shall thus assume this for the rest of the argument.
The conclusion (2) we want to prove is a convergence theorem – an assertion that for all functions ${f}$ in a given class (in this case, the class of absolutely integrable functions ${f: {\bf R} \rightarrow {\bf R}}$ ), a certain sequence of linear expressions ${T_h f}$ (in this case, the right averages ${T_h f(x) = \frac{1}{h} \int_{[x,x+h]} f(t)\ dt}$ ) converge in some sense (in this case, pointwise almost everywhere) to a specified limit (in this case, ${f}$ ). There is a general and very useful argument to prove such convergence theorems, known as the density argument. This argument requires two ingredients, which we state informally as follows:

A verification of the convergence result for some “dense subclass” of “nice” functions ${f}$ , such as continuous functions, smooth functions, simple functions, etc.. By “dense”, we mean that a general function ${f}$ in the original class can be approximated to arbitrary accuracy in a suitable sense by a function in the nice subclass.
A quantitative estimate that upper bounds the maximal fluctuation of the linear expressions ${T_h f}$ in terms of the “size” of the function ${f}$ (where the precise definition of “size” depends on the nature of the approximation in the first ingredient).

Once one has these two ingredients, it is usually not too hard to put them together to obtain the desired convergence theorem for general functions ${f}$ (not just those in the dense subclass). We illustrate this with a simple example:

Proposition 19 (Translation is continuous in ${L^1}$ ) Let ${f: {\bf R}^d \rightarrow {\bf C}}$ be an absolutely integrable function, and for each ${h \in {\bf R}^d}$ , let ${f_h: {\bf R}^d \rightarrow {\bf C}}$ be the shifted function

$\displaystyle f_h(x) := f(x-h).$

Then ${f_h}$ converges in ${L^1}$ norm to ${f}$ as ${h \rightarrow 0}$ , thus

$\displaystyle \lim_{h \rightarrow 0} \int_{{\bf R}^d} |f_h(x) - f(x)|\ dx = 0.$

Proof: We first verify this claim for a dense subclass of ${f}$ , namely the functions ${f}$ which are continuous and compactly supported (i.e. they vanish outside of a compact set). Such functions are continuous, and thus ${f_h}$ converges uniformly to ${f}$ as ${h \rightarrow 0}$ . Furthermore, as ${f}$ is compactly supported, the support of ${f_h-f}$ stays uniformly bounded for ${h}$ in a bounded set. From this we see that ${f_h}$ also converges to ${f}$ in ${L^1}$ norm as required.
Next, we observe the quantitative estimate

$\displaystyle \int_{{\bf R}^d} |f_h(x) - f(x)|\ dx \leq 2 \int_{{\bf R}^d} |f(x)|\ dx \ \ \ \ \ (4)$

for any ${h \in {\bf R}^d}$ . This follows easily from the triangle inequality

$\displaystyle \int_{{\bf R}^d} |f_h(x) - f(x)|\ dx \leq \int_{{\bf R}^d} |f_h(x)|\ dx + \int_{{\bf R}^d} |f(x)|\ dx$

together with the translation invariance of the Lebesgue integral:

$\displaystyle \int_{{\bf R}^d} |f_h(x)|\ dx = \int_{{\bf R}^d} |f(x)|\ dx.$

Now we put the two ingredients together. Let ${f: {\bf R}^d \rightarrow {\bf C}}$ be absolutely integrable, and let ${\varepsilon > 0}$ be arbitrary. Applying Littlewood’s second principle (Theorem 15 from Notes 2) to the absolutely integrable function ${f}$ , we can find a continuous, compactly supported function ${g: {\bf R}^d \rightarrow {\bf C}}$ such that

$\displaystyle \int_{{\bf R}^d} |f(x)-g(x)|\ dx \leq \varepsilon.$

Applying (4), we conclude that

$\displaystyle \int_{{\bf R}^d} |(f-g)_h(x)-(f-g)(x)|\ dx \leq 2\varepsilon,$

which we rearrange as

$\displaystyle \int_{{\bf R}^d} |(f_h-f)(x)-(g_h-g)(x)|\ dx \leq 2\varepsilon.$

By the dense subclass result, we also know that

$\displaystyle \int_{{\bf R}^d} |g_h(x)-g(x)|\ dx \leq \varepsilon$

for all ${h}$ sufficiently close to zero. From the triangle inequality, we conclude that

$\displaystyle \int_{{\bf R}^d} |f_h(x)-f(x)|\ dx \leq 3\varepsilon$

for all ${h}$ sufficiently close to zero, and the claim follows. $\Box$

Remark 20 In the above application of the density argument, we proved the required quantitative estimate directly for all functions ${f}$ in the original class of functions. However, it is also possible to use the density argument a second time and initially verify the quantitative estimate just for functions ${f}$ in a nice subclass (e.g. continuous functions of compact support). In many cases, one can then extend that estimate to the general case by using tools such as Fatou’s lemma, which are particularly suited for showing that upper bound estimates are preserved with respect to limits.

Exercise 21 Let ${f: {\bf R}^d \rightarrow {\bf C}}$ , ${g: {\bf R}^d \rightarrow {\bf C}}$ be Lebesgue measurable functions such that ${f}$ is absolutely integrable and ${g}$ is essentially bounded (i.e. bounded outside of a null set). Show that the convolution ${f*g: {\bf R}^d \rightarrow {\bf C}}$ defined by the formula

$\displaystyle f*g(x) = \int_{{\bf R}^d} f(y) g(x-y)\ dy$

is well-defined (in the sense that the integrand on the right-hand side is absolutely integrable) and that ${f*g}$ is a bounded, continuous function.

The above exercise is illustrative of a more general intuition, which is that convolutions tend to be smoothing in nature; the convolution ${f*g}$ of two functions is usually at least as regular as, and often more regular than, either of the two factors ${f, g}$ .
This smoothing phenomenon gives rise to an important fact, namely the Steinhaus theorem:

Exercise 22 (Steinhaus theorem) Let ${E \subset {\bf R}^d}$ be a Lebesgue measurable set of positive measure. Show that the set ${E-E := \{ x-y: x, y \in E \}}$ contains an open neighbourhood of the origin. (Hint: reduce to the case when ${E}$ is bounded, and then apply the previous exercise to the convolution ${1_E * 1_{-E}}$ , where ${-E := \{ -y: y \in E \}}$ .)

Exercise 23 A homomorphism ${f: {\bf R}^d \rightarrow {\bf C}}$ is a map with the property that ${f(x+y)=f(x)+f(y)}$ for all ${x,y \in {\bf R}^d}$ .

Show that all measurable homomorphisms are continuous. (Hint: for any disk ${D}$ centered at the origin in the complex plane, show that ${f^{-1}(z+D)}$ has positive measure for at least one ${z \in {\bf C}}$ , and then use the Steinhaus theorem from the previous exercise.)

Show that ${f}$ is a measurable homomorphism if and only if it takes the form ${f(x_1,\ldots,x_d) = x_1 z_1 +\ldots + x_d z_d}$ for all ${x_1,\ldots,x_d \in {\bf R}}$ and some complex coefficients ${z_1,\ldots,z_d}$ . (Hint: first establish this for rational ${x_1,\ldots,x_d}$ , and then use the previous part of this exercise.)

(For readers familiar with Zorn’s lemma) Show that there exist homomorphisms ${f: {\bf R}^d \rightarrow {\bf C}}$ which are not of the form in the previous exercise. (Hint: view ${{\bf R}^d}$ (or ${{\bf C}}$ ) as a vector space over the rationals ${{\bf Q}}$ , and use the fact (from Zorn’s lemma) that every vector space – even an infinite-dimensional one – has at least one basis.) This gives an alternate construction of a non-measurable set to that given in previous notes.

Remark 24 One drawback with the density argument is it gives convergence results which are qualitative rather than quantitative – there is no explicit bound on the rate of convergence. For instance, in Proposition 19, we know that for any ${\varepsilon > 0}$ , there exists ${\delta > 0}$ such that ${\int_{{\bf R}^d} |f_h(x)-f(x)|\ dx \leq \varepsilon}$ whenever ${|h| \leq \delta}$ , but we do not know exactly how ${\delta}$ depends on ${\varepsilon}$ and ${f}$ . Actually, the proof does eventually give such a bound, but it depends on “how measurable” the function ${f}$ is, or more precisely how “easy” it is to approximate ${f}$ by a “nice” function. To illustrate this issue, let’s work in one dimension and consider the function ${f(x) := \sin(Nx) 1_{[0,2\pi]}(x)}$ , where ${N \geq 1}$ is a large integer. On the one hand, ${f}$ is bounded in the ${L^1}$ norm uniformly in ${N}$ : ${\int_{\bf R} |f(x)|\ dx \leq 2\pi}$ (indeed, the left-hand side is equal to ${4}$ ). On the other hand, it is not hard to see that ${\int_{\bf R} |f_{\pi/N}(x) - f(x)|\ dx \geq c}$ for some absolute constant ${c>0}$ . Thus, if one force ${\int_{\bf R} |f_h(x) - f(x)|\ dx}$ to drop below ${c}$ , one has to make ${h}$ at most ${\pi/N}$ from the origin. Making ${N}$ large, we thus see that the rate of convergence of ${\int_{\bf R} |f_h(x) - f(x)|\ dx}$ to zero can be arbitrarily slow, even though ${f}$ is bounded in ${L^1}$ . The problem is that as ${N}$ gets large, it becomes increasingly difficult to approximate ${f}$ well by a “nice” function, by which we mean a uniformly continuous function with a reasonable modulus of continuity, due to the increasingly oscillatory nature of ${f}$ . See this blog post for some further discussion of this issue, and what quantitative substitutes are available for such qualitative results.

Now we return to the Lebesgue differentiation theorem, and apply the density argument. The dense subclass result is already contained in Corollary 14, which asserts that (2) holds for all continuous functions ${f}$ . The quantitative estimate we will need is the following special case of the Hardy-Littlewood maximal inequality:

Lemma 25 (One-sided Hardy-Littlewood maximal inequality) Let ${f: {\bf R} \rightarrow {\bf C}}$ be an absolutely integrable function, and let ${\lambda > 0}$ . Then

$\displaystyle m( \{ x \in {\bf R}: \sup_{h>0} \frac{1}{h} \int_{[x,x+h]} |f(t)|\ dt \geq \lambda \} ) \leq \frac{1}{\lambda} \int_{\bf R} |f(t)|\ dt.$

We will prove this lemma shortly, but let us first see how this, combined with the dense subclass result, will give the Lebesgue differentiation theorem. Let ${f: {\bf R} \rightarrow {\bf C}}$ be absolutely integrable, and let ${\varepsilon, \lambda > 0}$ be arbitrary. Then by Littlewood’s second principle, we can find a function ${g: {\bf R} \rightarrow {\bf C}}$ which is continuous and compactly supported, with

$\displaystyle \int_{\bf R} |f(x)-g(x)|\ dx \leq \varepsilon.$

Applying the one-sided Hardy-Littlewood maximal inequality, we conclude that

$\displaystyle m( \{ x \in {\bf R}: \sup_{h>0} \frac{1}{h} \int_{[x,x+h]} |f(t)-g(t)|\ dt \geq \lambda \} ) \leq \frac{\varepsilon}{\lambda}.$

In a similar spirit, from Markov’s inequality we have

$\displaystyle m( \{ x \in {\bf R}: |f(x)-g(x)| \geq \lambda \} ) \leq \frac{\varepsilon}{\lambda}.$

By subadditivity, we conclude that for all ${x \in {\bf R}}$ outside of a set ${E}$ of measure at most ${2\varepsilon/\lambda}$ , one has both

$\displaystyle \frac{1}{h} \int_{[x,x+h]} |f(t)-g(t)|\ dt < \lambda \ \ \ \ \ (5)$

and

$\displaystyle |f(x)-g(x)| < \lambda \ \ \ \ \ (6)$

for all ${h > 0}$ .
Now let ${x \in {\bf R} \backslash E}$ . From the dense subclass result (Corollary 14) applied to the continuous function ${g}$ , we have

$\displaystyle |\frac{1}{h} \int_{[x,x+h]} g(t)\ dt - g(x)| < \lambda$

whenever ${h}$ is sufficiently close to ${0}$ . Combining this with (5), (6), and the triangle inequality, we conclude that

$\displaystyle |\frac{1}{h} \int_{[x,x+h]} f(t)\ dt - f(x)| < 3\lambda$

for all ${h}$ sufficiently close to zero. In particular we have

$\displaystyle \limsup_{h \rightarrow 0} |\frac{1}{h} \int_{[x,x+h]} f(t)\ dt - f(x)| < 3\lambda$

for all ${x}$ outside of a set of measure ${2\varepsilon/\lambda}$ . Keeping ${\lambda}$ fixed and sending ${\varepsilon}$ to zero, we conclude that

$\displaystyle \limsup_{h \rightarrow 0} |\frac{1}{h} \int_{[x,x+h]} f(t)\ dt - f(x)| < 3\lambda$

for almost every ${x \in {\bf R}}$ . If we then let ${\lambda}$ go to zero along a countable sequence (e.g. ${\lambda := 1/n}$ for ${n=1,2,\ldots}$ ), we conclude that

$\displaystyle \limsup_{h \rightarrow 0} |\frac{1}{h} \int_{[x,x+h]} f(t)\ dt - f(x)| = 0$

for almost every ${x \in {\bf R}}$ , and the claim follows.
The only remaining task is to establish the one-sided Hardy-Littlewood maximal inequality. We will do so by using the rising sun lemma:

Lemma 26 (Rising sun lemma) Let ${[a,b]}$ be a compact interval, and let ${F: [a,b] \rightarrow {\bf R}}$ be a continuous function. Then one can find an at most countable family of disjoint non-empty open intervals ${I_n = (a_n,b_n)}$ in ${[a,b]}$ with the following properties:

For each ${n}$ , either ${F(a_n)=F(b_n)}$ , or else ${a_n=a}$ and ${F(b_n) \geq F(a_n)}$ .

If ${x \in (a,b]}$ does not lie in any of the intervals ${I_n}$ , then one must have ${F(y) \leq F(x)}$ for all ${x \leq y \leq b}$ .

Remark 27 To explain the name “rising sun lemma”, imagine the graph ${\{ (x, F(x)): x \in [a,b] \}}$ of ${F}$ as depicting a hilly landscape, with the sun shining horizontally from the rightward infinity ${(+\infty,0)}$ (or rising from the east, if you will). Those ${x}$ for which ${F(y) \leq F(x)}$ are the locations on the landscape which are illuminated by the sun. The intervals ${I_n}$ then represent the portions of the landscape that are in shadow.

This lemma is proven using the following basic fact:

Exercise 28 Show that any open subset ${U}$ of ${{\bf R}}$ can be written as the union of at most countably many disjoint non-empty open intervals, whose endpoints lie outside of ${U}$ . (Hint: first show that every ${x}$ in ${U}$ is contained in a maximal open subinterval ${(a,b)}$ of ${U}$ , and that these maximal open subintervals are disjoint, with each such interval containing at least one rational number.)

Proof: (Proof of rising sun lemma) Let ${U}$ be the set of all ${x \in (a,b)}$ such that ${F(y) > F(x)}$ for at least one ${x < y < b}$ . As ${F}$ is continuous, ${U}$ is open, and so ${U}$ is the union of at most countably many disjoint non-empty open intervals ${I_n = (a_n,b_n)}$ , with the endpoints ${a_n, b_n}$ lying outside of ${U}$ .
The second conclusion of the rising sun lemma is clear from construction, so it suffices to establish the first. Suppose first that ${I_n = (a_n,b_n)}$ is such that ${a_n \neq a}$ . As the endpoint ${a_n}$ does not lie in ${U}$ , we must have ${F(y) \leq F(a_n)}$ for all ${a_n \leq y \leq b}$ ; similarly we have ${F(y) \leq F(b_n)}$ for all ${b_n \leq y \leq b}$ . In particular we have ${F(b_n) \leq F(a_n)}$ . By the continuity of ${F}$ , it will then suffice to show that ${F(b_n) \geq F(t)}$ for all ${a_n < t < b_n}$ .
Suppose for contradiction that there was ${a_n < t < b_n}$ with ${F(b_n) < F(t)}$ . Let ${A := \{ s \in [t,b]: F(s) \geq F(t)\}}$ , then ${A}$ is a closed set that contains ${t}$ but is disjoint from ${[b_n,b]}$ , since ${F(s) \leq F(b_n) < F(t)}$ for all ${s \in [b_n,b]}$ . Set ${t_* := \sup(A)}$ , then ${t_* \in [t,b_n) \subset I_n \subset U}$ , and thus there exists ${t_* < y \leq b}$ such that ${F(y) > F(t_*)}$ . Since ${F(t_*) \geq F(t) > F(b_n)}$ , and ${F(b_n) \geq F(z)}$ for all ${b_n \leq z \leq b}$ , we see that ${y}$ cannot exceed ${b_n}$ , and thus lies in ${A}$ , but this contradicts the fact that ${t_*}$ is the supremum of ${A}$ .
The case when ${a_n=a}$ is similar and is left to the reader; the only difference is that we can no longer assert that ${F(y) \leq F(a_n)}$ for all ${a_n \leq y \leq b}$ , and so do not have the upper bound ${F(b_n) \leq F(a_n)}$ . $\Box$
Now we can prove the one-sided Hardy-Littlewood maximal inequality. By upwards monotonicity, it will suffice to show that

$\displaystyle m( \{ x \in [a,b]: \sup_{h>0; [x,x+h] \subset [a,b]} \frac{1}{h} \int_{[x,x+h]} |f(t)|\ dt \geq \lambda \} ) \leq \frac{1}{\lambda} \int_{\bf R} |f(t)|\ dt$

for any compact interval ${[a,b]}$ . By modifying ${\lambda}$ by an epsilon, we may replace the non-strict inequality here with strict inequality:

$\displaystyle m( \{ x \in [a,b]: \sup_{h>0; [x,x+h] \subset [a,b]} \frac{1}{h} \int_{[x,x+h]} |f(t)|\ dt > \lambda \} ) \leq \frac{1}{\lambda} \int_{\bf R} |f(t)|\ dt \ \ \ \ \ (7)$

Fix ${[a,b]}$ . We apply the rising sun lemma to the function ${F: [a,b] \rightarrow {\bf R}}$ defined as

$\displaystyle F(x) := \int_{[a,x]} |f(t)|\ dt - (x-a) \lambda.$

By Lemma 16, ${F}$ is continuous, and so we can find an at most countable sequence of intervals ${I_n = (a_n,b_n)}$ with the properties given by the rising sun lemma. From the second property of that lemma, we observe that

$\displaystyle \{ x \in (a,b]: \sup_{h>0; [x,x+h] \subset [a,b]} \frac{1}{h} \int_{[x,x+h]} |f(t)|\ dt > \lambda \} \subset \bigcup_n I_n,$

since the property ${\frac{1}{h} \int_{[x,x+h]} |f(t)|\ dt > \lambda}$ can be rearranged as ${F(x+h) > F(x)}$ . By countable additivity, we may thus upper bound the left-hand side of (7) by ${\sum_n (b_n-a_n)}$ . On the other hand, since ${F(b_n)-F(a_n) \geq 0}$ , we have

$\displaystyle \int_{I_n} |f(t)|\ dt \geq \lambda (b_n-a_n)$

and thus

$\displaystyle \sum_n (b_n-a_n) \leq \frac{1}{\lambda} \sum_n \int_{I_n} |f(t)|\ dt.$

As the ${I_n}$ are disjoint intervals in ${I}$ , we may apply monotone convergence and monotonicity to conclude that

$\displaystyle \sum_n \int_{I_n} |f(t)|\ dt \leq \int_{[a,b]} |f(t)|\ dt,$

and the claim follows.

Exercise 29 (Two-sided Hardy-Littlewood maximal inequality) Let ${f: {\bf R} \rightarrow {\bf C}}$ be an absolutely integrable function, and let ${\lambda > 0}$ . Show that

$\displaystyle m( \{ x \in {\bf R}: \sup_{x \in I} \frac{1}{|I|} \int_{I} |f(t)|\ dt \geq \lambda \} ) \leq \frac{2}{\lambda} \int_{\bf R} |f(t)|\ dt,$

where the supremum ranges over all intervals ${I}$ of positive length that contain ${x}$ .

Exercise 30 (Rising sun inequality) Let ${f: {\bf R} \rightarrow {\bf R}}$ be an absolutely integrable function, and let ${f^*: {\bf R} \rightarrow {\bf R}}$ be the one-sided signed Hardy-Littlewood maximal function

$\displaystyle f^*(x) := \sup_{h>0} \frac{1}{h} \int_{[x,x+h]} f(t)\ dt.$

Establish the rising sun inequality

$\displaystyle \lambda m( \{ f^*(x) > \lambda \} ) \leq \int_{x: f^*(x) > \lambda} f(x)\ dx$

for all real ${\lambda}$ (note here that we permit ${\lambda}$ to be zero or negative), and show that this inequality implies Lemma 25. (Hint: First do the ${\lambda=0}$ case, by invoking the rising sun lemma.) See these lecture notes for some further discussion of inequalities of this type, and applications to ergodic theory (and in particular the maximal ergodic theorem).

Exercise 31 Show that the left and right-hand sides in Exercise 30 are in fact equal when ${\lambda>0}$ . (Hint: one may first wish to try this in the case when ${f}$ has compact support, in which case one can apply the rising sun lemma to a sufficiently large interval containing the support of ${f}$ .)

— 2. The Lebesgue differentiation theorem in higher dimensions —

Now we extend the Lebesgue differentiation theorem to higher dimensions. Theorem 15 does not have an obvious high-dimensional analogue, but Theorem 17 does:

Theorem 32 (Lebesgue differentiation theorem in high dimensions) Let ${f: {\bf R}^d \rightarrow {\bf C}}$ be an absolutely integrable function. Then for almost every ${x \in {\bf R}^d}$ , one has

$\displaystyle \lim_{r \rightarrow 0} \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y) - f(x)|\ dy = 0 \ \ \ \ \ (8)$

and

$\displaystyle \lim_{r \rightarrow 0} \frac{1}{m(B(x,r))} \int_{B(x,r)} f(y)\ dy = f(x),$

where ${B(x,r) := \{ y \in {\bf R}^d: |x-y| < r \}}$ is the open ball of radius ${r}$ centred at ${x}$ .

From the triangle inequality we see that

$\displaystyle |\frac{1}{m(B(x,r))} \int_{B(x,r)} f(y)\ dy - f(x)|$

$\displaystyle = |\frac{1}{m(B(x,r))} \int_{B(x,r)} f(y) - f(x)\ dy|$

$\displaystyle \leq \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y) - f(x)|\ dy,$

so we see that the first conclusion of Theorem 32 implies the second. A point ${x}$ for which (8) holds is called a Lebesgue point of ${f}$ ; thus, for an absolutely integrable function ${f}$ , almost every point in ${{\bf R}^d}$ will be a Lebesgue point for ${{\bf R}^d}$ .

Exercise 33 Call a function ${f: {\bf R}^d \rightarrow {\bf C}}$ locally integrable if, for every ${x \in {\bf R}^d}$ , there exists an open neighbourhood of ${x}$ on which ${f}$ is absolutely integrable.

Show that ${f}$ is locally integrable if and only if ${\int_{B(0,r)} |f(x)|\ dx < \infty}$ for all ${r>0}$ .

Show that Theorem 32 implies a generalisation of itself in which the condition of absolute integrability of ${f}$ is weakened to local integrability.

Exercise 34 For each ${h>0}$ , let ${E_h}$ be a subset of ${B(0,h)}$ with the property that ${m(E_h) \geq c m(B(0,h))}$ for some ${c>0}$ independent of ${h}$ . Show that if ${f: {\bf R}^d \rightarrow {\bf C}}$ is locally integrable, and ${x}$ is a Lebesgue point of ${f}$ , then

$\displaystyle \lim_{h \rightarrow 0} \frac{1}{m(E_h)} \int_{x+E_h} f(y)\ dy = f(x).$

Conclude that Theorem 32 implies Theorem 17.

To prove Theorem 32, we use the density argument. The dense subclass case is easy:

Exercise 35 Show that Theorem 32 holds whenever ${f}$ is continuous.

The quantitative estimate needed is the following:

Theorem 36 (Hardy-Littlewood maximal inequality) Let ${f: {\bf R}^d \rightarrow {\bf C}}$ be an absolutely integrable function, and let ${\lambda > 0}$ . Then

$\displaystyle m( \{ x \in {\bf R}^d: \sup_{r>0} \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y)|\ dy \geq \lambda \} )$

$\displaystyle \leq \frac{C_d}{\lambda} \int_{{\bf R}^d} |f(t)|\ dt$

for some constant ${C_d>0}$ depending only on ${d}$ .

Remark 37 The expression ${\sup_{r>0} \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y)|\ dy \geq \lambda \}}$ is known as the Hardy-Littlewood maximal function of ${f}$ , and is often denoted ${Mf(x)}$ . It is an important function in the field of (real-variable) harmonic analysis.

Exercise 38 Use the density argument to show that Theorem 36 implies Theorem 32.

In the one-dimensional case, this estimate was established via the rising sun lemma. Unfortunately, that lemma relied heavily on the ordered nature of ${{\bf R}}$ , and does not have an obvious analogue in higher dimensions. Instead, we will use the following covering lemma. Given an open ball ${B = B(x,r)}$ in ${{\bf R}^d}$ and a real number ${c > 0}$ , we write ${cB := B(x,cr)}$ for the ball with the same centre as ${B}$ , but ${c}$ times the radius. (Note that this is slightly different from the set ${c \cdot B := \{ cy: y \in B \}}$ – why?) Note that ${|cB| = c^d |B|}$ for any open ball ${B \subset {\bf R}^d}$ and any ${c>0}$ .

Lemma 39 (Vitali-type covering lemma) Let ${B_1,\ldots,B_n}$ be a finite collection of open balls in ${{\bf R}^d}$ (not necessarily disjoint). Then there exists a subcollection ${B'_1,\ldots,B'_m}$ of disjoint balls in this collection, such that

$\displaystyle \bigcup_{i=1}^n B_i \subset \bigcup_{j=1}^m 3 B'_j. \ \ \ \ \ (9)$

In particular, by finite subadditivity,

$\displaystyle m( \bigcup_{i=1}^n B_i ) \leq 3^d \sum_{j=1}^m m(B'_j).$

Proof: We use a greedy algorithm argument, selecting the balls ${B'_i}$ to be as large as possible while remaining disjoint. More precisely, we run the following algorithm:

Step 0. Initialise ${m=0}$ (so that, initially, there are no balls ${B'_1,\ldots,B'_m}$ in the desired collection).
Step 1. Look at all the balls ${B_j}$ that do not already intersect one of the ${B'_1,\ldots,B'_m}$ (which, initially, will be all the balls ${B_1,\ldots,B_n}$ ). If there are no such balls, STOP. Otherwise, go on to Step 2.
Step 2. Locate the largest ball ${B_j}$ that does not already intersect one of the ${B'_1,\ldots,B'_m}$ . (If there are multiple largest balls with exactly the same radius, break the tie arbitrarily.) Add this ball to the collection ${B'_1,\ldots,B'_m}$ by setting ${B'_{m+1} := B_j}$ and then incrementing ${m}$ to ${m+1}$ . Then return to Step 1.

Note that at each iteration of this algorithm, the number of available balls amongst the ${B_1,\ldots,B_n}$ drops by at least one (since each ball selected certainly intersects itself and so cannot be selected again). So this algorithm terminates in finite time. It is also clear from construction that the ${B'_1,\ldots,B'_m}$ are a subcollection of the ${B_1,\ldots,B_n}$ consisting of disjoint balls. So the only task remaining is to verify that (9) holds at the completion of the algorithm, i.e. to show that each ball ${B_i}$ in the original collection is covered by the triples ${3B'_j}$ of the subcollection.
For this, we argue as follows. Take any ball ${B_i}$ in the original collection. Because the algorithm only halts when there are no more balls that are disjoint from the ${B'_1,\ldots,B'_m}$ , the ball ${B_i}$ must intersect at least one of the balls ${B'_j}$ in the subcollection. Let ${B'_j}$ be the first ball with this property, thus ${B_i}$ is disjoint from ${B'_1,\ldots,B'_{j-1}}$ , but intersects ${B'_j}$ . Because ${B'_j}$ was chosen to be largest amongst all balls that did not intersect ${B'_1,\ldots,B'_{j-1}}$ , we conclude that the radius of ${B_i}$ cannot exceed that of ${B'_j}$ . From the triangle inequality, this implies that ${B_i \subset 3B'_j}$ , and the claim follows. $\Box$

Exercise 40 Technically speaking, the above algorithmic argument was not phrased in the standard language of formal mathematical deduction, because in that language, any mathematical object (such as the natural number ${m}$ ) can only be defined once, and not redefined multiple times as is done in most algorithms. Rewrite the above argument in a way that avoids redefining any variable. (Hint: introduce a “time” variable ${t}$ , and recursively construct families ${B'_{1,t},\ldots,B'_{m_t,t}}$ of balls that represent the outcome of the above algorithm after ${t}$ iterations (or ${t_*}$ iterations, if the algorithm halted at some previous time ${t_* < t}$ ). For this particular algorithm, there are also more ad hoc approaches that exploit the relatively simple nature of the algorithm to allow for a less notationally complicated construction.) More generally, it is possible to use this time parameter trick to convert any construction involving a provably terminating algorithm into a construction that does not redefine any variable. (It is however dangerous to work with any algorithm that has an infinite run time, unless one has a suitably strong convergence result for the algorithm that allows one to take limits, either in the classical sense or in the more general sense of jumping to limit ordinals; in the latter case, one needs to use transfinite induction in order to ensure that the use of such algorithms is rigorous.)

Remark 41 The actual Vitali covering lemma is slightly different to this one, as the linked Wikipedia page shows. Actually there is a family of related covering lemmas which are useful for a variety of tasks in harmonic analysis, see for instance this book by de Guzmán for further discussion.

Now we can prove the Hardy-Littlewood inequality, which we will do with the constant ${C_d := 3^d}$ . It suffices to verify the claim with strict inequality,

$\displaystyle m( \{ x \in {\bf R}^d: \sup_{r>0} \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y)|\ dy > \lambda \} ) \leq \frac{C_d}{\lambda} \int_{{\bf R}^d} |f(t)|\ dt$

as the non-strict case then follows by perturbing ${\lambda}$ slightly and then taking limits.
Fix ${f}$ and ${\lambda}$ . By inner regularity, it suffices to show that

$\displaystyle m( K )\leq \frac{3^d}{\lambda} \int_{{\bf R}^d} |f(t)|\ dt$

whenever ${K}$ is a compact set that is contained in ${\{ x \in {\bf R}^d: \sup_{r>0} \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y)|\ dy > \lambda \}}$ .
By construction, for every ${x \in K}$ , there exists an open ball ${B(x,r)}$ such that

$\displaystyle \frac{1}{m(B(x,r))} \int_{B(x,r)} |f(y)|\ dy > \lambda. \ \ \ \ \ (10)$

By compactness of ${K}$ , we can cover ${K}$ by a finite number ${B_1,\ldots,B_n}$ of such balls. Applying the Vitali-type covering lemma, we can find a subcollection ${B'_1,\ldots,B'_m}$ of disjoint balls such that

$\displaystyle m( \bigcup_{i=1}^n B_i ) \leq 3^d \sum_{j=1}^m m(B'_j).$

By (10), on each ball ${B'_j}$ we have

$\displaystyle m(B'_j) < \frac{1}{\lambda} \int_{B'_j} |f(y)|\ dy;$

summing in ${j}$ and using the disjointness of the ${B'_j}$ we conclude that

$\displaystyle m( \bigcup_{i=1}^n B_i ) \leq \frac{3^d}{\lambda} \int_{{\bf R}^d} |f(y)|\ dy.$

Since the ${B_1,\ldots,B_n}$ cover ${K}$ , we obtain Theorem 36 as desired.

Exercise 42 Improve the constant ${3^d}$ in the Hardy-Littlewood maximal inequality to ${2^d}$ . (Hint: observe that with the construction used to prove the Vitali covering lemma, the centres of the balls ${B_i}$ are contained in ${\bigcup_{j=1}^m 2B'_j}$ and not just in ${\bigcup_{j=1}^m 3B'_j}$ . To exploit this observation one may need to first create an epsilon of room, as the centers are not by themselves sufficient to cover the required set.)

Remark 43 The optimal value of ${C_d}$ is not known in general, although a fairly recent result of Melas gives the surprising conclusion that the optimal value of ${C_1}$ is ${C_1 = \frac{11+\sqrt{61}}{12} = 1.56\ldots}$ . It is known that ${C_d}$ grows at most linearly in ${d}$ , thanks to a result of Stein and Strömberg, but it is not known if ${C_d}$ is bounded in ${d}$ or grows as ${d \rightarrow \infty}$ . See this blog post for some further discussion.

Exercise 44 (Dyadic maximal inequality) If ${f: {\bf R}^d \rightarrow {\bf C}}$ is an absolutely integrable function, establish the dyadic Hardy-Littlewood maximal inequality

$\displaystyle m( \{ x \in {\bf R}^d: \sup_{x \in Q} \frac{1}{|Q|} \int_{Q} |f(y)|\ dy \geq \lambda \} ) \leq \frac{1}{\lambda} \int_{{\bf R}^d} |f(t)|\ dt$

where the supremum ranges over all dyadic cubes ${Q}$ that contain ${x}$ . (Hint: the nesting property of dyadic cubes will be useful when it comes to the covering lemma stage of the argument, much as it was in Exercise 8 of Notes 1.)

Exercise 45 (Besicovitch covering lemma in one dimension) Let ${I_1,\ldots,I_n}$ be a finite family of open intervals in ${{\bf R}}$ (not necessarily disjoint). Show that there exist a subfamily ${I'_1,\ldots,I'_m}$ of intervals such that

${\bigcup_{i=1}^n I_n = \bigcup_{j=1}^m I'_m}$ ; and

Each point ${x \in {\bf R}}$ is contained in at most two of the ${I'_m}$ .

(Hint: First refine the family of intervals so that no interval ${I_i}$ is contained in the union of the the other intervals. At that point, show that it is no longer possible for a point to be contained in three of the intervals.) There is a variant of this lemma that holds in higher dimensions, known as the Besicovitch covering lemma.

Exercise 46 Let ${\mu}$ be a Borel measure (i.e. a countably additive measure on the Borel ${\sigma}$ -algebra) on ${{\bf R}}$ , such that ${0 < \mu(I) < \infty}$ for every interval ${I}$ of positive length. Assume that ${\mu}$ is inner regular, in the sense that ${\mu(E) = \sup_{K \subset E, \hbox{ compact}} \mu(K)}$ for every Borel measurable set ${E}$ . (As it turns out, from the theory of Radon measures, all locally finite Borel measures have this property, but we will not prove this here; see Exercise 12 of these notes.) Establish the Hardy-Littlewood maximal inequality

$\displaystyle \mu( \{ x \in {\bf R}: \sup_{x \in I} \frac{1}{\mu(I)} \int_{I} |f(y)| \ d\mu(y) \geq \lambda \} ) \leq \frac{2}{\lambda} \int_{\bf R} |f(y)|\ d\mu(y)$

for any absolutely integrable function ${f \in L^1(\mu)}$ , where the supremum ranges over all open intervals ${I}$ that contain ${x}$ . Note that this essentially generalises Exercise 29, in which ${\mu}$ is replaced by Lebesgue measure. (Hint: Repeat the proof of the usual Hardy-Littlewood maximal inequality, but use the Besicovitch covering lemma in place of the Vitali-type covering lemma. Why do we need the former lemma here instead of the latter?)

Exercise 47 (Cousin’s theorem) Prove Cousin’s theorem: given any function ${\delta: [a,b] \rightarrow (0,+\infty)}$ on a compact interval ${[a,b]}$ of positive length, there exists a partition ${a = t_0 < t_1 < \ldots < t_k = b}$ with ${k \geq 1}$ , together with real numbers ${t^*_j \in [t_{j-1},t_j]}$ for each ${1 \leq j \leq k}$ and ${t_j - t_{j-1} \leq \delta(t^*_j)}$ . (Hint: use the Heine-Borel theorem, which asserts that any open cover of ${[a,b]}$ has a finite subcover, followed by the Besicovitch covering lemma.) This theorem is useful in a variety of applications related to the second fundamental theorem of calculus, as we shall see below. The positive function ${\delta}$ is known as a gauge function.

Now we turn to consequences of the Lebesgue differentiation theorem. Given a Lebesgue measurable set ${E \subset {\bf R}^d}$ , call a point ${x \in {\bf R}^d}$ a point of density for ${E}$ if ${\frac{m(E \cap B(x,r))}{m(B(x,r))} \rightarrow 1}$ as ${r \rightarrow 0}$ . Thus, for instance, if ${E = [-1,1] \backslash \{0\}}$ , then every point in ${(-1,1)}$ (including the boundary point ${0}$ ) is a point of density for ${E}$ , but the endpoints ${-1, 1}$ (as well as the exterior of ${E}$ ) are not points of density. One can think of a point of density as being an “almost interior” point of ${E}$ ; it is not necessarily the case that one can fit an small ball ${B(x,r)}$ centred at ${x}$ inside of ${E}$ , but one can fit most of that small ball inside ${E}$ .

Exercise 48 If ${E \subset {\bf R}^d}$ is Lebesgue measurable, show that almost every point in ${E}$ is a point of density for ${E}$ , and almost every point in the complement of ${E}$ is not a point of density for ${E}$ .

Exercise 49 Let ${E \subset {\bf R}^d}$ be a measurable set of positive measure, and let ${\varepsilon > 0}$ .

Using Exercise 34 and Exercise 48, show that there exists a cube ${Q \subset {\bf R}^d}$ of positive sidelength such that ${m(E \cap Q) > (1-\varepsilon) m(Q)}$ .

Give an alternate proof of the above claim that avoids the Lebesgue differentiation theorem. (Hint: reduce to the case when ${E}$ is bounded, then approximate ${E}$ by an almost disjoint union of cubes.)

Use the above result to give an alternate proof of the Steinhaus theorem (Exercise 22).

Of course, one can replace cubes here by other comparable shapes, such as balls. (Indeed, a good principle to adopt in analysis is that cubes and balls are “equivalent up to constants”, in that a cube of some sidelength can be contained in a ball of comparable radius, and vice versa. This type of mental equivalence is analogous to, though not identical with, the famous dictum that a topologist cannot distinguish a doughnut from a coffee cup.)

Exercise 50

Give an example of a compact set ${K \subset {\bf R}}$ of positive measure such that ${m(K \cap I) < |I|}$ for every interval ${I}$ of positive length. (Hint: first construct an open dense subset of ${[0,1]}$ of measure strictly less than ${1}$ .)

Give an example of a measurable set ${E \subset {\bf R}}$ such that ${0 < m(E \cap I) < |I|}$ for every interval ${I}$ of positive length. (Hint: first work in a bounded interval, such as ${(-1,2)}$ . The complement of the set ${K}$ in the first example is the union of at most countably many open intervals, thanks to Exercise 28. Now fill in these open intervals and iterate.)

Exercise 51 (Approximations to the identity) Define a good kernel to be a measurable function ${P: {\bf R}^d \rightarrow {\bf R}^+}$ which is non-negative, radial (which means that there is a function ${\tilde P: [0,+\infty) \rightarrow {\bf R}^+}$ such that ${P(x) = \tilde P(|x|)}$ ), radially non-increasing (so that ${\tilde P}$ is a non-increasing function), and has total mass ${\int_{{\bf R}^d} P(x)\ dx}$ equal to ${1}$ . The functions ${P_t(x) := \frac{1}{t^d} P(\frac{x}{t})}$ for ${t>0}$ are then said to be a good family of approximations to the identity.

Show that the heat kernels ${P_t(x) := \frac{1}{(4\pi t^2)^{d/2}} e^{-|x|^2/4t^2}}$ and Poisson kernels ${P_t(x) := c_d \frac{t}{(t^2+|x|^2)^{(d+1)/2}}}$ are good families of approximations to the identity, if the constant ${c_d > 0}$ is chosen correctly (in fact one has ${c_d = \Gamma((d+1)/2)/\pi^{(d+1)/2}}$ , but you are not required to establish this). (Note that we have modified the usual formulation of the heat kernel by replacing ${t}$ with ${t^2}$ in order to make it conform to the notational conventions used in this exercise.)

Show that if ${P}$ is a good kernel, then
$\displaystyle c_d < \sum_{n=-\infty}^\infty 2^{dn} \tilde P(2^n) \leq C_d$

for some constants ${0 < c_d < C_d}$ depending only on ${d}$ . (Hint: compare ${P}$ with such “horizontal wedding cake” functions as ${\sum_{n=-\infty}^\infty 1_{2^{n-1}< |x| \leq 2^n} \tilde P(2^n)}$ .)

Establish the quantitative upper bound
$\displaystyle |\int_{{\bf R}^d} f(y) P_t(x-y)\ dy| \leq C'_d \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy$

for any absolutely integrable function ${f}$ and some constant ${C'_d > 0}$ depending only on ${d}$ .

Show that if ${f: {\bf R}^d \rightarrow {\bf C}}$ is absolutely integrable and ${x}$ is a Lebesgue point of ${f}$ , then the convolution
$\displaystyle f*P_t(x) := \int_{{\bf R}^d} f(y) P_t(x-y)\ dy$

converges to ${f(x)}$ as ${t \rightarrow 0}$ . (Hint: split ${f(y)}$ as the sum of ${f(x)}$ and ${f(y)-f(x)}$ .) In particular, ${f*P_t}$ converges pointwise almost everywhere to ${f}$ .

— 3. Almost everywhere differentiability —

As we see in undergraduate real analysis, not every continuous function ${f: {\bf R} \rightarrow {\bf R}}$ is differentiable, with the standard example being the absolute value function ${f(x) := |x|}$ , which is continuous not differentiable at the origin ${x=0}$ . Of course, this function is still almost everywhere differentiable. With a bit more effort, one can construct continuous functions that are in fact nowhere differentiable:

Exercise 52 (Weierstrass function) Let ${F: {\bf R} \rightarrow {\bf R}}$ be the function

$\displaystyle F(x) := \sum_{n=1}^\infty 4^{-n} \cos(16^n \pi x).$

Show that ${F}$ is well-defined (in the sense that the series is absolutely convergent) and that ${F}$ is a bounded continuous function.

Show that for every interval ${[\frac{j}{16^m}, \frac{j+1}{16^m}]}$ with ${m \geq 1}$ and ${j}$ integer, one has ${|F(\frac{j+1}{16^m}) - F(\frac{j}{16^m})| \geq c 4^{-m}}$ for some absolute constant ${c>0}$ .

Show that ${F}$ is not differentiable at any point ${x \in {\bf R}}$ . (Hint: argue by contradiction and use the previous part of this exercise.) Note that it is not enough to formally differentiate the series term by term and observe that the resulting series is divergent – why not?

The difficulty here is that a continuous function can still contain a large amount of oscillation, which can lead to breakdown of differentiability. However, if one can somehow limit the amount of oscillation present, then one can often recover a fair bit of differentiability. For instance, we have

Theorem 53 (Monotone differentiation theorem) Any function ${F: {\bf R} \rightarrow {\bf R}}$ which is monotone (either monotone non-decreasing or monotone non-increasing) is differentiable almost everywhere.

Exercise 54 Show that every monotone function is measurable.

To prove this theorem, we just treat the case when ${F}$ is monotone non-decreasing, as the non-increasing case is similar (and can be deduced from the non-decreasing case by replacing ${F}$ with ${-F}$ ).
We also first focus on the case when ${F}$ is continuous, as this allows us to use the rising sun lemma. To understand the differentiability of ${F}$ , we introduce the four Dini derivatives of ${F}$ at ${x}$ :

The upper right derivative ${\overline{D^+} F(x) := \limsup_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h}}$ ;
The lower right derivative ${\underline{D^+} F(x) := \liminf_{h \rightarrow 0^+} \frac{F(x+h)-F(x)}{h}}$ ;
The upper left derivative ${\overline{D^-} F(x) := \limsup_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h}}$ ;
The lower right derivative ${\underline{D^-} F(x) := \liminf_{h \rightarrow 0^-} \frac{F(x+h)-F(x)}{h}}$ .

Regardless of whether ${F}$ is differentiable or not (or even whether ${F}$ is continuous or not), the four Dini derivatives always exist and take values in the extended real line ${[-\infty,\infty]}$ . (If ${F}$ is only defined on an interval ${[a,b]}$ , rather than on the endpoints, then some of the Dini derivatives may not exist at the endpoints, but this is a measure zero set and will not impact our analysis.)

Exercise 55 If ${F}$ is monotone, show that the four Dini derivatives of ${F}$ are measurable. (Hint: the main difficulty is to reformulate the derivatives so that ${h}$ ranges over a countable set rather than an uncountable one.)

A function ${F}$ is differentiable at ${x}$ precisely when the four derivatives are equal and finite:

$\displaystyle \overline{D^+} F(x) = \underline{D^+} F(x) = \overline{D^-} F(x) = \underline{D^-} F(x) \in (-\infty,+\infty). \ \ \ \ \ (11)$

We also have the trivial inequalities

$\displaystyle \underline{D^+} F(x) \leq \overline{D^+} F(x); \quad \underline{D^-} F(x) \leq \overline{D^-} F(x).$

If ${F}$ is non-decreasing, all these quantities are non-negative, thus

$\displaystyle 0 \leq \underline{D^+} F(x) \leq \overline{D^+} F(x); \quad 0 \leq \underline{D^-} F(x) \leq \overline{D^-} F(x).$

The one-sided Hardy-Littlewood maximal inequality has an analogue in this setting:

Lemma 56 (One-sided Hardy-Littlewood inequality) Let ${F: [a,b] \rightarrow {\bf R}}$ be a continuous monotone non-decreasing function, and let ${\lambda > 0}$ . Then we have

$\displaystyle m( \{ x \in [a,b]: \overline{D^+} F(x) \geq \lambda \} ) \leq \frac{F(b)-F(a)}{\lambda}.$

Similarly for the other three Dini derivatives of ${F}$ .
If ${F}$ is not assumed to be continuous, then we have the weaker inequality

$\displaystyle m( \{ x \in [a,b]: \overline{D^+} F(x) \geq \lambda \} ) \leq C\frac{F(b)-F(a)}{\lambda}$

for some absolute constant ${C>0}$ .

Remark 57 Note that if one naively applies the fundamental theorems of calculus, one can formally see that the first part of Lemma 56 is equivalent to Lemma 25. We cannot however use this argument rigorously because we have not established the necessary fundamental theorems of calculus to do this. Nevertheless, we can borrow the proof of Lemma 25 without difficulty to use here, and this is exactly what we will do.

Proof: We just prove the continuous case and leave the discontinuous case as an exercise.
It suffices to prove the claim for ${\overline{D^+} F}$ ; by reflection (replacing ${F(x)}$ with ${-F(-x)}$ , and ${[a,b]}$ with ${[-b,-a]}$ ), the same argument works for ${\overline{D^-} F}$ , and then this trivially implies the same inequalities for ${\underline{D^+} F}$ and ${\underline{D^-} F}$ . By modifying ${\lambda}$ by an epsilon, and dropping the endpoints from ${[a,b]}$ as they have measure zero, it suffices to show that

$\displaystyle m( \{ x \in (a,b): \overline{D^+} F(x) > \lambda \} ) \leq \frac{F(b)-F(a)}{\lambda}$

We may apply the rising sun lemma (Lemma 26) to the continuous function ${G(x) := F(x) - \lambda x}$ . This gives us an at most countable family of intervals ${I_n = (a_n,b_n)}$ in ${(a,b)}$ , such that ${G(b_n) \geq G(a_n)}$ for each ${n}$ , and such that ${G(y) \leq G(x)}$ whenever ${a \leq x \leq y\leq b}$ and ${x}$ lies outside of all of the ${I_n}$ .
Observe that if ${x \in (a,b)}$ , and ${G(y) \leq G(x)}$ for all ${x \leq y \leq b}$ , then ${\overline{D^+} F(x) \leq \lambda}$ . Thus we see that the set ${\{ x \in (a,b): \overline{D^+} F(x) > \lambda \}}$ is contained in the union of the ${I_n}$ , and so by countable additivity

$\displaystyle m( \{ x \in (a,b): \overline{D^+} F(x) > \lambda \} ) \leq \sum_n b_n - a_n.$

But we can rearrange the inequality ${G(b_n) \leq G(a_n)}$ as ${b_n - a_n \leq \frac{F(b_n)-F(a_n)}{\lambda}}$ . From telescoping series and the monotone nature of ${F}$ we have ${\sum_n F(b_n)-F(a_n) \leq F(b)-F(a)}$ (this is easiest to prove by first working with a finite subcollection of the intervals ${(a_n,b_n)}$ , and then taking suprema), and the claim follows.
The discontinuous case is left as an exercise. $\Box$

Exercise 58 Prove Lemma 56 in the discontinuous case. (Hint: the rising sun lemma is no longer available, but one can use either the Vitali-type covering lemma (which will give ${C=3}$ ) or the Besicovitch lemma (which will give ${C=2}$ ), by modifying the proof of Theorem 36.

Sending ${\lambda \rightarrow \infty}$ in the above lemma (cf. Exercise 18 from Notes 2), and then sending ${[a,b]}$ to ${{\bf R}}$ , we conclude as a corollary that all the four Dini derivatives of a continuous monotone non-decreasing function are finite almost everywhere. So to prove Theorem 53 for continuous monotone non-decreasing functions, it suffices to show that (11) holds for almost every ${x}$ . In view of the trivial inequalities, it suffices to show that ${\overline{D_+} F(x) \leq \underline{D_-} F(x)}$ and ${\overline{D_-} F(x) \leq \underline{D_+} F(x)}$ for almost every ${x}$ . We will just show the first inequality, as the second follows by replacing ${F}$ with its reflection ${x \mapsto -F(-x)}$ . It will suffice to show that for every pair ${0 < r < R}$ of real numbers, the set

$\displaystyle E = E_{r,R} := \{ x \in {\bf R}: \overline{D_+} F(x) > R > r > \underline{D_-} F(x) \}$

is a null set, since by letting ${R, r}$ range over rationals with ${R>r > 0}$ and taking countable unions, we would conclude that the set ${\{ x \in {\bf R}: \overline{D_+} F(x) > \underline{D_-} F(x) \}}$ is a null set (recall that the Dini derivatives are all non-negative when ${F}$ is non-decreasing), and the claim follows.
Clearly ${E}$ is a measurable set. To prove that it is null, we will establish the following estimate:

Lemma 59 ( ${E}$ has density less than one) For any interval ${[a,b]}$ and any ${0 < r < R}$ , one has ${m( E_{r,R} \cap [a,b] ) \leq \frac{r}{R} |b-a|}$ .

Indeed, this lemma implies that ${E}$ has no points of density, which by Exercise 48 forces ${E}$ to be a null set.
Proof: We begin by applying the rising sun lemma to the function ${G(x) := r x + F(-x)}$ on ${[-b,-a]}$ ; the large number of negative signs present here is needed in order to properly deal with the lower left Dini derivative ${\underline{D_-} F}$ . This gives an at most countable family of disjoint intervals ${-I_n = (-b_n,-a_n)}$ in ${(-b,-a)}$ , such that ${G(-a_n) \geq G(-b_n)}$ for all ${n}$ , and such that ${G(-x) \geq G(-y)}$ whenever ${-x \leq -y \leq -a}$ and ${-x \in (-b,-a)}$ lies outside of all of the ${-I_n}$ . Observe that if ${x \in (a,b)}$ , and ${G(-x) \geq G(-y)}$ for all ${-x \leq -y \leq -a}$ , then ${\underline{D_-} F(x) \geq r}$ . Thus we see that ${E_{r,R} \cap (a,b)}$ is contained inside the union of the intervals ${I_n = (a_n,b_n)}$ . On the other hand, from the first part of Lemma 56 we have

$\displaystyle m( E_{r,R} \cap (a_n,b_n) ) \leq \frac{F(b_n)-F(a_n)}{R}.$

But we can rearrange the inequality ${G(-a_n) \geq G(-b_n)}$ as ${F(b_n) - F(a_n) \leq r (b_n-a_n)}$ . From countable additivity, one thus has

$\displaystyle m( E_{r,R} \cap [a,b] ) \leq \frac{r}{R} \sum_n b_n - a_n.$

But the ${(a_n,b_n)}$ are disjoint inside ${(a,b)}$ , so from countable additivity again, we have ${\sum_n b_n - a_n \leq b-a}$ , and the claim follows. $\Box$

Remark 60 Note if ${F}$ was not assumed to be continuous, then one would lose a factor of ${C}$ here from the second part of Lemma 56, and one would then be unable to prevent ${\overline{D^+} F}$ from being up to ${C}$ times as large as ${\underline{D_-} F}$ . So sometimes, even when all one is seeking is a qualitative result such as differentiability, it is still important to keep track of constants. (But this is the exception rather than the rule: for a large portion of arguments in analysis, the constants are not terribly important.)

This concludes the proof of Theorem 53 in the continuous monotone non-decreasing case. Now we work on removing the continuity hypothesis (which was needed in order to make the rising sun lemma work properly). If we naively try to run the density argument as we did in previous sections, then (for once) the argument does not work very well, as the space of continuous monotone functions are not sufficiently dense in the space of all monotone functions in the relevant sense (which, in this case, is in the total variation sense, which is what is needed to invoke such tools as Lemma 56.). To bridge this gap, we have to supplement the continuous monotone functions with another class of monotone functions, known as the jump functions.

Definition 61 (Jump function) A basic jump function ${J}$ is a function of the form

$\displaystyle J(x) := \left\{ \begin{array}{ll} 0 & \hbox{ when } x < x_0 \\ \theta & \hbox{ when } x = x_0 \\ 1 & \hbox{ when } x > x_0 \end{array} \right.$

for some real numbers ${x_0 \in {\bf R}}$ and ${0 \leq \theta \leq 1}$ ; we call ${x_0}$ the point of discontinuity for ${J}$ and ${\theta}$ the fraction. Observe that such functions are monotone non-decreasing, but have a discontinuity at one point. A jump function is any absolutely convergent combination of basic jump functions, i.e. a function of the form ${F = \sum_n c_n J_n}$ , where ${n}$ ranges over an at most countable set, each ${J_n}$ is a basic jump function, and the ${c_n}$ are positivereals with ${\sum_n c_n < \infty}$ . If there are only finitely many ${n}$ involved, we say that ${F}$ is a piecewise constant jump function.

Thus, for instance, if ${q_1, q_2, q_3, \ldots}$ is any enumeration of the rationals, then ${\sum_{n=1}^\infty 2^{-n} 1_{[q_n,+\infty)}}$ is a jump function.
Clearly, all jump functions are monotone non-decreasing. From the absolute convergence of the ${c_n}$ we see that every jump function is the uniform limit of piecewise constant jump functions, for instance ${\sum_{n=1}^\infty c_n J_n}$ is the uniform limit of ${\sum_{n=1}^N c_n J_n}$ . One consequence of this is that the points of discontinuity of a jump function ${\sum_{n=1}^\infty c_n J_n}$ are precisely those of the individual summands ${c_n J_n}$ , i.e. of the points ${x_n}$ where each ${J_n}$ jumps.
The key fact is that these functions, together with the continuous monotone functions, essentially generate all monotone functions, at least in the bounded case:

Lemma 62 (Continuous-singular decomposition for monotone functions) Let ${F: {\bf R} \rightarrow {\bf R}}$ be a monotone non-decreasing function.

The only discontinuities of ${F}$ are jump discontinuities. More precisely, if ${x}$ is a point where ${F}$ is discontinuous, then the limits ${\lim_{y \rightarrow x^-} F(y)}$ and ${\lim_{y \rightarrow x^+} F(y)}$ both exist, but are unequal, with ${\lim_{y \rightarrow x^-} F(y) <\lim_{y \rightarrow x^+} F(y)}$ .

There are at most countably many discontinuities of ${F}$ .

If ${F}$ is bounded, then ${F}$ can be expressed as the sum of a continuous monotone non-decreasing function ${F_c}$ and a jump function ${F_{pp}}$ .

Remark 63 This decomposition is part of the more general Lebesgue decomposition, which we will discuss later in this course.

Proof: By monotonicity, the limits ${F_-(x) := \lim_{y \rightarrow x^-} F(y)}$ and ${F^+(x) := \lim_{y \rightarrow x^+} F(y)}$ always exist, with ${F_-(x) \leq F(x) \leq F_+(x)}$ for all ${x}$ . This gives 1.
By 1., whenever there is a discontinuity ${x}$ of ${F}$ , there is at least one rational number ${q_x}$ strictly between ${F_-(x)}$ and ${F_+(x)}$ , and from monotonicity, each rational number can be assigned to at most one discontinuity. This gives 2.
Now we prove 3. Let ${A}$ be the set of discontinuities of ${F}$ , thus ${A}$ is at most countable. For each ${x \in A}$ , we define the jump ${c_x := F_+(x) - F_-(x) > 0}$ , and the fraction ${\theta_x := \frac{F(x)-F_-(x)}{F_+(x)-F_-(x)} \in [0,1]}$ . Thus

$\displaystyle F_+(x) = F_-(x) + c_x \hbox{ and } F(x) = F_-(x) + \theta_x c_x.$

Note that ${c_x}$ is the measure of the interval ${(F_-(x),F_+(x))}$ . By monotonicity, these intervals are disjoint; by the boundedness of ${F}$ , their union is bounded. By countable additivity, we thus have ${\sum_{x \in A} c_x < \infty}$ , and so if we let ${J_x}$ be the basic jump function with point of discontinuity ${x}$ and fraction ${\theta_x}$ , then the function

$\displaystyle F_{pp} := \sum_{x \in A} c_x J_x$

is a jump function.
As discussed previously, ${F}$ is discontinuous only at ${A}$ , and for each ${x \in A}$ one easily checks that

$\displaystyle (F_{pp})_+(x) = (F_{pp})_-(x) + c_x \hbox{ and } F_{pp}(x) = (F_{pp})_-(x) + \theta_x c_x$

where ${(F_{pp})_-(x) := \lim_{y \rightarrow x^-} F_{pp}(y)}$ , and ${(F_{pp})_+(x) := \lim_{y\rightarrow x^+} F_{pp}(y)}$ . We thus see that the difference ${F_c := F-F_{pp}}$ is continuous. The only remaining task is to verify that ${F_c}$ is monotone non-decreasing. By continuity it suffices to verify this away from the (countably many) jump discontinuities, thus we need

$\displaystyle F_{pp}(b)-F_{pp}(a) \leq F(b)-F(a)$

for all ${a < b}$ that are not jump discontinuities. But the left-hand side can be rewritten as ${\sum_{x \in A \cap [a,b]} c_x}$ , while the right-hand side is ${F_-(b) + F_+(a)}$ . As each ${c_x}$ is the measure of the interval ${(F_-(x), F_+(x))}$ , and these intervals for ${x \in A \cap [a,b]}$ are disjoint and lie in ${(F_+(a),F_-(b))}$ , the claim follows from countable additivity. $\Box$

Exercise 64 Show that the decomposition of a bounded monotone non-decreasing function ${F}$ into continuous ${F_c}$ and jump components ${F_{pp}}$ given by the above lemma is unique.

Exercise 65 Find a suitable generalisation of the notion of a jump function that allows one to extend the above decomposition to unbounded monotone functions, and then prove this extension. (Hint: the notion to shoot for here is that of a “locally jump function”.)

Now we can finish the proof of Theorem 53. As noted previously, it suffices to prove the claim for monotone non-decreasing functions. As differentiability is a local condition, we can easily reduce to the case of bounded monotone non-decreasing functions, since to test differentiability of a monotone non-decreasing function ${F}$ in any compact interval ${[a,b]}$ we may replace ${F}$ by the bounded monotone non-decreasing function ${\max( \min(F, F(b)), F(a))}$ with no change in the differentiability in ${[a,b]}$ (except perhaps at the endpoints ${a,b}$ , but these form a set of measure zero). As we have already proven the claim for continuous functions, it suffices by Lemma 62 (and linearity of the derivative) to verify the claim for jump functions.
Now, finally, we are able to use the density argument, using the piecewise constant jump functions as the dense subclass, and using the second part of Lemma 56 for the quantitative estimate; fortunately for us, the density argument does not particularly care that there is a loss of a constant factor in this estimate.
For piecewise constant jump functions, the claim is clear (indeed, the derivative exists and is zero outside of finitely many discontinuities). Now we run the density argument. Let ${F}$ be a bounded jump function, and let ${\varepsilon > 0}$ and ${\lambda > 0}$ be arbitrary. As every jump function is the uniform limit of piecewise constant jump functions, we can find a piecewise constant jump function ${F_\varepsilon}$ such that ${|F(x)-F_\varepsilon(x)| \leq \varepsilon}$ for all ${x}$ . Indeed, by taking ${F_\varepsilon}$ to be a partial sum of the basic jump functions that make up ${F}$ , we can ensure that ${F-F_\varepsilon}$ is also a monotone non-decreasing function. Applying the second part of Lemma 56, we have

$\displaystyle m(\{ x \in {\bf R}: \overline{D^+} (F-F_\varepsilon)(x) \geq \lambda \}) \leq \frac{2C\varepsilon}{\lambda}$

for some absolute constant ${C}$ , and similarly for the other four Dini derivatives. Thus, outside of a set of measure at most ${8C\varepsilon/\lambda}$ , all of the Dini derivatives of ${F-F_\varepsilon}$ are less than ${\lambda}$ . Since ${F'_\varepsilon}$ is almost everywhere differentiable, we conclude that outside of a set of measure at most ${8C\varepsilon/\lambda}$ , all the Dini derivatives of ${F(x)}$ lie within ${\lambda}$ of ${F'_\varepsilon(x)}$ , and in particular are finite and lie within ${2\lambda}$ of each other. Sending ${\varepsilon}$ to zero (holding ${\lambda}$ fixed), we conclude that for almost every ${x}$ , the Dini derivatives of ${F}$ are finite and lie within ${2\lambda}$ of each other. If we then send ${\lambda}$ to zero, we see that for almost every ${x}$ , the Dini derivatives of ${F}$ agree with each other and are finite, and the claim follows. This concludes the proof of Theorem 53.
Just as the integration theory of unsigned functions can be used to develop the integration theory of the absolutely convergent functions (see Notes 2), the differentiation theory of monotone functions can be used to develop a parallel differentiation theory for the class of functions of bounded variation:

Definition 66 (Bounded variation) Let ${F: {\bf R} \rightarrow {\bf R}}$ be a function. The total variation ${\|F\|_{TV({\bf R})}}$ (or ${\|F\|_{TV}}$ for short) of ${F}$ is defined to be the supremum

$\displaystyle \|F\|_{TV({\bf R})} := \sup_{x_0 < \ldots < x_n} \sum_{i=1}^n |F(x_i) - F(x_{i-1})|$

where the supremum ranges over all finite increasing sequences ${x_0,\ldots,x_n}$ of real numbers with ${n \geq 0}$ ; this is a quantity in ${[0,+\infty]}$ . We say that ${F}$ has bounded variation (on ${{\bf R}}$ ) if ${\|F\|_{TV({\bf R})}}$ is finite. (In this case, ${\|F\|_{TV({\bf R})}}$ is often written as ${\|F\|_{BV({\bf R})}}$ or just ${\|F\|_{BV}}$ .)
Given any interval ${[a,b]}$ , we define the total variation ${\|F\|_{TV([a,b])}}$ of ${F}$ on ${[a,b]}$ as

$\displaystyle \|F\|_{TV([a,b])} := \sup_{a \leq x_0 < \ldots < x_n \leq b} \sum_{i=1}^n |F(x_i) - F(x_{i-1})|;$

thus the definition is the same, but the points ${x_0,\ldots,x_n}$ are restricted to lie in ${[a,b]}$ . Thus for instance ${\|F\|_{TV({\bf R})} = \sup_{N \rightarrow \infty} \|F\|_{TV([-N,N])}}$ . We say that a function ${F}$ has bounded variation on ${[a,b]}$ if ${\|F\|_{BV([a,b])}}$ is finite.

Exercise 67 If ${F: {\bf R} \rightarrow {\bf R}}$ is a monotone function, show that ${\|F\|_{TV([a,b])} = |F(b)-F(a)|}$ for any interval ${[a,b]}$ , and that ${F}$ has bounded variation on ${{\bf R}}$ if and only if it is bounded.

Exercise 68 For any functions ${F, G: {\bf R} \rightarrow {\bf R}}$ , establish the triangle property ${\|F+G\|_{TV({\bf R})} \leq \|F\|_{TV({\bf R})} + \|G\|_{TV({\bf R})}}$ and the homogeneity property ${\|cF\|_{TV({\bf R})} = |c| \|F\|_{TV({\bf R})}}$ for any ${c \in {\bf R}}$ . Also show that ${\|F\|_{TV}=0}$ if and only if ${F}$ is constant.

Exercise 69 If ${F: {\bf R} \rightarrow {\bf R}}$ is a function, show that ${\|F\|_{TV([a,b])} + \|F\|_{TV([b,c])} = \|F\|_{TV([a,c])}}$ whenever ${a \leq b \leq c}$ .

Exercise 70

Show that every function ${f: {\bf R} \rightarrow {\bf R}}$ of bounded variation is bounded, and that the limits ${\lim_{x \rightarrow +\infty} f(x)}$ and ${\lim_{x \rightarrow -\infty} f(x)}$ , are well-defined.

Give a counterexample of a bounded, continuous, compactly supported function ${f}$ that is not of bounded variation.

Exercise 71 Let ${f: {\bf R} \rightarrow {\bf R}}$ be an absolutely integrable function, and let ${F: {\bf R} \rightarrow {\bf R}}$ be the indefinite integral ${F(x) := \int_{[-\infty,x]} f(x)}$ . Show that ${F}$ is of bounded variation, and that ${\|F\|_{TV({\bf R})} = \|f\|_{L^1({\bf R})}}$ . (Hint: the upper bound ${\|F\|_{TV({\bf R})} \leq \|f\|_{L^1({\bf R})}}$ is relatively easy to establish. To obtain the lower bound, use the density argument.)

Much as an absolutely integrable function can be expressed as the difference of its positive and negative parts, a bounded variation function can be expressed as the difference of two bounded monotone functions:

Proposition 72 A function ${F: {\bf R} \rightarrow {\bf R}}$ is of bounded variation if and only if it is the difference of two bounded monotone functions.

Proof: It is clear from Exercises 67, 68 that the difference of two bounded monotone functions is bounded. Now define the positive variation ${F^+: {\bf R} \rightarrow {\bf R}}$ of ${F}$ by the formula

$\displaystyle F^+(x) := \sup_{x_0 < \ldots < x_n \leq x} \sum_{i=1}^n \max(F(x_{i}) - F(x_{i-1}),0). \ \ \ \ \ (12)$

It is clear from construction that this is a monotone increasing function, taking values between ${0}$ and ${\|F\|_{TV({\bf R})}}$ , and is thus bounded. To conclude the proposition, it suffices to (by writing ${F = F_+ - (F_+-F_-)}$ to show that ${F_+-F}$ is non-decreasing, or in other words to show that

$\displaystyle F^+(b) \geq F^+(a) + F(b)-F(a).$

If ${F(b)-F(a)}$ is negative then this is clear from the monotone non-decreasing nature of ${F^+}$ , so assume that ${F(b)-F(a) \geq 0}$ . But then the claim follows because any sequence of real numbers ${x_0 < \ldots < x_n \leq a}$ can be extended by one or two elements by adding ${a}$ and ${b}$ , thus increasing the sum ${\sup_{x_0 < \ldots < x_n} \sum_{i=1}^n \max(F(x_i) - F(x_{i-1}),0)}$ by at least ${F(b)-F(a)}$ . $\Box$

Exercise 73 Let ${F: {\bf R} \rightarrow {\bf R}}$ be of bounded variation. Define the positive variation ${F^+}$ by (12), and the negative variation ${F^-}$ by

$\displaystyle F^-(x) := \sup_{x_0 < \ldots < x_n \leq x} \sum_{i=1}^n \max(-F(x_{i}) + F(x_{i-i}),0).$

Establish the identities

$\displaystyle F(x) = F(-\infty) + F^+(x) - F^-(x),$

$\displaystyle \|F\|_{TV[a,b]} = F^+(b)-F^+(a) + F^-(b)-F^-(a),$

and

$\displaystyle \|F\|_{TV} = F^+(+\infty) + F^-(+\infty)$

for every interval ${[a,b]}$ , where ${F(-\infty) := \lim_{x \rightarrow -\infty} F(x)}$ , ${F^+(+\infty) := \lim_{x \rightarrow +\infty} F^+(x)}$ , and ${F^-(+\infty) := \lim_{x \rightarrow +\infty} F^-(x)}$ . (Hint: The main difficulty comes from the fact that a partition ${x_0 < \ldots < x_n \leq x}$ that is good for ${F^+}$ need not be good for ${F^-}$ , and vice versa. However, this can be fixed by taking a good partition for ${F^+}$ and a good partition for ${F^-}$ and combining them together into a common refinement.)

From Proposition 72 and Theorem 53 we immediately obtain

Corollary 74 (BV differentiation theorem) Every bounded variation function is differentiable almost everywhere.

Exercise 75 Call a function locally of bounded variation if it is of bounded variation on every compact interval ${[a,b]}$ . Show that every function that is locally of bounded variation is differentiable almost everywhere.

Exercise 76 (Lipschitz differentiation theorem, one-dimensional case) A function ${f: {\bf R} \rightarrow {\bf R}}$ is said to be Lipschitz continuous if there exists a constant ${C >0}$ such that ${|f(x)-f(y)| \leq C|x-y|}$ for all ${x,y \in {\bf R}}$ ; the smallest ${C}$ with this property is known as the Lipschitz constant of ${f}$ . Show that every Lipschitz continuous function ${F}$ is locally of bounded variation, and hence differentiable almost everywhere. Furthermore, show that the derivative ${F'}$ , when it exists, is bounded in magnitude by the Lipschitz constant of ${F}$ .

Remark 77 The same result is true in higher dimensions, and is known as the Radamacher differentiation theorem, but we will defer the proof of this theorem to subsequent notes, when we have the powerful tool of the Fubini-Tonelli theorem available, that is particularly useful for deducing higher-dimensional results in analysis from lower-dimensional ones.

Exercise 78 A function ${f: {\bf R} \rightarrow {\bf R}}$ is said to be convex if one has ${f((1-t) x + ty) \leq (1-t) f(x) + t f(y)}$ for all ${x < y}$ and ${0 < t < 1}$ . Show that if ${f}$ is convex, then it is continuous and almost everywhere differentiable, and its derivative ${f'}$ is equal almost everywhere to a monotone non-decreasing function, and so is itself almost everywhere differentiable. (Hint: Drawing the graph of ${f}$ , together with a number of chords and tangent lines, is likely to be very helpful in providing visual intuition.) Thus we see that in some sense, convex functions are “almost everywhere twice differentiable”. Similar claims also hold for concave functions, of course.

— 4. The second fundamental theorem of calculus —

We are now finally ready to attack the second fundamental theorem of calculus in the cases where ${F}$ is not assumed to be continuously differentiable. We begin with the case when ${F: [a,b] \rightarrow {\bf R}}$ is monotone non-decreasing. From Theorem 53 (extending ${F}$ to the rest of the real line if needed), this implies that ${F}$ is differentiable almost everywhere in ${[a,b]}$ , so ${F'}$ is defined a.e.; from monotonicity we see that ${F'}$ is non-negative whenever it is defined. Also, an easy modification of Exercise 2 shows that ${F'}$ is measurable.
One half of the second fundamental theorem is easy:

Proposition 79 (Upper bound for second fundamental theorem) Let ${F: [a,b] \rightarrow {\bf R}}$ be monotone non-decreasing (so that, as discussed above, ${F'}$ is defined almost everywhere, is unsigned, and is measurable). Then

$\displaystyle \int_{[a,b]} F'(x)\ dx \leq F(b)-F(a).$

In particular, ${F'}$ is absolutely integrable.

Proof: It is convenient to extend ${F}$ to all of ${{\bf R}}$ by declaring ${F(x) := F(b)}$ for ${x>b}$ and ${F(x) := F(a)}$ for ${x<a}$ , then ${F}$ is now a bounded monotone function on ${{\bf R}}$ , and ${F'}$ vanishes outside of ${[a,b]}$ . As ${F}$ is almost everywhere differentiable, the Newton quotients

$\displaystyle f_n(x) := \frac{F(x+1/n) - F(x)}{1/n}$

converge pointwise almost everywhere to ${F'}$ . Applying Fatou’s lemma (Corollary 16 of Notes 3), we conclude that

$\displaystyle \int_{[a,b]} F'(x)\ dx \leq \liminf_{n \rightarrow \infty} \int_{[a,b]} \frac{F(x+1/n) - F(x)}{1/n}\ dx.$

The right-hand side can be rearranged as

$\displaystyle \liminf_{n \rightarrow \infty} n (\int_{[a+1/n,b+1/n]} F(y)\ dy - \int_{[a,b]} F(x)\ dx)$

which can be rearranged further as

$\displaystyle \liminf_{n \rightarrow \infty} n (\int_{[b,b+1/n]} F(x)\ dx - \int_{[a,a+1/n]} F(x)\ dx).$

Since ${F}$ is equal to ${F(b)}$ for the first integral and is at least ${F(a)}$ for the second integral, this expression is at most

$\displaystyle \leq \liminf_{n \rightarrow \infty} n (F(b)/n - F(a)/n) = F(b)-F(a)$

and the claim follows. $\Box$

Exercise 80 Show that any function of bounded variation has an (almost everywhere defined) derivative that is absolutely integrable.

In the Lipschitz case, one can do better:

Exercise 81 (Second fundamental theorem for Lipschitz functions) Let ${F: [a,b] \rightarrow {\bf R}}$ be Lipschitz continuous. Show that ${\int_{[a,b]} F'(x)\ dx = F(b)-F(a)}$ . (Hint: Argue as in the proof of Proposition 79, but use the dominated convergence theorem in place of Fatou’s lemma.)

Exercise 82 (Integration by parts formula) Let ${F, G: [a,b] \rightarrow {\bf R}}$ be Lipschitz continuous functions. Show that

$\displaystyle \int_{[a,b]} F'(x) G(x)\ dx = F(b) G(b)-F(a) G(a)$

$\displaystyle - \int_{[a,b]} F(x) G'(x)\ dx.$

(Hint: first show that the product of two Lipschitz continuous functions on ${[a,b]}$ is again Lipschitz continuous.)

Now we return to the monotone case. Inspired by the Lipschitz case, one may hope to recover equality in Proposition 79 for such functions ${F}$ . However, there is an important obstruction to this, which is that all the variation of ${F}$ may be concentrated in a set of measure zero, and thus undetectable by the Lebesgue integral of ${F'}$ . This is most obvious in the case of a discontinuous monotone function, such as the (appropriately named) Heaviside function ${F := 1_{[0,+\infty)}}$ ; it is clear that ${F'}$ vanishes almost everywhere, but ${F(b)-F(a)}$ is not equal to ${\int_{[a,b]} F'(x)\ dx}$ if ${b}$ and ${a}$ lie on opposite sides of the discontinuity at ${0}$ . In fact, the same problem arises for all jump functions:

Exercise 83 Show that if ${F}$ is a jump function, then ${F'}$ vanishes almost everywhere. (Hint: use the density argument, starting from piecewise constant jump functions and using Proposition 79 as the quantitative estimate.)

One may hope that jump functions – in which all the fluctuation is concentrated in a countable set – are the only obstruction to the second fundamental theorem of calculus holding for monotone functions, and that as long as one restricts attention to continuous monotone functions, that one can recover the second fundamental theorem. However, this is still not true, because it is possible for all the fluctuation to now be concentrated, not in a countable collection of jump discontinuities, but instead in an uncountable set of zero measure, such as the middle thirds Cantor set (Exercise 10 from Notes 1). This can be illustrated by the key counterexample of the Cantor function, also known as the Devil’s staircase function. The construction of this function is detailed in the exercise below.

Exercise 84 (Cantor function) Define the functions ${F_0, F_1, F_2, \ldots: [0,1] \rightarrow {\bf R}}$ recursively as follows:

Set ${F_0(x) := x}$ for all ${x \in [0,1]}$ .

For each ${n=1,2,\ldots}$ in turn, define
$\displaystyle F_n(x) := \left\{ \begin{array}{ll} \frac{1}{2} F_{n-1}(3x) & \hbox{ if } x \in [0,1/3]; \\ \frac{1}{2} & \hbox{ if } x \in (1/3,2/3); \\ \frac{1}{2} + \frac{1}{2} F_{n-1}(3x-2) & \hbox{ if } x \in [2/3,1] \end{array} \right.$

Graph ${F_0}$ , ${F_1}$ , ${F_2}$ , and ${F_3}$ (preferably on a single graph).

Show that for each ${n=0,1,\ldots}$ , ${F_n}$ is a continuous monotone non-decreasing function with ${F_n(0)=0}$ and ${F_n(1)=1}$ . (Hint: induct on ${n}$ .)

Show that for each ${n=0,1,\ldots}$ , one has ${|F_{n+1}(x) - F_n(x)| \leq 2^{-n}}$ for each ${x \in [0,1]}$ . Conclude that the ${F_n}$ converge uniformly to a limit ${F: [0,1] \rightarrow {\bf R}}$ . This limit is known as the Cantor function.

Show that the Cantor function ${F}$ is continuous and monotone non-decreasing, with ${F(0)=0}$ and ${F(1)=1}$ .

Show that if ${x \in [0,1]}$ lies outside the middle thirds Cantor set (Exercise 10 from Notes 1), then ${F}$ is constant in a neighbourhood of ${x}$ , and in particular ${F'(x)=0}$ . Conclude that ${\int_{[0,1]} F'(x)\ dx = 0 \neq 1 = F(1)-F(0)}$ , so that the second fundamental theorem of calculus fails for this function.

Show that ${F( \sum_{n=1}^\infty a_n 3^{-n} ) = \sum_{n=1}^\infty \frac{a_n}{2} 2^{-n}}$ for any digits ${a_1,a_2,\ldots \in \{0,2\}}$ . Thus the Cantor function, in some sense, converts base three expansions to base two expansions.

Let ${I = [ \sum_{i=1}^n \frac{a_i}{3^i}, \sum_{i=1}^n \frac{a_i}{3^i} + \frac{1}{3^n}]}$ be one of the intervals used in the ${n^{th}}$ cover ${I_n}$ of ${C}$ (see Exercise 10 from Notes 1), thus ${n \geq 0}$ and ${a_1,\ldots,a_n \in \{0,2\}}$ . Show that ${I}$ is an interval of length ${3^{-n}}$ , but ${F(I)}$ is an interval of length ${2^{-n}}$ .

Show that ${F}$ is not differentiable at any element of the Cantor set ${C}$ .

Remark 85 This example shows that the classical derivative ${F'(x) := \lim_{h \rightarrow 0; h \neq 0} \frac{F(x+h)-F(x)}{h}}$ of a function has some defects; it cannot “see” some of the variation of a continuous monotone function such as the Cantor function. Much later in this series, we will rectify this by introducing the concept of the weak derivative of a function, which despite the name, is more able than the strong derivative to detect this type of singular variation behaviour. (We will also encounter the Riemann-Stieltjes integral in later notes, which is another (closely related) way to capture all of the variation of a monotone function, and which is related to the classical derivative via the Lebesgue-Radon-Nikodym theorem.)

In view of this counterexample, we see that we need to add an additional hypothesis to the continuous monotone non-increasing function ${F}$ before we can recover the second fundamental theorem. One such hypothesis is absolute continuity. To motivate this definition, let us recall two existing definitions:

A function ${F: {\bf R} \rightarrow {\bf R}}$ is continuous if, for every ${\varepsilon > 0}$ and ${x_0 \in {\bf R}}$ , there exists a ${\delta > 0}$ such that ${|F(b)-F(a)| \leq \varepsilon}$ whenever ${(a,b)}$ is an interval of length at most ${\delta}$ that contains ${x_0}$ .
A function ${F: {\bf R} \rightarrow {\bf R}}$ is uniformly continuous if, for every ${\varepsilon > 0}$ , there exists a ${\delta > 0}$ such that ${|F(b)-F(a)| \leq \varepsilon}$ whenever ${(a,b)}$ is an interval of length at most ${\delta}$ .

Definition 86 A function ${F: {\bf R} \rightarrow {\bf R}}$ is said to be absolutely continuous if, for every ${\varepsilon > 0}$ , there exists a ${\delta > 0}$ such that ${\sum_{j=1}^n |F(b_j)-F(a_j)| \leq \varepsilon}$ whenever ${(a_1,b_1),\ldots,(a_n,b_n)}$ is a finite collection of disjoint intervals of total length ${\sum_{j=1}^n b_j - a_j}$ at most ${\delta}$ .
We define absolute continuity for a function ${F: [a,b] \rightarrow {\bf R}}$ defined on an interval ${[a,b]}$ similarly, with the only difference being that the intervals ${[a_j,b_j]}$ are of course now required to lie in the domain ${[a,b]}$ of ${F}$ .

The following exercise places absolute continuity in relation to other regularity properties:

Exercise 87

Show that every absolutely continuous function is uniformly continuous and therefore continuous.

Show that every absolutely continuous function is of bounded variation on every compact interval ${[a,b]}$ . (Hint: first show this is true for any sufficiently small interval.) In particular (by Exercise 75), absolutely continuous functions are differentiable almost everywhere.

Show that every Lipschitz continuous function is absolutely continuous.

Show that the function ${x \mapsto \sqrt{x}}$ is absolutely continuous, but not Lipschitz continuous, on the interval ${[0,1]}$ .

Show that the Cantor function from Exercise 84 is continuous, monotone, and uniformly continuous, but not absolutely continuous, on ${[0,1]}$ .

If ${f: {\bf R} \rightarrow {\bf R}}$ is absolutely integrable, show that the indefinite integral ${F(x) := \int_{[-\infty,x]} f(y)\ dy}$ is absolutely continuous, and that ${F}$ is differentiable almost everywhere with ${F'(x)=f(x)}$ for almost every ${x}$ .

Show that the sum or product of two absolutely continuous functions on an interval ${[a,b]}$ remains absolutely continuous. What happens if we work on ${{\bf R}}$ instead of on ${[a,b]}$ ?

Exercise 88

Show that absolutely continuous functions map null sets to null sets, i.e. if ${F: {\bf R} \rightarrow{\bf R}}$ is absolutely continuous and ${E}$ is a null set then ${F(E) := \{ F(x): x \in E \}}$ is also a null set.

Show that the Cantor function does not have this property.

For absolutely continuous functions, we can recover the second fundamental theorem of calculus:

Theorem 89 (Second fundamental theorem for absolutely continuous functions) Let ${F: [a,b] \rightarrow {\bf R}}$ be absolutely continuous. Then ${\int_{[a,b]} F'(x)\ dx = F(b)-F(a)}$ .

Proof: Our main tool here will be Cousin’s theorem (Exercise 47).
By Exercise 80, ${F'}$ is absolutely integrable. By Exercise 8 of Notes 4, ${F'}$ is thus uniformly integrable. Now let ${\varepsilon > 0}$ . By Exercise 11 of Notes 4, we can find ${\kappa > 0}$ such that ${\int_U |F'(x)|\ dx \leq \varepsilon}$ whenever ${U \subset [a,b]}$ is a measurable set of measure at most ${\kappa}$ . (Here we adopt the convention that ${F'}$ vanishes outside of ${[a,b]}$ .) By making ${\kappa}$ small enough, we may also assume from absolute continuity that ${\sum_{j=1}^n |F(b_j)-F(a_j)| \leq \varepsilon}$ whenever ${(a_1,b_1),\ldots,(a_n,b_n)}$ is a finite collection of disjoint intervals of total length ${\sum_{j=1}^n b_j - a_j}$ at most ${\kappa}$ .
Let ${E \subset [a,b]}$ be the set of points ${x}$ where ${F}$ is not differentiable, together with the endpoints ${a,b}$ , as well as the points where ${x}$ is not a Lebesgue point of ${F'}$ . thus ${E}$ is a null set. By outer regularity (or the definition of outer measure) we can find an open set ${U}$ containing ${E}$ of measure ${m(U) < \kappa}$ . In particular, ${\int_U |F'(x)|\ dx \leq \varepsilon}$ .
Now define a gauge function ${\delta: [a,b] \rightarrow (0,+\infty)}$ as follows.

If ${x \in E}$ , we define ${\delta(x)>0}$ to be small enough that the open interval ${(x-\delta(x), x+\delta(x))}$ lies in ${U}$ .
If ${x \not \in E}$ , then ${F}$ is differentiable at ${x}$ and ${x}$ is a Lebesgue point of ${F'}$ . We let ${\delta(x)>0}$ be small enough that ${|F(y)-F(x)-(y-x)F'(x)| \leq \varepsilon |y-x|}$ holds whenever ${|y-x| \leq \delta(x)}$ , and such that ${|\frac{1}{|I|} \int_I F'(y)\ dy - F'(x)| \leq \varepsilon}$ whenever ${I}$ is an interval containing ${x}$ of length at most ${\delta(x)}$ ; such a ${\delta(x)}$ exists by the definition of differentiability, and of Lebesgue point. We rewrite these properties using big-O notation as ${F(y) - F(x) = (y-x) F'(x) + O(\varepsilon |y-x|)}$ and ${\int_I F'(y)\ dy = |I| F'(x) + O(\varepsilon |I|)}$ .

Applying Cousin’s theorem, we can find a partition ${a = t_0 < t_1 < \ldots < t_k = b}$ with ${k \geq 1}$ , together with real numbers ${t^*_j \in [t_{j-1},t_j]}$ for each ${1 \leq j \leq k}$ and ${t_j - t_{j-1} \leq \delta(t^*_j)}$ .
We can express ${F(b)-F(a)}$ as a telescoping series

$\displaystyle F(b)-F(a) = \sum_{j=1}^k F(t_j) - F(t_{j-1}).$

To estimate the size of this sum, let us first consider those ${j}$ for which ${t^*_j \in E}$ . Then, by construction, the intervals ${(t_{j-1},t_j)}$ are disjoint in ${U}$ . By construction of ${\kappa}$ , we thus have

$\displaystyle \sum_{j: t^*_j \in E} |F(t_j) - F(t_{j-1})| \leq \varepsilon$

and thus

$\displaystyle \sum_{j: t^*_j \in E} F(t_j) - F(t_{j-1}) = O(\varepsilon).$

Next, we consider those ${j}$ for which ${t^*_j \not \in E}$ . By construction, for those ${j}$ we have

$\displaystyle F(t_j) - F(t_{j}^*) = (t_j - t_j^*) F'(t^*_j) + O(\varepsilon |t_j - t^*_j| )$

and

$\displaystyle F(t_j^*) - F(t_{j-1}) = (t_j^* - t_{j-1}) F'(t^*_j) + O(\varepsilon |t_j^* - t_{j-1}| )$

and thus

$\displaystyle F(t_j) - F(t_{j-1}) = (t_j - t_{j-1}) F'(t^*_j) + O(\varepsilon |t_j - t_{j-1}| ).$

On the other hand, from construction again we have

$\displaystyle \int_{[t_{j-1},t_j]} F'(y)\ dy = (t_j -t_{j-1}) F'(t^*_j) + O(\varepsilon |t_j - t_{j-1}| )$

and thus

$\displaystyle F(t_j) - F(t_{j-1}) = \int_{[t_{j-1},t_j]} F'(y)\ dy + O(\varepsilon |t_j - t_{j-1}| ).$

Summing in ${j}$ , we conclude that

$\displaystyle \sum_{j: t^*_j \not \in E} F(t_j) - F(t_{j-1}) = \int_{S} F'(y)\ dy + O(\varepsilon (b-a) ),$

where ${S}$ is the union of all the ${[t_{j-1},t_j]}$ with ${t^*_j \not \in E}$ . By construction, this set is contained in ${[a,b]}$ and contains ${[a,b] \backslash U}$ . Since ${\int_U |F'(x)|\ dx \leq \varepsilon}$ , we conclude that

$\displaystyle \int_{S} F'(y)\ dy = \int_{[a,b]} F'(y)\ dy + O(\varepsilon).$

Putting everything together, we conclude that

$\displaystyle F(b)-F(a) = \int_{[a,b]} F'(y)\ dy + O(\varepsilon) + O( \varepsilon |b-a| ).$

Since ${\varepsilon > 0}$ was arbitrary, the claim follows. $\Box$
Combining this result with Exercise 87, we obtain a satisfactory classification of the absolutely continuous functions:

Exercise 90 Show that a function ${F: [a,b] \rightarrow {\bf R}}$ is absolutely continuous if and only if it takes the form ${F(x) = \int_{[a,x]} f(y)\ dy + C}$ for some absolutely integrable ${f: [a,b] \rightarrow {\bf R}}$ and a constant ${C}$ .

Exercise 91 (Compatibility of the strong and weak derivatives in the absolutely continuous case) Let ${F: [a,b] \rightarrow {\bf R}}$ be an absolutely continuous function, and let ${\phi: [a,b] \rightarrow {\bf R}}$ be a continuously differentiable function supported in a compact subset of ${(a,b)}$ . Show that ${\int_{[a,b]} F' \phi(x) \ dx = - \int_{[a,b]} F \phi'(x)\ dx}$ .

Inspecting the proof of Theorem 89, we see that the absolute continuity was used primarily in two ways: firstly, to ensure the almost everywhere existence, and to control an exceptional null set ${E}$ . It turns out that one can achieve the latter control by making a different hypothesis, namely that the function ${F}$ is everywhere differentiable rather than merely almost everywhere differentiable. More precisely, we have

Proposition 92 (Second fundamental theorem of calculus, again) Let ${[a,b]}$ be a compact interval of positive length, let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function, such that ${F'}$ is absolutely integrable. Then the Lebesgue integral ${\int_{[a,b]} F'(x)\ dx}$ of ${F'}$ is equal to ${F(b) - F(a)}$ .

Proof: This will be similar to the proof of Theorem 89, the one main new twist being that we need several open sets ${U}$ instead of just one. Let ${E \subset [a,b]}$ be the set of points ${x}$ which are not Lebesgue points of ${F'}$ , together with the endpoints ${a,b}$ . This is a null set. Let ${\varepsilon > 0}$ , and then let ${\kappa > 0}$ be small enough that ${\int_U |F'(x)|\ dx \leq \varepsilon}$ whenever ${U}$ is measurable with ${m(U) \leq \kappa}$ . We can also ensure that ${\kappa \leq \varepsilon}$ .
For every natural number ${m=1,2,\ldots}$ we can find an open set ${U_m}$ containing ${E}$ of measure ${m(U_m) \leq \kappa/4^m}$ . In particular we see that ${m( \bigcup_{m=1}^{\infty} U_m ) \leq \kappa}$ and thus ${\int_{\bigcup_{m=1}^\infty U_m} |F'(x)|\ dx \leq \varepsilon}$ .
Now define a gauge function ${\delta: [a,b] \rightarrow (0,+\infty)}$ as follows.

If ${x \in E}$ , we define ${\delta(x)>0}$ to be small enough that the open interval ${(x-\delta(x), x+\delta(x))}$ lies in ${U_m}$ , where ${m}$ is the first natural number such that ${|F'(x)| \leq 2^m}$ , and also small enough that ${|F(y)-F(x)-(y-x)F'(x)| \leq \varepsilon |y-x|}$ holds whenever ${|y-x| \leq \delta(x)}$ . (Here we crucially use the everywhere differentiability to ensure that ${f'(x)}$ exists and is finite here.)
If ${x \not \in E}$ , we let ${\delta(x)>0}$ be small enough that ${|F(y)-F(x)-(y-x)F'(x)| \leq \varepsilon |y-x|}$ holds whenever ${|y-x| \leq \delta(x)}$ , and such that ${|\frac{1}{|I|} \int_I F'(y)\ dy - F'(x)| \leq \varepsilon}$ whenever ${I}$ is an interval containing ${x}$ of length at most ${\delta(x)}$ , exactly as in the proof of Theorem 89.

Applying Cousin’s theorem, we can find a partition ${a = t_0 < t_1 < \ldots < t_k = b}$ with ${k \geq 1}$ , together with real numbers ${t^*_j \in [t_{j-1},t_j]}$ for each ${1 \leq j \leq k}$ and ${t_j - t_{j-1} \leq \delta(t^*_j)}$ .
As before, we express ${F(b)-F(a)}$ as a telescoping series

$\displaystyle F(b)-F(a) = \sum_{j=1}^k F(t_j) - F(t_{j-1}).$

For the contributions of those ${j}$ with ${t^*_j \not \in E}$ , we argue exactly as in the proof of Theorem 89 to conclude eventually that

$\displaystyle \sum_{j: t^*_j \not \in E} F(t_j) - F(t_{j-1}) = \int_{S} F'(y)\ dy + O(\varepsilon (b-a) ),$

where ${S}$ is the union of all the ${[t_{j-1},t_j]}$ with ${t^*_j \not \in E}$ . Since

$\displaystyle \int_{[a,b] \backslash S} |F'(x)|\ dx \leq \int_{\bigcup_{m=1}^\infty U_m} |F'(x)|\ dx \leq \varepsilon$

we thus have

$\displaystyle \int_{S} F'(y)\ dy = \int_{[a,b]} F'(y)\ dy + O(\varepsilon).$

Now we turn to those ${j}$ with ${t^*_j \in E}$ . By construction, we have

$\displaystyle F(t_j) - F(t_{j-1}) = (t_j - t_{j-1}) F'(t^*_j) + O(\varepsilon |t_j - t_{j-1}| )$

fir these intervals, and so

$\displaystyle \sum_{j: t^*_j \in E} F(t_j) - F(t_{j-1}) = (\sum_{j: t^*_j \in E} (t_j - t_{j-1}) F'(t^*_j)) + O(\varepsilon (b-a) ).$

Next, for each ${j}$ we have ${F'(t^*_j) \leq 2^m}$ and ${[t_{j-1},t_j] \subset U_m}$ for some natural number ${m=1,2,\ldots}$ , by construction. By countable additivity, we conclude that

$\displaystyle (\sum_{j: t^*_j \in E} (t_j - t_{j-1}) F'(t^*_j)) \leq \sum_{m=1}^\infty 2^m m(U_m) \leq \sum_{m=1}^\infty 2^m \varepsilon/4^m = O(\varepsilon).$

Putting all this together, we again have

$\displaystyle F(b)-F(a) = \int_{[a,b]} F'(y)\ dy + O(\varepsilon) + O( \varepsilon |b-a| ).$

Since ${\varepsilon > 0}$ was arbitrary, the claim follows. $\Box$

Remark 93 The above proposition is yet another illustration of how the property of everywhere differentiability is significantly better than that of almost everywhere differentiability. In practice, though, the above proposition is not as useful as one might initially think, because there are very few methods that establish the everywhere differentiability of a function that do not also establish continuous differentiability (or at least Riemann integrability of the derivative), at which point one could just use Theorem 11 instead.

Exercise 94 Let ${F: [-1,1] \rightarrow {\bf R}}$ be the function defined by setting ${F(x) := x^2 \sin(\frac{1}{x^3})}$ when ${x}$ is non-zero, and ${F(0) := 0}$ . Show that ${F}$ is everywhere differentiable, but the deriative ${F'}$ is not absolutely integrable, and so the second fundamental theorem of calculus does not apply in this case (at least if we interpret ${\int_{[a,b]} F'(x)\ dx}$ using the absolutely convergent Lebesgue integral). See however the next exercise.

Exercise 95 (Henstock-Kurzweil integral) Let ${[a,b]}$ be a compact interval of positive length. We say that a function ${f: [a,b] \rightarrow {\bf R}}$ is Henstock-Kurzweil integrable with integral ${L \in {\bf R}}$ if for every ${\varepsilon > 0}$ there exists a gauge function ${\delta: [a,b] \rightarrow (0,+\infty)}$ such that one has

$\displaystyle | \sum_{j=1}^k f(t^*_j) (t_j - t_{j-1}) - L | \leq \varepsilon$

whenever ${k \geq 1}$ and ${a = t_0 < t_1 < \ldots < t_k = b}$ and ${t^*_1,\ldots,t^*_k}$ are such that ${t^*_j \in [t_{j-1},t_j]}$ and ${|t_j-t_{j-1}| \leq \delta(t^*_j)}$ for every ${1 \leq j \leq k}$ . When this occurs, we call ${L}$ the Henstock-Kurzweil integral of ${f}$ and write it as ${\int_{[a,b]} f(x)\ dx}$ .

Show that if a function is Henstock-Kurzweil integrable, it has a unique Henstock-Kurzweil integral. (Hint: use Cousin’s theorem.)

Show that if a function is Riemann integrable, then it is Henstock-Kurzweil integrable, and the Henstock-Kurzweil integral ${\int_{[a,b]} f(x)\ dx}$ is equal to the Riemann integral ${\int_a^b f(x)\ dx}$ .

Show that if a function ${f: [a,b] \rightarrow {\bf R}}$ is everywhere defined, everywhere finite, and is absolutely integrable, then it is Henstock-Kurzweil integrable, and the Henstock-Kurzweil integral ${\int_{[a,b]} f(x)\ dx}$ is equal to the Lebesgue integral ${\int_{[a,b]} f(x)\ dx}$ . (Hint: this is a variant of the proof of Theorem 89 or Proposition 92.)

Show that if ${F: [a,b] \rightarrow {\bf R}}$ is everywhere differentiable, then ${F'}$ is Henstock-Kurzweil integrable, and the Henstock-Kurzweil integral ${\int_{[a,b]} F'(x)\ dx}$ is equal to ${F(b)-F(a)}$ . (Hint: this is a variant of the proof of Theorem 89 or Proposition 92.)

Explain why the above results give an alternate proof of Exercise 10 and of Proposition 92.

Remark 96 As the above exercise indicates, the Henstock-Kurzweil integral (also known as the Denjoy integral or Perron integral) extends the Riemann integral and the absolutely convergent Lebesgue integral, at least as long as one restricts attention to functions that are defined and are finite everywhere (in contrast to the Lebesgue integral, which is willing to tolerate functions being infinite or undefined so long as this only occurs on a null set). It is the notion of integration that is most naturally associated with the fundamental theorem of calculus for everywhere differentiable functions, as seen in part 4 of the above exercise; it can also be used as a unified framework for all the proofs in this section that invoked Cousin’s theorem. The Henstock-Kurzweil integral can also integrate some (highly oscillatory) functions that the Lebesgue integral cannot, such as the derivative ${F'}$ of the function ${F}$ appearing in Exercise 94. This is analogous to how conditional summation ${\lim_{N \rightarrow \infty} \sum_{n=1}^N a_n}$ can sum conditionally convergent series ${\sum_{n=1}^\infty a_n}$ , even if they are not absolutely integrable. However, much as conditional summation is not always well-behaved with respect to rearrangement, the Henstock-Kurzweil integral does not always react well to changes of variable; also, due to its reliance on the order structure of the real line ${{\bf R}}$ , it is difficult to extend the Henstock-Kurzweil integral to more general spaces, such as the Euclidean space ${{\bf R}^d}$ , or to abstract measure spaces.

197 comments

Comments feed for this article

4 March, 2016 at 11:56 am

Anonymous

In Theorem 24, do we have that the converse is actually also true, namely, if the identity
$\displaystyle \int_{[a,b]}F'(x)\ dx=F(b)-F(a)$ makes sense, then $F$ must be absolutely continuous on $[a,b]$ ?

If $F:[a,b]\to\mathbb{R}$ is differentiable, can we conclude that $F$ is absolutely continuous on $[a,b]$ so that Proposition 25 is implied by Theorem 24?

4 March, 2016 at 3:41 pm

Terence Tao

Theorem 6 serves as a converse to Theorem 24. Note that just having $F'$ well-defined and absolutely integrable and $\int_{[a,b]} F'(x)\ dx = F(b)-F(a)$ for a single choice of $a,b$ is not sufficient (take for instance $F$ to be the Cantor function from Exercise 47, but with $F(1)$ set equal to 0 rather than 1).

Incidentally, all of these questions are great exercises to work out yourself; see https://terrytao.wordpress.com/career-advice/ask-yourself-dumb-questions-%E2%80%93-and-answer-them/

4 March, 2016 at 8:48 pm

Anonymous

Suppose $F:[a,b]\to\mathbb{R}$ is differentiable. It is true that
$\displaystyle \int_{[a,b]}F'(x)\ dx=F(b)-F(a)$
but not necessarily true that
$\displaystyle \int_{[a,x]}F'(x)\ dx=F(x)-F(a)$ for all $x\in[a,b]$ , which implies that $F$ is absolutely continuous.
I guess this is your point. But I don’t have a counterexample for the “not necessarily true” part. Do you have a hint how to find it?

5 March, 2016 at 7:22 am

Anonymous

Well, $f(x)=\sin(x^2)$ is not even uniformly continuous on $\mathbb{R}$ . And thus it cannot be absolutely continuous. But it is $C^\infty(\mathbb{R})$ .

11 April, 2016 at 9:58 am

Fonction maximale de Hardy-Littlewood, théorème de différentiation de Lebesgue | Matthieu Joseph

[…] constante n’est pas optimale dans cette inégalité. Terence Tao propose un exercice (l’exercice 19) pour améliorer par , puis il discute ensuite de la […]

4 May, 2016 at 2:09 pm

Anonymous

In Exercise 7, is $f(x-y)g(y)$ measurable on $\mathbb{R}^{2d}$ ?

4 May, 2016 at 2:33 pm

Anonymous

In Exercise 7 should $f*g$ be well defined for a.e. $x$ instead of every $x$ ?

5 May, 2016 at 8:18 am

Terence Tao

In this case $f*g$ can be shown to be well defined for all $x$ (the integrand is absolutely integrable).

Also, the measurability of $f(x-y)$ and $g(y)$ (and hence of the product) is easily deduced from observing that the composition of a measurable function and a continuous map (or more generally, a measurable map) is again measurable. Here, the maps in question are $(x,y) \mapsto x-y$ and $(x,y) \mapsto y$ .

5 May, 2016 at 4:58 pm

Anonymous

If $f$ is Borel measurable, then $f\circ h$ is Lebesgue measurable or Borel measurable whenever $h$ is. But we only know that $f$ is Lebesgue measurable. How do you conclude that $f\circ h$ is (Lebesgue) measurable where $h(x,y)=x-y$ ?

5 May, 2016 at 5:05 pm

Anonymous

Continuity of $h$ itself is not enough in this issue I suppose? It seems that one need to show directly that $h^{-1}(E)$ is Lebesgue measurable whenever $E$ is.

6 May, 2016 at 8:19 am

Terence Tao

Fair enough; the map $h: {\bf R} \mapsto {\bf R}^2$ defined by $h(x) := (x,0)$ is continuous but the preimage of the null set $E \times \{0\}$ is nonmeasurable for any nonmeasurable $E \subset {\bf R}$ . But in this particular case one can check that the preimage of a set of zero outer measure is again of zero outer measure, so null sets pull back to null sets. Since Lebesgue measurable sets differ by a null set from a Borel measurable set, the claim for Lebesgue sets then follows from the corresponding claim for Borel sets.

One can also use Exercise 21, 22 of Notes 1.

9 May, 2016 at 1:49 pm

Anonymous

In Stein and Shakarchi, $g$ is assumed to be integrable on $\mathbb{R}^d$ (so is $f$ ) and the conclusion is that $f*g$ is well defined for a.e. $x$ . Do you have a hint that why one cannot have $f*g$ being well defined for all $x$ in that case? Is there a counterexample showing what could go wrong?

9 May, 2016 at 2:40 pm

Terence Tao

Consider for instance $f*g(0) = \int_{{\bf R}^d} f(x) g(-x)\ dx$ . The Cauchy-Schwarz inequality tells us that this integral is well-defined when $f, g$ are square-integrable, but when $f,g$ are merely integrable then this integral may be divergent (e.g. consider the case $d=1$ and $f(x) = g(x) = \frac{1}{|x|^{1/2}} 1_{|x| \leq 1}$ ).

9 May, 2016 at 6:49 pm

Anonymous2

I read through notes 1 up to this note but I can’t find a useful statement to show that $f*g$ is well-defined for a.e. $x$ if one only assumes that $f,g$ are merely integrable… The comment above seems to suggest that this is unlikely true?

9 May, 2016 at 9:42 pm

Terence Tao

If $f, g: {\bf R}^d \to {\bf C}$ are absolutely integrable, then by the Fubini-Tonelli theorem, $f(y) g(x-y)$ is absolutely integrable on ${\bf R}^d \times {\bf R}^d$ , which by further application of Fubini-Tonelli shows that $f(y) g(x-y)$ is absolutely integrable in $y$ for almost every $x$ .

7 May, 2016 at 6:19 am

Anonymous

“But in this particular case one can check that the preimage of a set of zero outer measure is again of zero outer measure”

Would you elaborate how this is done? (What’s more, why did you use “outer measure” instead of “Lebesgue measure”?)

(Trying to prove by contradiction seems also difficult: Let $h:\mathbb{R}^{2}\to\mathbb{R}$ with $h(x,y)=x-y$ . Suppose $E$ is a subset of $\mathbb{R}^2$ with positive Lebesgue measure. How can I show that $h(E)$ can not have measure zero?)

7 May, 2016 at 7:53 am

Terence Tao

One needs to use outer measure initially because one does not know a priori that the preimage is measurable.

A warmup question would be to show that if $E \subset {\bf R}$ has zero outer measure, then the set $E \times [0,1] \subset {\bf R}^2$ has zero outer measure; using the countable subadditivity of outer measure, this implies that $E \times {\bf R} \subset {\bf R}^2$ also has outer measure. This gives the claim for $h(x,y) = x$ ; the case of $h(x,y) = x-y$ is similar.

As I said in my previous post, these claims are special cases of Exercises 21 and 22 of Notes 1.

12 May, 2016 at 6:32 am

Anonymous

Regarding the concept “good kernels” in Exercise 27, Stein and Shakarchi give two sets of conditions.

One is
(1) $\int_{\mathbb{R}^d} K_\delta(x)\ dx=1$
(2) $\int_{\mathbb{R}^d} |K_\delta(x)|\ dx\leq A$
(3) For every $\eta>0$ , $\int_{|x|\geq\eta} |K_\delta(x)|\ dx\to 0$ as $\delta\to 0$

Another one is
(1) $\int_{\mathbb{R}^d} K_\delta(x)\ dx=1$
(2′) $|K_\delta(x)|\leq A\delta^{-d}$
(3′) $|K_\delta(x)|\leq A\delta/|x|^{d+1}$ for all $\delta>0$ and $x$ .

You also give a set of condition in Exercise 27.

(1)and has total mass ${\int_{{\bf R}^d} P(x)\ dx}$ equal to ${1}$ .
(2″)non-negative,
(3″) radial,
(4″) radially non-increasing,

Other than the “total mass equals to 1”, are other conditions in each set related to each other? In practice, how does one choose which set of conditions to use?

12 May, 2016 at 8:32 am

Terence Tao

Basically it depends on the application. In analysis there often is not a “one-size-fits-all” definition for a given concept (such as a “good kernel”), one often has to tailor the precise axioms for the need at hand, strengthening them if this will help prove other theorems, or weakening them if this helps find instances of the concept. For the purpose of an introductory course, one often does not strive for the most general statement one can make in these directions, but instead focuses on a simple illustrative case, even if this means that the axioms are unnecessarily strong or the conclusion unnecessarily weak.

This is in contrast with the more algebraic areas of mathematics, in which there often are natural and canonical definitions for key concepts; cf. Gowers’ “Two cultures of mathematics” essay.

23 May, 2016 at 12:15 pm

Chris

Dear Terry,

in the proof of Lemma 18.2 you use this nice map. You choose for each jump at $x$ a rational number $q_x$ which lies between the left and right limit. Do you use the axiom of choice to do this?

Thanks.

23 May, 2016 at 6:37 pm

Terence Tao

One does not need the axiom of choice for this, as it is possible to place an explicit well-ordering on the rationals which can be used to create a choice function.

19 July, 2016 at 6:32 pm

Sunting

dear prof. tao. i am wondering whether the rising sun lemma could be true when it is defined on R. i.e.,
for any continuous F:R->R;
then we could find at most countable famility of disjoint non-empty open intervals.In=(an,bn);such that:
1)for each n, F(an)=F(bn);
2)for any x not belonging to any of In. then we have F(y)<=F(x)for any x<=y;

19 July, 2016 at 6:57 pm

Sunting

sorry , i may be wrong . it seems this lemma dont admit In=(-inf,a) or In=(a,inf);

27 September, 2016 at 10:52 am

246A, Notes 2: complex integration | What's new

[…] variants of the above integrals (e.g. the Henstock-Kurzweil integral, discussed for instance in this previous post), which can handle slightly different classes of functions and have slightly different properties […]

2 December, 2016 at 4:58 pm

Anonymous

In Theorem 6, do we have more information regarding the $x$ such that $F'(x)=f(x)$ . For instance, do we a characterization of such $x$ ? Suppose $f$ is continuous at $x=a$ . Can we conclude that $F'(a)=f(a)$ ?

3 December, 2016 at 2:19 am

Anonymous

$F'(x) = f(x)$ holds whenever $x$ is a Lebesgue point of $f$ (see e.g. the Wikipedia articles on Lebesgue points and Lebesgue differentiation theorem). Note that each continuity point of $f$ is a Lebesgue point of $f$ .

22 December, 2016 at 8:01 am

coupon_clipper

Terry, in exercise 9 part 1, you say to show that $f^{-1}(z+D)$ has positive measure for some z. I get the feeling that this is supposed to be simple but I’m stuck here.

I was able to show that it this weren’t the case, then there would be a null set (in $R^d$ ) that maps to the entire complex plane, but I’m not sure that leads to a contradiction. Can you give a hint?

[If $f^{-1}(z + D)$ is null for every $z$ , then one should be able to show that all of ${\bf R}^d$ is null by expressing the latter set as a countable union of sets of the former type. -T.]

23 December, 2016 at 5:05 am

coupon_clipper

That did it. Thanks T!

25 December, 2016 at 8:50 am

Anonymous

There are so many inequalities you labeled as “Hardy-Littlewood”. Did they actually give all of them mentioned here?

Besides, in “Just as the integration theory of unsigned functions can be used to develop the integration theory of the absolutely convergent functions”, I think you mean “absolutely integrable functions”?

26 December, 2016 at 10:35 am

Terence Tao

Thanks for the correction. All the one-dimensional Hardy-Littlewood inequalities stated here are essentially in the original 1931 paper of Hardy and Littlewood, although the notation is somewhat different. The higher-dimensional generalisations of the Hardy-Littlewood inequality first appear I believe in a 1939 paper of Wiener, so strictly speaking it should be referred to as a maximal inequality of Hardy-Littlewood type.

25 December, 2016 at 9:03 am

Anonymous

Is the converse of Corollary 21 also true so that we have a characterization of differentiable almost every where functions?

26 December, 2016 at 10:47 am

Terence Tao

No, for instance $\sin(\frac{1}{x})$ is differentiable almost everywhere on the real line (after assigning some arbitrary value to $x=0$ ) but has infinite variation; another class of examples would be functions that are locally constant outside of a Cantor set, but whose values on each connected component of the complement of the Cantor set are set to arbitrary values unrelated to each other. I would imagine that there is no particularly useful characterisation of the entire class of almost everywhere differentiable functions other than the tautological one of being differentiable outside of a set of measure zero.

25 December, 2016 at 6:10 pm

Anonymous

What is the relation between bounded variation and integrability on a compact interval?

[The fundamental theorems of calculus identify the absolutely continuous functions as the antiderivatives of the integrable functions, and the absolutely continuous functions are a subclass of the bounded variation functions. -T.]

26 December, 2016 at 1:15 pm

Anonymous

Sorry for the confusion. I was going to ask what is the relation between bounded variation and integrability of a function on a compact interval.

Say $f:[a,b]\to\mathbb{R}$ . Is there any relation between the integrability (being absolutely integrable or not) of $f$ on $[a,b]$ and the “bounded variationess” of $f$ ?

26 December, 2016 at 4:45 pm

Anonymous

A “d” is missing in the statement of Theorem 12.

[Corrected, thanks – T.]

26 December, 2016 at 8:35 pm

Terence Tao

Functions of bounded variation are bounded and measurable, hence integrable. On the other hand, the converse statements are very far from being true; functions of bounded variation have roughly one degree of regularity, functions that are merely integrable have none.

3 January, 2017 at 4:21 pm

Anonymous

All the dumb questions I asked here are for doing the exercise in Stein-Shakarchi:

Let $a,b>0$ and $f:[0,1]\to\mathbb{R}$ be defined with $f(x)=x^a\sin(x^{-b})$ for $x\in(0,1]$ and $f(0)=0$ . Show that $f$ is BV if and only if $a>b$.

When $a>b>0$ , all I can find is
$f’\in L^1([0,1])$. I really don’t know if I can conclude that $f$ is BV due to the possible bad behavior at $x=0$ since $f$ does not necessarily have a derivative there.

1 January, 2017 at 9:00 am

Anonymous

This is a related “dumb question” that I don’t know how to answer. Consider $f:[a,b]\to\mathbb{R}$ . (1)Suppose $f$ is BV on $[a,b]$ . Then one has $f$ is differentiable a.e.. Can one say further that $f'\in L^1([a,b])$ ? (2) One the other hand, if $f'\in L^1([a,b])$ (assuming $f'$ exists a.e.), can one conclude that $f$ is BV on $[a,b]$?

I can only answer this question when $f$ is absolutely continuous, which can be done by Exercise 38.

[For (1), combine Proposition 20 with Proposition 22. for (2), see my comment from 26 Dec 10:47am. -T.]

2 January, 2017 at 11:44 am

Anonymous

Thanks for the comment. For (2) I think the example $\sin\left(\frac{1}{x}\right)$ on $[0,1]$ does not work since the derivative $\cos\left(\frac{1}{x}\right)$ is not $L^1$ on $[0,1]$ and you are referring to the Cantor-function-like function actually?

One can at least now say that (i)being differentiable a.e. on $[a,b]$ and (ii)the derivative $f'\in L^1([a,b])$ are necessary for $f$ being BV on $[a,b]$ but far less than sufficient due to the existence of counterexamples (the Cantor-function-like function?).

What if one makes the exceptional set for (i) smaller: $f:[a,b]\to\mathbb{R}$ is differentiable everywhere on $[a,b]$ except at one point (or finitely many points even countably many)?

3 January, 2017 at 3:36 pm

Anonymous

I’m trying to use Exercise 36. Does one have

$\displaystyle \|F\|_{TV[a,b]}=\lim_{\epsilon\to0+}\|F\|_{TV[a+\epsilon,b]}?$

3 January, 2017 at 3:44 pm

Anonymous

And of course one should also assume that $F$ is continuous.

26 December, 2016 at 12:41 pm

Anonymous

In exercise 28, Weierstrass function is represented by a lacunary Fourier series. Is it necessary for any such example (periodic, continuous and non-differentiable everywhere) to have a lacunary Fourier series representation?

26 December, 2016 at 8:37 pm

Terence Tao

No. For instance, one can show that non-differentiable everywhere periodic functions are a comeager subset of the class of continuous periodic functions, whereas the lacunary Fourier series are a meager set.

26 December, 2016 at 6:03 pm

Anonymous

Due to my ignorance, I found the proof of the differentiation theorem (between Lemma 9 and Lemma10) hard to follow while the counterpart in Stein-Shakarchi (p105) seems to be very clear. (I like your comment about the “density argument” a lot. But I feel very stupid and frustrated that I didn’t figure out how the density argument is made.) Am I missing something that makes the difficulties of reading this particular proof? It seems that the style of writing here is sort of improvising, which is rather different from your advice on writing(https://terrytao.wordpress.com/advice-on-writing-papers/).

28 December, 2016 at 12:25 pm

Anonymous

I stared at the definition of Dini derivatives and wanted to try some nontrivial examples but I couldn’t figure out a way. Let $f(x)=x\sin\frac{1}{x}$ for $x\neq 0$ and $f(0)=0$ . Would you give an example for how to calculate one of the four Dini derivatives?

In general, is there a systematic way to calculate the Dini derivatives at the point where the given function is not differentiable?

28 December, 2016 at 1:28 pm

Anonymous

Would you elaborate the hint for Exercise 30? What does it mean by $h$ ranges over a countable set and how it would be useful?

[Countable unions or intersections of measurable sets stay measurable, whereas uncountable unions or intersections do not. A naive description of the level set of a Dini derivative, e.g. $\{ x: \overline{{\mathcal D}_+}F(x) > \lambda\}$ , will involve an uncountable union or intersection because $h$ ranges over the set of positive reals, which is an uncountable set. If one can somehow restrict the range of $h$ to a countable set (e.g. the set of positive rationals), then this problem goes away. -T.]

12 January, 2017 at 5:56 am

coupon_clipper

Exercise 28 (proving the Weierstrass function is nowhere differentiable — the continuous part was easy for me) is really giving me trouble. Following your hint, I computed $F(\frac{j+1}{8^k})$ and $F(\frac{j}{8^k})$ (I switched to using k as an index so we’re not using n twice) and found that they can both be expressed as a sum of $k-1$ terms instead of an infinite series.

Then when I subtract them, I can use the “difference of sines” formula, but that’s where I get stuck:

$F(\frac{j+1}{8^k}) - F(\frac{j}{8^k}) = \sum_{n=1}^{k-1} 4^{-n}\cdot 2\cdot sin(8^{n-k} \pi /2) \cdot cos \left(8^{n-k} \pi (j+\frac{1}{2})\right)$

I don’t see any way to bound this from below (in absolute value).

I know this seems like a basic analysis problem and it wasn’t really the point of this whole chapter, but I really want to figure it out.

12 January, 2017 at 9:23 am

Terence Tao

Hmm, actually the hint would fit the problem better if the exercise used cosines rather than sines, so that there is an $n=k$ term that easily dominates the rest of the sum. I’ll change the exercise accordingly.

12 January, 2017 at 11:13 am

coupon_clipper

Thanks, Terry! That’s better. (You swapped an m and n in part 2 though.)

[Corrected, thanks – T.]

12 January, 2017 at 11:14 am

coupon_clipper

Also, it’s exercise 52 in case anyone else is reading along. I’m not sure where I got 28 from.

1 May, 2017 at 2:32 pm

Pierre

Thanks Tao for this lectures.

Maybe i don’t understand the formulation of the Besicovitch covering lemma :
If i take (0,1),(-1,2/3),(1/3,2), there is no subfamily with the same union and 1/2 will always be in 3 intervals.

1 May, 2017 at 2:34 pm

Pierre

All wrong, sorry

23 August, 2017 at 3:57 am

Joe Li

A small typo:

In the proof of Proposition 19:

Applying (4), we conclude that
…
which we rearrange as

the next line there is an extra $h$ before the first $(x)$ .

[Corrected, thanks – T.]

23 August, 2017 at 4:01 am

Joe Li

At the end of the proof of Lebesgue differentiation theorem (just above lemma 26), is it necessary that $\lambda$ go to zero along a COUNTABLE sequence?

[This is because there is a measure zero exceptional set of $x$ for each $\lambda$ . In order to ensure that the union of these exceptional sets remains of measure zero, one needs to use only countably many $\lambda$ . -T]

24 August, 2017 at 2:20 am

Joe Li

Small typo in several places in section 2:

$\int_\mathbf{R}$ should be $\int_{\mathbf{R}^d}$ .

If I’m correct, these places are:

RHS of the line below Theorem 36 (Hardy-Littlewood maximal inequality)
RHS of the line below “It suffices to verify the claim with strict inequality”
RHS of the line below “By inner regularity, it suffices to show that”
RHS of he line below “establish the dyadic Hardy-Littlewood maximal inequality”

[Corrected, thanks – T.]

25 August, 2017 at 3:05 am

Joe Li

Small typos:

3rd lines above Remark 60:
$m(E_{r,R})$ should be $m(E_{r,R} \cap [a,b])$ .

In the proof of Lemma 62:

As discussed previously, $G$ is discontinuous only at $A$

should be

As discussed previously, $F$ is discontinuous only at $A$

since there is no $G$ in the context.

[Corrected, thanks – T.]

20 May, 2018 at 8:00 pm

A short proof of the Hardy-Littlewood maximal inequality | George Shakan

[…] Incidentally his proof gives the better constant , though this is well known, see for instance exercise 42 in these notes of Tao. One cute related geometric question is can one improve the constant in Vitali’s covering lemma […]

16 September, 2018 at 9:31 am

254A, Notes 1: Local well-posedness of the Navier-Stokes equations | What's new

[…] Exercise 2 Relax the hypotheses of continuity on to that of being measurable and bounded on compact intervals. (You will need tools such as the fundamental theorem of calculus for absolutely continuous or Lipschitz functions, covered for instance in this previous set of notes.) […]

18 September, 2018 at 10:44 am

Alan Chang

In Definition 66 (Bounded variation) and a few places afterwards, F(x_{i+1}) should be F(x_{i-1}).

[Corrected, thanks – T.]

1 June, 2019 at 7:45 am

Anonymous

I don’t quite understand the conclusion of Theorem 11:

Then the Riemann integral ${\int_a^b F'(x)\ dx} of {F'}$ is equal to ${F(b) - F(a)}$ . In particular, we have ${\int_a^b F'(x)\ dx = F(b)-F(a)}$ whenever ${F}$ is continuously differentiable.

Aren’t the following talking about the same things?

– the Riemann integral ${\int_a^b F'(x)\ dx} of {F'}$ is equal to ${F(b) - F(a)}$

– ${\int_a^b F'(x)\ dx = F(b)-F(a)}$

1 June, 2019 at 8:40 am

Terence Tao

Yes, the second claim is a particular case of the first (continuously differentiable functions have a derivative that is Riemann integrable).

1 June, 2019 at 8:11 am

Anonymous

Is there a relation between Theorem 89 and Proposition 92? It seems that the hypothesis of Prop 92:

… let ${F: [a,b] \rightarrow {\bf R}}$ be a differentiable function, such that ${F'}$ is absolutely integrable.

implies that of Theorem 89, namely $F$ is absolutely continuous, doesn’t it? If true, then Proposition 92 seems a bit redundant. (I may misunderstand the purpose of Theorem 89 though.)

1 June, 2019 at 8:43 am

Terence Tao

It is true that an everywhere differentiable function with absolutely integrable derivative is absolutely continuous, but to prove this one has to use Proposition 92 (together with Exercise 87.6). Note that Proposition 92 is a little subtle because it is not true if the function is merely assumed to be differentiable almost everywhere instead of differentiable everywhere, even when the function is continuous, as the example of the Devil’s staircase function illustrates.

1 June, 2019 at 10:18 am

Anonymous

Regarding Definition 86, is “absolute continuity” a global concept (like “uniform continuity on an interval”) or a local one (like “continuity” at a point)?

In Exercise 87:

Show that every absolutely continuous function is of bounded variation on every compact interval ${[a,b]}$ .

Should one understand “absolute continuity” as on $\mathbf{R}$ or on $[a,b]$ ?

1 June, 2019 at 2:20 pm

Terence Tao

The property of being absolutely continuous is localisable, in the sense that if a function is absolutely continuous on an interval (or real line) latex I$, then its restriction to a subinterval $J$ will also be absolutely continuous, and conversely if $I$ is partitioned into subintervals and a function is absolutely continuous on each subinterval (and continuous at the endpoints of the subintervals) then it will be absolutely continuous on the entire domain. However, it is not a pointwise property: there is no meaningful notion of a function being “absolutely continuous at a point”.

1 June, 2019 at 10:57 am

Anonymous

Lemma 62.3 and the Lebesgue decomposition theorem in the linked notes are both a sort of “decomposition” and they both have the concept of “absolute continuity” in the statement.

But in the former decomposition $F=F_c+F_{pp}$ , “absolute continuity” appears implicitly in the assumption while the later one $\mu = \mu_{ac} + \mu_s$ has “absolute continuity” in the conclusion.

Would you elaborate how Lemma 62.3 fits in the Lebesgue decomposition theorem?

1 June, 2019 at 2:22 pm

Terence Tao

The Lebesgue-Stieltjes measure $dF$ associated to a monotone function $F$ need not be absolutely continuous; it can have both absolutely continuous and singular components. The singular component in turn splits into a singular continuous component and a pure point component. The function $F_c$ is the cumulative distribution function of the sum of the absolutely continuous and singular continuous components, and $F_{pp}$ is the cumulative distribution function of the pure point component.

13 October, 2019 at 4:27 am

Anonymous

Stein said without proofs at the beginning of his singular integral book that it is a “simple” observation that the maximal function satisfies $Mf(x)\ge c|x|^{-n}$ . I searched around the book without finding anything to prove it. I believe it relates to this set of notes. How can one do that?

13 October, 2019 at 5:33 am

Anonymous

If $f(x)$ is identically zero, this inequality is clearly false for any $c > 0$

13 October, 2019 at 8:35 am

Anonymous

The assumption on $f$ is of course that $f$ is not identically zero.

14 October, 2019 at 8:38 am

Terence Tao

For any $R > 0$ and $|x| \geq 1$ one has

$\displaystyle Mf(x) \geq \frac{1}{m(B(x,|x|+R)} \int_{B(x,|x|+R)} |f(y)|\ dy$

$\displaystyle\geq \frac{c}{(|x|+R)^n} \int_{B(0,R)} |f(y)|\ dy \geq \frac{c'}{|x|^n}$

for some $c'>0$ depending on $R, f$ , as long as $R$ is chosen so that $\int_{B(0,R)} |f(y)|\ dy$ is non-zero, which is possible to do for any non-trivial $f$ (which is part of the hypotheses in the remark in Stein’s book on pages 5-6, as is the constraint $|x| \geq 1$ ).

14 October, 2019 at 10:54 am

Anonymous

It seems that it is possible to make $c'$ dependent only on $f$ if the inequality is required to hold only for sufficiently large $|x|$ (where the “sufficiently large” threshold is dependent only on $f$ )

17 October, 2019 at 6:59 am

Anonymous

The existence of $R$ can be derived by looking at the contrapositive of the statement: if $\int_{B(0,R)} |f(y)|\ dy=0$ for all $R$ , then one must have $\int |f|=0$ by the monotone convergence theorem. Is there any “constructive” proof showing what that $R$ should at least be?

If one considers the set $E=\{x:f(x)\neq 0\}$ , then $B(0,R)$ should contain a “large portion” of $E$ . If $E$ is bounded, then this is trivial.

17 October, 2019 at 7:35 am

Terence Tao

Perhaps it is clarifying to look at a simpler discrete analogue of this question. It is tautological that if a sequence $a_1,a_2,\dots$ is not identically zero, then there exists an $n$ such that $a_n \neq 0$ ; this is a discrete analogue of the assertion that if a function $f: {\bf R}^d \to {\bf R}$ is not identically zero a.e., then there is an $R$ such that $\int_{B(0,R)} |f| \neq 0$ . However, in both cases there is no bound on the quantity $n$ or $R$ . One explanation for this is that the key hypothesis of being “not identically zero” is an open condition, rather than a closed condition, and in particular is certainly not a compact condition. So one does not expect to have any uniform bound on conclusions that rely on such an open condition. (Compare for instance the assertion that if a real number $x$ is non-zero, then its reciprocal $1/x$ is finite. This assertion is trivial but one has no uniform bound on the magnitude of $1/x$ because the hypothesis of being non-zero is open. If instead we replaced that hypothesis with a closed condition like $|x| \geq \varepsilon$ then one can now obtain a uniform bound.)

So if one somehow upgrades the hypothesis of being not identically zero to something more quantitative (and closed), then there is a chance of getting a usable bound. For instance if one has some lower bound on $|a_n|$ or $|f(x)|$ that becomes positive when $n$ or $x$ is large enough then this would bound the quantity $n$ or $R$ in the previous conclusions.

18 October, 2019 at 1:28 am

Anonymous

Is there a precise definition for the concept “open condition” ?

23 October, 2019 at 6:53 am

Anonymous

In Stein-Shakarchi, the maximal function is defined as

$f^*(x)=\sup_{x\in B}\frac{1}{m(B)}\int_B|f(y)|\,dy$

I believe this is equivalent to the one given in this set of notes. Is it simply a matter of taste? Or does the flexibility of balls (not necessarily centered at $x$ ) in the above definition make any argument related to maximal functions easier in practice?

Stein’s book on singular integrals has the same version of maximal functions as the one here. I am wondering if there is any technical reason that he changed the definition to a different one. (Also a different version of the covering lemma was used there with the constant $5^d$ instead of $3^d$ . I’m curious who in history gave this slight improvement.)

23 October, 2019 at 7:50 am

Anonymous

This is the non-centered version of the maximal function. Hardy-littlewood maximal function is the centered one. The two versions are clearly equivalent since the centered maximal function is upper bounded by the non-centered one which in turn is upper bounded by $2^d$ times the centered one (by doubling the ball radius)

14 May, 2020 at 7:26 am

247B, Notes 4: almost everywhere convergence of Fourier series | What's new

[…] established with the assistance of the Hardy-Littlewood maximal inequality; see for instance this previous blog post. A remarkable observation of Stein, known as Stein’s maximal principle, allows one to reverse […]

10 August, 2020 at 11:53 am

Zijin Liu

In exercise 42, do you mean using sub-collection constructed in Vitali Covering lemma to cover the compact set K (with two times radius)? I have been stuck for a couple of days, could you provide some further hints?

Thank you.

10 August, 2020 at 7:42 pm

Terence Tao

Basically yes, except that one has to use $2+\varepsilon$ times the radius rather than twice the radius; also one has to use compactness more efficiently to make $K$ covered by the $\varepsilon B_j$ rather than by $B_j$ .

17 August, 2020 at 10:13 pm

Zijin Liu

Dear Professor Tao,

In exercise 58, I was trying to use Vitali-type covering lemma to modify the proof of Theorem 36. But I can only get an h_x for each x, such that [F(x+h_x)-F(x)/h_x]>λ, and (x,x+h_x) may not form an open cover, since it is not centered at x. Could you provide some hint to solve this issue?

Thank you

20 August, 2020 at 2:15 pm

Terence Tao

One can create an epsilon of room here and enlarge each $(x,x+h_x)$ by a tiny amount to generate an open cover without degrading the lower bound on the difference quotient by too much, and then take limits at the end of the argument to remove the loss.

19 February, 2021 at 9:03 am

N is a number

For solving Exercise 16, I think one can use Fubini’s theorem: we have $F(x) = \lim_{ n \to \infty } \int_{-n}^{x} f(t) dt$ and so taking one more limit as $x \to y$ and then interchanging the limits (thanks to Fubini’s theorem, since both of these exists and the integrand is absolutely integrable), the claim follows.

Is there a way to avoid using Fubini’s theorem (something which is easier than proof of Fubini’s theorem for instance).

[Dominated convergence works just fine here – T.]

21 February, 2021 at 9:00 am

N is a number

It can also be proved using density argument along the lines of proof of Proposition 19.

22 August, 2022 at 9:34 pm

khaledalekasir

Hi prof. Tao
I was trying to specify elements of “density argument” for proof given for second fundamental theorem of calculus.
Firs of all theorem is proven for continuous monotone case, but you`ve mentioned : “space of continuous monotone functions are not sufficiently dense in the space of all monotone functions in the relevant sense(you have mentioned this as total variation sense)”.
I`m still having trouble understanding meaning of “sufficiently dense” and “relevant sense”.
Thanks.

31 March, 2023 at 12:44 am

Sam

Dear Professor Tao:
For Exercise $29$ (Two-sided Hard-Littlewood), can we simply use the One-sided Hardy-Littlewood inequality on the reflected function $x \rightarrow f(-x)$

[Yes, this (and the triangle inequality) will solve the problem. -T]

5 April, 2023 at 11:53 am

Sam

Thank you professor. Also, for Exercise $31$ , in applying the rising sun lemma to the function $F(x) = \int_{[a, x]}f(t)\ dt - \lambda(x - a)$ , I was trying to show that the case $a_n = a$ and $F(b_n) > F(a_n)$ is impossible when $\lambda > 0$ (so we can have $\int_{I_n}f(t)\ dt = \lambda(b_n - a_n)$ for all $n$ ), but was was stuck in finding the contradiction. May you provide some further hint on the correct direction to proceed?

[Move the starting point $a$ to be well to the left of the support of $f$ to exclude this possibility. Drawing pictures may help. -T]

14 April, 2023 at 9:38 pm

Sam

Dear professor Tao:
In Exercise 46, do we need the Besicovich covering lemma instead of the Vitaly-type covering lemma since $x$ is not assumed to be the center of the ball here, while the Vitali-type covering lemma gives the version of the maximal inequality where $x$ is the center? What underlying property of the general locally finte Borel-measure $\mu$ that is not shared by the Lebesgue measure $m$ did we use in answering this question? Thank you.

17 April, 2023 at 8:06 am

Terence Tao

For general measures $\mu$ we do not have a clean relationship between $\mu(3B)$ and $\mu(B)$ .

17 April, 2023 at 12:28 pm

Sam

Thank you professor.

20 April, 2023 at 1:15 am

Anonymous

Dear professor Tao:
For part (2) of Exercise 49, it seems that one can reduce to $E$ being a bounded open set, and thus can be written as a countable union of almost disjoint cubes. From here, can you give some further hint as to how to construct the the cube $Q$ with the desired property? Also in part (1), do we need Exercise 48 to ensure that $x$ is a Lebesgue point (since it’s a density point), so we can safely apply Exercise 34?

22 April, 2023 at 12:21 am

Anonymous

Dear professor Tao:
I think I got the idea of part (2) of Exercise 49, outer approximate $E$ by an open set $U$ , which can be written as a countable union of almost disjoint cubes, and then one of those cubes must have the desired property.

27 April, 2023 at 12:11 am

Sam

Dear professor Tao:
Following the hint in part $(2)$ of Exercise 50, for any subinterval $I \subset (-1, 2)$ , $I \subset (-1, 2) \setminus K$ , which is a countable union of disjoint open intervals. From here, can you elaborate on the comment “fill in these open intervals and iterate”? What are we aiming at?

1 May, 2023 at 1:16 am

Sam

My understanding is to take a compact nowhere dense subset $E_1 \subset I$ , with measure $> 1/2|I|$ (say), then take the union $E_2$ of the compact nowhere dense subsets of the open components of $I \setminus E_1$ , each with measure $> 1/2 \cdot$ |the open component| … Then define $E := \bigcup_n E_n$ . Since $E$ is well-distributed in $I$ , the translates of $E$ will be well-distributed in $\mathbb{R}$ .

7 May, 2023 at 11:09 pm

Sam

For this construction, I’m able to show that $\forall I' \subset I$ , $I’$ an interval, $m(E \cap I') > 0$ . Yet I’m not able to show that $m(E \cap I') < |I'|$ .

16 May, 2023 at 4:27 pm

Sam

The difficulty for me is to control $m(I' \cap E_n)$ in terms of $|I'|$ for any subinterval $I' \subset I$ . Is this the wrong direction to consider?

21 May, 2023 at 7:55 pm

Sam

Let $E_n \subset I \setminus K \setminus E_1 \setminus \ldots E_{n-1}$ be the countable union of compact fillings in the $n$ th stage. I get stuck as to how to control the size of the filling $m(E_n)$ to ensure $E = \bigcap_n E_n$ has the given property. Could you please provide some further hint, professor?

22 May, 2023 at 1:30 pm

Sam

Dear professor:
$E$ should be $\cup_n E_n$ rather than $\cap_n E_n$ in my last comment. My issue is that the holes are shrinking in size and chained together. Namely, $I' \setminus E_1 \supset I' \setminus E_1 \setminus E_2 \supset \ldots$ . If $n$ is the first natural number such that $m(I' \setminus E_1 \setminus \ldots E_n) < |I'|$ , then $0 < m(I' \cap E_n) < |I'|$ . In fact, $E_n$ partially fills the hole $I' \setminus E_1 \setminus \ldots \setminus E_{n-1}$ by our construction of the $E$ 's. But then it is unclear whether the part of $I'$ not filled by $E_n$ , will then be filled completely in the remaining stage by $E_{n+1}, E_{n+2}, \ldots$ , etc.

5 June, 2023 at 3:03 am

Anonymous

Dear professor Tao: For part (1) of Exercise 50. Is there any theorem/proposition in the previous notes available, other than the definition itself, for us to calculate the total mass of $P$ ? Also, can you elaborate further on the hint in part (2) about comparing $P$ with the “horizontal wedding cake function” you mentioned?

5 June, 2023 at 7:42 am

Terence Tao

For the heat kernel, this is a gaussian integral and can be computed by a number of techniques (for instance a reduction to the famous identity $\int_{-\infty}^\infty e^{-\pi x^2}\ dx = 1$ ). For the Poisson kernel, one can rescale $t=1$ , and manipulate the integral by a number of techniques (e.g., polar coordinates and contour shifting, or using the identity $\frac{\Gamma(s)}{a^s} = \int_0^\infty e^{-at} t^{s-1}\ dt$ to express the Poisson kernel in terms of gaussians); alternatively, one can simply just declare $c_d$ by fiat to normalize the mass of the Poisson kernel.

For (2), one can obtain pointwise upper and lower bounds for $P$ in terms of the horizontal wedding cake function I mentioned, as well as a dilate of this function (replacing $x$ by $x/2$ or $2x$ ).

6 June, 2023 at 2:08 am

Anonymous

Thank you so much professor.

	Anonymous on Two announcements: AI for Math…
	Anonymous on 275A, Notes 3: The weak and st…
	Terence Tao on 254A, Supplement 4: Probabilis…
	Anonymous on 254A, Supplement 4: Probabilis…
	Terence Tao on Analysis II
	Anonymous on Analysis II
	El problema de Erdős… on Two announcements: AI for Math…
	Anonymous on An airport-inspired puzzle
	oliverknill on Two announcements: AI for Math…
	Anonymous on An airport-inspired puzzle
	Prashant Patil on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…

245A, Notes 5: Differentiation theorems

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

197 comments

Leave a reply to Anonymous Cancel reply

For commenters

245A, Notes 5: Differentiation theorems

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

197 comments

Leave a reply to Anonymous Cancel reply

For commenters