Today, Charlie wrapped up several loose ends in his lectures, including the connection with the classical Whitney extension theorem, the role of convex bodies and Whitney convexity, and a glimpse as to how one obtains the remarkably fast (almost linear time) algorithms in which one actually computes interpolation of functions from finite amounts of data.

With all the notational machinery from the preceding lecture, the Whitney extension theorem can be stated rather cleanly in the language of bundles, sections, and Glaeser-stability as follows. Consider a bundle ${\mathcal H} = (H_x)_{x \in E}$ on a compact set $E \subset {\Bbb R}^n$ in which every fibre is simply a point: $H_x = \{ P_x\}$. The Whitney extension problem is then to see whether such bundles have sections. The theorem is then:

Whitney extension theorem. Let ${\mathcal H}$ be a bundle in which every fibre is a point. Then this bundle has sections if and only if it is Glaeser stable.

It is then clear that the main theorem of the previous lecture generalises the Whitney extension theorem (which represents the case of exactly one stratum, consisting entirely of zero-dimensional fibres). It is then unsurprising that the main tools used to prove the latter theorem (Whitney cubes, smooth partitions of unity, etc.) also come up in the proof of the former. It is also worth noting that Whitney’s proof shows that the map from the polynomials $P_x$ to the section F can be chosen to be linear. In particular, if the set E is finite, one can bound the $C^m({\Bbb R}^n)$ norm of F by some linear combination of the magnitudes of the coefficients of the polynomials $P_x$.

In the proof of the Whitney extension theorem, it was necessary to smoothly partition space into cubes, perform some sort of extension on each cube, and then sum everything up. In order to guarantee convergence, it is necessary that the bounds on the extension are (a) uniform in some sense over all cubes, and (b) are stable under multiplication by smooth cutoff functions. It turns out that we can abstract these properties by means of a quantitative generalisation of the notion of a bundle, namely that of a collection of Whitney convex sets. Recall that a bundle is a collection ${\mathcal H} = (H_x)_{x \in E}$, where each $H_x$ is either empty, or takes the form $H_x = P_x + \sigma(x)$ for some $P_x \in {\mathcal P}_x$ and some ideal $\sigma(x) \subset {\mathcal P}_x$. Just to belabor the definitions, an ideal is a vector subspace $\sigma(x)$ with the property that $PQ \in \sigma(x)$ whenever $P \in \sigma(x)$ and $Q \in {\mathcal P}_x$.

We can generalise the notion of an ideal as follows. If $x\in {\Bbb R}^n$ and $A > 0$, we say that a set $\sigma(x) \subset {\mathcal P}_x$ is Whitney convex with Whitney constant A if it is a closed convex symmetric set, and if for every $0 < \delta \leq 1$ we have the ideal-like property $PQ \in A \sigma(x)$ whenever $P \in \sigma(x)$, $Q \in {\mathcal P}_x$, and we have the bounds $|\nabla^j P(x)| \leq \delta^{m-j}$, $|\nabla^j Q(x)| \leq \delta^{-j}$ for $0 \leq j \leq m$.

The point if this rather technical definition is that if a jet is lying in $\sigma(x)$ and it has controlled $C^m$ behaviour at some small length scale $\delta$, then one can smoothly localise to that scale $\delta$ without causing the jet to escape $\sigma$ too much. This is the key property needed for Whitney-type arguments to work. (This is why Whitney-type extension problems are tractable, whereas general linear PDE problems are not; one can glue together local extensions to form a global extension via partitions of unity, whereas one cannot build, say, a globally harmonic function from unrelated locally harmonic pieces.)

There was also a slightly weaker notion of Whitney convexity relating to a modulus of continuity $\omega$, called Whitney $\omega$-convexity, which was the same except that the polynomial P had to now obey the bounds $|\nabla^j P(x)| \leq \omega(\delta) \delta^{m-j}$ instead of $|\nabla^j P(x)| \leq \delta^{m-j}$.

Charlie then stated the main technical theorem that underpinned all of the results he presented in this lecture series:

Finiteness theorem, full version. Given n, m, a collection $(\sigma(x))_{x \in E}$ of Whitney $\omega$-convex sets on a compact set $E \subset {\Bbb R}^n$ with some Whitney constant A and modulus of continuity $\omega$, and some polynomials $P = (P_x)_{x \in E}$, define $\|P\|_{(E,\sigma,\omega)}$ to be the best constant M such that there exists $F \in C^{m,\omega}({\Bbb R}^n)$ with norm at most M such that $J_x(F) \in P_x + M \sigma(x)$ for all $x \in E$. Then there exists a finite subset S of E of cardinality O(1) such that $\|P\|_{(E,\sigma,\omega)} \sim_A \|P\|_{(S,\sigma,\omega)}$.

(As in the previous talks, all implied constants can depend on n and m.) This generalises the finiteness theorem mentioned in the first talk, by replacing the ideals with Whitney-convex bodies. As mentioned in the second talk, this generalisation turns out to be necessary in the proof, in order for the induction on the number of strata to work properly. The presence of the modulus of continuity prevents any compactness failures, and so E can be infinite here without difficulty.

Charlie didn’t want to discuss the proof of the above finiteness theorem in all of its gory detail. But he did identify the classical finiteness theorem which is in fact the key to the whole argument, namely Helly’s theorem. This theorem asserts that in order to verify the simultaneous satisfiability of an arbitrary number of compact convex constraints in ${\Bbb R}^n$, it suffices to verify satisfiability of n+1 of them at a time. Apparently, one can view the above finiteness theorem as a massively iterated version of Helly’s theorem, together with several other ingredients (such as partitions of unity adapted to Whitney cubes).
Charlie then finished up by briefly discussing the computer science aspects of his work, and in particular the nearly linear-time algorithms used to compute the near-optimal extensions. He did not attempt to discuss the full details of these algorithms, but instead focused on a very simplified model case (which had been extensively studied earlier in the computer science literature, in particular by Callahan-Kosaraju and Arya-Mount-Netanyahu-Silverman-Wu), namely that of efficiently computing the Lipschitz norm $\|f\|_{Lip(E)} := \sup_{x,y \in E: x \neq y} \frac{|f(x)-f(y)|}{|x-y|}$

of a function $f: E \to {\Bbb R}$ on a finite set $E \subset {\Bbb R}^n$ of N points. Note that a brute force computation of this Lipschitz norm would take $O(N^2)$ computations. Can one do better?

In one dimension, one can get by in just $O(N \log N)$ computations, by first using a sorting algorithm (such as quicksort) to arrange E in order, and then only compute the Lipschitz quotient for adjacent x, y, which clearly suffices. But what to do in higher dimensions?

If one is willing to give up a multiplicative error of $1 \pm O(\varepsilon)$, then it turns out that there is a technique, called the well-separated pairs decomposition, which can compute the Lipschitz norm up to this error in merely $O_\varepsilon( N \log N )$ computations.

It works like this. By a divide-and-conquer algorithm, one can partition the off-diagonal set $\{ (x,y) \in E \times E: x \neq y \}$ into product sets $E_j \times E'_j$ for $j=1,\ldots,J$, where we have the separation condition $\hbox{diam}(E_j), \hbox{diam}(E'_j) \leq \varepsilon \hbox{dist}(E_j,E'_j)$ and the cardinality condition $J = O_\varepsilon(N)$.

Even with this decomposition in hand, it is not entirely obvious how to quickly compute the Lipschitz norm approximately. But there is a neat trick to pull this off. For each product set $E_j \times E'_j$, pick just one representative $x_j \in E_j$ and one representative $y_j \in E'_j$. Then the quantity $K := \sup_{1 \leq j \leq J} \frac{|f(x_j)-f(y_j)|}{|x_j-y_j|},$

which can be computed in $O_\varepsilon(N)$ time, is certainly a lower bound for $\|f\|_{Lip}$. It turns out that it is a pretty good upper bound too; in fact, $\|f\|_{Lip}$ cannot exceed $(1 + 100 \varepsilon)K$ (say). This can be demonstrated by a clever infinite descent argument. More precisely, suppose for contradiction that the Lipschitz norm was larger than this, then there exist distinct x, y such that $|f(x)-f(y)| > (1 + 100 \varepsilon) K |x-y|$.

On the other hand, we know that there must be some product $E_j \times E'_j$ such that $x \in E_j$ and $y \in E'_j$. Then by definition of K we have $|f(x_j)-f(y_j)| \leq K |x_j-y_j|$

and also $|x-x_j|, |y-y_j| \leq \varepsilon |x-y|$.

Putting all this together using the triangle inequality, we soon see that either $|f(x)-f(x_j)| > (1 + 100 \varepsilon) K |x-x_j|$

or $|f(y)-f(y_j)| > (1 + 100 \varepsilon) K |y-y_j|$,

thus leading to a pair $(x,x_j)$ or $(y,y_j)$ which also exhibits large Lipschitz norm but which has a smaller separation than $(x,y)$. But if we keep iterating this we obtain an infinite descent, which is absurd. This explains how one obtains the nearly linear time computation of the Lipschitz norm; and apparently a similar method also computes higher extension norms.