In my previous post, I briefly discussed the work of the four Fields medalists of 2010 (Lindenstrauss, Ngo, Smirnov, and Villani). In this post I will discuss the work of Dan Spielman (winner of the Nevanlinna prize), Yves Meyer (winner of the Gauss prize), and Louis Nirenberg (winner of the Chern medal). Again by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise, so I will be able to discuss a sample contribution for each of them. Again, my choice of contribution is somewhat idiosyncratic and is not intended to represent the “best” work of each of the awardees.

— 1. Dan Spielman —

Dan Spielman works in numerical analysis (and in particular, numerical linear algebra) and theoretical computer science. Here I want to talk about one of his key contributions, namely his pioneering work with Teng on smoothed analysis. This is about an idea as much as it is about a collection of rigorous results, though Spielman and Teng certainly did buttress their ideas with serious new theorems.

Prior to this work, there were two basic ways that one analysed the performance (which could mean run-time, accuracy, or some other desirable quality) of a given algorithm. Firstly, one could perform a worst-case analysis, in which one assumed that the input was chosen in such an “adversarial” fashion that the performance was as poor as possible. Such an analysis would be suitable for applications such as certain aspects of cryptography, in which the input really was chosen by an adversary, or in high-stakes situations in which there was zero tolerance for any error whatsoever; it is also useful as a “default” analysis for when no realistic input model is available.

At the other extreme, one could perform an average-case analysis, in which the input was chosen in a completely random fashion (e.g. a random string of zeroes and ones, or a random vector whose entries were all distributed according to a Gaussian distribution). While such input models were usually not too realistic (except in situations where the signal-to-noise ratio was very low), they were usually fairly simple to analyse (using tools such as concentration of measure).

In many situations, the worst-case analysis is too conservative, and the average-case analysis is too optimistic or unrealistic. For instance, when using the popular simplex method to solve linear programming problems, the worst-case run-time can be exponentially large in the size of the problem, whereas the average-case run-time (in which one is fed a randomly chosen linear program as input) is polynomial. However, the typical linear program that one encounters in practice has enough structure to it that it does not resemble a randomly chosen program at all, and so it is not clear that the average-case bound is appropriate for the type of inputs one has in practice. At the other extreme, the exponentially bad worst-case inputs were so rare that they never seemed to come up in practice either.

To obtain a better input model, Spielman and Teng considered a smoothed-case model, in which the input was the sum of a deterministic (and possibly worst-case) input, and a small noise perturbation, which they took to be Gaussian to simplify their analysis. This reflected the presence of measurement error, roundoff error, and similar sources of noise in real-life applications of numerical algorithms. Remarkably, they were able to analyse the run-time of the simplex method for this model, concluding (after a lengthy technical argument) that under reasonable choices of parameters, the run time was polynomial time in the length, thus explaining the empirically observed phenomenon that the simplex method tended to run a lot better in practice than its worst-case analysis would predict, even if one started with extremely ill-conditioned inputs, provided that there was a bit of noise in the system.

One of the ingredients in their analysis was a quantitative bound on the condition number of an arbitrary matrix when it is perturbed by a random gaussian perturbation; the point being that random perturbation can often make an ill-conditioned matrix better behaved. (This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.) Recently, new tools from additive combinatorics (in particular, inverse Littlewood-Offord theory) have enabled Rudelson-Vershynin, Vu and myself, and others to generalise this bound to other noise models, such as random Bernoulli perturbations, which are a simple model for modeling digital roundoff error.

— 2. Yves Meyer —

Yves Meyer has worked in many fields over the years, from number theory to harmonic analysis to PDE to signal processing. As the Gauss prize is concerned with impact on fields outside of mathematics, Meyer’s major contributions to the theoretical foundations of wavelets, which are now a basic tool in signal processing, were undoubtedly a major consideration in awarding this prize. But I would like to focus here on another of Yves’ contributions, namely the Coifman-Meyer theory of paraproducts developed with Raphy Coifman, which is a cornerstone of the para-differential calculus that has turned out to be an indispensable tool in the modern theory of nonlinear PDE.

Nonlinear differential equations, by definition, tend to involve a combination of differential operators and nonlinear operators. The simplest example of the latter is a pointwise product {uv} of two fields {u} and {v}. One is thus often led to study expressions such as {D(uv)} or {D^{-1}(uv)} for various differential operators {D}.

For first order operators {D}, we can handle derivatives using the product rule (or Leibniz rule) from freshman calculus:

\displaystyle  D(uv) = (Du)v + u(Dv).

We can then iterate this to handle higher order derivatives. For instance, we have

\displaystyle  D^2(uv) = (D^2 u) v + 2 (Du) (Dv) + u (D^2 v),

and

\displaystyle  D^3(uv) = (D^3 u) v + 3 (D^2u) (Dv) + 3 (Du) (D^2 v) + u (D^3 v),

and so forth, assuming of course that all functions involved are sufficiently regular so that all expressions make sense. For inverse derivative expressions such as {D^{-1}(uv)}, no such simple formula exists, although one does have the very important integration by parts formula as a substitute. And for fractional derivatives such as {|D|^\alpha(uv)} with {\alpha > 0} a non-integer, there is also no closed formula of the above form.

Note how the derivatives on the product {uv} get distributed to the individual factors {u,v}, with {u} absorbing all the derivatives in one term, {v} absorbing all the derivatives in another, and the derivatives being shared between {u} and {v} in other terms. Within the usual confines of differential calculus; we cannot pick and choose which of the terms we like to keep, and which ones to discard; we must treat every single one of the terms that arise from the Leibniz expansion. This can cause difficulty when trying to control the product of two functions of unequal regularity – a situation that occurs very frequently in nonlinear PDE. For instance, if {u} is {C^1} (once continuously differentiable), and {v} is {C^3} (three times continuously differentiable), then the product {uv} is merely {C^1} rather than {C^3}; intuitively, we cannot “prevent” the three derivatives in the expression {D^3(uv)} from making their way to the {u} factor, which is not prepared to absorb all of them. However, it turns out that if we split the product {uv} into paraproducts such as the high-low paraproduct {\pi_{hl}(u,v)} and the low-high paraproduct, we can effectively separate these terms from each other, allowing for a much more flexible analysis of the situation.

The concept of a paraproduct can be motivated by using the Fourier transform. For simplicity let us work in one dimension, with {D} being the usual differential operator {D = \frac{d}{dx}}. Using a Fourier expansion (and assuming as much regularity and integrability as is needed to justify the formal manipulations) of {u} into components of different frequencies, we have

\displaystyle  u(x) = \int_{\bf R} \hat u(\xi) e^{i x \xi}\ d\xi

(here we use the usual PDE normalisation in which we try to hide the {2\pi} factor) and similarly

\displaystyle  v(x) = \int_{\bf R} \hat v(\eta) e^{i x \eta}\ d\eta

and thus

\displaystyle  uv(x) = \int_{\bf R} \int_{\bf R} \hat u(\xi) \hat v(\eta) e^{ix(\xi+\eta)}\ d\xi d\eta \ \ \ \ \ (1)

and thus, by differentiating under the integral sign

\displaystyle  D^k(uv)(x) = i^k \int_{\bf R} \int_{\bf R} (\xi+\eta)^k \hat u(\xi) \hat v(\eta) e^{ix(\xi+\eta)}\ d\xi d\eta.

In contrast, we have

\displaystyle  (D^j u) (D^{k-j} v)(x) = i^k \int_{\bf R} \int_{\bf R} \xi^j \eta^{k-j} \hat u(\xi) \hat v(\eta) e^{ix(\xi+\eta)}\ d\xi d\eta;

thus, the iterated Leibnitz rule just becomes the binomial formula

\displaystyle  (\xi+\eta)^2 = \xi^2 + 2\xi\eta + \eta^2, \quad (\xi+\eta)^3 = \xi^3 + 3 \xi^2 \eta + 3 \xi \eta^2 + \eta^3, \ldots

after taking Fourier transforms.

Now, it is certainly true that when dealing with an expression such as {(\xi+\eta)^2}, all three terms {\xi^2, 2\xi\eta, \eta^2} need to be present. But observe that when dealing with a “high-low” frequency interaction, in which {\xi} is much larger in magnitude than {\eta}, then the first term dominates: {(\xi+\eta)^2 \sim \xi^2}. Conversely, with a “low-high” frequency interaction, in which {\eta} is much larger in magnitude than {\xi}, we have {(\xi+\eta)^2 \sim \eta^2}. (There are also “high-high” interactions, in which {\xi} and {\eta} are comparable in magnitude, and {(\xi+\eta)^2} can be significantly smaller than either {\xi^2} or {\eta^2}, but for simplicity of discussion let us ignore this case.) It then becomes natural to try to decompose the product {uv} into “high-low” and “low-high” pieces (plus a “high-high” error), for instance by inserting suitable cutoff functions {m_{hl}(\xi,\eta)} or {m_{lh}(\xi,\eta)} in (1) to the regions {|\xi| \gg |\eta|} or {|\xi| \ll |\eta|} to create the paraproducts

\displaystyle  \pi_{hl}(u,v)(x) = \int_{\bf R} \int_{\bf R} m_{hl}(\xi,\eta) \hat u(\xi) \hat v(\eta) e^{ix(\xi+\eta)}\ d\xi d\eta

and

\displaystyle  \pi_{lh}(u,v)(x) = \int_{\bf R} \int_{\bf R} m_{lh}(\xi,\eta) \hat u(\xi) \hat v(\eta) e^{ix(\xi+\eta)}\ d\xi d\eta

Such paraproducts were first introduced by Calderón, and more explicitly by Bony. Heuristically, {\pi_{hl}(u,v)} is the “high-low” portion of the product {uv}, in which the high frequency components of {u} are “allowed” to interact with the low frequency components of {v}, but no other frequency interactions are permitted, and similarly for {\pi_{lh}(u,v)}. The para-differential calculus of Bony, Coifman, and Meyer then allows one to manipulate these paraproducts in ways that are very similar to ordinary pointwise products, except that they behave better with respect to the Leibniz rule or with more exotic differential or integral operators. For instance, one has

\displaystyle  D^k \pi_{hl}(u,v) \approx \pi_{hl}(D^k u, v)

and

\displaystyle  D^k \pi_{lh}(u,v) \approx \pi_{lh}(u, D^k v)

for differential operators {D^k} (and more generally for pseudodifferential operators such as {|D|^\alpha}, or integral operators such as {D^{-1}}), where we use the {\approx} symbol loosely to denote “up to lower order terms”. Furthermore, many of the basic estimates of the pointwise product, in particular Hölder’s inequality, have analogues for paraproducts; this is a special case of what is now known as the Coifman-Meyer theorem, which is fundamental in this subject, and is proven using Littlewood-Paley theory. The same theory in fact gives some estimates for paraproducts beyond what are available for products. For instance, if {u} is in {C^1} and {v} is in {C^3}, then the paraproduct {\pi_{lh}(u,v)} is “almost” in {C^3} (modulo some technical logarithmic divergences which I will not elaborate on here), in contrast to the full product {uv} which is merely in {C^1}.

Paraproducts also allow one to extend the classical product and chain rules to fractional derivative operators, leading to the fractional Leibniz rule

\displaystyle  |D|^\alpha(uv) \approx (|D|^\alpha u) v + u (|D|^\alpha v)

and fractional chain rule

\displaystyle  |D|^\alpha(F(u)) \approx (|D|^\alpha u) F'(u)

which are both very useful in nonlinear PDE (see e.g. this book of Taylor for a thorough treatment). See also this brief Notices article on paraproducts by Benyi, Maldonado, and Naibo.

— 3. Louis Nirenberg —

Louis Nirenberg has made an amazing number of contributions to analysis, PDE, and geometry (e.g. John-Nirenberg inequality, Nirenberg-Treves conjecture (recently solved by Dencker), Newlander-Nirenberg theorem, Gagliardo-Nirenberg inequality, Caffarelli-Kohn-Nirenberg theorem, etc.), while also being one of the nicest people I know. I will mention only two results of his here, one of them very briefly.

Among other things, Nirenberg and Kohn introduced the pseudo-differential calculus which, like the para-differential calculus mentioned in the previous section, is an extension of differential calculus, but this time focused more on generalisation to variable coefficient or fractional operators, rather than in generalising the Leibniz or chain rules. This calculus sits at the intersection of harmonic analysis, PDE, von Neumann algebras, microlocal analysis, and semiclassical physics, and also happens to be closely related to Meyer’s work on wavelets; it quantifies the positive aspects of the Heisenberg uncertainty principle, in that one can observe position and momentum simultaneously so long as the uncertainty relation is respected. But I will not discuss this topic further today.

Instead, I would like to focus here instead on a gem of an argument of Gidas, Ni, and Nirenberg, which is a brilliant application of Alexandrov’s method of moving planes, combined with the ubiquitous maximum principle. This concerns solutions to the ground state equation

\displaystyle  \Delta Q + Q^p = Q \ \ \ \ \ (2)

where {Q: {\bf R}^n \rightarrow {\bf R}^+} is a smooth positive function that decays exponentially at infinity, {p > 1} is an exponent, and {\Delta := \sum_{j=1}^n \frac{\partial^2}{\partial x_j^2}} is the Laplacian. This equation shows up in a number of contexts, including the nonlinear Schrödinger equation and also, by coincidence, in connection with the best constants in the Gagliardo-Nirenberg inequality. The existence of ground states {Q} can be proven by the variational principle. But one can say much more:

Lemma 1 (Gidas-Ni-Nirenberg) All ground states {Q} are radially symmetric with respect to some origin.

To show this radial symmetry, a small amount of Euclidean geometry shows that it is enough to show that there is a lot of reflection symmetry:

Lemma 2 (Gidas-Ni-Nirenberg, again) If {Q} is a ground state and {\omega \in S^{n-1}} is a unit vector, then there exists a hyperplane orthogonal to {\omega} with respect to which {Q} is symmetric.

To prove this lemma, we use the moving planes method, sliding in a plane orthogonal to {\omega} from infinity. More precisely, for each {t \in {\bf R}}, let {\Pi_t} be the hyperplane {\{ x: x \cdot \omega = t \}}, let {H_t} be the associated half-space {\{ x: x \cdot \omega \leq t\}}, and let {Q_t: H_t \rightarrow {\bf R}} be the function {Q_t(x) := Q(x) - Q(r_t(x))}, where

\displaystyle  r_t(x) := x + 2 (t - x \cdot \omega) \omega

is the reflection through {\Pi_t}; thus {Q_t} is the difference between {Q} and its reflection in {\Pi_t}. In particular, {Q_t} vanishes on the boundary {\Pi_t} of the half-space {H_t}.

Intuitively, the argument proceeds as follows. It is plausible that {Q_t} is going to be positive in the interior of {H_t} for large positive {t}, but negative in the interior of {H_t} for large negative {t}. Now imagine sliding {t} down from {+\infty} to {-\infty} until one reaches the first point {t = t_0} where {Q_{t_0}} ceases to be positive in the interior of {H_{t_0}}, then it attains its minimum at some zero in the interior of {H_{t_0}}. But by playing around with (2) (using the Lipschitz nature of the map {Q \mapsto Q^p} when {Q} is bounded) we know that {Q_{t_0}} obeys an elliptic constraint of the form {\Delta Q_{t_0} = O( |Q_{t_0}| )}. Applying the maximum principle, we can then conclude that {Q_{t_0}} vanishes identically in {H_{t_0}}, which gives the desired reflectoin symmetry.

(Now it turns out that there are some technical issues in making the above sketch precise, mainly because of the non-compact nature of the half-space {H_t}, but these can be fixed with a little bit of fiddling; see for instance Appendix B of my PDE textbook.)