In my previous post, I briefly discussed the work of the four Fields medalists of 2010 (Lindenstrauss, Ngo, Smirnov, and Villani). In this post I will discuss the work of Dan Spielman (winner of the Nevanlinna prize), Yves Meyer (winner of the Gauss prize), and Louis Nirenberg (winner of the Chern medal). Again by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise, so I will be able to discuss a sample contribution for each of them. Again, my choice of contribution is somewhat idiosyncratic and is not intended to represent the “best” work of each of the awardees.

** — 1. Dan Spielman — **

Dan Spielman works in numerical analysis (and in particular, numerical linear algebra) and theoretical computer science. Here I want to talk about one of his key contributions, namely his pioneering work with Teng on smoothed analysis. This is about an idea as much as it is about a collection of rigorous results, though Spielman and Teng certainly did buttress their ideas with serious new theorems.

Prior to this work, there were two basic ways that one analysed the performance (which could mean run-time, accuracy, or some other desirable quality) of a given algorithm. Firstly, one could perform a *worst-case* analysis, in which one assumed that the input was chosen in such an “adversarial” fashion that the performance was as poor as possible. Such an analysis would be suitable for applications such as certain aspects of cryptography, in which the input really was chosen by an adversary, or in high-stakes situations in which there was zero tolerance for any error whatsoever; it is also useful as a “default” analysis for when no realistic input model is available.

At the other extreme, one could perform an *average-case* analysis, in which the input was chosen in a completely random fashion (e.g. a random string of zeroes and ones, or a random vector whose entries were all distributed according to a Gaussian distribution). While such input models were usually not too realistic (except in situations where the signal-to-noise ratio was very low), they were usually fairly simple to analyse (using tools such as concentration of measure).

In many situations, the worst-case analysis is too conservative, and the average-case analysis is too optimistic or unrealistic. For instance, when using the popular simplex method to solve linear programming problems, the worst-case run-time can be exponentially large in the size of the problem, whereas the average-case run-time (in which one is fed a randomly chosen linear program as input) is polynomial. However, the typical linear program that one encounters in practice has enough structure to it that it does not resemble a randomly chosen program at all, and so it is not clear that the average-case bound is appropriate for the type of inputs one has in practice. At the other extreme, the exponentially bad worst-case inputs were so rare that they never seemed to come up in practice either.

To obtain a better input model, Spielman and Teng considered a *smoothed-case* model, in which the input was the sum of a deterministic (and possibly worst-case) input, and a small noise perturbation, which they took to be Gaussian to simplify their analysis. This reflected the presence of measurement error, roundoff error, and similar sources of noise in real-life applications of numerical algorithms. Remarkably, they were able to analyse the run-time of the simplex method for this model, concluding (after a lengthy technical argument) that under reasonable choices of parameters, the run time was polynomial time in the length, thus explaining the empirically observed phenomenon that the simplex method tended to run a lot better in practice than its worst-case analysis would predict, even if one started with extremely ill-conditioned inputs, provided that there was a bit of noise in the system.

One of the ingredients in their analysis was a quantitative bound on the condition number of an arbitrary matrix when it is perturbed by a random gaussian perturbation; the point being that random perturbation can often make an ill-conditioned matrix better behaved. (This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.) Recently, new tools from additive combinatorics (in particular, inverse Littlewood-Offord theory) have enabled Rudelson-Vershynin, Vu and myself, and others to generalise this bound to other noise models, such as random Bernoulli perturbations, which are a simple model for modeling digital roundoff error.

** — 2. Yves Meyer — **

Yves Meyer has worked in many fields over the years, from number theory to harmonic analysis to PDE to signal processing. As the Gauss prize is concerned with impact on fields outside of mathematics, Meyer’s major contributions to the theoretical foundations of wavelets, which are now a basic tool in signal processing, were undoubtedly a major consideration in awarding this prize. But I would like to focus here on another of Yves’ contributions, namely the *Coifman-Meyer theory of paraproducts* developed with Raphy Coifman, which is a cornerstone of the *para-differential calculus* that has turned out to be an indispensable tool in the modern theory of nonlinear PDE.

Nonlinear differential equations, by definition, tend to involve a combination of differential operators and nonlinear operators. The simplest example of the latter is a pointwise product of two fields and . One is thus often led to study expressions such as or for various differential operators .

For first order operators , we can handle derivatives using the product rule (or *Leibniz rule*) from freshman calculus:

We can then iterate this to handle higher order derivatives. For instance, we have

and

and so forth, assuming of course that all functions involved are sufficiently regular so that all expressions make sense. For inverse derivative expressions such as , no such simple formula exists, although one does have the very important integration by parts formula as a substitute. And for fractional derivatives such as with a non-integer, there is also no closed formula of the above form.

Note how the derivatives on the product get *distributed* to the individual factors , with absorbing all the derivatives in one term, absorbing all the derivatives in another, and the derivatives being shared between and in other terms. Within the usual confines of differential calculus; we cannot pick and choose which of the terms we like to keep, and which ones to discard; we must treat every single one of the terms that arise from the Leibniz expansion. This can cause difficulty when trying to control the product of two functions of unequal regularity – a situation that occurs very frequently in nonlinear PDE. For instance, if is (once continuously differentiable), and is (three times continuously differentiable), then the product is merely rather than ; intuitively, we cannot “prevent” the three derivatives in the expression from making their way to the factor, which is not prepared to absorb all of them. However, it turns out that if we split the product into *paraproducts* such as the *high-low paraproduct* and the *low-high paraproduct*, we can effectively separate these terms from each other, allowing for a much more flexible analysis of the situation.

The concept of a paraproduct can be motivated by using the Fourier transform. For simplicity let us work in one dimension, with being the usual differential operator . Using a Fourier expansion (and assuming as much regularity and integrability as is needed to justify the formal manipulations) of into components of different frequencies, we have

(here we use the usual PDE normalisation in which we try to hide the factor) and similarly

and thus, by differentiating under the integral sign

In contrast, we have

thus, the iterated Leibnitz rule just becomes the binomial formula

after taking Fourier transforms.

Now, it is certainly true that when dealing with an expression such as , all three terms need to be present. But observe that when dealing with a “high-low” frequency interaction, in which is much larger in magnitude than , then the first term dominates: . Conversely, with a “low-high” frequency interaction, in which is much larger in magnitude than , we have . (There are also “high-high” interactions, in which and are comparable in magnitude, and can be significantly smaller than either or , but for simplicity of discussion let us ignore this case.) It then becomes natural to try to decompose the product into “high-low” and “low-high” pieces (plus a “high-high” error), for instance by inserting suitable cutoff functions or in (1) to the regions or to create the *paraproducts*

and

Such paraproducts were first introduced by Calderón, and more explicitly by Bony. Heuristically, is the “high-low” portion of the product , in which the high frequency components of are “allowed” to interact with the low frequency components of , but no other frequency interactions are permitted, and similarly for . The *para-differential calculus* of Bony, Coifman, and Meyer then allows one to manipulate these paraproducts in ways that are very similar to ordinary pointwise products, except that they behave better with respect to the Leibniz rule or with more exotic differential or integral operators. For instance, one has

and

for differential operators (and more generally for pseudodifferential operators such as , or integral operators such as ), where we use the symbol loosely to denote “up to lower order terms”. Furthermore, many of the basic estimates of the pointwise product, in particular Hölder’s inequality, have analogues for paraproducts; this is a special case of what is now known as the Coifman-Meyer theorem, which is fundamental in this subject, and is proven using Littlewood-Paley theory. The same theory in fact gives some estimates for paraproducts beyond what are available for products. For instance, if is in and is in , then the paraproduct is “almost” in (modulo some technical logarithmic divergences which I will not elaborate on here), in contrast to the full product which is merely in .

Paraproducts also allow one to extend the classical product and chain rules to fractional derivative operators, leading to the *fractional Leibniz rule*

and *fractional chain rule*

which are both very useful in nonlinear PDE (see e.g. this book of Taylor for a thorough treatment). See also this brief Notices article on paraproducts by Benyi, Maldonado, and Naibo.

** — 3. Louis Nirenberg — **

Louis Nirenberg has made an amazing number of contributions to analysis, PDE, and geometry (e.g. John-Nirenberg inequality, Nirenberg-Treves conjecture (recently solved by Dencker), Newlander-Nirenberg theorem, Gagliardo-Nirenberg inequality, Caffarelli-Kohn-Nirenberg theorem, etc.), while also being one of the nicest people I know. I will mention only two results of his here, one of them very briefly.

Among other things, Nirenberg and Kohn introduced the *pseudo-differential calculus* which, like the para-differential calculus mentioned in the previous section, is an extension of differential calculus, but this time focused more on generalisation to variable coefficient or fractional operators, rather than in generalising the Leibniz or chain rules. This calculus sits at the intersection of harmonic analysis, PDE, von Neumann algebras, microlocal analysis, and semiclassical physics, and also happens to be closely related to Meyer’s work on wavelets; it quantifies the positive aspects of the Heisenberg uncertainty principle, in that one *can* observe position and momentum simultaneously so long as the uncertainty relation is respected. But I will not discuss this topic further today.

Instead, I would like to focus here instead on a gem of an argument of Gidas, Ni, and Nirenberg, which is a brilliant application of Alexandrov’s *method of moving planes*, combined with the ubiquitous maximum principle. This concerns solutions to the *ground state equation*

where is a smooth positive function that decays exponentially at infinity, is an exponent, and is the Laplacian. This equation shows up in a number of contexts, including the nonlinear Schrödinger equation and also, by coincidence, in connection with the best constants in the Gagliardo-Nirenberg inequality. The existence of ground states can be proven by the variational principle. But one can say much more:

Lemma 1 (Gidas-Ni-Nirenberg)All ground states are radially symmetric with respect to some origin.

To show this radial symmetry, a small amount of Euclidean geometry shows that it is enough to show that there is a lot of reflection symmetry:

Lemma 2 (Gidas-Ni-Nirenberg, again)If is a ground state and is a unit vector, then there exists a hyperplane orthogonal to with respect to which is symmetric.

To prove this lemma, we use the moving planes method, sliding in a plane orthogonal to from infinity. More precisely, for each , let be the hyperplane , let be the associated half-space , and let be the function , where

is the reflection through ; thus is the difference between and its reflection in . In particular, vanishes on the boundary of the half-space .

Intuitively, the argument proceeds as follows. It is plausible that is going to be positive in the interior of for large positive , but negative in the interior of for large negative . Now imagine sliding down from to until one reaches the first point where ceases to be positive in the interior of , then it attains its minimum at some zero in the interior of . But by playing around with (2) (using the Lipschitz nature of the map when is bounded) we know that obeys an elliptic constraint of the form . Applying the maximum principle, we can then conclude that vanishes identically in , which gives the desired reflectoin symmetry.

(Now it turns out that there are some technical issues in making the above sketch precise, mainly because of the non-compact nature of the half-space , but these can be fixed with a little bit of fiddling; see for instance Appendix B of my PDE textbook.)

## 24 comments

Comments feed for this article

20 August, 2010 at 10:50 am

UNSWThanks a lot for the summarization.

20 August, 2010 at 11:27 am

ozA very nice post. There is one minor mistake I believe:

“Again, by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise ..”

I don’t see how can it be that by mere chance the work of 6 out of the 7 prize winners is related to prof. Tao’s areas of expertise.

Any reasonable statistical analysis will reject this null hypothesis and suggest that it is not chance but the enormous breadth of prof. Tao’s

mathematical knowledge and interests which is in act here

20 August, 2010 at 12:02 pm

AnonThanks! I enjoyed reading this post. My favorite line is “This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.” I laughed out loud reading it.

20 August, 2010 at 12:25 pm

ZaherThanks for the summaries.

Small typo: in the definition of $\pi_{lh}$ the arguments (or subscripts) of $m_{hl}(.,.)$ should be permuted.

[Corrected, thanks - T.]20 August, 2010 at 3:45 pm

RogerPerhaps Tao’s expertise is not so broad, but the selection committee reads this blog, and they think that a subject is not important unless it is written about here! Maybe next we will see a prize for the empirical experiences of kicking machinery.

20 August, 2010 at 11:05 pm

ICM2010 — Ngo laudatio « Gowers's Weblog[...] for a general audience and Terence Tao has now posted about the work of the Fields medallists and the other prizewinners. And as I have already said, the ICM website has links to the full texts of the laudationes [...]

20 August, 2010 at 11:58 pm

AnonymousI thought the complexity of the simplex method was figured out by Smale, Lemke, etc in the 1970s(?) in about the way you describe, establishing that it’s polynomial except on a measure-zero set that can be characterized precisely. I’m probably confused.

21 August, 2010 at 7:27 am

Terence TaoI believe Smale addressed the average case complexity of the simplex method, but not the smoothed complexity, i.e. that the expected value of the run time was polynomial, if the inputs were chosen randomly. By applying Markov’s inequality, this does show that inputs with very large run time are rare, but unfortunately such bounds are not strong enough to directly say anything about the smoothed model, because the set of data which is very close to a specific input has tiny measure (exponentially small) with respect to a standard reference measure (e.g. gaussian).

30 August, 2010 at 11:25 am

Gil KalaiThe “pathological” examples for which the simplex algorithm requires exponentially many steps represent a set of problems of positive measure. So we cannot expect a result of the kind described by Anonymous. One remaining mystery both in the average-case analysis and smoothed analysis of the simplex algorithm is to give tail-estimates for the running time which are better than what can simply be deduced by Markov’s inequality. As a first step, is there a way to understand the variance of the number of steps? Next, can we expect an exponentially small tail?

21 August, 2010 at 4:03 am

Watch the Math! | Sun Ju's Blog[...] Terence Tao also gave his personal exposition (here and here) on the remarkable work of Prize winners in [...]

21 August, 2010 at 8:48 pm

AJProf: I do not know why the random perturbation to make ill-conditioned matrix well-conditioned is an important work in Dan Spielman’s work. I would bet against some Engineering companies which already do not use that idea. It seems control theoretic in spirit.

21 August, 2010 at 9:06 pm

AJI mean I doubt if some engineering companies already do not use the idea.

24 August, 2010 at 6:33 pm

SujitA comment from one of my friends in response to your observation: This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.

“Perhaps this applies to ill-behaved children being ‘perturbed’ every once in a while by a dose of yelling too? Just wondering about the homely implications of abstract Math. :D “

24 August, 2010 at 10:57 pm

TomI think you are the best Terry. I have a nostalgic feeling seeing you get the prize in 2006. You are the only one who writes an excellent blog and tediously summarize other people’s results. The people who won it this year are nobodys compared to you. T.Tao you are the man.

27 August, 2010 at 5:53 pm

AnonymousI believe he has a little robot assistant beside him or he has secretly invented the time machine or he has access to cranial ability performance enhancing drugs. Take a look at the number of publications he has had before 30. And this blog. I wonder where he has time. Amazing!

25 August, 2010 at 6:07 am

ICM2010 — Niremberg and Meyer laudationes « Gowers's Weblog[...] finish this post with a reminder that the work of Nirenberg and Meyer is discussed by Terry Tao on his [...]

25 August, 2010 at 8:03 am

poiuytTypo: “the typical linear program that one encounters in PRACTICE”?

[Corrected, thanks - T.]25 August, 2010 at 11:59 am

John SidlesThe insight that “some pieces of machinery work better after being kicked” has been invoked to explain the generic feasibility of efficient (classical) descriptions of large-dimension quantum dynamical trajectories. See (for example) Wojciech Zurek’s review “Decoherence, einselection, and the quantum origins of the classical” (Rev. Mod. Phys., 2003).

26 August, 2010 at 5:39 am

Boaz BarakA minor correction: actually, cryptography is typically focused on the *average case* analysis of algorithms. The reason is that the algorithmic task at the heart of a cryptosystem is that task of *breaking* it. Hence the input is chosen by the designer of the cryptosystem from a fixed well defined distribution (i.e., the one obtained by choosing a random key), and the adversary picks a “worst-case choice of the algorithm” to try to break it. (However, it’s also true that one also needs worst-case analysis of algorithms to make sure that, for example, an decryption algorithm will always finish decoding no matter what input it’s provided, and to make sure that the reductions in the security proof of a cryptosystem are efficient.)

The reason worst-case analysis of algorithms is so prevalent is not because of crypto but rather because in most cases it’s very hard to find a clean model distribution of the inputs, and so worst-case performing general algorithms, that one can plug in any application without worrying about the input distributions are very useful. (The simplex algorithm is perhaps the canonical example for an algorithm that seems to have the latter usefulness property without having the former, and the Spielman-Teng work gives some reasons why.)

Another “justification” for worst-case analysis is that it’s also been observed that in many cases, if we have an efficient algorithm for many diverse families of natural inputs, there is also another efficient algorithm that works in the worst-case. This is the case for linear programming. Two interesting exceptions are the graph isomorphism problem and approximating so-called “unique games” (i.e., the problem underlying Khot’s unique games conjecture), where there are efficient algorithms for many natural input distributions, but there no known algorithms (yet?) proven to work on all inputs.

[Fair enough, although worst-case analysis is still relevant for some aspects of cryptography, particularly with regards to code design rather than code breaking. I've reworded the paragraph accordingly. - T]31 August, 2010 at 3:43 am

Gil KalaiIt is worth mentioning also that there are various notions of “average case analysis”. The most naive notion is about uniform distribution on the inputs. A more sophisticated notion is about an arbitrary “computable” distribution of the inputs which is (I believe) what “Levin’s average case complexity” is more or less about. Levin’s notion of average case complexity is probably the more intimately connected with cryptography but I am sure Boaz can explain it much better than me.

1 September, 2010 at 8:11 am

Boaz BarakHi Gil,

You are right that in crypto we have the freedom to choose the distribution over inputs that will make the problem hardest, and that distribution doesn’t have to be the uniform distribution. However, in many cases, crypto is concerned with clean distributions such as uniform strings, random matrices, product of two random primes etc.. In fact, crypto may be the only “real world” application where one indeed needs to solve instances generated by such clean distributions (in contrast, instances coming from “nature” will be at best only approximately from a model distribution).

Levin gave a general definition of what it means to have a “polynomial on the average” algorithm for a given problem F and a distribution D (it turns out there are some subtleties in defining these). He then showed some completeness results, although unfortunately the reductions seem to always either make the problem or the distribution “unnatural”.

There is a subtle reason why Levin’s notion is not immediately useful to cryptography. This is explained beautifully in this survey of Russell Impagliazzo who calls it the difference between “Pessiland” and “Minicrypt”. Roughly speaking, to be useful for crypto, we need a way to sample “planted hard instances” of the distribution. For some natural distributions such as random clique and random 3SAT this is easy due to the symmetries of the distribution, but it can conceivably be the case that there is a problem F (in NP) and a distribution D such that we can efficiently sample an input from D, but we have no procedure to sample a pair (x,y) of an input from D and a solution (i.e. “witness”) y to this input. It is the latter procedure that is needed to use F as a basis for cryptography.

2 September, 2010 at 7:55 am

ICM2010 — final post « Gowers's Weblog[...] Terence Tao on the work of Spielman, Meyer and Nirenberg [...]

24 January, 2011 at 4:10 pm

It’s just another talk, to be honest « It's cold outside[...] https://terrytao.wordpress.com/2010/08/20/spielman-meyer-nirenberg/ [...]

13 December, 2012 at 2:34 am

Matthew Kennedy awarded CMS 2012 Doctoral Prize « Noncommutative Analysis[...] work. Tim Gowers and Terry Tao have set a fine example in their expositions of the works of Fields Medalists or Abel Prize laureates. These are among the most interesting and important posts out there, I [...]