In my previous post, I briefly discussed the work of the four Fields medalists of 2010 (Lindenstrauss, Ngo, Smirnov, and Villani). In this post I will discuss the work of Dan Spielman (winner of the Nevanlinna prize), Yves Meyer (winner of the Gauss prize), and Louis Nirenberg (winner of the Chern medal). Again by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise, so I will be able to discuss a sample contribution for each of them. Again, my choice of contribution is somewhat idiosyncratic and is not intended to represent the “best” work of each of the awardees.
— 1. Dan Spielman —
Dan Spielman works in numerical analysis (and in particular, numerical linear algebra) and theoretical computer science. Here I want to talk about one of his key contributions, namely his pioneering work with Teng on smoothed analysis. This is about an idea as much as it is about a collection of rigorous results, though Spielman and Teng certainly did buttress their ideas with serious new theorems.
Prior to this work, there were two basic ways that one analysed the performance (which could mean run-time, accuracy, or some other desirable quality) of a given algorithm. Firstly, one could perform a worst-case analysis, in which one assumed that the input was chosen in such an “adversarial” fashion that the performance was as poor as possible. Such an analysis would be suitable for applications such as certain aspects of cryptography, in which the input really was chosen by an adversary, or in high-stakes situations in which there was zero tolerance for any error whatsoever; it is also useful as a “default” analysis for when no realistic input model is available.
At the other extreme, one could perform an average-case analysis, in which the input was chosen in a completely random fashion (e.g. a random string of zeroes and ones, or a random vector whose entries were all distributed according to a Gaussian distribution). While such input models were usually not too realistic (except in situations where the signal-to-noise ratio was very low), they were usually fairly simple to analyse (using tools such as concentration of measure).
In many situations, the worst-case analysis is too conservative, and the average-case analysis is too optimistic or unrealistic. For instance, when using the popular simplex method to solve linear programming problems, the worst-case run-time can be exponentially large in the size of the problem, whereas the average-case run-time (in which one is fed a randomly chosen linear program as input) is polynomial. However, the typical linear program that one encounters in practice has enough structure to it that it does not resemble a randomly chosen program at all, and so it is not clear that the average-case bound is appropriate for the type of inputs one has in practice. At the other extreme, the exponentially bad worst-case inputs were so rare that they never seemed to come up in practice either.
To obtain a better input model, Spielman and Teng considered a smoothed-case model, in which the input was the sum of a deterministic (and possibly worst-case) input, and a small noise perturbation, which they took to be Gaussian to simplify their analysis. This reflected the presence of measurement error, roundoff error, and similar sources of noise in real-life applications of numerical algorithms. Remarkably, they were able to analyse the run-time of the simplex method for this model, concluding (after a lengthy technical argument) that under reasonable choices of parameters, the run time was polynomial time in the length, thus explaining the empirically observed phenomenon that the simplex method tended to run a lot better in practice than its worst-case analysis would predict, even if one started with extremely ill-conditioned inputs, provided that there was a bit of noise in the system.
One of the ingredients in their analysis was a quantitative bound on the condition number of an arbitrary matrix when it is perturbed by a random gaussian perturbation; the point being that random perturbation can often make an ill-conditioned matrix better behaved. (This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.) Recently, new tools from additive combinatorics (in particular, inverse Littlewood-Offord theory) have enabled Rudelson-Vershynin, Vu and myself, and others to generalise this bound to other noise models, such as random Bernoulli perturbations, which are a simple model for modeling digital roundoff error.
— 2. Yves Meyer —
Yves Meyer has worked in many fields over the years, from number theory to harmonic analysis to PDE to signal processing. As the Gauss prize is concerned with impact on fields outside of mathematics, Meyer’s major contributions to the theoretical foundations of wavelets, which are now a basic tool in signal processing, were undoubtedly a major consideration in awarding this prize. But I would like to focus here on another of Yves’ contributions, namely the Coifman-Meyer theory of paraproducts developed with Raphy Coifman, which is a cornerstone of the para-differential calculus that has turned out to be an indispensable tool in the modern theory of nonlinear PDE.
Nonlinear differential equations, by definition, tend to involve a combination of differential operators and nonlinear operators. The simplest example of the latter is a pointwise product of two fields
and
. One is thus often led to study expressions such as
or
for various differential operators
.
For first order operators , we can handle derivatives using the product rule (or Leibniz rule) from freshman calculus:
We can then iterate this to handle higher order derivatives. For instance, we have
and
and so forth, assuming of course that all functions involved are sufficiently regular so that all expressions make sense. For inverse derivative expressions such as , no such simple formula exists, although one does have the very important integration by parts formula as a substitute. And for fractional derivatives such as
with
a non-integer, there is also no closed formula of the above form.
Note how the derivatives on the product get distributed to the individual factors
, with
absorbing all the derivatives in one term,
absorbing all the derivatives in another, and the derivatives being shared between
and
in other terms. Within the usual confines of differential calculus; we cannot pick and choose which of the terms we like to keep, and which ones to discard; we must treat every single one of the terms that arise from the Leibniz expansion. This can cause difficulty when trying to control the product of two functions of unequal regularity – a situation that occurs very frequently in nonlinear PDE. For instance, if
is
(once continuously differentiable), and
is
(three times continuously differentiable), then the product
is merely
rather than
; intuitively, we cannot “prevent” the three derivatives in the expression
from making their way to the
factor, which is not prepared to absorb all of them. However, it turns out that if we split the product
into paraproducts such as the high-low paraproduct
and the low-high paraproduct, we can effectively separate these terms from each other, allowing for a much more flexible analysis of the situation.
The concept of a paraproduct can be motivated by using the Fourier transform. For simplicity let us work in one dimension, with being the usual differential operator
. Using a Fourier expansion (and assuming as much regularity and integrability as is needed to justify the formal manipulations) of
into components of different frequencies, we have
(here we use the usual PDE normalisation in which we try to hide the factor) and similarly
and thus, by differentiating under the integral sign
In contrast, we have
thus, the iterated Leibnitz rule just becomes the binomial formula
after taking Fourier transforms.
Now, it is certainly true that when dealing with an expression such as , all three terms
need to be present. But observe that when dealing with a “high-low” frequency interaction, in which
is much larger in magnitude than
, then the first term dominates:
. Conversely, with a “low-high” frequency interaction, in which
is much larger in magnitude than
, we have
. (There are also “high-high” interactions, in which
and
are comparable in magnitude, and
can be significantly smaller than either
or
, but for simplicity of discussion let us ignore this case.) It then becomes natural to try to decompose the product
into “high-low” and “low-high” pieces (plus a “high-high” error), for instance by inserting suitable cutoff functions
or
in (1) to the regions
or
to create the paraproducts
and
Such paraproducts were first introduced by Calderón, and more explicitly by Bony. Heuristically, is the “high-low” portion of the product
, in which the high frequency components of
are “allowed” to interact with the low frequency components of
, but no other frequency interactions are permitted, and similarly for
. The para-differential calculus of Bony, Coifman, and Meyer then allows one to manipulate these paraproducts in ways that are very similar to ordinary pointwise products, except that they behave better with respect to the Leibniz rule or with more exotic differential or integral operators. For instance, one has
and
for differential operators (and more generally for pseudodifferential operators such as
, or integral operators such as
), where we use the
symbol loosely to denote “up to lower order terms”. Furthermore, many of the basic estimates of the pointwise product, in particular Hölder’s inequality, have analogues for paraproducts; this is a special case of what is now known as the Coifman-Meyer theorem, which is fundamental in this subject, and is proven using Littlewood-Paley theory. The same theory in fact gives some estimates for paraproducts beyond what are available for products. For instance, if
is in
and
is in
, then the paraproduct
is “almost” in
(modulo some technical logarithmic divergences which I will not elaborate on here), in contrast to the full product
which is merely in
.
Paraproducts also allow one to extend the classical product and chain rules to fractional derivative operators, leading to the fractional Leibniz rule
and fractional chain rule
which are both very useful in nonlinear PDE (see e.g. this book of Taylor for a thorough treatment). See also this brief Notices article on paraproducts by Benyi, Maldonado, and Naibo.
— 3. Louis Nirenberg —
Louis Nirenberg has made an amazing number of contributions to analysis, PDE, and geometry (e.g. John-Nirenberg inequality, Nirenberg-Treves conjecture (recently solved by Dencker), Newlander-Nirenberg theorem, Gagliardo-Nirenberg inequality, Caffarelli-Kohn-Nirenberg theorem, etc.), while also being one of the nicest people I know. I will mention only two results of his here, one of them very briefly.
Among other things, Nirenberg and Kohn introduced the pseudo-differential calculus which, like the para-differential calculus mentioned in the previous section, is an extension of differential calculus, but this time focused more on generalisation to variable coefficient or fractional operators, rather than in generalising the Leibniz or chain rules. This calculus sits at the intersection of harmonic analysis, PDE, von Neumann algebras, microlocal analysis, and semiclassical physics, and also happens to be closely related to Meyer’s work on wavelets; it quantifies the positive aspects of the Heisenberg uncertainty principle, in that one can observe position and momentum simultaneously so long as the uncertainty relation is respected. But I will not discuss this topic further today.
Instead, I would like to focus here instead on a gem of an argument of Gidas, Ni, and Nirenberg, which is a brilliant application of Alexandrov’s method of moving planes, combined with the ubiquitous maximum principle. This concerns solutions to the ground state equation
where is a smooth positive function that decays exponentially at infinity,
is an exponent, and
is the Laplacian. This equation shows up in a number of contexts, including the nonlinear Schrödinger equation and also, by coincidence, in connection with the best constants in the Gagliardo-Nirenberg inequality. The existence of ground states
can be proven by the variational principle. But one can say much more:
Lemma 1 (Gidas-Ni-Nirenberg) All ground states
are radially symmetric with respect to some origin.
To show this radial symmetry, a small amount of Euclidean geometry shows that it is enough to show that there is a lot of reflection symmetry:
Lemma 2 (Gidas-Ni-Nirenberg, again) If
is a ground state and
is a unit vector, then there exists a hyperplane orthogonal to
with respect to which
is symmetric.
To prove this lemma, we use the moving planes method, sliding in a plane orthogonal to from infinity. More precisely, for each
, let
be the hyperplane
, let
be the associated half-space
, and let
be the function
, where
is the reflection through ; thus
is the difference between
and its reflection in
. In particular,
vanishes on the boundary
of the half-space
.
Intuitively, the argument proceeds as follows. It is plausible that is going to be positive in the interior of
for large positive
, but negative in the interior of
for large negative
. Now imagine sliding
down from
to
until one reaches the first point
where
ceases to be positive in the interior of
, then it attains its minimum at some zero in the interior of
. But by playing around with (2) (using the Lipschitz nature of the map
when
is bounded) we know that
obeys an elliptic constraint of the form
. Applying the maximum principle, we can then conclude that
vanishes identically in
, which gives the desired reflectoin symmetry.
(Now it turns out that there are some technical issues in making the above sketch precise, mainly because of the non-compact nature of the half-space , but these can be fixed with a little bit of fiddling; see for instance Appendix B of my PDE textbook.)
28 comments
Comments feed for this article
20 August, 2010 at 10:50 am
UNSW
Thanks a lot for the summarization.
20 August, 2010 at 11:27 am
oz
A very nice post. There is one minor mistake I believe:
“Again, by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise ..”
I don’t see how can it be that by mere chance the work of 6 out of the 7 prize winners is related to prof. Tao’s areas of expertise.
Any reasonable statistical analysis will reject this null hypothesis and suggest that it is not chance but the enormous breadth of prof. Tao’s
mathematical knowledge and interests which is in act here
20 August, 2010 at 12:02 pm
Anon
Thanks! I enjoyed reading this post. My favorite line is “This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.” I laughed out loud reading it.
20 August, 2010 at 12:25 pm
Zaher
Thanks for the summaries.
Small typo: in the definition of $\pi_{lh}$ the arguments (or subscripts) of $m_{hl}(.,.)$ should be permuted.
[Corrected, thanks – T.]
20 August, 2010 at 3:45 pm
Roger
Perhaps Tao’s expertise is not so broad, but the selection committee reads this blog, and they think that a subject is not important unless it is written about here! Maybe next we will see a prize for the empirical experiences of kicking machinery.
20 August, 2010 at 11:05 pm
ICM2010 — Ngo laudatio « Gowers's Weblog
[…] for a general audience and Terence Tao has now posted about the work of the Fields medallists and the other prizewinners. And as I have already said, the ICM website has links to the full texts of the laudationes […]
20 August, 2010 at 11:58 pm
Anonymous
I thought the complexity of the simplex method was figured out by Smale, Lemke, etc in the 1970s(?) in about the way you describe, establishing that it’s polynomial except on a measure-zero set that can be characterized precisely. I’m probably confused.
21 August, 2010 at 7:27 am
Terence Tao
I believe Smale addressed the average case complexity of the simplex method, but not the smoothed complexity, i.e. that the expected value of the run time was polynomial, if the inputs were chosen randomly. By applying Markov’s inequality, this does show that inputs with very large run time are rare, but unfortunately such bounds are not strong enough to directly say anything about the smoothed model, because the set of data which is very close to a specific input has tiny measure (exponentially small) with respect to a standard reference measure (e.g. gaussian).
30 August, 2010 at 11:25 am
Gil Kalai
The “pathological” examples for which the simplex algorithm requires exponentially many steps represent a set of problems of positive measure. So we cannot expect a result of the kind described by Anonymous. One remaining mystery both in the average-case analysis and smoothed analysis of the simplex algorithm is to give tail-estimates for the running time which are better than what can simply be deduced by Markov’s inequality. As a first step, is there a way to understand the variance of the number of steps? Next, can we expect an exponentially small tail?
21 August, 2010 at 4:03 am
Watch the Math! | Sun Ju's Blog
[…] Terence Tao also gave his personal exposition (here and here) on the remarkable work of Prize winners in […]
21 August, 2010 at 8:48 pm
AJ
Prof: I do not know why the random perturbation to make ill-conditioned matrix well-conditioned is an important work in Dan Spielman’s work. I would bet against some Engineering companies which already do not use that idea. It seems control theoretic in spirit.
21 August, 2010 at 9:06 pm
AJ
I mean I doubt if some engineering companies already do not use the idea.
24 August, 2010 at 6:33 pm
Sujit
A comment from one of my friends in response to your observation: This is perhaps analogous in some ways to the empirical experience that some pieces of machinery work better after being kicked.
“Perhaps this applies to ill-behaved children being ‘perturbed’ every once in a while by a dose of yelling too? Just wondering about the homely implications of abstract Math. :D “
24 August, 2010 at 10:57 pm
Tom
I think you are the best Terry. I have a nostalgic feeling seeing you get the prize in 2006. You are the only one who writes an excellent blog and tediously summarize other people’s results. The people who won it this year are nobodys compared to you. T.Tao you are the man.
27 August, 2010 at 5:53 pm
Anonymous
I believe he has a little robot assistant beside him or he has secretly invented the time machine or he has access to cranial ability performance enhancing drugs. Take a look at the number of publications he has had before 30. And this blog. I wonder where he has time. Amazing!
25 August, 2010 at 6:07 am
ICM2010 — Niremberg and Meyer laudationes « Gowers's Weblog
[…] finish this post with a reminder that the work of Nirenberg and Meyer is discussed by Terry Tao on his […]
25 August, 2010 at 8:03 am
poiuyt
Typo: “the typical linear program that one encounters in PRACTICE”?
[Corrected, thanks – T.]
25 August, 2010 at 11:59 am
John Sidles
The insight that “some pieces of machinery work better after being kicked” has been invoked to explain the generic feasibility of efficient (classical) descriptions of large-dimension quantum dynamical trajectories. See (for example) Wojciech Zurek’s review “Decoherence, einselection, and the quantum origins of the classical” (Rev. Mod. Phys., 2003).
26 August, 2010 at 5:39 am
Boaz Barak
A minor correction: actually, cryptography is typically focused on the *average case* analysis of algorithms. The reason is that the algorithmic task at the heart of a cryptosystem is that task of *breaking* it. Hence the input is chosen by the designer of the cryptosystem from a fixed well defined distribution (i.e., the one obtained by choosing a random key), and the adversary picks a “worst-case choice of the algorithm” to try to break it. (However, it’s also true that one also needs worst-case analysis of algorithms to make sure that, for example, an decryption algorithm will always finish decoding no matter what input it’s provided, and to make sure that the reductions in the security proof of a cryptosystem are efficient.)
The reason worst-case analysis of algorithms is so prevalent is not because of crypto but rather because in most cases it’s very hard to find a clean model distribution of the inputs, and so worst-case performing general algorithms, that one can plug in any application without worrying about the input distributions are very useful. (The simplex algorithm is perhaps the canonical example for an algorithm that seems to have the latter usefulness property without having the former, and the Spielman-Teng work gives some reasons why.)
Another “justification” for worst-case analysis is that it’s also been observed that in many cases, if we have an efficient algorithm for many diverse families of natural inputs, there is also another efficient algorithm that works in the worst-case. This is the case for linear programming. Two interesting exceptions are the graph isomorphism problem and approximating so-called “unique games” (i.e., the problem underlying Khot’s unique games conjecture), where there are efficient algorithms for many natural input distributions, but there no known algorithms (yet?) proven to work on all inputs.
[Fair enough, although worst-case analysis is still relevant for some aspects of cryptography, particularly with regards to code design rather than code breaking. I’ve reworded the paragraph accordingly. – T]
31 August, 2010 at 3:43 am
Gil Kalai
It is worth mentioning also that there are various notions of “average case analysis”. The most naive notion is about uniform distribution on the inputs. A more sophisticated notion is about an arbitrary “computable” distribution of the inputs which is (I believe) what “Levin’s average case complexity” is more or less about. Levin’s notion of average case complexity is probably the more intimately connected with cryptography but I am sure Boaz can explain it much better than me.
1 September, 2010 at 8:11 am
Boaz Barak
Hi Gil,
You are right that in crypto we have the freedom to choose the distribution over inputs that will make the problem hardest, and that distribution doesn’t have to be the uniform distribution. However, in many cases, crypto is concerned with clean distributions such as uniform strings, random matrices, product of two random primes etc.. In fact, crypto may be the only “real world” application where one indeed needs to solve instances generated by such clean distributions (in contrast, instances coming from “nature” will be at best only approximately from a model distribution).
Levin gave a general definition of what it means to have a “polynomial on the average” algorithm for a given problem F and a distribution D (it turns out there are some subtleties in defining these). He then showed some completeness results, although unfortunately the reductions seem to always either make the problem or the distribution “unnatural”.
There is a subtle reason why Levin’s notion is not immediately useful to cryptography. This is explained beautifully in this survey of Russell Impagliazzo who calls it the difference between “Pessiland” and “Minicrypt”. Roughly speaking, to be useful for crypto, we need a way to sample “planted hard instances” of the distribution. For some natural distributions such as random clique and random 3SAT this is easy due to the symmetries of the distribution, but it can conceivably be the case that there is a problem F (in NP) and a distribution D such that we can efficiently sample an input from D, but we have no procedure to sample a pair (x,y) of an input from D and a solution (i.e. “witness”) y to this input. It is the latter procedure that is needed to use F as a basis for cryptography.
2 September, 2010 at 7:55 am
ICM2010 — final post « Gowers's Weblog
[…] Terence Tao on the work of Spielman, Meyer and Nirenberg […]
24 January, 2011 at 4:10 pm
It’s just another talk, to be honest « It's cold outside
[…] https://terrytao.wordpress.com/2010/08/20/spielman-meyer-nirenberg/ […]
13 December, 2012 at 2:34 am
Matthew Kennedy awarded CMS 2012 Doctoral Prize « Noncommutative Analysis
[…] work. Tim Gowers and Terry Tao have set a fine example in their expositions of the works of Fields Medalists or Abel Prize laureates. These are among the most interesting and important posts out there, I […]
26 September, 2014 at 4:19 am
Econlinks: Of Maths, Efficiency, and Language | My great WordPress blog
[…] Terry Tao brief and informative on the 2010 Fields medalists (Le Monde est aussi très heureux et honoré pour Ngo et Villani, “deux facettes de l’école mathématique française“). Read also Tao’s intro to the winners of the Nevanlinna, Gauss and Chern prizes. […]
22 March, 2017 at 12:40 am
Yves Meyer wins the 2017 Abel Prize | What's new
[…] within my own area of expertise. (I had previously written about some other work of Meyer in this blog post.) Indeed I had learned about Meyer’s wavelet constructions as a graduate student while […]
27 March, 2017 at 4:30 am
Notes in Mathematics (March, 2017) – Cheenta Ganit Kendra
[…] You may read more about his work in the words of celebrated mathematician Terence Tao here: https://terrytao.wordpress.com/2010/08/20/spielman-meyer-nirenberg/ Presently I am reading a cult classic by Hilbert: Geometry and Imagination. In introductory […]
18 April, 2017 at 7:36 am
Jorge Losada
Dear Prof. Tao, thanks for your excellent work and for sharing with us such information. I am particularly interested in the section about Yves Meyer.
In usual calculus, exponential function plays a central role and this is due to (I believe) it links in a really beautiful way sums and (usual) products.
I have tried reading the suggested book “Tools for PDE” by M.E. Taylor, but I found it really hard for me. So, what can we say about functions with similar properties than exponential function considering paraproducts instead of usual products?
Please, if someone has any idea I would really appreciate her (or his) help. Thanks!! I have some conjectures…
You can find my e-mail on this website
http://www.usc.es/es/centros/matematicas/profesor.html?Num_Puesto=16662&Num_Persona=198874&ano=66
I have tried reading the suggested book “Tools for PDE” but I found it really hard for me.