I’ve just uploaded to the arXiv my paper “An inverse theorem for the bilinear $L^2$ Strichartz estimate for the wave equation“. This paper is another technical component of my “heatwave project“, which aims to establish the global regularity conjecture for energy-critical wave maps into hyperbolic space. I have been in the process of writing the final paper of that project, in which I will show that the only way singularities can form is if a special type of solution, known as an “almost periodic blowup solution”, exists. However, I recently discovered that the existing function space estimates that I was relying on for the large energy perturbation theory were not quite adequate, and in particular I needed a certain “inverse theorem” for a standard bilinear estimate which was not quite in the literature. The purpose of this paper is to establish that inverse theorem, which may also have some application to other nonlinear wave equations.
To explain the inverse theorem, let me first discuss the bilinear estimate that it inverts. Define a wave to be a solution to the free wave equation . If the wave has a finite amount of energy, then one expects the wave to disperse as time goes to infinity; this is captured by the Strichartz estimates, which establish various spacetime
bounds on such waves in terms of the energy (or related quantities, such as Sobolev norms of the initial data). These estimates are fundamental to the local and global theory of nonlinear wave equations, as they can be used to control the effect of the nonlinearity.
In some cases (especially in low dimensions and/or low regularities, and with equations whose nonlinear terms contain derivatives), Strichartz estimates are too weak to control nonlinearities; roughly speaking, this is because waves decay too slowly in low dimensions. (For instance, one-dimensional waves do not decay at all.) However, it has been understood for some time that if the nonlinearity has a special null structure, which roughly means that it consists only of interactions between transverse waves rather than parallel waves, then there is more decay that one can exploit. For instance, while one-dimensional waves do not decay in time, the product between a left-propagating wave
and a right-propagating wave
does decay in time. In particular, if f and g are bounded in
, then this product is bounded in spacetime
, thanks to the Fubini-Tonelli theorem.
There is a similar “bilinear ” estimate for products of transverse waves in higher dimensions. This estimate is the basic building block for the bilinear
estimates and their variants as developed by Bourgain, Klainerman-Machedon, Kenig-Ponce-Vega, Tataru, and others, and which are the tool of choice for establishing local and global control on nonlinear wave equations, particularly at low dimensions and at critical regularities. In particular, these estimates (or more precisely, a complicated variant of these estimates in sophisticated function spaces, due to Tataru and myself), are used in the theory of the energy-critical wave map equation. [These bilinear (and trilinear) estimates are not, by themselves, enough to handle this equation; one also needs an additional gauge fixing procedure before the equation is sufficiently close to linear in behaviour that these estimates become effective. But I do not wish to discuss the (significant) gauge fixing issue here.]
To cut a (very) long story short, these estimates, when combined with a suitable perturbative theory, allow one to control energy-critical wave maps as long as the energy is small. However, the whole point of the “heatwave” project is to control the non-perturbative setting when the energy is large (but finite), and one wants to control the solution for long periods of time.
In my previous “heatwave” paper, in which I established large data local well-posedness for this equation, I finessed this issue by localising time to very short intervals, which made certain spacetime norms small enough for the perturbation theory to apply. This sufficed for the local well-posedness theory, but is not good enough for the global perturbative theory, because the number of very short intervals needed to cover the entire time axis becomes unbounded. For that, one needs the ability to make certain norms or estimates “small” by only chopping up time into a bounded number of intervals. I refer to this property as divisibility (I used to refer to it, somewhat incorrectly, as fungibility).
In the case of semilinear wave (or Schrödinger equations), in which Strichartz estimates are already sufficient to obtain a satisfactory perturbative theory, divisibility is well-understood, and boils down to the following simple observation: if a function obeys a global spacetime integrability bound such as
for some finite exponent p and some finite bound M, then one can partition into intervals I on which
for some at one’s disposal to select. Indeed the number of such intervals is bounded by
, and the intervals can be selected by a simple “greedy algorithm” argument. This divisibility property of
-type spacetime norms allows one to easily generalise the small-data perturbation theory to the large-data setting, and is relied upon heavily in the modern theory of the critical nonlinear wave and Schrödinger equations; see for instance this survey of Killip and Visan.
Unfortunately, the function spaces used in wave maps are not easily divisible in this manner (very roughly speaking, this is because the function space norms contain too many type norms within them). So one cannot rely purely on refining the function space; one must also work on refining the bilinear (and trilinear) estimates on these spaces. The standard way to do this is to strengthen the
exponents in these estimates, and for the basic bilinear
estimate this has indeed been done (in work of Wolff and myself). This suffices for “equal-frequency” interactions, in which one is multiplying two transverse waves of the same frequency, but turns out to be inadequate for “imbalanced-frequency” interactions, when one is multiplying a low-frequency wave by a high-frequency transverse wave. For this, I rely instead on establishing an inverse theorem for the estimate.
Generally speaking, whenever one is faced with an estimate, e.g. a linear estimate
one can pose the inverse problem of trying to classify the functions f for which the estimate is tight in the sense that
for some which is not too small. Such inverse theorems are a current area of study in additive combinatorics, and have recently begun making an appearance in PDE as well. For instance:
- Young’s inequality
or the Hausdorff-Young inequality
, is only tight (for non-endpoint p,q,r) when f, g are concentrated on balls, arithmetic progressions, or Bohr sets (this is a consequence of several basic theorems in additive combinatorics, including Freiman’s theorem and the Balog-Szemeredi-Gowers theorem);
- The trivial inequality
for the Gowers uniformity norms is only expected to be tight when f correlates with a highly algebraic object, such as a polynomial phase or nilsequence (this is the inverse conjecture for the Gowers norm, which is partially proven so far);
- The Sobolev embedding
is only tight when f is concentrated on a unit ball (for non-endpoint estimates) or a ball of arbitrary radius (for endpoint estimates);
- Strichartz estimates are only tight when f is concentrated on a ball (for non-endpoint estimates) or a tube (for endpoint estimates).
Inverse theorems for such estimates as Sobolev inequalities and Strichartz estimates are also closely related to the theory of concentration compactness and profile decompositions; see this previous blog post of mine for a discussion.
I can now state informally, the main result of this paper:
Theorem 1 (informal statement). A bilinear
estimate between two waves of different frequency is only tight when the waves are concentrated on a small number of light rays. Outside of these rays, the
norm is small.
This leads to a corollary which will be used in my final heatwave paper:
Corollary 2 (informal statement). Any large-energy wave
can have its time axis subdivided into a bounded number of intervals, such that on each interval the bilinear estimates for that wave (when interacted against any high-frequency transverse wave) behave “as if”
was small-energy rather than large energy.
The method of proof relies on a paper of mine from several years ago on bilinear estimates for the wave equation, which in turn is based on a celebrated paper of Wolff. Roughly speaking, the idea is to use wave packet decompositions and the combinatorics of light rays to isolate the regions of spacetime where the waves are concentrating, cover these regions by tubular neighbourhoods of light rays, then remove the light rays to reduce the energy (or mass) of the solution and iterate. The wave packet analysis is moderately complicated, but fortunately I can use a proposition on this topic from my paper as a black box, leaving only the other components of the argument to write out in detail.

13 comments
Comments feed for this article
22 April, 2009 at 8:54 am
ben green
Terry, one small comment: the inverse theorem for the Hausdorff-Young inequality is not just Freiman’s theorem, it is Freiman plus the Balog-Szemeredi-Gowers theorem. And in fact it is slightly more than that, as both of those additive-combinatorial theorems apply to sets and not to arbitrary functions. I don’t believe it is particularly hard to deal with arbitrary functions by foliating into approximate (dyadic) level sets, but I do not actually know of a place in the literature where this is done explicitly. A paper of Bourgain (Multilinear exponential sums in prime fields under optimal entropy condition on the
sources) does something very much in this spirit in the course of one of the arguments.
22 April, 2009 at 10:37 am
Terence Tao
Dear Ben, You’re right, of course; I’ve adjusted the text accordingly.
One way to pass from functions back to sets is to use the Marcinkiewicz interpolation theorem, which ties strong L^p estimates to restricted weak-type L^p estimates, which roughly speaking are the specialisation of the strong L^p etimates to the case of indicator functions. If an estimate is tight for an interpolated L^p estimate then it must also be tight for the restricted weak-type variant, which basically means that the functions that were tight for the original inequality must already behave like indicator functions to some extent.
On the other hand, for “endpoint” estimates that cannot be obtained from Marcinkiewicz interpolation, it is quite possible for functions that are tight for the estimate to not behave at all like indicator functions, instead being a superposition of such functions of widely different heights and widths. The inverse theory for endpoints is usually rather subtle (and in many cases, hopeless: for instance, there is no inverse theorem for the p=2 endpoint of Hausdorff-Young, since the inequality is an equality in that case (Plancherel’s theorem))!
The papers of Reingold, Tulsiani, Trevisan, Vadhan on the dense model theorem also have some nice ways to convert functions back into sets by averaging.
Incidentally, there is a stub of a Tricki article on this point at
http://www.tricki.org/article/Control_level_sets
22 April, 2009 at 10:16 am
Shuanglin Shao
An excellent post!
22 April, 2009 at 11:00 am
anonymous
Could you give a reference for the inverse theorem for the Sobolev embedding and Strichartz estimates?
thanks!
22 April, 2009 at 11:44 am
Terence Tao
Dear anonymous,
I’m not sure where the Sobolev inverse theorem first originates; it is implicit in the concentration-compactness work of Lions and others, but I don’t know of a specific reference there. For nonlinear wave equations, the earliest reference I know of is by Bahouri and Gerard in an 1999 AJM paper. It is also to some extent implicit in the large body of work in the ’80s on Besov and Triebel-Lizorkin spaces, connections to wavelets, etc.
For Strichartz, a paper of Begout and Vargas from 2007 (see also Shao’s recent paper at http://arxiv.org/abs/0809.0153 ) contains some explicit inverse theorems of this type, although arguably such theorems are already implicit to some extent in Bourgain’s 1991 paper on the restriction problem (and in many subsequent works also), generally going under such names as “improved Strichartz estimates”.
22 April, 2009 at 1:00 pm
anonymous
Thank you for the response. I guess my question is what do these inverse theorems really look like? Shao’s result appears to be that maximizers to the Strichartz and Soblev-Strichartz inequalities exist, at least in non-endpoint cases. In establishing the later he works out a profile decomposition for the Soblev-Strichartz norms. I see that maximizers to a Strichartz inequaity are functions of the form that are identified by the inverse theorem (more specifically, in the cases where the maximizers have been classified, they are in fact Gaussians, which is consistent with the inverse statement of being “localized to a ball”). However, the inverse theorem seems much stronger, that if a function isn’t localized to a ball, then its Strichartz norm should be very small compared to the L^2 norm of the initial data. From your reply, I’m guessing that the inverse theorem is somehow implied by profile decomposition / concentration compactness results, but this doesn’t seem to be clear. Sorry to be dense, and thanks again!
22 April, 2009 at 1:56 pm
Terence Tao
An explicit example of an inverse Sobolev theorem can be found for instance in Lemma 3.1 of my paper
http://arxiv.org/abs/math.AP/0601164
The Begout-Vargas paper also has some statements of inverse Strichartz type.
One can use the profile decomposition to deduce inverse theorems (though usually the order of implications goes the other way). Indeed, if the inverse theorem failed for (say) Sobolev, then one could find a sequence of examples which were tight for the Sobolev inequality but became increasingly dispersed in the sense that they were not localised to any ball. Applying the profile decomposition to such a sequence (normalising the Sobolev norm to be bounded), one would obtain a contradiction.
9 June, 2009 at 11:25 am
anonymous
I admit I am very new to this, but I am having a hard time seeing how one can use Freiman’s theorem and the Balog-Szemeredi-Gowers theorem, to prove an inverse young and Hausdorff-Young theorem.
If this is done anywhere in the literature could you point me to it? If not, would you be willing to provide a very brief sketch of how the argument should work? In view of Ben Green’s comments (and your reply) I am happy to assume that f=1_A and G=1_B. I am guessing that the argument is something like: (a) apply BSG at endpoints, (b) interpolate, and (c) somehow plug the conclusion of BSG into Freiman’s theorem.
9 June, 2009 at 12:33 pm
Terence Tao
Dear anonymous, for Young’s inequality for characteristic functions
, say on the integers, one can look at the cardinality of the level sets
. On the one hand this quantity is zero if
; on the other hand, by Markov’s inequality there is a lower bound of
. If Young’s inequality
is close to sharp for some non-endpoint choices of p,q,r, then a computation using the distributional bounds implies that
are comparable, and that
has size comparable to
for some
comparable to
. This is basically the situation to which BSG, and then Freiman, would apply.
For Hausdorff-Young, one can use interpolation to reduce to analysing the particular inequality
, which by Plancherel’s theorem is equivalent to the Young inequality
which can be handled by the preceding argument.
I think these results do not appear explicitly in the literature, though they seem to be folklore (Michael Christ, for instance, has some unpublished computations establishing these sorts of things, as part of his work on quasi-extremisers of Radon transforms).
9 June, 2009 at 2:45 pm
anonymous
Thanks for the reply. When I asked the question, I had in mind the circle, T. In fact, it seems that this is what we need to assume if we plan to deduce from it the inverse theorem for the (usual) Hausdorff-Young inequality.
In the case of T, we still may assume that f=1_A and g=1_B (for A, B \subset T) and deduce similar distributional estimates. However, it seems that the additive combinatorics won’t be the same in this setting (unless at some point we pass to the dual group, Z).
In fact, I’m not now sure what the inverse result should be. Are we trying to show that if young’s inequality is nearly sharp then f is localized to a small ball? What I (probably mistakenly) assumed/hoped when I read the result is that in the case of near equality for f \in L^p(T), we’d have that \hat{f}(n) is supported on frequencies in nearly an AP.
9 June, 2009 at 7:19 pm
Terence Tao
For a general, the inverse theorem for Young’s inequality should assert that if
is close to being satisfied with equality, then f and g should be concentrated on translates of the same set of small doubling. On the torus, the sets of small doubling are essentially the sums of balls and generalised arithmetic progressions; see this paper of mine for a precise statement. Similarly for the Hausdorff-Young inequality.
28 June, 2009 at 2:17 pm
YIQ
Dear Terry,
I have a final question about the final step of applying Balog-Szemeredi-Gowers and Freiman in deriving the young’s inquality inverse theorem in the thread above.
I have a final question about the final step of applying Balog-Szemeredi-Gowers and Freiman. Let’s stick to the integers,
. Using the statements in your and Vu’s combinatorics book, BSG roughly states that if the additive energy
then there exists subsets
and
such that
.
I understand that heuristically this shows that
and
have small doubling, and thus Freiman’s theorem should imply that these are ‘almost’ generalized AP’s. However, every formulation of Freiman’s theorem that I am familiar with (such as Theorem 5.32 in your book) works with only a single set (i.e. has a hypothesis such as
). Is there a more general version of this theorem, or can we somehow use Freiman's theorem in this form?
Moreover, even if we didn't care to be as general, and started with
, then it still seems like we fall a bit short of being able to apply Freiman's theorem, because the subsets
and
we get from applying BSG aren't necessarily the same.
Thanks again!
28 June, 2009 at 4:41 pm
Terence Tao
Dear YIQ,
One can use sumset theory to essentially make the A’ and B’ components the same (up to translations), see e.g. Theorem 2.31(v) of my book with Van.