You are currently browsing the category archive for the ‘math.FA’ category.
In functional analysis, it is common to endow various (infinite-dimensional) vector spaces with a variety of topologies. For instance, a normed vector space can be given the strong topology as well as the weak topology; if the vector space has a predual, it also has a weak-* topology. Similarly, spaces of operators have a number of useful topologies on them, including the operator norm topology, strong operator topology, and the weak operator topology. For function spaces, one can use topologies associated to various modes of convergence, such as uniform convergence, pointwise convergence, locally uniform convergence, or convergence in the sense of distributions. (A small minority of such modes are not topologisable, though, the most common of which is pointwise almost everywhere convergence; see Exercise 8 of this previous post).
Some of these topologies are much stronger than others (in that they contain many more open sets, or equivalently that they have many fewer convergent sequences and nets). However, even the weakest topologies used in analysis (e.g. convergence in distributions) tend to be Hausdorff, since this at least ensures the uniqueness of limits of sequences and nets, which is a fundamentally useful feature for analysis. On the other hand, some Hausdorff topologies used are “better” than others in that many more analysis tools are available for those topologies. In particular, topologies that come from Banach space norms are particularly valued, as such topologies (and their attendant norm and metric structures) grant access to many convenient additional results such as the Baire category theorem, the uniform boundedness principle, the open mapping theorem, and the closed graph theorem.
Of course, most topologies placed on a vector space will not come from Banach space norms. For instance, if one takes the space of continuous functions on that converge to zero at infinity, the topology of uniform convergence comes from a Banach space norm on this space (namely, the uniform norm ), but the topology of pointwise convergence does not; and indeed all the other usual modes of convergence one could use here (e.g. convergence, locally uniform convergence, convergence in measure, etc.) do not arise from Banach space norms.
I recently realised (while teaching a graduate class in real analysis) that the closed graph theorem provides a quick explanation for why Banach space topologies are so rare:
Proposition 1 Let be a Hausdorff topological vector space. Then, up to equivalence of norms, there is at most one norm one can place on so that is a Banach space whose topology is at least as strong as . In particular, there is at most one topology stronger than that comes from a Banach space norm.
Proof: Suppose one had two norms on such that and were both Banach spaces with topologies stronger than . Now consider the graph of the identity function from the Banach space to the Banach space . This graph is closed; indeed, if is a sequence in this graph that converged in the product topology to , then converges to in norm and hence in , and similarly converges to in norm and hence in . But limits are unique in the Hausdorff topology , so . Applying the closed graph theorem (see also previous discussions on this theorem), we see that the identity map is continuous from to ; similarly for the inverse. Thus the norms are equivalent as claimed.
By using various generalisations of the closed graph theorem, one can generalise the above proposition to Fréchet spaces, or even to F-spaces. The proposition can fail if one drops the requirement that the norms be stronger than a specified Hausdorff topology; indeed, if is infinite dimensional, one can use a Hamel basis of to construct a linear bijection on that is unbounded with respect to a given Banach space norm , and which can then be used to give an inequivalent Banach space structure on .
One can interpret Proposition 1 as follows: once one equips a vector space with some “weak” (but still Hausdorff) topology, there is a canonical choice of “strong” topology one can place on that space that is stronger than the “weak” topology but arises from a Banach space structure (or at least a Fréchet or F-space structure), provided that at least one such structure exists. In the case of function spaces, one can usually use the topology of convergence in distribution as the “weak” Hausdorff topology for this purpose, since this topology is weaker than almost all of the other topologies used in analysis. This helps justify the common practice of describing a Banach or Fréchet function space just by giving the set of functions that belong to that space (e.g. is the space of Schwartz functions on ) without bothering to specify the precise topology to serve as the “strong” topology, since it is usually understood that one is using the canonical such topology (e.g. the Fréchet space structure on given by the usual Schwartz space seminorms).
Of course, there are still some topological vector spaces which have no “strong topology” arising from a Banach space at all. Consider for instance the space of finitely supported sequences. A weak, but still Hausdorff, topology to place on this space is the topology of pointwise convergence. But there is no norm stronger than this topology that makes this space a Banach space. For, if there were, then letting be the standard basis of , the series would have to converge in , and hence pointwise, to an element of , but the only available pointwise limit for this series lies outside of . But I do not know if there is an easily checkable criterion to test whether a given vector space (equipped with a Hausdorff “weak” toplogy) can be equipped with a stronger Banach space (or Fréchet space or -space) topology.
A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm of a linear map between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case is of interest, as this operator norm is the same as the largest singular value of the matrix associated to .
In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an diagonal matrix . In this case, the operator norm is simply the supremum norm of the diagonal coefficients:
whenever and are sequences with ; but this easily follows from the arithmetic mean-geometric mean inequality
Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients , as opposed to just their magnitudes ) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients or are replaced by operators instead.
To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix
Indeed, the lower bound is trivial (as can be seen by testing on vectors which are supported on the block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition
to decompose an arbitrary vector as
with , in which case we have
and the upper bound in (6) then follows from a simple computation.
When is large, this is a significant improvement over the triangle inequality, which merely gives
The reason for this gain can ultimately be traced back to the “orthogonality” of the ; that they “occupy different columns” and “different rows” of the range and domain of . This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities
whenever . (The first identity asserts that the ranges of the are orthogonal to each other, and the second asserts that the coranges of the (the ranges of the adjoints ) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).
The Cotlar-Stein lemma is an extension of this observation to the case where the are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have
on each component of , which by the triangle inequality gives the inferior bound
The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.
The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators which “do not mix scales together too much”, in that operators map functions “that oscillate at a given scale ” to functions that still mostly oscillate at the same scale . In that case, one can often split into components which essentically capture the scale behaviour, and understanding boundedness properties of then reduces to establishing the boundedness of the simpler operators (and of establishing a sufficient decay in products such as or when and are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the , but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition in a fashion that is tailored to the particular operator , and is not necessarily dictated by Fourier-analytic considerations.
Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of is understood through the operator norm of a large power of (or more precisely, of its self-adjoint square or ). Indeed, from an iteration of (14) we see that for any natural number , one has
Recall that when we applied the triangle inequality directly to , we lost a factor of in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.
On the other hand, we can group the product by pairs in another way, to obtain the bound of
for (16). Taking roots, we obtain
Sending , we obtain the claim.
Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any , the sum is unconditionally convergent in (and furthermore has bounded -variation), and the resulting operator is a bounded linear operator with an operator norm bound on .
Remark 2 If we specialise to the case where all the are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.
Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality
(which follows easily from the singular value decomposition), we can bound by
Estimating the other two terms in the summand by , and then repeatedly summing the indices one at a time as before, we obtain
and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.
Recall that a (real) topological vector space is a real vector space equipped with a topology that makes the vector space operations and continuous. One often restricts attention to Hausdorff topological vector spaces; in practice, this is not a severe restriction because it turns out that any topological vector space can be made Hausdorff by quotienting out the closure of the origin . One can also discuss complex topological vector spaces, and the theory is not significantly different; but for sake of exposition we shall restrict attention here to the real case.
An obvious example of a topological vector space is a finite-dimensional vector space such as with the usual topology. Of course, there are plenty of infinite-dimensional topological vector spaces also, such as infinite-dimensional normed vector spaces (with the strong, weak, or weak-* topologies) or Frechet spaces.
One way to distinguish the finite and infinite dimensional topological vector spaces is via local compactness. Recall that a topological space is locally compact if every point in that space has a compact neighbourhood. From the Heine-Borel theorem, all finite-dimensional vector spaces (with the usual topology) are locally compact. In infinite dimensions, one can trivially make a vector space locally compact by giving it a trivial topology, but once one restricts to the Hausdorff case, it seems impossible to make a space locally compact. For instance, in an infinite-dimensional normed vector space with the strong topology, an iteration of the Riesz lemma shows that the closed unit ball in that space contains an infinite sequence with no convergent subsequence, which (by the Heine-Borel theorem) implies that cannot be locally compact. If one gives the weak-* topology instead, then is now compact by the Banach-Alaoglu theorem, but is no longer a neighbourhood of the identity in this topology. In fact, we have the following result:
The first proof of this theorem that I am aware of is by André Weil. There is also a related result:
As a corollary, every locally compact Hausdorff topological vector space is in fact isomorphic to with the usual topology for some . This can be viewed as a very special case of the theorem of Gleason, which is a key component of the solution to Hilbert’s fifth problem, that a locally compact group with no small subgroups (in the sense that there is a neighbourhood of the identity that contains no non-trivial subgroups) is necessarily isomorphic to a Lie group. Indeed, Theorem 1 is in fact used in the proof of Gleason’s theorem (the rough idea being to first locate a “tangent space” to at the origin, with the tangent vectors described by “one-parameter subgroups” of , and show that this space is a locally compact Hausdorff topological space, and hence finite dimensional by Theorem 1).
Theorem 2 may seem devoid of content, but it does contain some subtleties, as it hinges crucially on the joint continuity of the vector space operations and , and not just on the separate continuity in each coordinate. Consider for instance the one-dimensional vector space with the co-compact topology (a non-empty set is open iff its complement is compact in the usual topology). In this topology, the space is (though not Hausdorff), the scalar multiplication map is jointly continuous as long as the scalar is not zero, and the addition map is continuous in each coordinate (i.e. translations are continuous), but not jointly continuous; for instance, the set does not contain a non-trivial Cartesian product of two sets that are open in the co-compact topology. So this is not a counterexample to Theorem 2. Similarly for the cocountable or cofinite topologies on (the latter topology, incidentally, is the same as the Zariski topology on ).
Another near-counterexample comes from the topology of inherited by pulling back the usual topology on the unit circle . Admittedly, this pullback topology is not quite Hausdorff, but the addition map is jointly continuous. On the other hand, the scalar multiplication map is not continuous at all. A slight variant of this topology comes from pulling back the usual topology on the torus under the map for some irrational ; this restores the Hausdorff property, and addition is still jointly continuous, but multiplication remains discontinuous.
As some final examples, consider with the discrete topology; here, the topology is Hausdorff, addition is jointly continuous, and every dilation is continuous, but multiplication is not jointly continuous. If one instead gives the half-open topology, then again the topology is Hausdorff and addition is jointly continuous, but scalar multiplication is only jointly continuous once one restricts the scalar to be non-negative.
Below the fold, I record the textbook proof of Theorem 2 and Theorem 1. There is nothing particularly original in this presentation, but I wanted to record it here for my own future reference, and perhaps these results will also be of interest to some other readers.
A few days ago, I found myself needing to use the Fredholm alternative in functional analysis:
Theorem 1 (Fredholm alternative) Let be a Banach space, let be a compact operator, and let be non-zero. Then exactly one of the following statements hold:
- (Eigenvalue) There is a non-trivial solution to the equation .
- (Bounded resolvent) The operator has a bounded inverse on .
Among other things, the Fredholm alternative can be used to establish the spectral theorem for compact operators. A hypothesis such as compactness is necessary; the shift operator on , for instance, has no eigenfunctions, but is not invertible for any unit complex number . The claim is also false when ; consider for instance the multiplication operator on , which is compact and has no eigenvalue at zero, but is not invertible.
It had been a while since I had studied the spectral theory of compact operators, and I found that I could not immediately reconstruct a proof of the Fredholm alternative from first principles. So I set myself the exercise of doing so. I thought that I had managed to establish the alternative in all cases, but as pointed out in comments, my argument is restricted to the case where the compact operator is approximable, which means that it is the limit of finite rank operators in the uniform topology. Many Banach spaces (and in particular, all Hilbert spaces) have the approximation property that implies (by a result of Grothendieck) that all compact operators on that space are almost finite rank. For instance, if is a Hilbert space, then any compact operator is approximable, because any compact set can be approximated by a finite-dimensional subspace, and in a Hilbert space, the orthogonal projection operator to a subspace is always a contraction. (In more general Banach spaces, finite-dimensional subspaces are still complemented, but the operator norm of the projection can be large.) Unfortunately, there are examples of Banach spaces for which the approximation property fails; the first such examples were discovered by Enflo, and a subsequent paper of by Alexander demonstrated the existence of compact operators in certain Banach spaces that are not approximable.
I also found out that this argument was essentially also discovered independently by by MacCluer-Hull and by Uuye. Nevertheless, I am recording this argument here, together with two more traditional proofs of the Fredholm alternative (based on the Riesz lemma and a continuity argument respectively).
One of the most notorious open problems in functional analysis is the invariant subspace problem for Hilbert spaces, which I will state here as a conjecture:
Conjecture 1 (Invariant Subspace Problem, ISP0) Let be an infinite dimensional complex Hilbert space, and let be a bounded linear operator. Then contains a proper closed invariant subspace (thus ).
As stated this conjecture is quite infinitary in nature. Just for fun, I set myself the task of trying to find an equivalent reformulation of this conjecture that only involved finite-dimensional spaces and operators. This turned out to be somewhat difficult, but not entirely impossible, if one adopts a sufficiently generous version of “finitary” (cf. my discussion of how to finitise the infinitary pigeonhole principle). Unfortunately, the finitary formulation that I arrived at ended up being rather complicated (in particular, involving the concept of a “barrier”), and did not obviously suggest a path to resolving the conjecture; but it did at least provide some simpler finitary consequences of the conjecture which might be worth focusing on as subproblems.
I should point out that the arguments here are quite “soft” in nature and are not really addressing the heart of the invariant subspace problem; but I think it is still of interest to observe that this problem is not purely an infinitary problem, and does have some non-trivial finitary consequences.
I am indebted to Henry Towsner for many discussions on this topic.
A (complex, semi-definite) inner product space is a complex vector space equipped with a sesquilinear form which is conjugate symmetric, in the sense that for all , and non-negative in the sense that for all . By inspecting the non-negativity of for complex numbers , one obtains the Cauchy-Schwarz inequality
if one then defines , one then quickly concludes the triangle inequality
which then soon implies that is a semi-norm on . If we make the additional assumption that the inner product is positive definite, i.e. that whenever is non-zero, then this semi-norm becomes a norm. If is complete with respect to the metric induced by this norm, then is called a Hilbert space.
The above material is extremely standard, and can be found in any graduate real analysis course; I myself covered it here. But what is perhaps less well known (except inside the fields of additive combinatorics and ergodic theory) is that the above theory of classical Hilbert spaces is just the first case of a hierarchy of higher order Hilbert spaces, in which the binary inner product is replaced with a -ary inner product that obeys an appropriate generalisation of the conjugate symmetry, sesquilinearity, and positive semi-definiteness axioms. Such inner products then obey a higher order Cauchy-Schwarz inequality, known as the Cauchy-Schwarz-Gowers inequality, and then also obey a triangle inequality and become semi-norms (or norms, if the inner product was non-degenerate). Examples of such norms and spaces include the Gowers uniformity norms , the Gowers box norms , and the Gowers-Host-Kra seminorms ; a more elementary example are the family of Lebesgue spaces when the exponent is a power of two. They play a central role in modern additive combinatorics and to certain aspects of ergodic theory, particularly those relating to Szemerédi’s theorem (or its ergodic counterpart, the Furstenberg multiple recurrence theorem); they also arise in the regularity theory of hypergraphs (which is not unrelated to the other two topics).
A simple example to keep in mind here is the order two Hilbert space on a measure space , where the inner product takes the form
In this brief note I would like to set out the abstract theory of such higher order Hilbert spaces. This is not new material, being already implicit in the breakthrough papers of Gowers and Host-Kra, but I just wanted to emphasise the fact that the material is abstract, and is not particularly tied to any explicit choice of norm so long as a certain axiom are satisfied. (Also, I wanted to write things down so that I would not have to reconstruct this formalism again in the future.) Unfortunately, the notation is quite heavy and the abstract axiom is a little strange; it may be that there is a better way to formulate things. In this particular case it does seem that a concrete approach is significantly clearer, but abstraction is at least possible.
Note: the discussion below is likely to be comprehensible only to readers who already have some exposure to the Gowers norms.
In harmonic analysis and PDE, one often wants to place a function on some domain (let’s take a Euclidean space for simplicity) in one or more function spaces in order to quantify its “size” in some sense. Examples include
- The Lebesgue spaces of functions whose norm is finite, as well as their relatives such as the weak spaces (and more generally the Lorentz spaces ) and Orlicz spaces such as and ;
- The classical regularity spaces , together with their Hölder continuous counterparts ;
- The Sobolev spaces of functions whose norm is finite (other equivalent definitions of this norm exist, and there are technicalities if is negative or ), as well as relatives such as homogeneous Sobolev spaces , Besov spaces , and Triebel-Lizorkin spaces . (The conventions for the superscripts and subscripts here are highly variable.)
- Hardy spaces , the space BMO of functions of bounded mean oscillation (and the subspace VMO of functions of vanishing mean oscillation);
- The Wiener algebra ;
- Morrey spaces ;
- The space of finite measures;
- etc., etc.
As the above partial list indicates, there is an entire zoo of function spaces one could consider, and it can be difficult at first to see how they are organised with respect to each other. However, one can get some clarity in this regard by drawing a type diagram for the function spaces one is trying to study. A type diagram assigns a tuple (usually a pair) of relevant exponents to each function space. For function spaces on Euclidean space, two such exponents are the regularity of the space, and the integrability of the space. These two quantities are somewhat fuzzy in nature (and are not easily defined for all possible function spaces), but can basically be described as follows. We test the function space norm of a modulated rescaled bump function
where is an amplitude, is a radius, is a test function, is a position, and is a frequency of some magnitude . One then studies how the norm depends on the parameters . Typically, one has a relationship of the form
for some exponents , at least in the high-frequency case when is large (in particular, from the uncertainty principle it is natural to require , and when dealing with inhomogeneous norms it is also natural to require ). The exponent measures how sensitive the norm is to oscillation, and thus controls regularity; if is large, then oscillating functions will have large norm, and thus functions in will tend not to oscillate too much and thus be smooth. Similarly, the exponent measures how sensitive the norm is to the function spreading out to large scales; if is small, then slowly decaying functions will have large norm, so that functions in tend to decay quickly; conversely, if is large, then singular functions will tend to have large norm, so that functions in will tend to not have high peaks.
Note that the exponent in (2) could be positive, zero, or negative, however the exponent should be non-negative, since intuitively enlarging should always lead to a larger (or at least comparable) norm. Finally, the exponent in the parameter should always be , since norms are by definition homogeneous. Note also that the position plays no role in (1); this reflects the fact that most of the popular function spaces in analysis are translation-invariant.
The type diagram below plots the indices of various spaces. The black dots indicate those spaces for which the indices are fixed; the blue dots are those spaces for which at least one of the indices are variable (and so, depending on the value chosen for these parameters, these spaces may end up in a different location on the type diagram than the typical location indicated here).
(There are some minor cheats in this diagram, for instance for the Orlicz spaces and one has to adjust (1) by a logarithmic factor. Also, the norms for the Schwartz space are not translation-invariant and thus not perfectly describable by this formalism. This picture should be viewed as a visual aid only, and not as a genuinely rigorous mathematical statement.)
The type diagram can be used to clarify some of the relationships between function spaces, such as Sobolev embedding. For instance, when working with inhomogeneous spaces (which basically identifies low frequencies with medium frequencies , so that one is effectively always in the regime ), then decreasing the parameter results in decreasing the right-hand side of (1). Thus, one expects the function space norms to get smaller (and the function spaces to get larger) if one decreases while keeping fixed. Thus, for instance, should be contained in , and so forth. Note however that this inclusion is not available for homogeneous function spaces such as , in which the frequency parameter can be either much larger than or much smaller than .
Similarly, if one is working in a compact domain rather than in , then one has effectively capped the radius parameter to be bounded, and so we expect the function space norms to get smaller (and the function spaces to get larger) as one increases , thus for instance will be contained in . Conversely, if one is working in a discrete domain such as , then the radius parameter has now effectively been bounded from below, and the reverse should occur: the function spaces should get larger as one decreases . (If the domain is both compact and discrete, then it is finite, and on a finite-dimensional space all norms are equivalent.)
As mentioned earlier, the uncertainty principle suggests that one has the restriction . From this and (2), we expect to be able to enlarge the function space by trading in the regularity parameter for the integrability parameter , keeping the dimensional quantity fixed. This is indeed how Sobolev embedding works. Note in some cases one runs out of regularity before p goes all the way to infinity (thus ending up at an space), while in other cases p hits infinity first. In the latter case, one can embed the Sobolev space into a Holder space such as .
On continuous domains, one can send the frequency off to infinity, keeping the amplitude and radius fixed. From this and (1) we see that norms with a lower regularity can never hope to control norms with a higher regularity , no matter what one does with the integrability parameter. Note however that in discrete settings this obstruction disappears; when working on, say, , then in fact one can gain as much regularity as one wishes for free, and there is no distinction between a Lebesgue space and their Sobolev counterparts in such a setting.
When interpolating between two spaces (using either the real or complex interpolation method), the interpolated space usually has regularity and integrability exponents on the line segment between the corresponding exponents of the endpoint spaces. (This can be heuristically justified from the formula (2) by thinking about how the real or complex interpolation methods actually work.) Typically, one can control the norm of the interpolated space by the geometric mean of the endpoint norms that is indicated by this line segment; again, this is plausible from looking at (2).
The space is self-dual. More generally, the dual of a function space will generally have type exponents that are the reflection of the original exponents around the origin. Consider for instance the dual spaces or in the above diagram.
Spaces whose integrability exponent is larger than 1 (i.e. which lie to the left of the dotted line) tend to be Banach spaces, while spaces whose integrability exponent is less than 1 are almost never Banach spaces. (This can be justified by covering a large ball into small balls and considering how (1) would interact with the triangle inequality in this case). The case is borderline; some spaces at this level of integrability, such as , are Banach spaces, while other spaces, such as , are not.
While the regularity and integrability are usually the most important exponents in a function space (because amplitude, width, and frequency are usually the most important features of a function in analysis), they do not tell the entire story. One major reason for this is that the modulated bump functions (1), while an important class of test examples of functions, are by no means the only functions that one would wish to study. For instance, one could also consider sums of bump functions (1) at different scales. The behaviour of the function space norms on such spaces is often controlled by secondary exponents, such as the second exponent that arises in Lorentz spaces, Besov spaces, or Triebel-Lizorkin spaces. For instance, consider the function
where is a large integer, representing the number of distinct scales present in . Any function space with regularity and should assign each summand in (3) a norm of O(1), so the norm of could be as large as if one assumes the triangle inequality. This is indeed the case for the norm, but for the weak norm, i.e. the norm, only has size . More generally, for the Lorentz spaces , will have a norm of about . Thus we see that such secondary exponents can influence the norm of a function by an amount which is polynomial in the number of scales. In many applications, though, the number of scales is a “logarithmic” quantity and thus of lower order interest when compared against the “polynomial” exponents such as and . So the fine distinctions between, say, strong and weak , are only of interest in “critical” situations in which one cannot afford to lose any logarithmic factors (this is for instance the case in much of Calderon-Zygmund theory).
We have cheated somewhat by only working in the high frequency regime. When dealing with inhomogeneous spaces, one often has a different set of exponents for (1) in the low-frequency regime than in the high-frequency regime. In such cases, one sometimes has to use a more complicated type diagram to genuinely model the situation, e.g. by assigning to each space a convex set of type exponents rather than a single exponent, or perhaps having two separate type diagrams, one for the high frequency regime and one for the low frequency regime. Such diagrams can get quite complicated, and will probably not be much use to a beginner in the subject, though in the hands of an expert who knows what he or she is doing, they can still be an effective visual aid.
This is a technical post inspired by separate conversations with Jim Colliander and with Soonsik Kwon on the relationship between two techniques used to control non-radiating solutions to dispersive nonlinear equations, namely the “double Duhamel trick” and the “in/out decomposition”. See for instance these lecture notes of Killip and Visan for a survey of these two techniques and other related methods in the subject. (I should caution that this post is likely to be unintelligible to anyone not already working in this area.)
For sake of discussion we shall focus on solutions to a nonlinear Schrödinger equation
and we will not concern ourselves with the specific regularity of the solution , or the specific properties of the nonlinearity here. We will also not address the issue of how to justify the formal computations being performed here.
Solutions to this equation enjoy the forward Duhamel formula
for times to the future of in the lifespan of the solution, as well as the backward Duhamel formula
for all times to the past of in the lifespan of the solution. The first formula asserts that the solution at a given time is determined by the initial state and by the immediate past, while the second formula is the time reversal of the first, asserting that the solution at a given time is determined by the final state and the immediate future. These basic causal formulae are the foundation of the local theory of these equations, and in particular play an instrumental role in establishing local well-posedness for these equations. In this local theory, the main philosophy is to treat the homogeneous (or linear) term or as the main term, and the inhomogeneous (or nonlinear, or forcing) integral term as an error term.
The situation is reversed when one turns to the global theory, and looks at the asymptotic behaviour of a solution as one approaches a limiting time (which can be infinite if one has global existence, or finite if one has finite time blowup). After a suitable rescaling, the linear portion of the solution often disappears from view, leaving one with an asymptotic blowup profile solution which is non-radiating in the sense that the linear components of the Duhamel formulae vanish, thus
where are the endpoint times of existence. (This type of situation comes up for instance in the Kenig-Merle approach to critical regularity problems, by reducing to a minimal blowup solution which is almost periodic modulo symmetries, and hence non-radiating.) These types of non-radiating solutions are propelled solely by their own nonlinear self-interactions from the immediate past or immediate future; they are generalisations of “nonlinear bound states” such as solitons.
A key task is then to somehow combine the forward representation (1) and the backward representation (2) to obtain new information on itself, that cannot be obtained from either representation alone; it seems that the immediate past and immediate future can collectively exert more control on the present than they each do separately. This type of problem can be abstracted as follows. Let be the infimal value of over all forward representations of of the form
Typically, one already has (or is willing to assume as a bootstrap hypothesis) control on in the norm , which gives control of in the norms . The task is then to use the control of both the and norm of to gain control of in a more conventional Hilbert space norm , which is typically a Sobolev space such as or .
for all reasonable ; note that setting and applying the arithmetic-geometric inequality then gives (5). The point is that if has a forward representation (3) and has a backward representation (4), then the inner product can (formally, at least) be expanded as a double integral
The dispersive nature of the linear Schrödinger equation often causes to decay, especially in high dimensions. In high enough dimension (typically one needs five or higher dimensions, unless one already has some spacetime control on the solution), the decay is stronger than , so that the integrand becomes absolutely integrable and one recovers (6).
Unfortunately it appears that estimates of the form (6) fail in low dimensions (for the type of norms that actually show up in applications); there is just too much interaction between past and future to hope for any reasonable control of this inner product. But one can try to obtain (5) by other means. By the Hahn-Banach theorem (and ignoring various issues related to reflexivity), (5) is equivalent to the assertion that every can be decomposed as , where and . Indeed once one has such a decomposition, one obtains (5) by computing the inner product of with in in two different ways. One can also (morally at least) write as and similarly write as
So one can dualise the task of proving (5) as that of obtaining a decomposition of an arbitrary initial state into two components and , where the former disperses into the past and the latter disperses into the future under the linear evolution. We do not know how to achieve this type of task efficiently in general – and doing so would likely lead to a significant advance in the subject (perhaps one of the main areas in this topic where serious harmonic analysis is likely to play a major role). But in the model case of spherically symmetric data , one can perform such a decomposition quite easily: one uses microlocal projections to set to be the “inward” pointing component of , which propagates towards the origin in the future and away from the origin in the past, and to simimlarly be the “outward” component of . As spherical symmetry significantly dilutes the amplitude of the solution (and hence the strength of the nonlinearity) away from the origin, this decomposition tends to work quite well for applications, and is one of the main reasons (though not the only one) why we have a global theory for low-dimensional nonlinear Schrödinger equations in the radial case, but not in general.
The in/out decomposition is a linear one, but the Hahn-Banach argument gives no reason why the decomposition needs to be linear. (Note that other well-known decompositions in analysis, such as the Fefferman-Stein decomposition of BMO, are necessarily nonlinear, a fact which is ultimately equivalent to the non-complemented nature of a certain subspace of a Banach space; see these lecture notes of mine and this old blog post for some discussion.) So one could imagine a sophisticated nonlinear decomposition as a general substitute for the in/out decomposition. See for instance this paper of Bourgain and Brezis for some of the subtleties of decomposition even in very classical function spaces such as . Alternatively, there may well be a third way to obtain estimates of the form (5) that do not require either decomposition or the double Duhamel trick; such a method may well clarify the relative relationship between past, present, and future for critical nonlinear dispersive equations, which seems to be a key aspect of the theory that is still only partially understood. (In particular, it seems that one needs a fairly strong decoupling of the present from both the past and the future to get the sort of elliptic-like regularity results that allow us to make further progress with such equations.)
In set theory, a function is defined as an object that evaluates every input to exactly one output . However, in various branches of mathematics, it has become convenient to generalise this classical concept of a function to a more abstract one. For instance, in operator algebras, quantum mechanics, or non-commutative geometry, one often replaces commutative algebras of (real or complex-valued) functions on some space , such as or , with a more general – and possibly non-commutative – algebra (e.g. a -algebra or a von Neumann algebra). Elements in this more abstract algebra are no longer definable as functions in the classical sense of assigning a single value to every point , but one can still define other operations on these “generalised functions” (e.g. one can multiply or take inner products between two such objects).
Generalisations of functions are also very useful in analysis. In our study of spaces, we have already seen one such generalisation, namely the concept of a function defined up to almost everywhere equivalence. Such a function (or more precisely, an equivalence class of classical functions) cannot be evaluated at any given point , if that point has measure zero. However, it is still possible to perform algebraic operations on such functions (e.g. multiplying or adding two functions together), and one can also integrate such functions on measurable sets (provided, of course, that the function has some suitable integrability condition). We also know that the spaces can usually be described via duality, as the dual space of (except in some endpoint cases, namely when , or when and the underlying space is not -finite).
We have also seen (via the Lebesgue-Radon-Nikodym theorem) that locally integrable functions on, say, the real line , can be identified with locally finite absolutely continuous measures on the line, by multiplying Lebesgue measure by the function . So another way to generalise the concept of a function is to consider arbitrary locally finite Radon measures (not necessarily absolutely continuous), such as the Dirac measure . With this concept of “generalised function”, one can still add and subtract two measures , and integrate any measure against a (bounded) measurable set to obtain a number , but one cannot evaluate a measure (or more precisely, the Radon-Nikodym derivative of that measure) at a single point , and one also cannot multiply two measures together to obtain another measure. From the Riesz representation theorem, we also know that the space of (finite) Radon measures can be described via duality, as linear functionals on .
There is an even larger class of generalised functions that is very useful, particularly in linear PDE, namely the space of distributions, say on a Euclidean space . In contrast to Radon measures , which can be defined by how they “pair up” against continuous, compactly supported test functions to create numbers , a distribution is defined by how it pairs up against a smooth compactly supported function to create a number . As the space of smooth compactly supported functions is smaller than (but dense in) the space of continuous compactly supported functions (and has a stronger topology), the space of distributions is larger than that of measures. But the space is closed under more operations than , and in particular is closed under differential operators (with smooth coefficients). Because of this, the space of distributions is similarly closed under such operations; in particular, one can differentiate a distribution and get another distribution, which is something that is not always possible with measures or functions. But as measures or functions can be interpreted as distributions, this leads to the notion of a weak derivative for such objects, which makes sense (but only as a distribution) even for functions that are not classically differentiable. Thus the theory of distributions can allow one to rigorously manipulate rough functions “as if” they were smooth, although one must still be careful as some operations on distributions are not well-defined, most notably the operation of multiplying two distributions together. Nevertheless one can use this theory to justify many formal computations involving derivatives, integrals, etc. (including several computations used routinely in physics) that would be difficult to formalise rigorously in a purely classical framework.
If one shrinks the space of distributions slightly, to the space of tempered distributions (which is formed by enlarging dual class to the Schwartz class ), then one obtains closure under another important operation, namely the Fourier transform. This allows one to define various Fourier-analytic operations (e.g. pseudodifferential operators) on such distributions.
Of course, at the end of the day, one is usually not all that interested in distributions in their own right, but would like to be able to use them as a tool to study more classical objects, such as smooth functions. Fortunately, one can recover facts about smooth functions from facts about the (far rougher) space of distributions in a number of ways. For instance, if one convolves a distribution with a smooth, compactly supported function, one gets back a smooth function. This is a particularly useful fact in the theory of constant-coefficient linear partial differential equations such as , as it allows one to recover a smooth solution from smooth, compactly supported data by convolving with a specific distribution , known as the fundamental solution of . We will give some examples of this later in these notes.
It is this unusual and useful combination of both being able to pass from classical functions to generalised functions (e.g. by differentiation) and then back from generalised functions to classical functions (e.g. by convolution) that sets the theory of distributions apart from other competing theories of generalised functions, in particular allowing one to justify many formal calculations in PDE and Fourier analysis rigorously with relatively little additional effort. On the other hand, being defined by linear duality, the theory of distributions becomes somewhat less useful when one moves to more nonlinear problems, such as nonlinear PDE. However, they still serve an important supporting role in such problems as a “ambient space” of functions, inside of which one carves out more useful function spaces, such as Sobolev spaces, which we will discuss in the next set of notes.