In set theory, a function is defined as an object that *evaluates* every input to exactly one output . However, in various branches of mathematics, it has become convenient to generalise this classical concept of a function to a more abstract one. For instance, in operator algebras, quantum mechanics, or non-commutative geometry, one often replaces commutative algebras of (real or complex-valued) functions on some space , such as or , with a more general – and possibly non-commutative – algebra (e.g. a -algebra or a von Neumann algebra). Elements in this more abstract algebra are no longer definable as functions in the classical sense of assigning a single value to every point , but one can still define other operations on these “generalised functions” (e.g. one can multiply or take inner products between two such objects).

Generalisations of functions are also very useful in analysis. In our study of spaces, we have already seen one such generalisation, namely the concept of a function defined up to almost everywhere equivalence. Such a function (or more precisely, an equivalence class of classical functions) cannot be evaluated at any given point , if that point has measure zero. However, it is still possible to perform algebraic operations on such functions (e.g. multiplying or adding two functions together), and one can also integrate such functions on measurable sets (provided, of course, that the function has some suitable integrability condition). We also know that the spaces can usually be described via duality, as the dual space of (except in some endpoint cases, namely when , or when and the underlying space is not -finite).

We have also seen (via the Lebesgue-Radon-Nikodym theorem) that locally integrable functions on, say, the real line , can be identified with locally finite absolutely continuous measures on the line, by multiplying Lebesgue measure by the function . So another way to generalise the concept of a function is to consider arbitrary locally finite Radon measures (not necessarily absolutely continuous), such as the Dirac measure . With this concept of “generalised function”, one can still add and subtract two measures , and integrate any measure against a (bounded) measurable set to obtain a number , but one cannot evaluate a measure (or more precisely, the Radon-Nikodym derivative of that measure) at a single point , and one also cannot multiply two measures together to obtain another measure. From the Riesz representation theorem, we also know that the space of (finite) Radon measures can be described via duality, as linear functionals on .

There is an even larger class of generalised functions that is very useful, particularly in linear PDE, namely the space of distributions, say on a Euclidean space . In contrast to Radon measures , which can be defined by how they “pair up” against continuous, compactly supported test functions to create numbers , a distribution is defined by how it pairs up against a *smooth* compactly supported function to create a number . As the space of smooth compactly supported functions is smaller than (but dense in) the space of continuous compactly supported functions (and has a stronger topology), the space of distributions is larger than that of measures. But the space is closed under more operations than , and in particular is closed under differential operators (with smooth coefficients). Because of this, the space of distributions is similarly closed under such operations; in particular, one can differentiate a distribution and get another distribution, which is something that is not always possible with measures or functions. But as measures or functions can be interpreted as distributions, this leads to the notion of a weak derivative for such objects, which makes sense (but only as a distribution) even for functions that are not classically differentiable. Thus the theory of distributions can allow one to rigorously manipulate rough functions “as if” they were smooth, although one must still be careful as some operations on distributions are not well-defined, most notably the operation of multiplying two distributions together. Nevertheless one can use this theory to justify many formal computations involving derivatives, integrals, etc. (including several computations used routinely in physics) that would be difficult to formalise rigorously in a purely classical framework.

If one shrinks the space of distributions slightly, to the space of *tempered distributions* (which is formed by enlarging dual class to the Schwartz class ), then one obtains closure under another important operation, namely the Fourier transform. This allows one to define various Fourier-analytic operations (e.g. pseudodifferential operators) on such distributions.

Of course, at the end of the day, one is usually not all that interested in distributions in their own right, but would like to be able to use them as a tool to study more classical objects, such as smooth functions. Fortunately, one can recover facts about smooth functions from facts about the (far rougher) space of distributions in a number of ways. For instance, if one convolves a distribution with a smooth, compactly supported function, one gets back a smooth function. This is a particularly useful fact in the theory of constant-coefficient linear partial differential equations such as , as it allows one to recover a smooth solution from smooth, compactly supported data by convolving with a specific distribution , known as the fundamental solution of . We will give some examples of this later in these notes.

It is this unusual and useful combination of both being able to pass from classical functions to generalised functions (e.g. by differentiation) and then back from generalised functions to classical functions (e.g. by convolution) that sets the theory of distributions apart from other competing theories of generalised functions, in particular allowing one to justify many formal calculations in PDE and Fourier analysis rigorously with relatively little additional effort. On the other hand, being defined by linear duality, the theory of distributions becomes somewhat less useful when one moves to more nonlinear problems, such as nonlinear PDE. However, they still serve an important supporting role in such problems as a “ambient space” of functions, inside of which one carves out more useful function spaces, such as Sobolev spaces, which we will discuss in the next set of notes.

** — 1. Smooth functions with compact support — **

In the rest of the notes we will work on a fixed Euclidean space . (One can also define distributions on other domains related to , such as open subsets of , or -dimensional manifolds, but for simplicity we shall restrict attention to Euclidean spaces in these notes.)

A test function is any smooth, compactly supported function ; the space of such functions is denoted . (In some texts, this space is denoted instead.)

From analytic continuation one sees that there are no real-analytic test functions other than the zero function. Despite this negative result, test functions actually exist in abundance:

- (i) Show that there exists at least one test function that is not identically zero. (
Hint: it suffices to do this for . One starting point is to use the fact that the function defined by for and otherwise is smooth, even at the origin .)- (ii) Show that if and is absolutely integrable and compactly supported, then the convolution is also in . (
Hint:first show that is continuously differentiable with .)- (iii) ( Urysohn lemma) Let be a compact subset of , and let be an open neighbourhood of . Show that there exists a function supported in which equals on . (Hint: use the ordinary Urysohn lemma to find a function in that equals on a neighbourhood of and is supported in a compact subset of , then convolve this function by a suitable test function.)
- (iv) Show that is dense in (in the uniform topology), and dense in (with the topology) for all .

The space is clearly a vector space. Now we place a (very strong!) topology on it. We first observe that , where ranges over all compact subsets of and consists of those functions which are supported in . Each will be given a topology (called the *smooth topology*) generated by the norms

for , where we view as a -dimensional vector (or, if one wishes, a -dimensional rank tensor); thus a sequence converges to a limit if and only if converges uniformly to for all . (This gives the structure of a Fréchet space, though we will not use this fact here.)

We make the trivial remark that if are compact sets, then is a subspace of , and the topology on the former space is the restriction of the topology of the latter space. Because of this, we are able to give a (very strong) topology as follows. Call a seminorm on *good* if it is continuous function on for each compact (or equivalently, the ball is open in for each compact ). We then give the topology defined by all good seminorms. Clearly, this makes a (locally convex) topological vector space.

Exercise 2Let be a sequence in , and let be another function in . Show that converges in the topology of to if and only if there exists a compact set such that are all supported in , and converges to in the smooth topology of .

Exercise 3

- (i) Show that the topology of is first countable for every compact .
- (ii) Show that the topology of is
notfirst countable. (Hint:given any countable sequence of open neighbourhoods of , build a new open neighbourhood that does not contain any of the previous ones, using the -compact nature of .)- (iii) As an additional challenge, construct a set such that is an adherent point of , but is not as the limit of any sequence in .

There are plenty of continuous operations on :

- (i) Let be a compact set. Show that a linear map into a normed vector space is continuous if and only if there exists and such that for all .
- (ii) Let be compact sets. Show that a linear map is continuous if and only if for every there exists and a constant such that for all .
- (iii) Show that a linear map from the space of test functions into a topological vector space generated by some family of seminorms (i.e., a locally convex topological vector space) is continuous if and only if it is sequentially continuous (i.e. whenever converges to in , converges to in ), and if and only if is continuous for each compact . Thus while first countability fails for , we have a serviceable substitute for this property.
- (iv) Show that the inclusion map from to is continuous for every .
- (v) Show that a map is continuous if and only if for every compact set there exists a compact set such that maps continuously to .
- (vi) Show that every linear differential operator with smooth coefficients is a continuous operation on .
- (vii) Show that convolution with any absolutely integrable, compactly supported function is a continuous operation on .
- (viii) Show that the product operation is continuous from to .

A sequence of continuous, compactly supported functions is said to be an approximation to the identity if the are non-negative, have total mass equal to , and whose supports shrink to the origin, thus for any fixed , is supported on the ball for sufficiently large. One can generate such a sequence by starting with a single non-negative continuous compactly supported function of total mass , and then setting ; many other constructions are possible also.

One has the following useful fact:

Exercise 5Let be a sequence of approximations to the identity.

- (i) If is continuous, show that converges uniformly on compact sets to .
- (ii) If for some , show that converges in to . (
Hint:use (i), the density of in , and Young’s inequality.)- (iii) If , show that converges in to . (Hint: use the identity , cf. Exercise 1(ii).)

Exercise 6Show that is separable. (Hint:it suffices to show that is separable for each compact . There are several ways to accomplish this. One is to begin with the Stone-Weierstrass theorem, which will give a countable set which is dense in the uniform topology, then use the fundamental theorem of calculus to strengthen the topology. Another is to use Exercise 5 and then discretise the convolution. Another is to embed into a torus and use Fourier series, noting that the Fourier coefficients of a smooth function decay faster than any power of .)

** — 2. Distributions — **

Now we can define the concept of a distribution.

Definition 1 (Distribution)Adistributionon is a continuous linear functional from to . The space of such distributions is denoted , and is given the weak-* topology. In particular, a sequence of distributions converges (in the sense of distributions) to a limit if one has for all .A technical point: we endow the space with the

conjugatecomplex structure. Thus, if , and is a complex number, then is the distribution that maps a test function to rather than ; thus . This is to keep the analogy between the evaluation of a distribution against a function, and the usual Hermitian inner product of two test functions.

From Exercise 4, we see that a linear functional is a distribution if, for every compact set , there exists and such that

Exercise 7Show that is a Hausdorff topological vector space.

We note two basic examples of distributions:

- Any locally integrable function can be viewed as a distribution, by writing for all test functions .
- Any complex Radon measure can be viewed as a distribution, by writing , where is the complex conjugate of (thus ). (Note that this example generalises the preceding one, which corresponds to the case when is absolutely continuous with respect to Lebesgue measure.) Thus, for instance, the Dirac measure at the origin is a distribution, with for all test functions .

Exercise 8Show that the above identifications of locally integrable functions or complex Radon measures with distributions are injective. (Hint: use Exercise 1(iv).)

From the above exercise, we may view locally integrable functions and locally finite measures as a special type of distribution. In particular, and are now contained in for all .

Exercise 9Show that if a sequence of locally integrable functions converge in to a limit, then they also converge in the sense of distributions; similarly, if a sequence of complex Radon measures converge in the vague topology to a limit, then they also converge in the sense of distributions.

Thus we see that convergence in the sense of distributions is among the weakest of the notions of convergence used in analysis; however, from the Hausdorff property, distributional limits are still *unique*.

Exercise 10If is a sequence of approximations to the identity, show that converges in the sense of distributions to the Dirac distribution .

More exotic examples of distributions can be given:

Exercise 11 (Derivative of the delta function)Let . Show that the functional for all test functions is a distribution which does not arise from either a locally integrable function or a Radon measure. (Note how it is important here that is smooth (and in particular differentiable, and not merely continuous.) The presence of the minus sign will be explained shortly.

Exercise 12 (Principal value of )Let . Show that the functional defined by the formulais a distribution which does not arise from either a locally integrable function or a Radon measure. (Note that is not a locally integrable function!)

Exercise 13 (Distributional interpretations of )Let . For any , show that the functional defined by the formulais a distribution that does not arise from either a locally integrable function or a Radon measure. Note that any two such functionals differ by a constant multiple of the Dirac delta distribution.

Exercise 14A distribution is said to berealif is real for every real-valued test function . Show that every distribution can be uniquely expressed as for some real distributions .

Exercise 15A distribution is said to benon-negativeif is non-negative for every non-negative test function . Show that a distribution is non-negative if and only if it is a non-negative Radon measure. (Hint: use the Riesz representation theorem and Exercise 1(iv).) Note that this implies that the analogue of the Jordan decomposition fails for distributions; any distribution which is not a Radon measure will not be the difference of non-negative distributions.

We will now extend various operations on locally integrable functions or Radon measures to distributions by arguing by analogy. (Shortly we will give a more formal approach, based on density.)

We begin with the operation of multiplying a distribution by a smooth function . Observe that

for all test functions . Inspired by this formula, we define the product of a distribution with a smooth function by setting

for all test functions . It is easy to see (e.g. using Exercise 4(vi)) that this defines a distribution , and that this operation is compatible with existing definitions of products between a locally integrable function (or Radon measure) with a smooth function. It is important that is smooth (and not merely, say, continuous) because one needs the product of a test function with to still be a test function.

Exercise 16Let . Establish the identityfor any smooth function . In particular,

where we abuse notation slightly and write for the identity function . Conversely, if is a distribution such that

show that is a constant multiple of . (

Hint:Use the identity to write as the sum of and times a test function for any test function , where is a fixed test function equalling at the origin.)

Remark 1Even though distributions are not, strictly speaking, functions, it is often useful heuristically to view them as such, thus for instance one might write a distributional identity such as suggestively as . Another useful (and rigorous) way to view such identities is to write distributions such as as a limit of approximations to the identity , and show that the relevant identity becomes true in the limit; thus, for instance, to show that , one can show that in the sense of distributions as . (In fact, converges to zero in the norm.)

Exercise 17Let . With the distribution from Exercise 12, show that is equal to . With the distributions from Exercise 13, show that , where is the signum function.

A distribution is said to be *supported* in a closed set in for all that vanish on an open neighbourhood of . The intersection of all that is supported on is denoted and is referred to as the *support* of the distribution; this is the smallest closed set that is supported on. Thus, for instance, the Dirac delta function is supported on , as are all derivatives of that function. (Note here that it is important that vanish on a *neighbourhood* of , rather than merely vanishing on itself; for instance, in one dimension, there certainly exist test functions that vanish at but nevertheless have a non-zero inner product with .)

Exercise 18Show that every distribution is the limit of a sequence of compactly supported distributions (using the weak-* topology, of course). (Hint:Approximate a distribution by the truncated distributions for some smooth cutoff functions constructed using Exercise 1(iii).)

In a similar spirit, we can convolve a distribution by an absolutely integrable, compactly supported function . From Fubini’s theorem we observe the formula

for all test functions , where . Inspired by this formula, we define the convolution of a distribution with an absolutely integrable, compactly supported function by the formula

for all test functions . This gives a well-defined distribution (thanks to Exercise 4(vii)) which is compatible with previous notions of convolution.

Example 1One has for all test functions . In one dimension, we have (why?), thus differentiation can be viewed as convolution with a distribution.

A remarkable fact about convolutions of two functions is that they inherit the regularity of the *smoother* of the two factors (in contrast to products , which tend to inherit the regularity of the *rougher* of the two factors). (This disparity can be also be seen by contrasting the identity with the identity .) In the case of convolving distributions with test functions, this phenomenon is manifested as follows:

Lemma 2Let be a distribution, and let be a test function. Then is equal to a smooth function.

*Proof:* If were itself a smooth function, then one could easily verify the identity

where . As is a test function, it is easy to see that varies smoothly in in any norm (indeed, it has Taylor expansions to any order in such norms) and so the right-hand side is a smooth function of . So it suffices to verify the identity (3). As distributions are defined against test functions , it suffices to show that

On the other hand, we have from (2) that

So the only issue is to justify the interchange of integral and inner product:

Certainly, (from the compact support of ) any Riemann sum can be interchanged with the inner product:

where ranges over some lattice and is the volume of the fundamental domain. A modification of the argument that shows convergence of the Riemann integral for smooth, compactly supported functions then works here and allows one to take limits; we omit the details.

This has an important corollary:

Lemma 3Every distribution is the limit of a sequence of test functions. In particular, is dense in .

*Proof:* By Exercise 18, it suffices to verify this for compactly supported distributions . We let be a sequence of approximations to the identity. By Exercise 5(iii) and (2), we see that converges in the sense of distributions to . By Lemma 2, is a smooth function; as and are both compactly supported, is compactly supported also. The claim follows.

Because of this lemma, we can formalise the previous procedure of extending operations that were previously defined on test functions, to distributions, provided that these operations were continuous in distributional topologies. However, we shall continue to proceed by analogy as it requires fewer verifications in order to motivate the definition.

Exercise 19Another consequence of Lemma 2 is that it allows one to extend the definition (2) of convolution to the case when is not an integrable function of compact support, but is instead merely a distribution of compact support. Adopting this convention, show that convolution of distributions of compact support is both commutative and associative. (Hint:this can either be done directly, or by carefully taking limits using Lemma 3.)

The next operation we will introduce is that of differentiation. An integration by parts reveals the identity

for any test functions and . Inspired by this, we define the (distributional) partial derivative of a distribution by the formula

This can be verified to still be a distribution, and by Exercise 4(vi), the operation of differentiation is a continuous one on distributions. More generally, given any linear differential operator with smooth coefficients, one can define for a distribution by the formula

where is the adjoint differential operator , which can be defined implicitly by the formula

for test functions , or more explicitly by replacing all coefficients with complex conjugates, replacing each partial derivative with its negative, and reversing the order of operations (thus for instance the adjoint of the first-order operator would be ).

Example 2The distribution defined in Exercise 11 is the derivative of , as defined by the above formula.

Many of the identities one is used to in classical calculus extend to the distributional setting (as one would already expect from Lemma 3). For instance:

Exercise 20 (Product rule)Let be a distribution, and let be smooth. Show thatfor all .

Exercise 21Let . Show that in three different ways:

- Directly from the definitions;
- using the product rule;
- Writing as the limit of approximations to the identity.

- (i) Show that if is a distribution and is an integer, then if and only if is a linear combination of and its first derivatives .
- (ii) Show that a distribution is supported on if and only if it is a linear combination of and finitely many of its derivatives.
- (iii) Generalise (ii) to the case of general dimension (where of course one now uses partial derivatives instead of derivatives).

Exercise 23Let .

- Show that the derivative of the Heaviside function is equal to .
- Show that the derivative of the signum function is equal to .
- Show that the derivative of the locally integrable function is equal to .
- Show that the derivative of the locally integrable function is equal to the distribution from Exercise 13.
- Show that the derivative of the locally integrable function is the locally integrable function .

If a locally integrable function has a distributional derivative which is also a locally integrable function, we refer to the latter as the weak derivative of the former. Thus, for instance, the weak derivative of is (as one would expect), but does not have a weak derivative (despite being (classically) differentiable almost everywhere), because the distributional derivative of this function is not itself a locally integrable function. Thus weak derivatives differ in some respects from their classical counterparts, though of course the two concepts agree for smooth functions.

Exercise 24Let . Show that for any , and any distribution , we have , thus weak derivatives commute with each other. (This is in contrast to classical derivatives, which can fail to commute for non-smooth functions; for instance, at the origin , despite both derivatives being defined. More generally, weak derivatives tend to be less pathological than classical derivatives, but of course the downside is that weak derivatives do not always have a classical interpretation as a limit of a Newton quotient.)

Exercise 25Let , and let be an integer. Let us say that a compactly supported distributionhas of order at mostif the functional is continuous in the norm. Thus, for instance, has order at most , and has order at most , and every compactly supported distribution is of order at most for some sufficiently large .

- Show that if is a compactly supported distribution of order at most , then it is a compactly supported Radon measure.
- Show that if is a compactly supported distribution of order at most , then has order at most .
- Conversely, if is a compactly supported distribution of order , then we can write for some compactly supported distributions of order . (
Hint:one has to “dualise” the fundamental theorem of calculus, and then apply smooth cutoffs to recover compact support.)- Show that every compactly supported distribution can be expressed as a finite linear combination of (distributional) derivatives of compactly supported Radon measures.
- Show that every compactly supported distribution can be expressed as a finite linear combination of (distributional) derivatives of functions in , for any fixed .

We now set out some other operations on distributions. If we define the translation of a test function by a shift by the formula , then we have

for all test functions , so it is natural to define the translation of a distribution by the formula

Next, we consider linear changes of variable.

Exercise 26 (Linear changes of variable)Let , and let be a linear transformation. Given a distribution , let be the distribution given by the formulafor all test functions . (How would one motivate this formula?)

- Show that for all linear transformations .
- If , show that for all linear transformations .
- Conversely, if and is a distribution such that for all linear transformations . (
Hint:first show that there exists a constant such that whenever is a bump function supported in . To show this, approximate by the functionfor an approximation to the identity.)

Remark 2One can also compose distributions with diffeomorphisms. However, things become much more delicate if the map one is composing with contains stationary points; for instance, in one dimension, one cannot meaningfully make sense of (the composition of the Dirac delta distribution with ); this can be seen by first noting that for an approximation to the identity, does not converge to a limit in the distributional sense.

Exercise 27 (Tensor product of distributions)Let be integers. If and are distributions, show that there is a unique distribution with the property thatfor all test functions , , where is the tensor product of and . (

Hint:like many other constructions of tensor products, this is rather intricate. One way is to start by fixing two cutoff functions on respectively, and define on modulated test functions for various frequencies , and then use Fourier series to define on for smooth . Then show that these definitions of are compatible for different choices of and can be glued together to form a distribution; finally, go back and verify (4).)

We close this section with one caveat. Despite the many operations that one can perform on distributions, there are two types of operations which cannot, in general, be defined on arbitrary distributions (at least while remaining in the class of distributions):

- Nonlinear operations (e.g. taking the absolute value of a distribution); or
- Multiplying a distribution by anything rougher than a smooth function.

Thus, for instance, there is no meaningful way to interpret the square of the Dirac delta function as a distribution. This is perhaps easiest to see using an approximation to the identity: converges to in the sense of distributions, but does not converge to anything (the integral against a test function that does not vanish at the origin will go to infinity as ). For similar reasons, one cannot meaningfully interpret the absolute value of the derivative of the delta function. (One also cannot multiply by – why?)

Exercise 28Let be a normed vector space which contains as a dense subspace (and such that the inclusion of to is continuous). The adjoint (or transpose) of this inclusion map is then an injection from to the space of distributions ; thus can be viewed as a subspace of the space of distributions.

- Show that the closed unit ball in is also closed in the space of distributions.
- Conclude that any distributional limit of a bounded sequence in for , is still in .
- Show that the previous claim fails for , but holds for the space of finite measures.

** — 3. Tempered distributions — **

The list of operations one can define on distributions has one major omission – the Fourier transform . Unfortunately, one cannot easily define the Fourier transform for all distributions. One can see this as follows. From Plancherel’s theorem one has the identity

for test functions , so one would like to define the Fourier transform of a distribution by the formula

Unfortunately this does not quite work, because the adjoint Fourier transform of a test function is not a test function, but is instead just a Schwartz function. (Indeed, by Exercise 46 of Notes 2, it is not possible to find a non-trivial test function whose Fourier transform is again a test function.) To address this, we need to work with a slightly smaller space than that of all distributions, namely those of *tempered* distributions:

Definition 4 (Tempered distributions)A tempered distribution is a continuous linear functional on the Schwartz space (with the topology given by Exercise 25 of Notes 2), i.e. an element of .

Since embeds continuously into (with a dense image), we see that the space of tempered distributions can be embedded into the space of distributions. However, not every distribution is tempered:

Example 3The distribution is not tempered. Indeed, if is a bump function, observe that the sequence of functions converges to zero in the Schwartz space topology, but does not go to zero, and so this distribution does not correspond to a tempered distribution.

On the other hand, distributions which avoid this sort of exponential growth, and instead only grow polynomially, tend to be tempered:

Exercise 29Show that any Radon measure which is ofpolynomial growthin the sense that for all and some constants , where is the ball of radius centred at the origin in , is tempered.

Remark 3As a zeroth approximation, one can roughly think of “tempered” as being synonymous with “polynomial growth”. However, this is not strictly true: for instance, the (weak) derivative of a function of polynomial growth will still be tempered, but need not be of polynomial growth (for instance, the derivative of is a tempered distribution, despite having exponential growth). While one can eventually describe which distributions are tempered by measuring their “growth” in both physical space and in frequency space, we will not do so here.

Most of the operations that preserve the space of distributions, also preserve the space of tempered distributions. For instance:

Exercise 30

- Show that any derivative of a tempered distribution is again a tempered distribution.
- Show that and any convolution of a tempered distribution with a compactly supported distribution is again a tempered distribution.
- Show that if is a measurable function which is
rapidly decreasingin the sense that is an function for each , then a convolution of a tempered distribution with can be defined, and is again a tempered distribution.- Show that if is a smooth function such that and all its derivatives have
at most polynomial growth(thus for each there exists such that for all ) then the product of a tempered distribution with is again a tempered distribution. Give a counterexample to show that this statement fails if the polynomial growth hypotheses are dropped.- Show that the translate of a tempered distribution is again a tempered distribution.

But we can now add a new operation to this list using (5): as the Fourier transform maps Schwartz functions continuously to Schwartz functions, it also continuously maps the space of tempered distributions to itself. One can also define the inverse Fourier transform on tempered distributions in a similar manner.

It is not difficult to extend many of the properties of the Fourier transform from Schwartz functions to distributions. For instance:

Exercise 31Let be a tempered distribution, and let be a Schwartz function.

- (Inversion formula) Show that .
- (Multiplication intertwines with convolution) Show that and .
- (Translation intertwines with modulation) For any , show that , where . Similarly, show that for any , one has .
- (Linear transformations) For any invertible linear transformation , show that .
- (Differentiation intertwines with polynomial multiplication) For any , show that , where and is the coordinate function in physical space and frequency space respectively, and similarly .

Exercise 32Let .

- (Inversion formula) Show that and .
- (Orthogonality) Let be a subspace of , and let be Lebesgue measure on . Show that is Lebesgue measure on the orthogonal complement of . (Note that this generalises the previous exercise.)
- (Poisson summation formula) Let be the distribution
Show that this is a tempered distribution which is equal to its own Fourier transform.

One can use these properties of tempered distributions to start solving constant-coefficient PDE. We first illustrate this by an ODE example, showing how the formal symbolic calculus for solving such ODE that you may have seen as an undergraduate, can now be (sometimes) justified using tempered distributions.

Exercise 33Let , let be real numbers, and let be the operator .

- If , use the Fourier transform to show that all tempered distribution solutions to the ODE are of the form for some constants .
- If , show that all tempered distribution solutions to the ODE are of the form for some constants .

Remark 4More generally, one can solve any homogeneous constant-coefficient ODE using tempered distributions and the Fourier transform so long as the roots of the characteristic polynomial are purely imaginary. In all other cases, solutions can grow exponentially as or and so are not tempered. There are other theories of generalised functions that can handle these objects (e.g. hyperfunctions) but we will not discuss them here.

Now we turn to PDE. To illustrate the method, let us focus on solving Poisson’s equation

in , where is a Schwartz function and is a distribution, where is the Laplacian. (In some texts, particularly those using spectral analysis, the Laplacian is occasionally defined instead as , to make it positive semi-definite, but we will eschew that sign convention here, though of course the theory is only changed in a trivial fashion if one adopts it.)

We first settle the question of uniqueness:

Exercise 34Let . Using the Fourier transform, show that the only tempered distributions which are harmonic (by which we mean that in the sense of distributions) are the harmonic polynomials. (Hint:use Exercise 22.) Note that this generalises Liouville’s theorem. There are of course many other harmonic functions than the harmonic polynomials, e.g. , but such functions are not tempered distributions.

From the above exercise, we know that the solution to (6), if tempered, is defined up to harmonic polynomials. To find a solution, we observe that it is enough to find a *fundamental solution*, i.e. a tempered distribution solving the equation

Indeed, if one then convolves this equation with the Schwartz function , and uses the identity (which can either be seen directly, or by using Exercise 31), we see that will be a tempered distribution solution to (6) (and all the other solutions will equal this solution plus a harmonic polynomial). So, it is enough to locate a fundamental solution . We can take Fourier transforms and rewrite this equation as

(here we are treating the tempered distribution as a function to emphasise that the dependent variable is now ). It is then natural to propose to solve this equation as

though this may not be the unique solution (for instance, one is free to modify by a multiple of the Dirac delta function, cf. Exercise 16).

A short computation in polar coordinates shows that is locally integrable in dimensions , so the right-hand side of (7) makes sense. To then compute explicitly, we have from the distributional inversion formula that

so we now need to figure out what the Fourier transform of a negative power of (or the adjoint Fourier transform of a negative power of ) is.

Let us work formally at first, and consider the problem of computing the Fourier transform of the function in for some exponent . A direct attack, based on evaluating the (formal) Fourier integral

does not seem to make much sense (the integral is not absolutely integrable), although a change of variables (or dimensional analysis) heuristic can at least lead to the prediction that the integral (8) should be some multiple of . But which multiple should it be? To continue the formal calculation, we can write the non-integrable function as an average of integrable functions whose Fourier transforms are already known. There are many such functions that one could use here, but it is natural to use Gaussians, as they have a particularly pleasant Fourier transform, namely

for (see Exercise 42 of Notes 2). To get from Gaussians to , one can observe that is invariant under the scaling for . Thus, it is natural to average the standard Gaussian with respect to this scaling, thus producing the function , then integrate with respect to the multiplicative Haar measure . A straightforward change of variables then gives the identity

where

is the Gamma function. If we formally take Fourier transforms of this identity, we obtain

Another change of variables shows that

and so we conclude (formally) that

thus solving the problem of what the constant multiple of should be.

Exercise 35Give a rigorous proof of (9) for (when both sides are locally integrable) in the sense of distributions. (Hint:basically, one needs to test the entire formal argument against an arbitrary Schwartz function.) The identity (9) can in fact be continued meromorphically in , but the interpretation of distributions such as when is not locally integrable is somewhat complicated (cf. Exercise 12) and will not be discussed here.

Specialising back to the current situation with , and using the standard identities

we see that

and similarly

and so from (7) we see that one choice of the fundamental solution is the Newton potential

leading to an explicit (and rigorously derived) solution

to the Poisson equation (6) in for Schwartz functions . (This is not quite the only fundamental solution available; one can add a harmonic polynomial to , which will end up adding a harmonic polynomial to , since the convolution of a harmonic polynomial with a Schwartz function is easily seen to still be harmonic.)

Exercise 36Without using the theory of distributions, give an alternate (and still rigorous) proof that the function defined in (10) solves (6) in .

Exercise 37

- Show that for any , a fundamental solution to the Poisson equation is given by the locally integrable function
where is the volume of the unit ball in dimensions.

- Show that for , a fundamental solution is given by the locally integrable function .
- Show that for , a fundamental solution is given by the locally integrable function .
This we see that for the Poisson equation, is a “critical” dimension, requiring a logarithmic correction to the usual formula.

Similar methods can solve other constant coefficient linear PDE. We give some standard examples in the exercises below.

Exercise 38Let . Show that a smooth solution to the heat equation with initial data for some Schwartz function is given by for , where is the heat kernel(This solution is unique assuming certain smoothness and decay conditions at infinity, but we will not pursue this issue here.)

Exercise 39Let . Show that a smooth solution to the Schrödinger equation with initial data for some Schwartz function is given by for , where is theSchrödinger kerneland we use the standard branch of the complex logarithm (with cut on the negative real axis) to define . (

Hint:You may wish to investigate the Fourier transform of , where is a complex number with positive real part, and then let approach the imaginary axis.) (The close similarity with the heat kernel is a manifestation of Wick rotation in action. However, from an analytical viewpoint, the two kernels are very different. For instance, the convergence of to as follows in the heat kernel case by the theory of approximations to the identity, whereas the convergence in the Schrödinger case is much more subtle, and is best seen via Fourier analysis.)

Exercise 40Let . Show that a smooth solution to the wave equation with initial data for some Schwartz functions is given by the formulafor , where is the distribution

where is Lebesgue measure on the sphere , and the derivative is defined in the Newtonian sense , with the limit taken in the sense of distributions.

Remark 5The theory of (tempered) distributions is also highly effective for studying variable coefficient linear PDE, especially if the coefficients are fairly smooth, and particularly if one is primarily interested in the singularities of solutions to such PDE and how they propagate; here the Fourier transform must be augmented with more general transforms of this type, such as Fourier integral operators. A classic reference for this topic is the four volumes of Hörmander’s “The analysis of linear partial differential operators”. For nonlinear PDE, subspaces of the space of distributions, such as Sobolev spaces, tend to be more useful.

## 95 comments

Comments feed for this article

8 May, 2014 at 10:22 pm

AnonymousDear Dr. Tao,

In the process of solving Exercise 40, how do we define the in Fourier transform of a tempered distribution? For, example, let \lambda be a tempered distribution then (F\lambda) (f)=\lambda(Ff) or (F\lambda) (f)=\lambda(F^{-1}f), where F is denoting the Fourier transform?

9 May, 2014 at 7:25 am

Terence TaoSee equation (5).

9 May, 2014 at 8:09 pm

AnonymousDear Dr. Tao,

Are we supposed to use Kirchhoff’s formula to get the distribution K_{t} in Exercise 40?

Thanks

.

14 May, 2014 at 8:13 pm

AnonymousDear Dr. Tao,

In Exercise 40, what is the meaning of “Newtonian Sense” and how to use it in solving this Exercise?

Thanks!

14 May, 2014 at 8:28 pm

Terence Taohttp://en.wikipedia.org/wiki/Newton_quotient

14 May, 2014 at 10:47 pm

AnonymousJust a minor comment for correction, in Exercise 40 isn’t it “… for some Schwartz functions {f} and {g} …” instead of “… for some Schwartz functions {f} is given by the formula …?”

[Corrected, thanks – T.]20 August, 2015 at 5:04 am

KE operator and eigenfunctions[…] I could explain what's going on but it will involve a long sojourn into distribution theory: https://terrytao.wordpress.com/2009/04/19/245c-notes-3-distributions/#more-2072 Really your teacher needs to explain it – post here with what he/she says – it should prove […]

26 August, 2015 at 6:52 am

AnonymousIn some books, a distribution on is defined as a linear functional with the following property

Suppose is a sequence in such that

There exists a compact subset of such that

for all

and

uniformly

as for all

Then as .

Is it exactly the same as the distributions defined in this note?

[Yes; see Exercise 4(iii). -T.]27 August, 2015 at 10:52 am

AnonymousWhat is the weak derivative of the distribution defined in Exercise 12? It seems that p.v.(1/x^2) is not well defined…

27 August, 2015 at 11:17 am

Terence TaoIf one chases the definitions and integrates by parts, one has

So the derivative of p.v. 1/x is a renormalised principal value of -1/x^2.

27 August, 2015 at 12:10 pm

AnonymousHmm, how can one get the last line?

[Taylor expansion -T.]28 August, 2015 at 7:21 am

AnonymousIt seems that this relates to https://en.wikipedia.org/wiki/Hadamard_regularization

But I don’t see why the limit exists…

28 August, 2015 at 7:35 am

AnonymousCan one say that the limit in the last line exists since it is “equal” to , the existence of which has been proven in Ex. 12? It looks like a circular argument to me since in the very beginning one assumes the existence of .

[Yes, the existence of the earlier limits (which indeed follows from the easily verified fact that the distributional derivative of a distribution is again a distribution, together with Exercise 12) can be used to establish the existence of the later limits. It is also instructive to establish (again, using Taylor expansion) directly that the later limit is of a Cauchy sequence and thus convergent. -T.]29 August, 2015 at 1:25 pm

AnonymousI read somewhere before but I don’t remember exactly in what book:

Let $(a_n)$ be a strictly decreasing (I don’t remember if it is increasing or decreasing) positive sequence such that . If we defines

where , then converges (uniformly) to a function in .

Has anybody here seen a reference for such constructions of a nontrivial test function before? It seems that Exercise 1(i) is more standard.

18 September, 2015 at 1:25 pm

AnonymousStop cheating on math M541 homework.

2 November, 2015 at 7:06 pm

275A, Notes 4: The central limit theorem | What's new[…] in the sense of distributions that arises in distribution theory (discussed for instance in this previous blog post), however strictly speaking the two notions of convergence are distinct and should not be confused […]

9 November, 2015 at 6:52 am

AnonymousWhen construct an approximation to the identity, is there a particular reason that one chooses sequence instead of a continuous version ?

9 November, 2015 at 7:00 am

AnonymousIn Exercise 5(ii), does one need to

assumethat first in order to argue that “ converges in to “?9 November, 2015 at 10:25 am

Terence TaoNo, this is implicitly part of the conclusion (but this follows easily from Young’s inequality or Minkowski’s inequality).

If one only assumes local integrability or on , then one needs compact support of the approximating sequence , as otherwise the tail of could interact with an arbitrarily large growth of and it is not even immediate that is anywhere finite, let alone convergent to anything interesting.

In most applications there is not much difference between using a discrete sequence of approximations rather than a continuous sequence, but using a countable discrete sequence makes it essentially trivial to establish measurability of any limit objects obtained, and also there are some sequential compactness results one can exploit when working with a discrete sequence that are not easily available in the continuous setting. (Of course in many situations one can use topological compactness as a substitute for sequential compactness, e.g. use topological Banach-Alaoglu in place of sequential Banach-Alaoglu.)

9 November, 2015 at 7:12 am

AnonymousIf is locally integrable or more generally , can one have the similar fact in Exercise 5?

9 November, 2015 at 11:55 am

AnonymousIn Exercise 1(iv), what is ? I’ve looked it up in your textbook and I didn’t find it there.

[See Exercise 1.10.7 (or Exercise 7 of 245B Notes 12), or also Exercise 1.10.17, Remark 1.10.16, etc..]9 November, 2015 at 12:03 pm

AnonymousIn Exercise 9

if a sequence of locally integrable functions converge in to a limit…What is the topology defined for ?

[The topology of is the topology generated by the seminorms for compact , as in Example 1.9.5. -T]30 November, 2015 at 5:49 pm

AnonymousCan one safely replace with any open set in this note to get the theory of distribution on ?

30 November, 2015 at 6:53 pm

Terence TaoThe theory of distributions localises fairly easily (basically just replace with ). But the theory of

tempereddistributions is significantly more difficult to work with on domains, because it is not obvious how to define the Schwartz class on a domain (and because the main tool that makes the tempered distribution concept useful, namely the distributional Fourier transform, is not obviously available).31 December, 2015 at 2:39 pm

AnonymousGiven a distribution , can we in general find a distribution with as its distributional derivative?

[Yes; this is a good exercise for you to establish. The key point is that the test functions of mean zero form a hyperplane in the space of all test functions, in that any test function can be made mean zero by subtraction of a scalar multiple of a fixed reference test function. -T.]20 January, 2016 at 11:50 am

AnonymousHow do we see that ” embeds ‘continuously’ into ”

[Easiest way is via nets – show that every net that converges in also converges in the Schwartz topology. Or one can explicitly show that pullbacks of basic open sets are open. -T.]