In set theory, a function {f: X \rightarrow Y} is defined as an object that evaluates every input {x} to exactly one output {f(x)}. However, in various branches of mathematics, it has become convenient to generalise this classical concept of a function to a more abstract one. For instance, in operator algebras, quantum mechanics, or non-commutative geometry, one often replaces commutative algebras of (real or complex-valued) functions on some space {X}, such as {C(X)} or {L^\infty(X)}, with a more general – and possibly non-commutative – algebra (e.g. a {C^*}-algebra or a von Neumann algebra). Elements in this more abstract algebra are no longer definable as functions in the classical sense of assigning a single value {f(x)} to every point {x \in X}, but one can still define other operations on these “generalised functions” (e.g. one can multiply or take inner products between two such objects).

Generalisations of functions are also very useful in analysis. In our study of {L^p} spaces, we have already seen one such generalisation, namely the concept of a function defined up to almost everywhere equivalence. Such a function {f} (or more precisely, an equivalence class of classical functions) cannot be evaluated at any given point {x}, if that point has measure zero. However, it is still possible to perform algebraic operations on such functions (e.g. multiplying or adding two functions together), and one can also integrate such functions on measurable sets (provided, of course, that the function has some suitable integrability condition). We also know that the {L^p} spaces can usually be described via duality, as the dual space of {L^{p'}} (except in some endpoint cases, namely when {p=\infty}, or when {p=1} and the underlying space is not {\sigma}-finite).

We have also seen (via the Lebesgue-Radon-Nikodym theorem) that locally integrable functions {f \in L^1_{\hbox{loc}}({\bf R})} on, say, the real line {{\bf R}}, can be identified with locally finite absolutely continuous measures {m_f} on the line, by multiplying Lebesgue measure {m} by the function {f}. So another way to generalise the concept of a function is to consider arbitrary locally finite Radon measures {\mu} (not necessarily absolutely continuous), such as the Dirac measure {\delta_0}. With this concept of “generalised function”, one can still add and subtract two measures {\mu, \nu}, and integrate any measure {\mu} against a (bounded) measurable set {E} to obtain a number {\mu(E)}, but one cannot evaluate a measure {\mu} (or more precisely, the Radon-Nikodym derivative {d\mu/dm} of that measure) at a single point {x}, and one also cannot multiply two measures together to obtain another measure. From the Riesz representation theorem, we also know that the space of (finite) Radon measures can be described via duality, as linear functionals on {C_c({\bf R})}.

There is an even larger class of generalised functions that is very useful, particularly in linear PDE, namely the space of distributions, say on a Euclidean space {{\bf R}^d}. In contrast to Radon measures {\mu}, which can be defined by how they “pair up” against continuous, compactly supported test functions {f \in C_c({\bf R}^d)} to create numbers {\langle f, \mu \rangle := \int_{{\bf R}^d} f\ d\overline{\mu}}, a distribution {\lambda} is defined by how it pairs up against a smooth compactly supported function {f \in C^\infty_c({\bf R}^d)} to create a number {\langle f, \lambda \rangle}. As the space {C^\infty_c({\bf R}^d)} of smooth compactly supported functions is smaller than (but dense in) the space {C_c({\bf R}^d)} of continuous compactly supported functions (and has a stronger topology), the space of distributions is larger than that of measures. But the space {C^\infty_c({\bf R}^d)} is closed under more operations than {C_c({\bf R}^d)}, and in particular is closed under differential operators (with smooth coefficients). Because of this, the space of distributions is similarly closed under such operations; in particular, one can differentiate a distribution and get another distribution, which is something that is not always possible with measures or {L^p} functions. But as measures or functions can be interpreted as distributions, this leads to the notion of a weak derivative for such objects, which makes sense (but only as a distribution) even for functions that are not classically differentiable. Thus the theory of distributions can allow one to rigorously manipulate rough functions “as if” they were smooth, although one must still be careful as some operations on distributions are not well-defined, most notably the operation of multiplying two distributions together. Nevertheless one can use this theory to justify many formal computations involving derivatives, integrals, etc. (including several computations used routinely in physics) that would be difficult to formalise rigorously in a purely classical framework.

If one shrinks the space of distributions slightly, to the space of tempered distributions (which is formed by enlarging dual class {C^\infty_c({\bf R}^d)} to the Schwartz class {{\mathcal S}({\bf R}^d)}), then one obtains closure under another important operation, namely the Fourier transform. This allows one to define various Fourier-analytic operations (e.g. pseudodifferential operators) on such distributions.

Of course, at the end of the day, one is usually not all that interested in distributions in their own right, but would like to be able to use them as a tool to study more classical objects, such as smooth functions. Fortunately, one can recover facts about smooth functions from facts about the (far rougher) space of distributions in a number of ways. For instance, if one convolves a distribution with a smooth, compactly supported function, one gets back a smooth function. This is a particularly useful fact in the theory of constant-coefficient linear partial differential equations such as {Lu=f}, as it allows one to recover a smooth solution {u} from smooth, compactly supported data {f} by convolving {f} with a specific distribution {G}, known as the fundamental solution of {L}. We will give some examples of this later in these notes.

It is this unusual and useful combination of both being able to pass from classical functions to generalised functions (e.g. by differentiation) and then back from generalised functions to classical functions (e.g. by convolution) that sets the theory of distributions apart from other competing theories of generalised functions, in particular allowing one to justify many formal calculations in PDE and Fourier analysis rigorously with relatively little additional effort. On the other hand, being defined by linear duality, the theory of distributions becomes somewhat less useful when one moves to more nonlinear problems, such as nonlinear PDE. However, they still serve an important supporting role in such problems as a “ambient space” of functions, inside of which one carves out more useful function spaces, such as Sobolev spaces, which we will discuss in the next set of notes.

— 1. Smooth functions with compact support —

In the rest of the notes we will work on a fixed Euclidean space {{\bf R}^d}. (One can also define distributions on other domains related to {{\bf R}^d}, such as open subsets of {{\bf R}^d}, or {d}-dimensional manifolds, but for simplicity we shall restrict attention to Euclidean spaces in these notes.)

A test function is any smooth, compactly supported function {f: {\bf R}^d \rightarrow {\bf C}}; the space of such functions is denoted {C^\infty_c({\bf R}^d)}. (In some texts, this space is denoted {C^\infty_0({\bf R}^d)} instead.)

From analytic continuation one sees that there are no real-analytic test functions other than the zero function. Despite this negative result, test functions actually exist in abundance:

Exercise 1

  • (i) Show that there exists at least one test function that is not identically zero. (Hint: it suffices to do this for {d=1}. One starting point is to use the fact that the function {f: {\bf R} \rightarrow {\bf R}} defined by {f(x) := e^{-1/x}} for {x > 0} and {f(x) := 0} otherwise is smooth, even at the origin {0}.)
  • (ii) Show that if {f \in C^\infty_c({\bf R}^d)} and {g: {\bf R}^d \rightarrow {\bf R}} is absolutely integrable and compactly supported, then the convolution {f*g} is also in {C^\infty_c({\bf R}^d)}. (Hint: first show that {f*g} is continuously differentiable with {\nabla(f*g) = (\nabla f)*g}.)
  • (iii) ({C^\infty} Urysohn lemma) Let {K} be a compact subset of {{\bf R}^d}, and let {U} be an open neighbourhood of {K}. Show that there exists a function {f: C^\infty_c({\bf R}^d)} supported in {U} which equals {1} on {K}. (Hint: use the ordinary Urysohn lemma to find a function in {C_c({\bf R}^d)} that equals {1} on a neighbourhood of {K} and is supported in a compact subset of {U}, then convolve this function by a suitable test function.)
  • (iv) Show that {C^\infty_c({\bf R}^d)} is dense in {C_0({\bf R}^d)} (in the uniform topology), and dense in {L^p({\bf R}^d)} (with the {L^p} topology) for all {0 < p < \infty}.

The space {C^\infty_c({\bf R}^d)} is clearly a vector space. Now we place a (very strong!) topology on it. We first observe that {C^\infty_c({\bf R}^d) = \bigcup_K C^\infty_c(K)}, where {K} ranges over all compact subsets of {{\bf R}^d} and {C^\infty_c(K)} consists of those functions {f \in C^\infty_c({\bf R}^d)} which are supported in {K}. Each {C^\infty_c(K)} will be given a topology (called the smooth topology) generated by the norms

\displaystyle  \| f \|_{C^k} := \sup_{x \in {\bf R}^d} \sum_{j=0}^k|\nabla^j f(x)|

for {k=0,1,\ldots}, where we view {\nabla^j f(x)} as a {d^j}-dimensional vector (or, if one wishes, a {d}-dimensional rank {j} tensor); thus a sequence {f_n \in C^\infty_c(K)} converges to a limit {f \in C^\infty_c(K)} if and only if {\nabla^j f_n} converges uniformly to {\nabla^j f} for all {j=0,1,\ldots}. (This gives {C^\infty_c(K)} the structure of a Fréchet space, though we will not use this fact here.)

We make the trivial remark that if {K \subset K'} are compact sets, then {C^\infty_c(K)} is a subspace of {C^\infty_c(K')}, and the topology on the former space is the restriction of the topology of the latter space. Because of this, we are able to give {C^\infty_c({\bf R}^d)} a (very strong) topology as follows. Call a seminorm {\| \|} on {C^\infty_c({\bf R}^d)} good if it is continuous function on {C^\infty_c(K)} for each compact {K} (or equivalently, the ball {\{ f \in C^\infty_c(K): \|f\| < 1 \}} is open in {C^\infty_c(K)} for each compact {K}). We then give {C^\infty_c({\bf R}^d)} the topology defined by all good seminorms. Clearly, this makes C^\infty_c({\bf R})^d a (locally convex) topological vector space.

Exercise 2 Let {f_n} be a sequence in {C^\infty_c({\bf R}^d)}, and let {f} be another function in {C^\infty_c({\bf R}^d)}. Show that {f_n} converges in the topology of {C^\infty_c({\bf R}^d)} to {f} if and only if there exists a compact set {K} such that {f_n, f} are all supported in {K}, and {f_n} converges to {f} in the smooth topology of {C^\infty_c(K)}.

Exercise 3

  • (i) Show that the topology of {C^\infty_c(K)} is first countable for every compact {K}.
  • (ii) Show that the topology of {C^\infty_c({\bf R}^d)} is not first countable. (Hint: given any countable sequence of open neighbourhoods of {0}, build a new open neighbourhood that does not contain any of the previous ones, using the {\sigma}-compact nature of {{\bf R}^d}.)
  • (iii) As an additional challenge, construct a set {E \subset C^\infty({\bf R}^d)} such that {0} is an adherent point of {E}, but {0} is not as the limit of any sequence in {E}.

There are plenty of continuous operations on {C^\infty_c({\bf R}^d)}:

Exercise 4

  • (i) Let {K} be a compact set. Show that a linear map {T: C^\infty_c(K) \rightarrow X} into a normed vector space {X} is continuous if and only if there exists {k \geq 0} and {C > 0} such that {\| Tf \|_X \leq C \|f\|_{C^{k}}} for all {f \in C^\infty_c(K)}.
  • (ii) Let {K, K'} be compact sets. Show that a linear map {T: C^\infty_c(K) \rightarrow C^\infty_c(K')} is continuous if and only if for every {k \geq 0} there exists {k' \geq 0} and a constant {C_k > 0} such that {\| Tf \|_{C^k} \leq C_k \|f\|_{C^{k'}}} for all {f \in C^\infty_c(K)}.
  • (iii) Show that a linear map {T: C^\infty_c({\bf R}^d) \rightarrow X} from the space of test functions into a topological vector space generated by some family of seminorms (i.e., a locally convex topological vector space) is continuous if and only if it is sequentially continuous (i.e. whenever {f_n} converges to {f} in {C^\infty_c({\bf R}^d)}, {Tf_n} converges to {Tf} in {X}), and if and only if {T: C^\infty_c(K) \rightarrow X} is continuous for each compact {K \subset {\bf R}^d}. Thus while first countability fails for {C^\infty_c({\bf R}^d)}, we have a serviceable substitute for this property.
  • (iv) Show that the inclusion map from {C^\infty_c({\bf R}^d)} to {L^p({\bf R}^d)} is continuous for every {0 < p \leq \infty}.
  • (v) Show that a map {T: C^\infty_c({\bf R}^d) \rightarrow C^\infty_c({\bf R}^d)} is continuous if and only if for every compact set {K \subset {\bf R}^d} there exists a compact set {K'} such that {T} maps {C^\infty_c(K)} continuously to {C^\infty_c(K')}.
  • (vi) Show that every linear differential operator with smooth coefficients is a continuous operation on {C^\infty_c({\bf R}^d)}.
  • (vii) Show that convolution with any absolutely integrable, compactly supported function is a continuous operation on {C^\infty_c({\bf R}^d)}.
  • (viii) Show that the product operation {f, g \mapsto fg} is continuous from {C^\infty_c({\bf R}^d) \times C^\infty_c({\bf R}^d)} to {C^\infty_c({\bf R}^d)}.

A sequence {\phi_n \in C_c({\bf R}^d)} of continuous, compactly supported functions is said to be an approximation to the identity if the {\phi_n} are non-negative, have total mass {\int_{{\bf R}^n} \phi_n} equal to {1}, and whose supports shrink to the origin, thus for any fixed {r}, {\phi_n} is supported on the ball {B(0,r)} for {n} sufficiently large. One can generate such a sequence by starting with a single non-negative continuous compactly supported function {\phi} of total mass {1}, and then setting {\phi_n(x) := n^d \phi(nx)}; many other constructions are possible also.

One has the following useful fact:

Exercise 5 Let {\phi_n \in C^\infty_c({\bf R}^d)} be a sequence of approximations to the identity.

  • (i) If {f \in C({\bf R}^d)} is continuous, show that {f*\phi_n} converges uniformly on compact sets to {f}.
  • (ii) If {f \in L^p({\bf R}^d)} for some {1 \leq p < \infty}, show that {f*\phi_n} converges in {L^p({\bf R}^d)} to {f}. (Hint: use (i), the density of {C_c({\bf R}^d)} in {L^p({\bf R}^d)}, and Young’s inequality.)
  • (iii) If {f \in C^\infty_c({\bf R}^d)}, show that {f*\phi_n} converges in {C^\infty_c({\bf R}^d)} to {f}. (Hint: use the identity {\nabla( f * \phi_n ) = (\nabla f ) * \phi_n}, cf. Exercise 1(ii).)

Exercise 6 Show that {C^\infty_c({\bf R}^d)} is separable. (Hint: it suffices to show that {C^\infty_c(K)} is separable for each compact {K}. There are several ways to accomplish this. One is to begin with the Stone-Weierstrass theorem, which will give a countable set which is dense in the uniform topology, then use the fundamental theorem of calculus to strengthen the topology. Another is to use Exercise 5 and then discretise the convolution. Another is to embed {K} into a torus and use Fourier series, noting that the Fourier coefficients {\hat f} of a smooth function {f: {\Bbb T}^d \rightarrow {\bf C}} decay faster than any power of {|n|}.)

— 2. Distributions —

Now we can define the concept of a distribution.

Definition 1 (Distribution) A distribution on {{\bf R}^d} is a continuous linear functional {\lambda: f \mapsto \langle f,\lambda\rangle} from {C^\infty_c({\bf R}^d)} to {{\bf C}}. The space of such distributions is denoted {C^\infty_c({\bf R}^d)^*}, and is given the weak-* topology. In particular, a sequence of distributions {\lambda_n} converges (in the sense of distributions) to a limit {\lambda} if one has {\langle f, \lambda_n \rangle \rightarrow \langle f,\lambda \rangle} for all {f \in C^\infty_c({\bf R}^d)}.

A technical point: we endow the space {C^\infty_c({\bf R}^d)^*} with the conjugate complex structure. Thus, if {\lambda \in C^\infty_c({\bf R}^d)^*}, and {c} is a complex number, then {c\lambda} is the distribution that maps a test function {f} to {\overline{c} \langle f, \lambda \rangle} rather than {c \langle f, \lambda \rangle}; thus {\langle f, c\lambda \rangle = \overline{c} \langle f, \lambda \rangle}. This is to keep the analogy between the evaluation of a distribution against a function, and the usual Hermitian inner product {\langle f, g \rangle = \int_{{\bf R}^d} f \overline{g}} of two test functions.

From Exercise 4, we see that a linear functional {\lambda: C^\infty_c({\bf R}^d)\rightarrow {\bf C}} is a distribution if, for every compact set {K \subset {\bf R}^d}, there exists {k \geq 0} and {C > 0} such that

\displaystyle  |\langle f,\lambda\rangle| \leq C \|f\|_{C^k} \ \ \ \ \ (1)

for all {f \in C^\infty_c(K)}.

Exercise 7 Show that {C^\infty_c({\bf R}^d)^*} is a Hausdorff topological vector space.

We note two basic examples of distributions:

  • Any locally integrable function {g \in L^1_{\hbox{loc}}({\bf R}^d)} can be viewed as a distribution, by writing {\langle f, g \rangle := \int_{{\bf R}^d} f(x) \overline{g(x)}\ dx} for all test functions {f}.
  • Any complex Radon measure {\mu} can be viewed as a distribution, by writing {\langle f, \mu \rangle := \int_{{\bf R}^d} f(x)\ d\overline{\mu}}, where {\overline{\mu}} is the complex conjugate of {\mu} (thus {\overline{\mu}(E) := \overline{\mu(E)}}). (Note that this example generalises the preceding one, which corresponds to the case when {\mu} is absolutely continuous with respect to Lebesgue measure.) Thus, for instance, the Dirac measure {\delta} at the origin is a distribution, with {\langle f, \delta \rangle = f(0)} for all test functions {f}.

Exercise 8 Show that the above identifications of locally integrable functions or complex Radon measures with distributions are injective. (Hint: use Exercise 1(iv).)

From the above exercise, we may view locally integrable functions and locally finite measures as a special type of distribution. In particular, {C^\infty_c({\bf R}^d)} and {L^p({\bf R}^d)} are now contained in {C^\infty_c({\bf R}^d)^*} for all {1 \leq p \leq \infty}.

Exercise 9 Show that if a sequence of locally integrable functions converge in {L^1_{\hbox{loc}}} to a limit, then they also converge in the sense of distributions; similarly, if a sequence of complex Radon measures converge in the vague topology to a limit, then they also converge in the sense of distributions.

Thus we see that convergence in the sense of distributions is among the weakest of the notions of convergence used in analysis; however, from the Hausdorff property, distributional limits are still unique.

Exercise 10 If {\phi_n} is a sequence of approximations to the identity, show that {\phi_n} converges in the sense of distributions to the Dirac distribution {\delta}.

More exotic examples of distributions can be given:

Exercise 11 (Derivative of the delta function) Let {d=1}. Show that the functional {\delta': f \mapsto - f'(0)} for all test functions {f} is a distribution which does not arise from either a locally integrable function or a Radon measure. (Note how it is important here that {f} is smooth (and in particular differentiable, and not merely continuous.) The presence of the minus sign will be explained shortly.

Exercise 12 (Principal value of {1/x}) Let {d=1}. Show that the functional {\hbox{p.v.} 1/x} defined by the formula

\displaystyle  \langle f, \hbox{p.v.} \frac{1}{x} \rangle := \lim_{\epsilon \rightarrow 0} \int_{|x| > \epsilon} \frac{f(x)}{x}\ dx

is a distribution which does not arise from either a locally integrable function or a Radon measure. (Note that {1/x} is not a locally integrable function!)

Exercise 13 (Distributional interpretations of {1/|x|}) Let {d=1}. For any {r > 0}, show that the functional {\lambda_r} defined by the formula

\displaystyle  \langle f, \lambda_r \rangle := \int_{|x| < r} \frac{f(x)-f(0)}{|x|}\ dx + \int_{|x| \geq r} \frac{f(x)}{|x|}\ dx

is a distribution that does not arise from either a locally integrable function or a Radon measure. Note that any two such functionals {\lambda_r, \lambda_{r'}} differ by a constant multiple of the Dirac delta distribution.

Exercise 14 A distribution {\lambda} is said to be real if {\langle f,\lambda\rangle} is real for every real-valued test function {f}. Show that every distribution {\lambda} can be uniquely expressed as {\hbox{Re}(\lambda) + i \hbox{Im}(\lambda)} for some real distributions {\hbox{Re}(\lambda), \hbox{Im}(\lambda)}.

Exercise 15 A distribution {\lambda} is said to be non-negative if {\langle f,\lambda\rangle} is non-negative for every non-negative test function {f}. Show that a distribution is non-negative if and only if it is a non-negative Radon measure. (Hint: use the Riesz representation theorem and Exercise 1(iv).) Note that this implies that the analogue of the Jordan decomposition fails for distributions; any distribution which is not a Radon measure will not be the difference of non-negative distributions.

We will now extend various operations on locally integrable functions or Radon measures to distributions by arguing by analogy. (Shortly we will give a more formal approach, based on density.)

We begin with the operation of multiplying a distribution {\lambda} by a smooth function {h: {\bf R}^d \rightarrow {\bf C}}. Observe that

\displaystyle  \langle f, gh \rangle = \langle f \overline{h}, g \rangle

for all test functions {f,g,h}. Inspired by this formula, we define the product {\lambda h = h \lambda} of a distribution with a smooth function by setting

\displaystyle  \langle f, \lambda h \rangle := \langle f \overline{h}, \lambda \rangle

for all test functions {f}. It is easy to see (e.g. using Exercise 4(vi)) that this defines a distribution {\lambda h}, and that this operation is compatible with existing definitions of products between a locally integrable function (or Radon measure) with a smooth function. It is important that {h} is smooth (and not merely, say, continuous) because one needs the product of a test function {f} with {\overline{h}} to still be a test function.

Exercise 16 Let {d=1}. Establish the identity

\displaystyle  \delta f = f(0) \delta

for any smooth function {f}. In particular,

\displaystyle  \delta x = 0

where we abuse notation slightly and write {x} for the identity function {x \mapsto x}. Conversely, if {\lambda} is a distribution such that

\displaystyle  \lambda x = 0,

show that {\lambda} is a constant multiple of {\delta}. (Hint: Use the identity {f(x) = f(0) + x \int_0^1 f'(tx)\ dt} to write {f(x)} as the sum of {f(0) \psi} and {x} times a test function for any test function {f}, where {\psi} is a fixed test function equalling {1} at the origin.)

Remark 1 Even though distributions are not, strictly speaking, functions, it is often useful heuristically to view them as such, thus for instance one might write a distributional identity such as {\delta x = 0} suggestively as {\delta(x) x = 0}. Another useful (and rigorous) way to view such identities is to write distributions such as {\delta} as a limit of approximations to the identity {\psi_n}, and show that the relevant identity becomes true in the limit; thus, for instance, to show that {\delta x = 0}, one can show that {\psi_n x \rightarrow 0} in the sense of distributions as {n \rightarrow \infty}. (In fact, {\psi_n x} converges to zero in the {L^1} norm.)

Exercise 17 Let {d=1}. With the distribution {\hbox{p.v.} \frac{1}{x}} from Exercise 12, show that {(\hbox{p.v.} \frac{1}{x}) x} is equal to {1}. With the distributions {\lambda_r} from Exercise 13, show that {\lambda_r x = \hbox{sgn}}, where {\hbox{sgn}} is the signum function.

A distribution {\lambda} is said to be supported in a closed set {K} in {\langle f, \lambda \rangle = 0} for all {f} that vanish on an open neighbourhood of {K}. The intersection of all {K} that {\lambda} is supported on is denoted {\hbox{supp}(\lambda)} and is referred to as the support of the distribution; this is the smallest closed set that {\lambda} is supported on. Thus, for instance, the Dirac delta function is supported on {\{0\}}, as are all derivatives of that function. (Note here that it is important that {f} vanish on a neighbourhood of {K}, rather than merely vanishing on {K} itself; for instance, in one dimension, there certainly exist test functions {f} that vanish at {0} but nevertheless have a non-zero inner product with {\delta'}.)

Exercise 18 Show that every distribution is the limit of a sequence of compactly supported distributions (using the weak-* topology, of course). (Hint: Approximate a distribution {\lambda} by the truncated distributions {\lambda \eta_n} for some smooth cutoff functions {\eta_n} constructed using Exercise 1(iii).)

In a similar spirit, we can convolve a distribution {\lambda} by an absolutely integrable, compactly supported function {h \in L^1({\bf R}^d)}. From Fubini’s theorem we observe the formula

\displaystyle  \langle f, g*h \rangle = \langle f * \tilde h, g \rangle

for all test functions {f,g,h}, where {\tilde h(x) := \overline{h(-x)}}. Inspired by this formula, we define the convolution {\lambda * h = h*\lambda} of a distribution with an absolutely integrable, compactly supported function by the formula

\displaystyle  \langle f, \lambda*h \rangle := \langle f * \tilde h, \lambda \rangle \ \ \ \ \ (2)

for all test functions {f}. This gives a well-defined distribution {\lambda h} (thanks to Exercise 4(vii)) which is compatible with previous notions of convolution.

Example 1 One has {\delta*f=f*\delta=f} for all test functions {f}. In one dimension, we have {\delta' * f = f'} (why?), thus differentiation can be viewed as convolution with a distribution.

A remarkable fact about convolutions of two functions {f*g} is that they inherit the regularity of the smoother of the two factors {f, g} (in contrast to products {fg}, which tend to inherit the regularity of the rougher of the two factors). (This disparity can be also be seen by contrasting the identity {\nabla (f*g) = (\nabla f)*g = f * (\nabla g)} with the identity {\nabla (fg) = (\nabla f) g + f (\nabla g)}.) In the case of convolving distributions with test functions, this phenomenon is manifested as follows:

Lemma 2 Let {\lambda \in C^\infty_c({\bf R}^d)^*} be a distribution, and let {h \in C^\infty_c({\bf R}^d)} be a test function. Then {\lambda*h} is equal to a smooth function.

Proof: If {\lambda} were itself a smooth function, then one could easily verify the identity

\displaystyle  \lambda * h(x) = \overline{\langle h_x, \lambda\rangle} \ \ \ \ \ (3)

where {h_x(y) := \overline{h}(x-y)}. As {h} is a test function, it is easy to see that {h_x} varies smoothly in {x} in any {C^k} norm (indeed, it has Taylor expansions to any order in such norms) and so the right-hand side is a smooth function of {x}. So it suffices to verify the identity (3). As distributions are defined against test functions {f}, it suffices to show that

\displaystyle  \langle f, \lambda*h \rangle = \int_{{\bf R}^d} f(x) \langle h_x, \lambda \rangle\ dx.

On the other hand, we have from (2) that

\displaystyle  \langle f, \lambda*h \rangle = \langle f * \tilde h, \lambda \rangle = \langle \int_{{\bf R}^d} f(x) h_x\ dx, \lambda \rangle.

So the only issue is to justify the interchange of integral and inner product:

\displaystyle  \int_{{\bf R}^d} f(x) \langle h_x, \lambda \rangle\ dx = \langle \int_{{\bf R}^d} f(x) h_x\ dx, \lambda \rangle.

Certainly, (from the compact support of {f}) any Riemann sum can be interchanged with the inner product:

\displaystyle  \sum_n f(x_n) \langle h_{x_n}, \lambda \rangle \Delta x = \langle \sum_n f(x_n) h_{x_n} \Delta x, \lambda \rangle,

where {x_n} ranges over some lattice and {\Delta x} is the volume of the fundamental domain. A modification of the argument that shows convergence of the Riemann integral for smooth, compactly supported functions then works here and allows one to take limits; we omit the details. \Box

This has an important corollary:

Lemma 3 Every distribution is the limit of a sequence of test functions. In particular, {C^\infty_c({\bf R}^d)} is dense in {C^\infty_c({\bf R}^d)^*}.

Proof: By Exercise 18, it suffices to verify this for compactly supported distributions {\lambda}. We let {\phi_n} be a sequence of approximations to the identity. By Exercise 5(iii) and (2), we see that {\lambda * \phi_n} converges in the sense of distributions to {\lambda}. By Lemma 2, {\lambda * \phi_n} is a smooth function; as {\lambda} and {\phi_n} are both compactly supported, {\lambda*\phi_n} is compactly supported also. The claim follows. \Box

Because of this lemma, we can formalise the previous procedure of extending operations that were previously defined on test functions, to distributions, provided that these operations were continuous in distributional topologies. However, we shall continue to proceed by analogy as it requires fewer verifications in order to motivate the definition.

Exercise 19 Another consequence of Lemma 2 is that it allows one to extend the definition (2) of convolution to the case when {h} is not an integrable function of compact support, but is instead merely a distribution of compact support. Adopting this convention, show that convolution of distributions of compact support is both commutative and associative. (Hint: this can either be done directly, or by carefully taking limits using Lemma 3.)

The next operation we will introduce is that of differentiation. An integration by parts reveals the identity

\displaystyle  \langle f, \frac{\partial}{\partial x_j} g \rangle = - \langle \frac{\partial}{\partial x_j} f, g \rangle

for any test functions {f, g} and {j=1,\ldots,d}. Inspired by this, we define the (distributional) partial derivative {\frac{\partial}{\partial x_j} \lambda} of a distribution {\lambda} by the formula

\displaystyle  \langle f, \frac{\partial}{\partial x_j} \lambda \rangle := - \langle \frac{\partial}{\partial x_j} f, \lambda \rangle.

This can be verified to still be a distribution, and by Exercise 4(vi), the operation of differentiation is a continuous one on distributions. More generally, given any linear differential operator {P} with smooth coefficients, one can define {P\lambda} for a distribution {\lambda} by the formula

\displaystyle  \langle f, P\lambda \rangle := \langle P^* f, \lambda \rangle

where {P^*} is the adjoint differential operator {P}, which can be defined implicitly by the formula

\displaystyle  \langle f, P g \rangle = \langle P^* f, g \rangle

for test functions {f,g}, or more explicitly by replacing all coefficients with complex conjugates, replacing each partial derivative {\frac{\partial}{\partial x_j}} with its negative, and reversing the order of operations (thus for instance the adjoint of the first-order operator {a(x) \frac{d}{dx}: f \mapsto af'} would be {-\frac{d}{dx} a(x): f \mapsto - (af)'}).

Example 2 The distribution {\delta'} defined in Exercise 11 is the derivative {\frac{d}{dx} \delta} of {\delta}, as defined by the above formula.

Many of the identities one is used to in classical calculus extend to the distributional setting (as one would already expect from Lemma 3). For instance:

Exercise 20 (Product rule) Let {\lambda \in C^\infty_c({\bf R}^d)^*} be a distribution, and let {f: {\bf R}^d \rightarrow {\bf C}} be smooth. Show that

\displaystyle \frac{\partial}{\partial x_j}(\lambda f) = (\frac{\partial}{\partial x_j} \lambda) f + \lambda (\frac{\partial}{\partial x_j} f)

for all {j=1,\ldots,d}.

Exercise 21 Let {d=1}. Show that {\delta' x = - \delta} in three different ways:

  • Directly from the definitions;
  • using the product rule;
  • Writing {\delta} as the limit of approximations {\psi_n} to the identity.

Exercise 22 Let {d=1}.

  • (i) Show that if {\lambda} is a distribution and {n \geq 1} is an integer, then {\lambda x^n = 0} if and only if is a linear combination of {\delta} and its first {n-1} derivatives {\delta', \delta'', \ldots, \delta^{(n-1)}}.
  • (ii) Show that a distribution {\lambda} is supported on {\{0\}} if and only if it is a linear combination of {\delta} and finitely many of its derivatives.
  • (iii) Generalise (ii) to the case of general dimension {d} (where of course one now uses partial derivatives instead of derivatives).

Exercise 23 Let {d=1}.

  • Show that the derivative of the Heaviside function {1_{[0,+\infty)}} is equal to {\delta}.
  • Show that the derivative of the signum function {\hbox{sgn}(x)} is equal to {2\delta}.
  • Show that the derivative of the locally integrable function {\log |x|} is equal to {\hbox{p.v.} \frac{1}{x}}.
  • Show that the derivative of the locally integrable function {\log |x| \hbox{sgn}(x)} is equal to the distribution {\lambda_1} from Exercise 13.
  • Show that the derivative of the locally integrable function {|x|} is the locally integrable function {\hbox{sgn}(x)}.

If a locally integrable function has a distributional derivative which is also a locally integrable function, we refer to the latter as the weak derivative of the former. Thus, for instance, the weak derivative of {|x|} is {\hbox{sgn}(x)} (as one would expect), but {\hbox{sgn}(x)} does not have a weak derivative (despite being (classically) differentiable almost everywhere), because the distributional derivative {2\delta} of this function is not itself a locally integrable function. Thus weak derivatives differ in some respects from their classical counterparts, though of course the two concepts agree for smooth functions.

Exercise 24 Let {d \geq 1}. Show that for any {1 \leq i,j \leq d}, and any distribution {\lambda \in C^\infty_c({\bf R}^d)^*}, we have {\frac{\partial}{\partial x_i} \frac{\partial}{\partial x_j} \lambda = \frac{\partial}{\partial x_j} \frac{\partial}{\partial x_i} \lambda}, thus weak derivatives commute with each other. (This is in contrast to classical derivatives, which can fail to commute for non-smooth functions; for instance, {\frac{\partial}{\partial x} \frac{\partial}{\partial y} \frac{xy^3}{x^2 + y^2} \neq \frac{\partial}{\partial y} \frac{\partial}{\partial x} \frac{xy^3}{x^2 + y^2}} at the origin {(x,y)=0}, despite both derivatives being defined. More generally, weak derivatives tend to be less pathological than classical derivatives, but of course the downside is that weak derivatives do not always have a classical interpretation as a limit of a Newton quotient.)

Exercise 25 Let {d=1}, and let {k \geq 0} be an integer. Let us say that a compactly supported distribution {\lambda \in C^\infty_c({\bf R})^*} has of order at most {k} if the functional {f \mapsto \langle f,\lambda \rangle} is continuous in the {C^k} norm. Thus, for instance, {\delta} has order at most {0}, and {\delta'} has order at most {1}, and every compactly supported distribution is of order at most {k} for some sufficiently large {k}.

  • Show that if {\lambda} is a compactly supported distribution of order at most {0}, then it is a compactly supported Radon measure.
  • Show that if {\lambda} is a compactly supported distribution of order at most {k}, then {\lambda'} has order at most {k+1}.
  • Conversely, if {\lambda} is a compactly supported distribution of order {k+1}, then we can write {\lambda = \rho' + \nu} for some compactly supported distributions of order {k}. (Hint: one has to “dualise” the fundamental theorem of calculus, and then apply smooth cutoffs to recover compact support.)
  • Show that every compactly supported distribution can be expressed as a finite linear combination of (distributional) derivatives of compactly supported Radon measures.
  • Show that every compactly supported distribution can be expressed as a finite linear combination of (distributional) derivatives of functions in {C^k_0({\bf R})}, for any fixed {k}.

We now set out some other operations on distributions. If we define the translation {\tau_x f} of a test function {f} by a shift {x \in {\bf R}^d} by the formula {\tau_x f(y) := f(y-x)}, then we have

\displaystyle  \langle f, \tau_x g \rangle = \langle \tau_{-x} f, g \rangle

for all test functions {f,g}, so it is natural to define the translation {\tau_x \lambda} of a distribution {\lambda} by the formula

\displaystyle  \langle f, \tau_x \lambda \rangle := \langle \tau_{-x} f, \lambda \rangle.

Next, we consider linear changes of variable.

Exercise 26 (Linear changes of variable) Let {d \geq 1}, and let {L: {\bf R}^d \rightarrow {\bf R}^d} be a linear transformation. Given a distribution {\lambda \in C^\infty_c({\bf R}^d)^*}, let {\lambda \circ L} be the distribution given by the formula

\displaystyle  \langle f, \lambda \circ L \rangle := \frac{1}{|\det L|} \langle f \circ L^{-1}, \lambda \rangle

for all test functions {f}. (How would one motivate this formula?)

  • Show that {\delta \circ L = \frac{1}{|\det L|} \delta} for all linear transformations {L}.
  • If {d=1}, show that {\hbox{p.v.} \frac{1}{x} \cdot L = \frac{1}{|\det L|} \hbox{p.v.} \frac{1}{x}} for all linear transformations {L}.
  • Conversely, if {d=1} and {\lambda} is a distribution such that {\lambda \cdot L = \frac{1}{|\det L|} \lambda} for all linear transformations {L}. (Hint: first show that there exists a constant {c} such that {\langle f, \lambda \rangle = c \int_0^\infty \frac{f(x)}{x}\ dx} whenever {f} is a bump function supported in {(0,+\infty)}. To show this, approximate {f} by the function

    \displaystyle \int_{-\infty}^\infty f(e^t x) \psi_n(t)\ dt = \int_0^\infty \frac{f(y)}{y} \psi_n( \log \frac{x}{y} ) 1_{x>0}\ dy

    for {\psi_n} an approximation to the identity.)

Remark 2 One can also compose distributions with diffeomorphisms. However, things become much more delicate if the map one is composing with contains stationary points; for instance, in one dimension, one cannot meaningfully make sense of {\delta(x^2)} (the composition of the Dirac delta distribution with {x \mapsto x^2}); this can be seen by first noting that for an approximation {\psi_n} to the identity, {\psi_n(x^2)} does not converge to a limit in the distributional sense.

Exercise 27 (Tensor product of distributions) Let {d, d' \geq 1} be integers. If {\lambda \in C^\infty_c({\bf R}^d)^*} and {\rho \in C^\infty_c({\bf R}^{d'})^*} are distributions, show that there is a unique distribution {\lambda \otimes \rho \in C^\infty_c({\bf R}^{d+d'})^*} with the property that

\displaystyle  \langle f \otimes g, \lambda \otimes \rho \rangle = \langle f, \lambda \rangle \langle g, \rho \rangle \ \ \ \ \ (4)

for all test functions {f \in C^\infty_c({\bf R}^d)}, {g \in C^\infty_c({\bf R}^{d'})}, where {f \otimes g: C^\infty_c({\bf R}^{d+d'})} is the tensor product {f \otimes g(x,x') := f(x) g(x')} of {f} and {g}. (Hint: like many other constructions of tensor products, this is rather intricate. One way is to start by fixing two cutoff functions {\psi, \psi'} on {{\bf R}^d, {\bf R}^{d'}} respectively, and define {\lambda \otimes \rho} on modulated test functions {e^{2\pi i \xi \cdot x} e^{2\pi i \xi' \cdot x} \psi(x) \psi'(x')} for various frequencies {\xi, \xi'}, and then use Fourier series to define {\lambda \otimes \rho} on {F(x,x') \psi(x) \psi'(x')} for smooth {F}. Then show that these definitions of {\lambda \otimes \rho} are compatible for different choices of {\psi, \psi'} and can be glued together to form a distribution; finally, go back and verify (4).)

We close this section with one caveat. Despite the many operations that one can perform on distributions, there are two types of operations which cannot, in general, be defined on arbitrary distributions (at least while remaining in the class of distributions):

  • Nonlinear operations (e.g. taking the absolute value of a distribution); or
  • Multiplying a distribution by anything rougher than a smooth function.

Thus, for instance, there is no meaningful way to interpret the square {\delta^2} of the Dirac delta function as a distribution. This is perhaps easiest to see using an approximation {\psi_n} to the identity: {\psi_n} converges to {\delta} in the sense of distributions, but {\psi_n^2} does not converge to anything (the integral against a test function that does not vanish at the origin will go to infinity as {n \rightarrow \infty}). For similar reasons, one cannot meaningfully interpret the absolute value {|\delta'|} of the derivative of the delta function. (One also cannot multiply {\delta} by {\hbox{sgn}(x)} – why?)

Exercise 28 Let {X} be a normed vector space which contains {C^\infty_c({\bf R}^d)} as a dense subspace (and such that the inclusion of {C^\infty_c({\bf R}^d)} to {X} is continuous). The adjoint (or transpose) of this inclusion map is then an injection from {X^*} to the space of distributions {C^\infty_c({\bf R}^d)^*}; thus {X^*} can be viewed as a subspace of the space of distributions.

  • Show that the closed unit ball in {X^*} is also closed in the space of distributions.
  • Conclude that any distributional limit of a bounded sequence in {L^p({\bf R}^d)} for {1 < p \leq \infty}, is still in {L^p({\bf R}^d)}.
  • Show that the previous claim fails for {L^1({\bf R}^d)}, but holds for the space {M({\bf R}^d)} of finite measures.

— 3. Tempered distributions —

The list of operations one can define on distributions has one major omission – the Fourier transform {{\mathcal F}}. Unfortunately, one cannot easily define the Fourier transform for all distributions. One can see this as follows. From Plancherel’s theorem one has the identity

\displaystyle  \langle f, {\mathcal F} g \rangle = \langle {\mathcal F}^* f, g \rangle

for test functions {f, g}, so one would like to define the Fourier transform {{\mathcal F} \lambda = \hat \lambda} of a distribution {\lambda} by the formula

\displaystyle  \langle f, {\mathcal F} \lambda \rangle := \langle {\mathcal F}^* f, \lambda \rangle. \ \ \ \ \ (5)

Unfortunately this does not quite work, because the adjoint Fourier transform {{\mathcal F}^*} of a test function is not a test function, but is instead just a Schwartz function. (Indeed, by Exercise 46 of Notes 2, it is not possible to find a non-trivial test function whose Fourier transform is again a test function.) To address this, we need to work with a slightly smaller space than that of all distributions, namely those of tempered distributions:

Definition 4 (Tempered distributions) A tempered distribution is a continuous linear functional {\lambda: f \mapsto \langle f, \lambda \rangle} on the Schwartz space {{\mathcal S}({\bf R}^d)} (with the topology given by Exercise 25 of Notes 2), i.e. an element of {{\mathcal S}({\bf R}^d)^*}.

Since {C^\infty_c({\bf R}^d)} embeds continuously into {{\mathcal S}({\bf R}^d)} (with a dense image), we see that the space of tempered distributions can be embedded into the space of distributions. However, not every distribution is tempered:

Example 3 The distribution {e^x} is not tempered. Indeed, if {\psi} is a bump function, observe that the sequence of functions {e^{-n} \psi(x-n)} converges to zero in the Schwartz space topology, but {\langle e^{-n} \psi(x-n), e^x \rangle} does not go to zero, and so this distribution does not correspond to a tempered distribution.

On the other hand, distributions which avoid this sort of exponential growth, and instead only grow polynomially, tend to be tempered:

Exercise 29 Show that any Radon measure {\mu} which is of polynomial growth in the sense that {|\mu|(B(0,R)) \leq C R^k} for all {R \geq 1} and some constants {C, k > 0}, where {B(0,R)} is the ball of radius {R} centred at the origin in {{\bf R}^d}, is tempered.

Remark 3 As a zeroth approximation, one can roughly think of “tempered” as being synonymous with “polynomial growth”. However, this is not strictly true: for instance, the (weak) derivative of a function of polynomial growth will still be tempered, but need not be of polynomial growth (for instance, the derivative {e^x \cos(e^x)} of {\sin(e^x)} is a tempered distribution, despite having exponential growth). While one can eventually describe which distributions are tempered by measuring their “growth” in both physical space and in frequency space, we will not do so here.

Most of the operations that preserve the space of distributions, also preserve the space of tempered distributions. For instance:

Exercise 30

  • Show that any derivative of a tempered distribution is again a tempered distribution.
  • Show that and any convolution of a tempered distribution with a compactly supported distribution is again a tempered distribution.
  • Show that if {f} is a measurable function which is rapidly decreasing in the sense that {|x|^k f(x)} is an {L^\infty({\bf R}^d)} function for each {k=0,1,2,\ldots}, then a convolution of a tempered distribution with {f} can be defined, and is again a tempered distribution.
  • Show that if {f} is a smooth function such that {f} and all its derivatives have at most polynomial growth (thus for each {j \geq 0} there exists {C, k \geq 0} such that {|\nabla^j f(x)| \leq C (1 + |x|)^k} for all {x \in {\bf R}^d}) then the product of a tempered distribution with {f} is again a tempered distribution. Give a counterexample to show that this statement fails if the polynomial growth hypotheses are dropped.
  • Show that the translate of a tempered distribution is again a tempered distribution.

But we can now add a new operation to this list using (5): as the Fourier transform {{\mathcal F}} maps Schwartz functions continuously to Schwartz functions, it also continuously maps the space of tempered distributions to itself. One can also define the inverse Fourier transform {{\mathcal F}^* = {\mathcal F}^{-1}} on tempered distributions in a similar manner.

It is not difficult to extend many of the properties of the Fourier transform from Schwartz functions to distributions. For instance:

Exercise 31 Let {\lambda \in {\mathcal S}({\bf R}^d)^*} be a tempered distribution, and let {f \in {\mathcal S}({\bf R}^d)} be a Schwartz function.

  • (Inversion formula) Show that {{\mathcal F}^* {\mathcal F} \lambda = {\mathcal F} {\mathcal F}^* \lambda = \lambda}.
  • (Multiplication intertwines with convolution) Show that {{\mathcal F}(\lambda f) = ({\mathcal F} \lambda) * ({\mathcal F} f)} and {{\mathcal F}(\lambda * f) = ({\mathcal F} \lambda) ({\mathcal F} f)}.
  • (Translation intertwines with modulation) For any {x_0 \in {\bf R}^d}, show that {{\mathcal F}(\tau_{x_0} \lambda) = e_{-x_0} {\mathcal F} \lambda}, where {e_{-x_0}(\xi) := e^{-2\pi i \xi \cdot x_0}}. Similarly, show that for any {\xi_0 \in {\bf R}^d}, one has {{\mathcal F}( e_{\xi_0} \lambda ) = \tau_{\xi_0} {\mathcal F} \lambda}.
  • (Linear transformations) For any invertible linear transformation {L: {\bf R}^d \rightarrow {\bf R}^d}, show that {{\mathcal F}(\lambda \circ L) = \frac{1}{|\det L|} ({\mathcal F} \lambda) \circ (L^*)^{-1}}.
  • (Differentiation intertwines with polynomial multiplication) For any {1 \leq j \leq d}, show that {{\mathcal F}( \frac{\partial}{\partial x_j} \lambda ) = 2\pi i \xi_j {\mathcal F} \lambda}, where {x_j} and {\xi_j} is the {j^{th}} coordinate function in physical space and frequency space respectively, and similarly {{\mathcal F}( - 2\pi i x_j \lambda ) = \frac{\partial}{\partial \xi_j} {\mathcal F} \lambda}.

Exercise 32 Let {d \geq 1}.

  • (Inversion formula) Show that {{\mathcal F} \delta = 1} and {{\mathcal F} 1 = \delta}.
  • (Orthogonality) Let {V} be a subspace of {{\bf R}^d}, and let {\mu} be Lebesgue measure on {V}. Show that {{\mathcal F} \mu} is Lebesgue measure on the orthogonal complement {V^\perp} of {V}. (Note that this generalises the previous exercise.)
  • (Poisson summation formula) Let {\sum_{k \in {\bf Z}^d} \tau_k \delta} be the distribution

    \displaystyle  \langle f, \sum_{k \in {\bf Z}^d} \tau_k \delta\rangle := \sum_{k \in {\bf Z}^d} f(k).

    Show that this is a tempered distribution which is equal to its own Fourier transform.

One can use these properties of tempered distributions to start solving constant-coefficient PDE. We first illustrate this by an ODE example, showing how the formal symbolic calculus for solving such ODE that you may have seen as an undergraduate, can now be (sometimes) justified using tempered distributions.

Exercise 33 Let {d=1}, let {a,b} be real numbers, and let {D} be the operator {D = \frac{d}{dx}}.

  • If {a \neq b}, use the Fourier transform to show that all tempered distribution solutions to the ODE {(D-ia)(D-ib) \lambda = 0} are of the form {\lambda = A e^{iax} + B e^{ibx}} for some constants {A, B}.
  • If {a = b}, show that all tempered distribution solutions to the ODE {(D-ia)(D-ib) \lambda = 0} are of the form {\lambda = A e^{iax} + B x e^{iax}} for some constants {A, B}.

Remark 4 More generally, one can solve any homogeneous constant-coefficient ODE using tempered distributions and the Fourier transform so long as the roots of the characteristic polynomial are purely imaginary. In all other cases, solutions can grow exponentially as {x \rightarrow +\infty} or {x \rightarrow -\infty} and so are not tempered. There are other theories of generalised functions that can handle these objects (e.g. hyperfunctions) but we will not discuss them here.

Now we turn to PDE. To illustrate the method, let us focus on solving Poisson’s equation

\displaystyle  \Delta u = f \ \ \ \ \ (6)

in {{\bf R}^d}, where {f} is a Schwartz function and {u} is a distribution, where {\Delta = \sum_{j=1}^d \frac{\partial^2}{\partial x_j^2}} is the Laplacian. (In some texts, particularly those using spectral analysis, the Laplacian is occasionally defined instead as {-\sum_{j=1}^d \frac{\partial^2}{\partial x_j^2}}, to make it positive semi-definite, but we will eschew that sign convention here, though of course the theory is only changed in a trivial fashion if one adopts it.)

We first settle the question of uniqueness:

Exercise 34 Let {d \geq 1}. Using the Fourier transform, show that the only tempered distributions {\lambda \in {\mathcal S}({\bf R}^d)^*} which are harmonic (by which we mean that {\Delta \lambda = 0} in the sense of distributions) are the harmonic polynomials. (Hint: use Exercise 22.) Note that this generalises Liouville’s theorem. There are of course many other harmonic functions than the harmonic polynomials, e.g. {e^x \cos(y)}, but such functions are not tempered distributions.

From the above exercise, we know that the solution {u} to (6), if tempered, is defined up to harmonic polynomials. To find a solution, we observe that it is enough to find a fundamental solution, i.e. a tempered distribution {K} solving the equation

\displaystyle  \Delta K = \delta.

Indeed, if one then convolves this equation with the Schwartz function {f}, and uses the identity {(\Delta K) * f = \Delta(K*f)} (which can either be seen directly, or by using Exercise 31), we see that {u=K*f} will be a tempered distribution solution to (6) (and all the other solutions will equal this solution plus a harmonic polynomial). So, it is enough to locate a fundamental solution {K}. We can take Fourier transforms and rewrite this equation as

\displaystyle  -4\pi^2 |\xi|^2 \hat K(\xi) = 1

(here we are treating the tempered distribution {\hat K} as a function to emphasise that the dependent variable is now {\xi}). It is then natural to propose to solve this equation as

\displaystyle  \hat K(\xi) = \frac{1}{-4\pi^2 |\xi|^2}, \ \ \ \ \ (7)

though this may not be the unique solution (for instance, one is free to modify {K} by a multiple of the Dirac delta function, cf. Exercise 16).

A short computation in polar coordinates shows that {\frac{1}{-4\pi^2 |\xi|^2}} is locally integrable in dimensions {d \geq 3}, so the right-hand side of (7) makes sense. To then compute {K} explicitly, we have from the distributional inversion formula that

\displaystyle  K = \frac{-1}{4\pi^2} {\mathcal F}^* |\xi|^{-2}

so we now need to figure out what the Fourier transform of a negative power of {|x|} (or the adjoint Fourier transform of a negative power of {|\xi|}) is.

Let us work formally at first, and consider the problem of computing the Fourier transform of the function {|x|^{-\alpha}} in {{\bf R}^d} for some exponent {\alpha}. A direct attack, based on evaluating the (formal) Fourier integral

\displaystyle  \widehat{|x|^{-\alpha}}(\xi) = \int_{{\bf R}^d} |x|^{-\alpha} e^{-2\pi i \xi \cdot x}\ dx \ \ \ \ \ (8)

does not seem to make much sense (the integral is not absolutely integrable), although a change of variables (or dimensional analysis) heuristic can at least lead to the prediction that the integral (8) should be some multiple of {|\xi|^{\alpha-d}}. But which multiple should it be? To continue the formal calculation, we can write the non-integrable function {|x|^{-\alpha}} as an average of integrable functions whose Fourier transforms are already known. There are many such functions that one could use here, but it is natural to use Gaussians, as they have a particularly pleasant Fourier transform, namely

\displaystyle  \widehat{ e^{-\pi t^2 |x|^2} }(\xi) = t^{d} e^{-\pi |\xi|^2 / t^2}

for {t>0} (see Exercise 42 of Notes 2). To get from Gaussians to {|x|^{-\alpha}}, one can observe that {|x|^{-\alpha}} is invariant under the scaling {f(x) \mapsto t^{\alpha} f(tx)} for {t>0}. Thus, it is natural to average the standard Gaussian {e^{-\pi |x|^2}} with respect to this scaling, thus producing the function {t^{\alpha} e^{-\pi t^2 |x|^2}}, then integrate with respect to the multiplicative Haar measure {\frac{dt}{t}}. A straightforward change of variables then gives the identity

\displaystyle  \int_0^\infty t^{\alpha} e^{-\pi t^2 |x|^2} \frac{dt}{t} = \frac{1}{2} \pi^{-\alpha/2} |x|^{-\alpha} \Gamma(\alpha/2)

where

\displaystyle  \Gamma(s) := \int_0^\infty t^s e^{-t} \frac{dt}{t}

is the Gamma function. If we formally take Fourier transforms of this identity, we obtain

\displaystyle  \int_0^\infty t^{\alpha} t^{-d} e^{-\pi |x|^2/t^2} \frac{dt}{t} = \frac{1}{2} \pi^{-\alpha/2} \widehat{|x|^{-\alpha}}(\xi) \Gamma(\alpha/2).

Another change of variables shows that

\displaystyle  \int_0^\infty t^{\alpha} t^{-d} e^{-\pi |x|^2/t^2} \frac{dt}{t} = \frac{1}{2} \pi^{-(d-\alpha)/2} |\xi|^{-(d-\alpha)} \Gamma((d-\alpha)/2)

and so we conclude (formally) that

\displaystyle  \widehat{|x|^{-\alpha}}(\xi) = \frac{\pi^{-(d-\alpha)/2} \Gamma((d-\alpha)/2)}{\pi^{-\alpha/2} \Gamma(\alpha/2)} |\xi|^{-(d-\alpha)} \ \ \ \ \ (9)

thus solving the problem of what the constant multiple of {|\xi|^{-(d-\alpha)}} should be.

Exercise 35 Give a rigorous proof of (9) for {0 < \alpha < d} (when both sides are locally integrable) in the sense of distributions. (Hint: basically, one needs to test the entire formal argument against an arbitrary Schwartz function.) The identity (9) can in fact be continued meromorphically in {\alpha}, but the interpretation of distributions such as {|x|^{-\alpha}} when {|x|^{-\alpha}} is not locally integrable is somewhat complicated (cf. Exercise 12) and will not be discussed here.

Specialising back to the current situation with {d=3, \alpha=2}, and using the standard identities

\displaystyle  \Gamma(n) = (n-1)!; \quad \Gamma(\frac{1}{2}) = \sqrt{\pi}

we see that

\displaystyle  \widehat{\frac{1}{|x|^2}}(\xi) = \pi |\xi|^{-1}

and similarly

\displaystyle  {\mathcal F}^* \frac{1}{|\xi|^2} = \pi |x|^{-1}

and so from (7) we see that one choice of the fundamental solution {K} is the Newton potential

\displaystyle  K = \frac{-1}{4\pi |x|},

leading to an explicit (and rigorously derived) solution

\displaystyle  u(x) := f*K(x) = -\frac{1}{4\pi} \int_{{\bf R}^3} \frac{f(y)}{|x-y|}\ dy \ \ \ \ \ (10)

to the Poisson equation (6) in {d=3} for Schwartz functions {f}. (This is not quite the only fundamental solution {K} available; one can add a harmonic polynomial to {K}, which will end up adding a harmonic polynomial to {u}, since the convolution of a harmonic polynomial with a Schwartz function is easily seen to still be harmonic.)

Exercise 36 Without using the theory of distributions, give an alternate (and still rigorous) proof that the function {u} defined in (10) solves (6) in {d=3}.

Exercise 37

  • Show that for any {d \geq 3}, a fundamental solution {K} to the Poisson equation is given by the locally integrable function

    \displaystyle  K(x) = \frac{1}{d(d-2) \omega_d} \frac{1}{|x|^{d-2}},

    where {\omega_d = \pi^{d/2} / \Gamma(\frac{d}{2}+1)} is the volume of the unit ball in {d} dimensions.

  • Show that for {d=1}, a fundamental solution is given by the locally integrable function {K(x)=|x|/2}.
  • Show that for {d=2}, a fundamental solution is given by the locally integrable function {K(x)= \frac{1}{2\pi} \log |x|}.

This we see that for the Poisson equation, {d=2} is a “critical” dimension, requiring a logarithmic correction to the usual formula.

Similar methods can solve other constant coefficient linear PDE. We give some standard examples in the exercises below.

Exercise 38 Let {d \geq 1}. Show that a smooth solution {u: {\bf R}^+ \times {\bf R}^d \rightarrow {\bf C}} to the heat equation {\partial_t u = \Delta u} with initial data {u(0,x) = f(x)} for some Schwartz function {f} is given by {u(t) = f * K_t} for {t>0}, where {K_t} is the heat kernel

\displaystyle  K_t(x) = \frac{1}{(4\pi t)^{d/2}} e^{-|x-y|^2/4t}.

(This solution is unique assuming certain smoothness and decay conditions at infinity, but we will not pursue this issue here.)

Exercise 39 Let {d \geq 1}. Show that a smooth solution {u: {\bf R} \times {\bf R}^d \rightarrow {\bf C}} to the Schrödinger equation {\partial_t u = i \Delta u} with initial data {u(0,x) = f(x)} for some Schwartz function {f} is given by {u(t) = f * K_t} for {t \neq 0}, where {K_t} is the Schrödinger kernel

\displaystyle  K_t(x) = \frac{1}{(4\pi i t)^{d/2}} e^{i|x-y|^2/4t}

and we use the standard branch of the complex logarithm (with cut on the negative real axis) to define {(4\pi i t)^{d/2}}. (Hint: You may wish to investigate the Fourier transform of {e^{-z|\xi|^2}}, where {z} is a complex number with positive real part, and then let {z} approach the imaginary axis.) (The close similarity with the heat kernel is a manifestation of Wick rotation in action. However, from an analytical viewpoint, the two kernels are very different. For instance, the convergence of {f*K_t} to {f} as {t \rightarrow 0} follows in the heat kernel case by the theory of approximations to the identity, whereas the convergence in the Schrödinger case is much more subtle, and is best seen via Fourier analysis.)

Exercise 40 Let {d=3}. Show that a smooth solution {u: {\bf R} \times {\bf R}^3 \rightarrow {\bf C}} to the wave equation {-\partial_{tt} u + \Delta u} with initial data {u(0,x) = f(x), \partial_t u(0,x) = g(x)} for some Schwartz functions {f} is given by the formula

\displaystyle  u(t) = f * \partial_t K_t + g * K_t

for {t \neq 0}, where {K_t} is the distribution

\displaystyle  \langle f, K_t \rangle := \frac{t}{4\pi} \int_{S^2} f(t\omega)\ d\omega

where {\omega} is Lebesgue measure on the sphere {S^2}, and the derivative {\partial_t K_t} is defined in the Newtonian sense {\lim_{dt \rightarrow 0} \frac{K_{t+dt}-K_t}{dt}}, with the limit taken in the sense of distributions.

Remark 5 The theory of (tempered) distributions is also highly effective for studying variable coefficient linear PDE, especially if the coefficients are fairly smooth, and particularly if one is primarily interested in the singularities of solutions to such PDE and how they propagate; here the Fourier transform must be augmented with more general transforms of this type, such as Fourier integral operators. A classic reference for this topic is the four volumes of Hörmander’s “The analysis of linear partial differential operators”. For nonlinear PDE, subspaces of the space of distributions, such as Sobolev spaces, tend to be more useful.