You are currently browsing the tag archive for the ‘heat equation’ tag.

[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]

One of the most fundamental partial differential equations in mathematics is the heat equation

$\displaystyle \partial_t f = L f \ \ \ \ \ (1)$

where ${f: [0,+\infty) \times {\bf R}^n \rightarrow {\bf R}}$ is a scalar function ${(t,x) \mapsto f(t,x)}$ of both time and space, and ${L}$ is the Laplacian ${L := \frac{1}{2} \Delta = \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2}}$. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of ${\frac{1}{2}}$ in the definition of the heat propagator ${L}$ is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.

In probability theory, this equation takes on particular significance when ${f}$ is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that

$\displaystyle \int_{{\bf R}^n} f(t,x)\ dx = 1$

for all ${t}$. (Actually, it suffices to verify this constraint at time ${t=0}$, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret ${f(t,x)\ dx}$ as the probability distribution of a Brownian motion

$\displaystyle dx = dB(t) \ \ \ \ \ (2)$

where ${x = x(t) \in {\bf R}^n}$ is a stochastic process with initial probability distribution ${f(0,x)\ dx}$; see for instance this previous blog post for more discussion.

A model example of a solution to the heat equation to keep in mind is that of the fundamental solution

$\displaystyle G(t,x) = \frac{1}{(2\pi t)^{n/2}} e^{-|x|^2/2t} \ \ \ \ \ (3)$

defined for any ${t>0}$, which represents the distribution of Brownian motion of a particle starting at the origin ${x=0}$ at time ${t=0}$. At time ${t}$, ${G(t,x)}$ represents an ${{\bf R}^n}$-valued random variable, each coefficient of which is an independent random variable of mean zero and variance ${t}$. (As ${t \rightarrow 0^+}$, ${G(t)}$ converges in the sense of distributions to a Dirac mass at the origin.)

The heat equation can also be viewed as the gradient flow for the Dirichlet form

$\displaystyle D(f,g) := \frac{1}{2} \int_{{\bf R}^n} \nabla f \cdot \nabla g\ dx \ \ \ \ \ (4)$

since one has the integration by parts identity

$\displaystyle \int_{{\bf R}^n} Lf(x) g(x)\ dx = \int_{{\bf R}^n} f(x) Lg(x)\ dx = - D(f,g) \ \ \ \ \ (5)$

for all smooth, rapidly decreasing ${f,g}$, which formally implies that ${L f}$ is (half of) the negative gradient of the Dirichlet energy ${D(f,f) = \frac{1}{2} \int_{{\bf R}^n} |\nabla f|^2\ dx}$ with respect to the ${L^2({\bf R}^n,dx)}$ inner product. Among other things, this implies that the Dirichlet energy decreases in time:

$\displaystyle \partial_t D(f,f) = - 2 \int_{{\bf R}^n} |Lf|^2\ dx. \ \ \ \ \ (6)$

For instance, for the fundamental solution (3), one can verify for any time ${t>0}$ that

$\displaystyle D(G,G) = \frac{n}{2^{n+2} \pi^{n/2}} \frac{1}{t^{(n+2)/2}} \ \ \ \ \ (7)$

(assuming I have not made a mistake in the calculation). In a similar spirit we have

$\displaystyle \partial_t \int_{{\bf R}^n} |f|^2\ dx = - 2 D(f,f). \ \ \ \ \ (8)$

Since ${D(f,f)}$ is non-negative, the formula (6) implies that ${\int_{{\bf R}^n} |Lf|^2\ dx}$ is integrable in time, and in particular we see that ${Lf}$ converges to zero as ${t \rightarrow \infty}$, in some averaged ${L^2}$ sense at least; similarly, (8) suggests that ${D(f,f)}$ also converges to zero. This suggests that ${f}$ converges to a constant function; but as ${f}$ is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in ${{\bf R}^n}$ to decay to zero in some sense as ${t \rightarrow \infty}$. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the ${L^\infty}$ norm like ${O(t^{-n/2})}$.

Since ${L1=0}$, we also observe the basic cancellation property

$\displaystyle \int_{{\bf R}^n} Lf(x) \ dx = 0 \ \ \ \ \ (9)$

for any function ${f}$.

There are other quantities relating to ${f}$ that also decrease in time under heat flow, particularly in the important case when ${f}$ is a probability measure. In this case, it is natural to introduce the entropy

$\displaystyle S(f) := \int_{{\bf R}^n} f(x) \log f(x)\ dx.$

Thus, for instance, if ${f(x)\ dx}$ is the uniform distribution on some measurable subset ${E}$ of ${{\bf R}^n}$ of finite measure ${|E|}$, the entropy would be ${-\log |E|}$. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has ${S(G) = -\frac{n}{2} \log( 2 \pi e t )}$ for any ${t>0}$, reflecting the fact that ${G(t)}$ is approximately uniformly distributed on a ball of radius ${O(\sqrt{t})}$ (and thus of measure ${O(t^{n/2})}$).

A short formal computation shows (if one assumes for simplicity that ${f}$ is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that

$\displaystyle \partial_t S(f) = \int_{{\bf R}^n} (Lf) \log f + f \frac{Lf}{f}\ dx$

$\displaystyle = \int_{{\bf R}^n} (Lf) \log f\ dx$

$\displaystyle = - D( f, \log f )$

$\displaystyle = - \frac{1}{2} \int_{{\bf R}^n} \frac{|\nabla f|^2}{f}\ dx$

$\displaystyle = - 4D( g, g )$

where ${g := \sqrt{f}}$ is the square root of ${f}$. For instance, if ${f}$ is the fundamental solution (3), one can check that ${D(g,g) = \frac{n}{8t}}$ (note that this is a significantly cleaner formula than (7)!).

In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.

Actually, one can say more: the rate of decrease ${4D(g,g)}$ of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have

$\displaystyle L \phi(f) = \phi'(f) Lf + \frac{1}{2} \phi''(f) |\nabla f|^2, \ \ \ \ \ (10)$

valid for for any smooth function ${\phi: {\bf R} \rightarrow {\bf R}}$, we see from (1) that

$\displaystyle 2 g \partial_t g = 2 g L g + |\nabla g|^2$

and thus (again assuming that ${f}$, and hence ${g}$, is strictly positive to avoid technicalities)

$\displaystyle \partial_t g = Lg + \frac{|\nabla g|^2}{2g}.$

We thus have

$\displaystyle \partial_t D(g,g) = 2 D(g,Lg) + D(g, \frac{|\nabla g|^2}{g} ).$

It is now convenient to compute using the Einstein summation convention to hide the summation over indices ${i,j = 1,\ldots,n}$. We have

$\displaystyle 2 D(g,Lg) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) (\partial_i \partial_j \partial_j g)\ dx$

and

$\displaystyle D(g, \frac{|\nabla g|^2}{g} ) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) \partial_i \frac{\partial_j g \partial_j g}{g}\ dx.$

By integration by parts and interchanging partial derivatives, we may write the first integral as

$\displaystyle 2 D(g,Lg) = - \frac{1}{2} \int_{{\bf R}^n} (\partial_i \partial_j g) (\partial_i \partial_j g)\ dx,$

and from the quotient and product rules, we may write the second integral as

$\displaystyle D(g, \frac{|\nabla g|^2}{g} ) = \int_{{\bf R}^n} \frac{(\partial_i g) (\partial_j g) (\partial_i \partial_j g)}{g} - \frac{(\partial_i g) (\partial_j g) (\partial_i g) (\partial_j g)}{2g^2}\ dx.$

Gathering terms, completing the square, and making the summations explicit again, we see that

$\displaystyle \partial_t D(g,g) =- \frac{1}{2} \int_{{\bf R}^n} \frac{\sum_{i=1}^n \sum_{j=1}^n |g \partial_i \partial_j g - (\partial_i g) (\partial_j g)|^2}{g^2}\ dx$

and so in particular ${D(g,g)}$ is always decreasing.

The above identity can also be written as

$\displaystyle \partial_t D(g,g) = - \frac{1}{2} \int_{{\bf R}^n} |\nabla^2 \log g|^2 g^2\ dx.$

Exercise 1 Give an alternate proof of the above identity by writing ${f = e^{2u}}$, ${g = e^u}$ and deriving the equation ${\partial_t u = Lu + |\nabla u|^2}$ for ${u}$.

It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.

Let ${L: H \rightarrow H}$ be a self-adjoint operator on a finite-dimensional Hilbert space ${H}$. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence ${\lambda_1,\ldots,\lambda_n \in {\bf R}}$ of eigenvalues and an orthonormal basis ${e_1,\ldots,e_n}$ of eigenfunctions such that ${L e_i = \lambda_i e_i}$ for all ${i=1,\ldots,n}$. In particular, given any function ${m: \sigma(L) \rightarrow {\bf C}}$ on the spectrum ${\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}}$ of ${L}$, one can then define the linear operator ${m(L): H \rightarrow H}$ by the formula

$\displaystyle m(L) e_i := m(\lambda_i) e_i,$

which then gives a functional calculus, in the sense that the map ${m \mapsto m(L)}$ is a ${C^*}$-algebra isometric homomorphism from the algebra ${BC(\sigma(L) \rightarrow {\bf C})}$ of bounded continuous functions from ${\sigma(L)}$ to ${{\bf C}}$, to the algebra ${B(H \rightarrow H)}$ of bounded linear operators on ${H}$. Thus, for instance, one can define heat operators ${e^{-tL}}$ for ${t>0}$, Schrödinger operators ${e^{itL}}$ for ${t \in {\bf R}}$, resolvents ${\frac{1}{L-z}}$ for ${z \not \in \sigma(L)}$, and (if ${L}$ is positive) wave operators ${e^{it\sqrt{L}}}$ for ${t \in {\bf R}}$. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with ${L}$ positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

$\displaystyle u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)$

the Schrödinger equation

$\displaystyle u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)$

the wave equation

$\displaystyle u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)$

or the Helmholtz equation

$\displaystyle (L-z) u = f. \ \ \ \ \ (4)$

The functional calculus can also be associated to a spectral measure. Indeed, for any vectors ${f, g \in H}$, there is a complex measure ${\mu_{f,g}}$ on ${\sigma(L)}$ with the property that

$\displaystyle \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);$

indeed, one can set ${\mu_{f,g}}$ to be the discrete measure on ${\sigma(L)}$ defined by the formula

$\displaystyle \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.$

One can also view this complex measure as a coefficient

$\displaystyle \mu_{f,g} = \langle \mu f, g \rangle_H$

of a projection-valued measure ${\mu}$ on ${\sigma(L)}$, defined by setting

$\displaystyle \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.$

Finally, one can view ${L}$ as unitarily equivalent to a multiplication operator ${M: f \mapsto g f}$ on ${\ell^2(\{1,\ldots,n\})}$, where ${g}$ is the real-valued function ${g(i) := \lambda_i}$, and the intertwining map ${U: \ell^2(\{1,\ldots,n\}) \rightarrow H}$ is given by

$\displaystyle U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,$

so that ${L = U M U^{-1}}$.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space ${H}$, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators ${L: H \rightarrow H}$. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators ${L: D \rightarrow H}$, which (initially, at least), are only defined on a dense subspace ${D}$ of the Hilbert space ${H}$. A very typical situation arises when ${H = L^2(\Omega)}$ is the square-integrable functions on some domain or manifold ${\Omega}$ (which may have a boundary or be otherwise “incomplete”), and ${D = C^\infty_c(\Omega)}$ are the smooth compactly supported functions on ${\Omega}$, and ${L}$ is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as ${e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}}$ or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator ${L: D \rightarrow H}$ must be imposed. The most obvious is that of symmetry, which asserts that

$\displaystyle \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)$

for all ${f, g \in D}$. In some applications, one also wants to impose positive definiteness, which asserts that

$\displaystyle \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)$

for all ${f \in D}$. These hypotheses are sufficient in the case when ${L}$ is bounded, and in particular when ${H}$ is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents ${(L-z)^{-1}}$ are well-defined for any strictly complex ${z}$, which by duality implies that the image of ${L-z}$ should be dense in ${H}$. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when ${H}$ is the Hilbert space ${H := L^2((0,1))}$, ${D := C^\infty_c((0,1))}$ is the space of test functions, and ${L}$ is the one-dimensional Laplacian ${L := -\frac{d^2}{dx^2}}$. Then ${L}$ is symmetric and positive, but the operator ${L-k^2}$ does not have dense image for any complex ${k}$, since

$\displaystyle \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0$

for all test functions ${f \in C^\infty_c((0,1))}$, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when ${H := L^2((0,+\infty))}$, ${D := C^\infty_c((0,+\infty))}$, ${L}$ is the momentum operator ${L := i \frac{d}{dx}}$. Then the resolvent ${(L-z)^{-1}}$ can be uniquely defined for ${z}$ in the upper half-plane, but not in the lower half-plane, due to the obstruction

$\displaystyle \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0$

for all test functions ${f}$ (note that the function ${e^{i\bar{z} x}}$ lies in ${L^2((0,+\infty))}$ when ${z}$ is in the lower half-plane). For related reasons, the translation operators ${e^{itL}}$ have a problem with either uniqueness or existence (depending on whether ${t}$ is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once ${L}$ is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing ${L}$ on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator ${L}$, then ${L}$ is self-adjoint. Examples of “expected consequences” include:

• Existence of resolvents ${(L-z)^{-1}}$ (or equivalently, dense image for ${L-z}$);
• Existence of a contractive heat propagator semigroup ${e^{tL}}$ (in the positive case);
• Existence of a unitary Schrödinger propagator group ${e^{itL}}$;
• Existence of a unitary wave propagator group ${e^{it\sqrt{L}}}$ (in the positive case);
• Existence of a “reasonable” functional calculus.
• Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator ${L = -\Delta_g}$ on a smooth Riemannian manifold ${(M,g)}$ (using ${C^\infty_c(M)}$ as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains ${(0,1)}$ and ${(0,+\infty)}$ in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

One theme in this course will be the central nature played by the gaussian random variables ${X \equiv N(\mu,\sigma^2)}$. Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.

One way to exploit this algebraic structure is to continuously deform the variance ${t := \sigma^2}$ from an initial variance of zero (so that the random variable is deterministic) to some final level ${T}$. We would like to use this to give a continuous family ${t \mapsto X_t}$ of random variables ${X_t \equiv N(\mu, t)}$ as ${t}$ (viewed as a “time” parameter) runs from ${0}$ to ${T}$.

At present, we have not completely specified what ${X_t}$ should be, because we have only described the individual distribution ${X_t \equiv N(\mu,t)}$ of each ${X_t}$, and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.

We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of ${n \times n}$ Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of ${n \times n}$ Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.