You are currently browsing the tag archive for the ‘Bakry-Emery theory’ tag.

[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]

One of the most fundamental partial differential equations in mathematics is the heat equation

$\displaystyle \partial_t f = L f \ \ \ \ \ (1)$

where ${f: [0,+\infty) \times {\bf R}^n \rightarrow {\bf R}}$ is a scalar function ${(t,x) \mapsto f(t,x)}$ of both time and space, and ${L}$ is the Laplacian ${L := \frac{1}{2} \Delta = \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2}}$. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of ${\frac{1}{2}}$ in the definition of the heat propagator ${L}$ is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.

In probability theory, this equation takes on particular significance when ${f}$ is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that

$\displaystyle \int_{{\bf R}^n} f(t,x)\ dx = 1$

for all ${t}$. (Actually, it suffices to verify this constraint at time ${t=0}$, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret ${f(t,x)\ dx}$ as the probability distribution of a Brownian motion

$\displaystyle dx = dB(t) \ \ \ \ \ (2)$

where ${x = x(t) \in {\bf R}^n}$ is a stochastic process with initial probability distribution ${f(0,x)\ dx}$; see for instance this previous blog post for more discussion.

A model example of a solution to the heat equation to keep in mind is that of the fundamental solution

$\displaystyle G(t,x) = \frac{1}{(2\pi t)^{n/2}} e^{-|x|^2/2t} \ \ \ \ \ (3)$

defined for any ${t>0}$, which represents the distribution of Brownian motion of a particle starting at the origin ${x=0}$ at time ${t=0}$. At time ${t}$, ${G(t,x)}$ represents an ${{\bf R}^n}$-valued random variable, each coefficient of which is an independent random variable of mean zero and variance ${t}$. (As ${t \rightarrow 0^+}$, ${G(t)}$ converges in the sense of distributions to a Dirac mass at the origin.)

The heat equation can also be viewed as the gradient flow for the Dirichlet form

$\displaystyle D(f,g) := \frac{1}{2} \int_{{\bf R}^n} \nabla f \cdot \nabla g\ dx \ \ \ \ \ (4)$

since one has the integration by parts identity

$\displaystyle \int_{{\bf R}^n} Lf(x) g(x)\ dx = \int_{{\bf R}^n} f(x) Lg(x)\ dx = - D(f,g) \ \ \ \ \ (5)$

for all smooth, rapidly decreasing ${f,g}$, which formally implies that ${L f}$ is (half of) the negative gradient of the Dirichlet energy ${D(f,f) = \frac{1}{2} \int_{{\bf R}^n} |\nabla f|^2\ dx}$ with respect to the ${L^2({\bf R}^n,dx)}$ inner product. Among other things, this implies that the Dirichlet energy decreases in time:

$\displaystyle \partial_t D(f,f) = - 2 \int_{{\bf R}^n} |Lf|^2\ dx. \ \ \ \ \ (6)$

For instance, for the fundamental solution (3), one can verify for any time ${t>0}$ that

$\displaystyle D(G,G) = \frac{n}{2^{n+2} \pi^{n/2}} \frac{1}{t^{(n+2)/2}} \ \ \ \ \ (7)$

(assuming I have not made a mistake in the calculation). In a similar spirit we have

$\displaystyle \partial_t \int_{{\bf R}^n} |f|^2\ dx = - 2 D(f,f). \ \ \ \ \ (8)$

Since ${D(f,f)}$ is non-negative, the formula (6) implies that ${\int_{{\bf R}^n} |Lf|^2\ dx}$ is integrable in time, and in particular we see that ${Lf}$ converges to zero as ${t \rightarrow \infty}$, in some averaged ${L^2}$ sense at least; similarly, (8) suggests that ${D(f,f)}$ also converges to zero. This suggests that ${f}$ converges to a constant function; but as ${f}$ is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in ${{\bf R}^n}$ to decay to zero in some sense as ${t \rightarrow \infty}$. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the ${L^\infty}$ norm like ${O(t^{-n/2})}$.

Since ${L1=0}$, we also observe the basic cancellation property

$\displaystyle \int_{{\bf R}^n} Lf(x) \ dx = 0 \ \ \ \ \ (9)$

for any function ${f}$.

There are other quantities relating to ${f}$ that also decrease in time under heat flow, particularly in the important case when ${f}$ is a probability measure. In this case, it is natural to introduce the entropy

$\displaystyle S(f) := \int_{{\bf R}^n} f(x) \log f(x)\ dx.$

Thus, for instance, if ${f(x)\ dx}$ is the uniform distribution on some measurable subset ${E}$ of ${{\bf R}^n}$ of finite measure ${|E|}$, the entropy would be ${-\log |E|}$. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has ${S(G) = -\frac{n}{2} \log( 2 \pi e t )}$ for any ${t>0}$, reflecting the fact that ${G(t)}$ is approximately uniformly distributed on a ball of radius ${O(\sqrt{t})}$ (and thus of measure ${O(t^{n/2})}$).

A short formal computation shows (if one assumes for simplicity that ${f}$ is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that

$\displaystyle \partial_t S(f) = \int_{{\bf R}^n} (Lf) \log f + f \frac{Lf}{f}\ dx$

$\displaystyle = \int_{{\bf R}^n} (Lf) \log f\ dx$

$\displaystyle = - D( f, \log f )$

$\displaystyle = - \frac{1}{2} \int_{{\bf R}^n} \frac{|\nabla f|^2}{f}\ dx$

$\displaystyle = - 4D( g, g )$

where ${g := \sqrt{f}}$ is the square root of ${f}$. For instance, if ${f}$ is the fundamental solution (3), one can check that ${D(g,g) = \frac{n}{8t}}$ (note that this is a significantly cleaner formula than (7)!).

In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.

Actually, one can say more: the rate of decrease ${4D(g,g)}$ of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have

$\displaystyle L \phi(f) = \phi'(f) Lf + \frac{1}{2} \phi''(f) |\nabla f|^2, \ \ \ \ \ (10)$

valid for for any smooth function ${\phi: {\bf R} \rightarrow {\bf R}}$, we see from (1) that

$\displaystyle 2 g \partial_t g = 2 g L g + |\nabla g|^2$

and thus (again assuming that ${f}$, and hence ${g}$, is strictly positive to avoid technicalities)

$\displaystyle \partial_t g = Lg + \frac{|\nabla g|^2}{2g}.$

We thus have

$\displaystyle \partial_t D(g,g) = 2 D(g,Lg) + D(g, \frac{|\nabla g|^2}{g} ).$

It is now convenient to compute using the Einstein summation convention to hide the summation over indices ${i,j = 1,\ldots,n}$. We have

$\displaystyle 2 D(g,Lg) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) (\partial_i \partial_j \partial_j g)\ dx$

and

$\displaystyle D(g, \frac{|\nabla g|^2}{g} ) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) \partial_i \frac{\partial_j g \partial_j g}{g}\ dx.$

By integration by parts and interchanging partial derivatives, we may write the first integral as

$\displaystyle 2 D(g,Lg) = - \frac{1}{2} \int_{{\bf R}^n} (\partial_i \partial_j g) (\partial_i \partial_j g)\ dx,$

and from the quotient and product rules, we may write the second integral as

$\displaystyle D(g, \frac{|\nabla g|^2}{g} ) = \int_{{\bf R}^n} \frac{(\partial_i g) (\partial_j g) (\partial_i \partial_j g)}{g} - \frac{(\partial_i g) (\partial_j g) (\partial_i g) (\partial_j g)}{2g^2}\ dx.$

Gathering terms, completing the square, and making the summations explicit again, we see that

$\displaystyle \partial_t D(g,g) =- \frac{1}{2} \int_{{\bf R}^n} \frac{\sum_{i=1}^n \sum_{j=1}^n |g \partial_i \partial_j g - (\partial_i g) (\partial_j g)|^2}{g^2}\ dx$

and so in particular ${D(g,g)}$ is always decreasing.

The above identity can also be written as

$\displaystyle \partial_t D(g,g) = - \frac{1}{2} \int_{{\bf R}^n} |\nabla^2 \log g|^2 g^2\ dx.$

Exercise 1 Give an alternate proof of the above identity by writing ${f = e^{2u}}$, ${g = e^u}$ and deriving the equation ${\partial_t u = Lu + |\nabla u|^2}$ for ${u}$.

It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.