You are currently browsing the tag archive for the ‘Ornstein-Uhlenbeck process’ tag.

*[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]*

One of the most fundamental partial differential equations in mathematics is the heat equation

where is a scalar function of both time and space, and is the Laplacian . For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of in the definition of the heat propagator is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.

In probability theory, this equation takes on particular significance when is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that

for all . (Actually, it suffices to verify this constraint at time , as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret as the probability distribution of a Brownian motion

where is a stochastic process with initial probability distribution ; see for instance this previous blog post for more discussion.

A model example of a solution to the heat equation to keep in mind is that of the fundamental solution

defined for any , which represents the distribution of Brownian motion of a particle starting at the origin at time . At time , represents an -valued random variable, each coefficient of which is an independent random variable of mean zero and variance . (As , converges in the sense of distributions to a Dirac mass at the origin.)

The heat equation can also be viewed as the gradient flow for the Dirichlet form

since one has the integration by parts identity

for all smooth, rapidly decreasing , which formally implies that is (half of) the negative gradient of the *Dirichlet energy* with respect to the inner product. Among other things, this implies that the Dirichlet energy decreases in time:

For instance, for the fundamental solution (3), one can verify for any time that

(assuming I have not made a mistake in the calculation). In a similar spirit we have

Since is non-negative, the formula (6) implies that is integrable in time, and in particular we see that converges to zero as , in some averaged sense at least; similarly, (8) suggests that also converges to zero. This suggests that converges to a constant function; but as is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in to decay to zero in some sense as . However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the norm like .

Since , we also observe the basic cancellation property

There are other quantities relating to that also decrease in time under heat flow, particularly in the important case when is a probability measure. In this case, it is natural to introduce the *entropy*

Thus, for instance, if is the uniform distribution on some measurable subset of of finite measure , the entropy would be . Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has for any , reflecting the fact that is approximately uniformly distributed on a ball of radius (and thus of measure ).

A short formal computation shows (if one assumes for simplicity that is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that

where is the square root of . For instance, if is the fundamental solution (3), one can check that (note that this is a significantly cleaner formula than (7)!).

In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.

Actually, one can say more: the rate of decrease of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have

valid for for any smooth function , we see from (1) that

and thus (again assuming that , and hence , is strictly positive to avoid technicalities)

We thus have

It is now convenient to compute using the Einstein summation convention to hide the summation over indices . We have

and

By integration by parts and interchanging partial derivatives, we may write the first integral as

and from the quotient and product rules, we may write the second integral as

Gathering terms, completing the square, and making the summations explicit again, we see that

and so in particular is always decreasing.

The above identity can also be written as

Exercise 1Give an alternate proof of the above identity by writing , and deriving the equation for .

It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.

One theme in this course will be the central nature played by the *gaussian random variables* . Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.

One way to exploit this algebraic structure is to continuously deform the variance from an initial variance of zero (so that the random variable is deterministic) to some final level . We would like to use this to give a continuous family of random variables as (viewed as a “time” parameter) runs from to .

At present, we have not completely specified what should be, because we have only described the individual distribution of each , and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.

We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the *spectrum* of Hermitian matrices. This process is called *Dyson Brownian motion*, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.

Consider the sum of iid real random variables of finite mean and variance for some . Then the sum has mean and variance , and so (by Chebyshev’s inequality) we expect to usually have size . To put it another way, if we consider the *normalised sum*

then has been normalised to have mean zero and variance , and is thus usually of size .

In the previous set of notes, we were able to establish various tail bounds on . For instance, from Chebyshev’s inequality one has

and if the original distribution was bounded or subgaussian, we had the much stronger Chernoff bound

for some absolute constants ; in other words, the are uniformly subgaussian.

Now we look at the distribution of . The fundamental central limit theorem tells us the asymptotic behaviour of this distribution:

Theorem 1 (Central limit theorem)Let be iid real random variables of finite mean and variance for some , and let be the normalised sum (1). Then as , converges in distribution to the standard normal distribution .

Exercise 2Show that does not converge in probability or in the almost sure sense. (Hint:the intuition here is that for two very different values of , the quantities and are almost independent of each other, since the bulk of the sum is determined by those with . Now make this intuition precise.)

Exercise 3Use Stirling’s formula from Notes 0a to verify the central limit theorem in the case when is a Bernoulli distribution, taking the values and only. (This is a variant of Exercise 2 from those notes, or Exercise 2 from Notes 1. It is easy to see that once one does this, one can rescale and handle any other two-valued distribution also.)

Exercise 4Use Exercise 9 from Notes 1 to verify the central limit theorem in the case when is gaussian.

Note we are only discussing the case of real iid random variables. The case of complex random variables (or more generally, vector-valued random variables) is a little bit more complicated, and will be discussed later in this post.

The central limit theorem (and its variants, which we discuss below) are extremely useful tools in random matrix theory, in particular through the control they give on random walks (which arise naturally from linear functionals of random matrices). But the central limit theorem can also be viewed as a “commutative” analogue of various spectral results in random matrix theory (in particular, we shall see in later lectures that the *Wigner semicircle law* can be viewed in some sense as a “noncommutative” or “free” version of the central limit theorem). Because of this, the *techniques* used to prove the central limit theorem can often be adapted to be useful in random matrix theory. Because of this, we shall use these notes to dwell on several different proofs of the central limit theorem, as this provides a convenient way to showcase some of the basic methods that we will encounter again (in a more sophisticated form) when dealing with random matrices.

## Recent Comments