One theme in this course will be the central nature played by the gaussian random variables {X \equiv N(\mu,\sigma^2)}. Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.

One way to exploit this algebraic structure is to continuously deform the variance {t := \sigma^2} from an initial variance of zero (so that the random variable is deterministic) to some final level {T}. We would like to use this to give a continuous family {t \mapsto X_t} of random variables {X_t \equiv N(\mu, t)} as {t} (viewed as a “time” parameter) runs from {0} to {T}.

At present, we have not completely specified what {X_t} should be, because we have only described the individual distribution {X_t \equiv N(\mu,t)} of each {X_t}, and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.

We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of {n \times n} Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of {n \times n} Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.

— 1. Formal construction of Brownian motion —

We begin with constructing one-dimensional Brownian motion. We shall model this motion using the machinery of Wiener processes:

Definition 1 (Wiener process) Let {\mu \in {\bf R}}, and let {\Sigma \subset [0,+\infty)} be a set of times containing {0}. A (one-dimensional) Wiener process on {\Sigma} with initial position {\mu} is a collection {(X_t)_{t \in \Sigma}} of real random variables {X_t} for each time {t \in \Sigma}, with the following properties:

  • (i) {X_0 = \mu}.
  • (ii) Almost surely, the map {t \mapsto X_t} is a continuous function on {\Sigma}.
  • (iii) For every {0 \leq t_- < t_+} in {\Sigma}, the increment {X_{t_+}-X_{t_-}} has the distribution of {N( 0, t_+ - t_- )_{\bf R}}. (In particular, {X_t \equiv N(\mu,t)_{\bf R}} for every {t > 0}.)
  • (iv) For every {t_0 \leq t_1 \leq \ldots \leq t_n} in {\Sigma}, the increments {X_{t_{i}} - X_{t_{i-1}}} for {i=1,\ldots,n} are jointly independent.

If {\Sigma} is discrete, we say that {(X_t)_{t \in \Sigma}} is a discrete Wiener process; if {\Sigma = [0,+\infty)} then we say that {(X_t)_{t \in \Sigma}} is a continuous Wiener process.

Remark 1 Collections of random variables {(X_t)_{t \in \Sigma}}, where {\Sigma} is a set of times, will be referred to as stochastic processes, thus Wiener processes are a (very) special type of stochastic process.

Remark 2 In the case of discrete Wiener processes, the continuity requirement (ii) is automatic. For continuous Wiener processes, there is a minor technical issue: the event that {t \mapsto X_t} is continuous need not be a measurable event (one has to take uncountable intersections to define this event). Because of this, we interpret (ii) by saying that there exists a measurable event of probability {1}, such that {t \mapsto X_t} is continuous on all of this event, while also allowing for the possibility that {t \mapsto X_t} could also sometimes be continuous outside of this event also. One can view the collection {(X_t)_{t \in \Sigma}} as a single random variable, taking values in the product space {{\bf R}^\Sigma} (with the product {\sigma}-algebra, of course).

Remark 3 One can clearly normalise the initial position {\mu} of a Wiener process to be zero by replacing {X_t} with {X_t-\mu} for each {t}.

We shall abuse notation somewhat and identify continuous Wiener processes with Brownian motion in our informal discussion, although technically the former is merely a model for the latter. To emphasise this link with Brownian motion, we shall often denote continuous Wiener processes as {(B_t)_{t \in [0,+\infty)}} rather than {(X_t)_{t \in [0,+\infty)}}.

It is not yet obvious that Wiener processes exist, and to what extent they are unique. The situation is easily clarified though for discrete processes:

Proposition 2 (Discrete Brownian motion) Let {\Sigma} be a discrete subset of {[0,+\infty)} containing {0}, and let {\mu\in {\bf R}}. Then (after extending the sample space if necessary) there exists a Wiener process {(X_t)_{t \in \Sigma}} with base point {\mu}. Furthermore, any other Wiener process {(X'_t)_{t \in \Sigma}} with base point {\mu} has the same distribution as {\mu}.

Proof: As {\Sigma} is discrete and contains {0}, we can write it as {\{t_0,t_1,t_2,\ldots\}} for some

\displaystyle  0 = t_0 < t_1 < t_2 < \ldots.

Let {(dX_i)_{i=1}^\infty} be a collection of jointly independent random variables with {dX_i \equiv N(0,t_i-t_{i-1})_{\bf R}} (the existence of such a collection, after extending the sample space, is guaranteed by Exercise 18 of Notes 0.) If we then set

\displaystyle  X_{t_i} := \mu + dX_1 + \ldots + dX_i

for all {i=0,1,2,\ldots}, then one easily verifies (using Exercise 9 of Notes 1) that {(X_t)_{t \in \Sigma}} is a Wiener process.

Conversely, if {(X'_t)_{t \in \Sigma}} is a Wiener process, and we define {dX'_i := X'_i - X'_{i-1}} for {i=1,2,\ldots}, then from the definition of a Wiener process we see that the {dX'_i} have distribution {N(0,t_i-t_{i-1})_{\bf R}} and are jointly independent (i.e. any finite subcollection of the {dX'_i} are jointly independent). This implies for any finite {n} that the random variables {(dX_i)_{i=1}^n} and {(dX'_i)_{i=1}^n} have the same distribution, and thus {(X_t)_{t \in \Sigma'}} and {(X'_t)_{t \in \Sigma'}} have the same distribution for any finite subset {\Sigma'} of {\Sigma}. From the construction of the product {\sigma}-algebra we conclude that {(X_t)_{t \in \Sigma}} and {(X'_t)_{t \in \Sigma}} have the same distribution, as required. \Box

Now we pass from the discrete case to the continuous case.

Proposition 3 (Continuous Brownian motion) Let {\mu\in {\bf R}}. Then (after extending the sample space if necessary) there exists a Wiener process {(X_t)_{t \in [0,+\infty)}} with base point {\mu}. Furthermore, any other Wiener process {(X'_t)_{t \in [0,+\infty)}} with base point {\mu} has the same distribution as {\mu}.

Proof: The uniqueness claim follows by the same argument used to prove the uniqueness component of Proposition 2, so we just prove existence here. The iterative construction we give here is somewhat analogous to that used to create self-similar fractals, such as the Koch snowflake. (Indeed, Brownian motion can be viewed as a probabilistic analogue of a self-similar fractal.)

The idea is to create a sequence of increasingly fine discrete Brownian motions, and then to take a limit. Proposition 2 allows one to create each individual discrete Brownian motion, but the key is to couple these discrete processes together in a consistent manner.

Here’s how. We start with a discrete Wiener process {(X_t)_{t \in {\bf N}}} on the natural numbers {{\bf N} = \{0,1,2\ldots\}} with initial position {\mu}, which exists by Proposition 2. We now extend this process to the denser set of times {\frac{1}{2} {\bf N} := \{ \frac{1}{2} n: n \in {\bf N} \}} by setting

\displaystyle  X_{t+\frac{1}{2}} := \frac{X_t + X_{t+1}}{2} + Y_{t,0}

for {t=0,1,2,\ldots}, where {(Y_{t,0})_{t \in {\bf N}}} are iid copies of {N(0,1/4)_{\bf R}}, which are jointly independent of the {(X_t)_{t \in {\bf N}}}. It is a routine matter to use Exercise 9 of Notes 1 to show that this creates a discrete Wiener process {(X_t)_{t \in \frac{1}{2} {\bf N}}} on {\frac{1}{2}{\bf N}} which extends the previous process.

Next, we extend the process further to the denser set of times {\frac{1}{4} {\bf N}} by defining

\displaystyle  X_{t+\frac{1}{4}} := \frac{X_{t} + X_{t+1/2}}{2} + Y_{t,1}

where {(Y_{t,1})_{t \in \frac{1}{2}{\bf N}}} are iid copies of {N(0,1/8)_{\bf R}}, jointly independent of {(X_t)_{t \in \frac{1}{2} {\bf N}}}. Again, it is a routine matter to show that this creates a discrete Wiener process {(X_t)_{t \in \frac{1}{4} {\bf N}}} on {\frac{1}{4}{\bf N}}.

Iterating this procedure a countable number of times, we obtain a collection of discrete Wiener processes {(X_t)_{t \in \frac{1}{2^k} {\bf N}}} for {k=0,1,2,\ldots} which are consistent with each other, in the sense that the earlier processes in this collection are restrictions of later ones. (This requires a countable number of extensions of the underlying sample space, but one can capture all of these extensions into a single extension via the machinery of inverse limits of probability spaces; it is also not difficult to manually build a single extension sufficient for performing all the above constructions.)

Now we establish a Hölder continuity property. Let {\theta} be any exponent between {0} and {1/2}, and let {T > 0} be finite. Observe that for any {k = 0,1,\ldots} and any {j \in {\bf N}}, we have {X_{(j+1)/2^k} - X_{j/2^k} \equiv N(0,1/2^k)_{\bf R}} and hence (by the subgaussian nature of the normal distribution)

\displaystyle  {\bf P}( |X_{(j+1)/2^k} - X_{j/2^k}| \geq 2^{-k \theta} ) \leq C \exp( - c 2^{k(1-2\theta)} )

for some absolute constants {C, c}. The right-hand side is summable as {j,k} run over {{\bf N}} subject to the constraint {j/2^k \leq T}. Thus, by the Borel-Cantelli lemma, for each fixed {T}, we almost surely have that

\displaystyle  |X_{(j+1)/2^k} - X_{j/2^k}| \leq 2^{-k \theta}

for all but finitely many {j,k \in {\bf N}} with {j/2^k \leq T}. In particular, this implies that for each fixed {T}, the function {t \mapsto X_t} is almost surely Hölder continuous of exponent {\theta} on the dyadic rationals {j/2^k} in {[0,T]}, and thus (by the countable union bound) is almost surely locally Hölder continuous of exponent {\theta} on the dyadic rationals in {[0,+\infty)}. In particular, they are almost surely locally uniformly continuous on this domain.

As the dyadic rationals are dense in {[0,+\infty)}, we can thus almost surely extend {t \mapsto X_t} uniquely to a continuous function on all of {[0,+\infty)}. (On the remaining probability zero event, we extend {t \mapsto X_t} in some arbitrary measurable fashion.) Note that if {t_n} is any sequence in {[0,+\infty)} converging to {t}, then {X_{t_n}} converges almost surely to {X_t}, and thus also converges in probability and in distribution. Similarly for differences such as {X_{t_{+,n}} - X_{t_{-,n}}}. Using this, we easily verify that {(X_t)_{t \in [0,+\infty)}} is a continuous Wiener process, as required. \Box

Remark 4 One could also have used the Kolmogorov extension theorem to establish the limit.

Exercise 1 Let {(X_t)_{t \in [0,+\infty)}} be a continuous Wiener process. We have already seen that if {0 < \theta < 1/2}, that the map {t \mapsto X_t} is almost surely Hölder continuous of order {\theta}. Show that if {1/2 \leq \theta \leq 1}, then the map {t \mapsto X_t} is almost surely not Hölder continuous of order {\theta}.

Show also that the map {t \mapsto X_t} is almost surely nowhere differentiable. Thus, Brownian motion provides a (probabilistic) example of a continuous function which is nowhere differentiable.

Remark 5 In the above constructions, the initial position {\mu} of the Wiener process was deterministic. However, one can easily construct Wiener processes in which the initial position {X_0} is itself a random variable. Indeed, one can simply set

\displaystyle  X_t := X_0 + B_t

where {(B_t)_{t \in [0,+\infty)}} is a continuous Wiener process with initial position {0} which is independent of {X_0}. Then we see that {X_t} obeys properties (ii), (iii), (iv) of Definition 1, but the distribution of {X_t} is no longer {N(\mu,t)_{\bf R}}, but is instead the convolution of the law of {X_0}, and the law of {N(0,t)_{\bf R}}.

— 2. Connection with random walks —

We saw how to construct Brownian motion as a limit of discrete Wiener processes, which were partial sums of independent gaussian random variables. The central limit theorem (see Notes 2) allows one to interpret Brownian motion in terms of limits of partial sums of more general independent random variables, otherwise known as (independent) random walks.

Definition 4 (Random walk) Let {\Delta X} be a real random variable, let {\mu \in {\bf R}} be an initial position, and let {\Delta t > 0} be a time step. We define a discrete random walk with initial position {\mu}, time step {\Delta t} and step distribution {\Delta X} (or {\mu_{\Delta X}}) to be a process {(X_t)_{t \in \Delta t \cdot {\bf N}}} defined by

\displaystyle  X_{n \Delta t} := \mu + \sum_{i=1}^n \Delta X_{i \Delta t}

where {(\Delta X_{i \Delta t})_{i=1}^\infty} are iid copies of {\Delta X}.

Example 1 From the proof of Proposition 2, we see that a discrete Wiener process on {\Delta t \cdot {\bf N}} with initial position {\mu} is nothing more than a discrete random walk with step distribution of {N(0,\Delta t)_{\bf R}}. Another basic example is simple random walk, in which {\Delta X} is equal to {(\Delta t)^{1/2}} times a signed Bernoulli variable, thus we have {X_{(n+1) \Delta t} = X_{n\Delta t} \pm (\Delta t)^{1/2}}, where the signs {\pm} are unbiased and are jointly independent in {n}.

Exercise 2 (Central limit theorem) Let {X} be a real random variable with mean zero and variance {1}, and let {\mu \in {\bf R}}. For each {\Delta t > 0}, let {(X^{(\Delta t)}_t)_{t \in [0,+\infty)}} be a process formed by starting with a random walk {(X^{(\Delta t)}_t)_{t \in \Delta t \cdot {\bf N}}} with initial position {\mu}, time step {\Delta t}, and step distribution {(\Delta t)^{1/2} X}, and then extending to other times in {[0,+\infty)}, in a piecewise linear fashion, thus

\displaystyle  X^{(\Delta t)}_{(n+\theta) \Delta t} := (1-\theta) X^{(\Delta t)}_{n \Delta t} + \theta X^{(\Delta t)}_{(n+1) \Delta t}

for all {n \in {\bf N}} and {0 < \theta < 1}. Show that as {\Delta t \rightarrow 0}, the process {(X^{(\Delta t)}_t)_{t \in [0,+\infty)}} converges in distribution to a continuous Wiener process with initial position {\mu}. (Hint: from the Riesz representation theorem (or the Kolmogorov extension theorem), it suffices to establish this convergence for every finite set of times in {[0,+\infty)}. Now use the central limit theorem; treating the piecewise linear modifications to the process as an error term.)

— 3. Connection with the heat equation —

Let {(B_t)_{t \in [0,+\infty)}} be a Wiener process with base point {\mu}, and let {F: {\bf R} \rightarrow {\bf R}} be a smooth function with all derivatives bounded. Then, for each time {t}, the random variable {F(B_t)} is bounded and thus has an expectation {{\bf E} F(B_t)}. From the almost sure continuity of {B_t} and the dominated convergence theorem we see that the map {t \mapsto {\bf E} F(B_t)} is continuous. In fact it is differentiable, and obeys the following differential equation:

Lemma 5 (Equation of motion) For all times {t \geq 0}, we have

\displaystyle  \frac{d}{dt} {\bf E} F(B_t) = \frac{1}{2} {\bf E} F_{xx}(B_t)

where {F_{xx}} is the second derivative of {F}. In particular, {t \mapsto {\bf E} F(B_t)} is continuously differentiable (because the right-hand side is continuous).

Proof: We work from first principles. It suffices to show for fixed {t \geq 0}, that

\displaystyle  {\bf E} F(B_{t+dt}) = {\bf E} F(B_t) + \frac{1}{2} dt {\bf E} F_{xx}(B_t) + o(dt)

as {dt \rightarrow 0}. We shall establish this just for non-negative {dt}; the claim for negative {dt} (which only needs to be considered for {t>0}) is similar and is left as an exercise.

Write {dB_t := B_{t+dt} - B_t}. From Taylor expansion and the bounded third derivative of {F}, we have

\displaystyle  F(B_{t+dt}) = F(B_t) + F_x(B_t) dB_t + \frac{1}{2} F_{xx}(B_t) |dB_t|^2 + O( |dB_t|^3 ). \ \ \ \ \ (1)

We take expectations. Since {dB_t \equiv N(0,dt)_{\bf R}}, we have {{\bf E} |dB_t|^3 = O( (dt)^{3/2} )}, so in particular

\displaystyle  {\bf E} F(B_{t+dt}) = {\bf E} F(B_t) + {\bf E} F_x(B_t) dB_t + \frac{1}{2} {\bf E} F_{xx}(B_t) |dB_t|^2 + o(dt).

Now observe that {dB_t} is independent of {B_t}, and has mean zero and variance {dt}. The claim follows. \Box

Exercise 3 Complete the proof of the lemma by considering negative values of {dt}. (Hint: one has to exercise caution because {dB_t} is not independent of {B_t} in this case. However, it will be independent of {B_{t+dt}}. Also, use the fact that {{\bf E} F_x(B_t)} and {{\bf E} F_{xx}(B_t)} are continuous in {t}. Alternatively, one can deduce the formula for the left-derivative from that of the right-derivative via a careful application of the fundamental theorem of calculus, paying close attention to the hypotheses of that theorem.)

Remark 6 In the language of Ito calculus, we can write (1) as

\displaystyle  d F(B_t) = F_x(B_t) dB_t + \frac{1}{2} F_{xx}(B_t) dt. \ \ \ \ \ (2)

Here, {dF(B_t) := F(B_{t+dt}) - F(B_t)}, and {dt} should either be thought of as being infinitesimal, or being very small, though in the latter case the equation (2) should not be viewed as being exact, but instead only being true up to errors of mean {o(dt)} and third moment {O(dt^3)}. This is a special case of Ito’s formula. It should be compared against the chain rule

\displaystyle  dF(X_t) = F_x(X_t) dX_t

when {t \mapsto X_t} is a smooth process. The non-smooth nature of Brownian motion causes the quadratic term in the Taylor expansion to be non-negligible, which explains the additional term in (2), although the Hölder continuity of this motion is sufficient to still be able to ignore terms that are of cubic order or higher. (In this spirit, one can summarise (the differential side of) Ito calculus informally by the heuristic equations {dB_t = O( (dt)^{1/2} )} and {|dB_t|^2 = dt}, with the understanding that all terms that are {o(dt)} are discarded.)

Let {\rho(t,x)\ dx} be the probability density function of {B_t}; by inspection of the normal distribution, this is a smooth function for {t>0}, but is a Dirac mass at {\mu} at time {t=0}. By definition of density function,

\displaystyle  {\bf E} F(B_t) = \int_{\bf R} F(x) \rho(t,x)\ dx

for any Schwartz function {F}. Applying Lemma 5 and integrating by parts, we see that

\displaystyle  \partial_t \rho = \frac{1}{2} \partial_{xx} \rho \ \ \ \ \ (3)

in the sense of (tempered) distributions (see e.g. my earlier notes on this topic). In other words, {\rho} is a (tempered distributional) solution to the heat equation (3). Indeed, since {\rho} is the Dirac mass at {\mu} at time {t=0}, {\rho} for later times {t} is the fundamental solution of that equation from initial position {\mu}.

From the theory of PDE (e.g. from Fourier analysis, see Exercise 38 of these notes) one can solve the (distributional) heat equation with this initial data to obtain the unique solution

\displaystyle  \rho(t,x) = \frac{1}{\sqrt{2\pi t}} e^{-|x-\mu|^2/2t}.

Of course, this is also the density function of {N(\mu,t)_{\bf R}}, which is (unsurprisingly) consistent with the fact that {B_t \equiv N(\mu,t)}. Thus we see why the normal distribution of the central limit theorem involves the same type of functions (i.e. gaussians) as the fundamental solution of the heat equation. Indeed, one can use this argument to heuristically derive the central limit theorem from the fundamental solution of the heat equation (cf. Section 7 of Notes 2), although the derivation is only heuristic because one first needs to know that some limiting distribution already exists (in the spirit of Exercise 2).

Remark 7 Because we considered a Wiener process with a deterministic initial position {\mu}, the density function {\rho} was a Dirac mass at time {t=0}. However, one can run exactly the same arguments for Wiener processes with stochastic initial position (see Remark 5), and one will still obtain the same heat equation (5), but now with a more general initial condition.

We have related one-dimensional Brownian motion to the one-dimensional heat equation, but there is no difficulty establishing a similar relationship in higher dimensions. In a vector space {{\bf R}^n}, define a (continuous) Wiener process {(X_t)_{t \in [0,+\infty)}} in {{\bf R}^n} with an initial position {\mu = (\mu_1,\ldots,\mu_n) \in {\bf R}^n} to be a process whose components {(X_{t,i})_{t \in [0,+\infty)}} for {i=1,\ldots,n} are independent Wiener processes with initial position {\mu_i}. It is easy to see that such processes exist, are unique in distribution, and obey the same sort of properties as in Definition 1, but with the one-dimensional gaussian distribution {N(\mu,\sigma^2)_{\bf R}} replaced by the {n}-dimensional analogue {N(\mu,\sigma^2 I)_{{\bf R}^n}}, which is given by the density function

\displaystyle  \frac{1}{(2\pi \sigma)^{n/2}} e^{-|x-\mu|^2/\sigma^2}\ dx

where {dx} is now Lebesgue measure on {{\bf R}^n}.

Exercise 4 If {(B_t)_{t \in [0,+\infty)}} is an {n}-dimensional continuous Wiener process, show that

\displaystyle  \frac{d}{dt} {\bf E} F(B_t) = \frac{1}{2} {\bf E} (\Delta F)(B_t)

whenever {F: {\bf R}^n \rightarrow {\bf R}} is smooth with all derivatives bounded, where

\displaystyle  \Delta F := \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2} F

is the Laplacian of {F}. Conclude in particular that the density function {\rho(t,x)\ dx} of {B_t} obeys the (distributional) heat equation

\displaystyle  \partial_t \rho = \frac{1}{2} \Delta \rho.

A simple but fundamental observation is that {n}-dimensional Brownian motion is rotation-invariant: more precisely, if {(X_t)_{t \in [0,+\infty)}} is an {n}-dimensional Wiener process with initial position {0}, and {U \in O(n)} is any orthogonal transformation on {{\bf R}^n}, then {(UX_t)_{t \in [0,+\infty)}} is another Wiener process with initial position {0}, and thus has the same distribution:

\displaystyle  (UX_t)_{t \in [0,+\infty)} \equiv (X_t)_{t \in [0,+\infty)}. \ \ \ \ \ (4)

This is ultimately because the {n}-dimensional normal distributions {N(0,\sigma^2 I)_{{\bf R}^n}} are manifestly rotation-invariant (see Exercise 10 of Notes 1).

Remark 8 One can also relate variable-coefficient heat equations to variable-coefficient Brownian motion {(X_t)_{t \in [0,+\infty)}}, in which the variance of an increment {dX_t} is now only proportional to {dt} for infinitesimal {dt} rather than being equal to {dt}, with the constant of proportionality allowed to depend on the time {t} and on the position {X_t}. One can also add drift terms by allowing the increment {dX_t} to have a non-zero mean (which is also proportional to {dt}). This can be accomplished through the machinery of stochastic calculus, which we will not discuss in detail in these notes. In a similar fashion, one can construct Brownian motion (and heat equations) on manifolds or on domains with boundary, though we will not discuss this topic here.

Exercise 5 Let {X_1} be a real random variable of mean zero and variance {1}. Define a stochastic process {(X_t)_{t \in [0,+\infty)}} by the formula

\displaystyle  X_t := e^{-t} ( X_1 + B_{e^{2t}-1} )

where {(B_t)_{t \in [0,+\infty)}} is a Wiener process with initial position zero that is independent of {X_1}. This process is known as an Ornstein-Uhlenbeck process.

  • Show that each {X_t} has mean zero and variance {1}.
  • Show that {X_t} converges in distribution to {N(0,1)_{\bf R}} as {t \rightarrow \infty}.
  • If {F: {\bf R} \rightarrow {\bf R}} is smooth with all derivatives bounded, show that

    \displaystyle  \frac{d}{dt} {\bf E} F(X_t) = {\bf E} LF(X_t)

    where {L} is the Ornstein-Uhlenbeck operator

    \displaystyle  LF := F_{xx} - x F_x.

    Conclude that the density function {\rho(t,x)} of {X_t} obeys (in a distributional sense, at least) the Ornstein-Uhlenbeck equation

    \displaystyle  \partial_t \rho = L^* \rho

    where the adjoint operator {L^*} is given by

    \displaystyle  L^* \rho := \rho_{xx} + \partial_x(x \rho).

  • Show that the only probability density function {\rho} for which {L^*\rho=0} is the Gaussian {\frac{1}{\sqrt{2\pi}} e^{-x^2/2}\ dx}; furthermore, show that {\hbox{Re} \langle \rho, L^* \rho \rangle_{L^2({\bf R})} \leq 0} for all probability density functions {\rho} in the Schwartz space with mean zero and variance {1}. Discuss how this fact relates to the preceding two parts of this exercise.

Remark 9 The heat kernel {\frac{1}{(\sqrt{2\pi t})^d} e^{-|x-\mu|^2/2t}} in {d} dimensions is absolutely integrable in time away from the initial time {t=0} for dimensions {d \geq 3}, but becomes divergent in dimension {1} and (just barely) divergent for {d=2}. This causes the qualitative behaviour of Brownian motion {B_t} in {{\bf R}^d} to be rather different in the two regimes. For instance, in dimensions {d \geq 3} Brownian motion is transient; almost surely one has {B_t \rightarrow \infty} as {t \rightarrow \infty}. But in dimension {d=1} Brownian motion is recurrent: for each {x_0 \in {\bf R}}, one almost surely has {B_t=x_0} for infinitely many {t}. In the critical dimension {d=2}, Brownian motion turns out to not be recurrent, but is instead neighbourhood recurrent: almost surely, {B_t} revisits every neighbourhood of {x_0} at arbitrarily large times, but does not visit {x_0} itself for any positive time {t}. The study of Brownian motion and its relatives is in fact a huge and active area of study in modern probability theory, but will not be discussed in this course.

— 4. Dyson Brownian motion —

The space {V} of {n \times n} Hermitian matrices can be viewed as a real vector space of dimension {n^2} using the Frobenius norm

\displaystyle A \mapsto \hbox{tr}(A^2)^{1/2} = \sum_{i=1}^n a_{ii}^2 + 2 \sum_{1 \leq i < j \leq n} \hbox{Re}(a_{ij})^2 + \hbox{Im}(a_{ij})^2

where {a_{ij}} are the coefficients of {A}. One can then identify {V} explicitly with {{\bf R}^{n^2}} via the identification

\displaystyle  (a_{ij})_{1 \leq i,j \leq n} \equiv ( (a_{ii})_{i=1}^n, (\sqrt{2} \hbox{Re}(a_{ij}), \sqrt{2} \hbox{Im}(a_{ij}))_{1 \leq i<j \leq n} ).

Now that one has this indentification, for each Hermitian matrix {A_0 \in V} (deterministic or stochastic) we can define a Wiener process {(A_t)_{t \in [0,+\infty)}} on {V} with initial position {A_0}. By construction, we see that {t \mapsto A_t} is almost surely continuous, and each increment {A_{t_+} - A_{t_-}} is equal to {(t_+-t_-)^{1/2}} times a matrix drawn from the gaussian unitary ensemble (GUE), with disjoint increments being jointly independent. In particular, the diagonal entries of {A_{t_+}-A_{t_-}} have distribution {N(0,t_+-t_-)_{\bf R}}, and the off-diagonal entries have distribution {N(0,t_+-t_-)_{\bf C}}.

Given any Hermitian matrix {A}, one can form the spectrum {(\lambda_1(A),\ldots,\lambda_n(A))}, which lies in the Weyl chamber {{\bf R}^n_\geq := \{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \geq \ldots \geq \lambda_n \}}. Taking the spectrum of the Wiener process {(A_t)_{t \in [0,+\infty)}}, we obtain a process

\displaystyle (\lambda_1(A_t),\ldots,\lambda_n(A_t))_{t \in [0,+\infty)}

in the Weyl cone. We abbreviate {\lambda_i(A_t)} as {\lambda_i}.

For {t>0}, we see that {A_t} is absolutely continuously distributed in {V}. In particular, since almost every Hermitian matrix has simple spectrum, we see that {A_t} has almost surely simple spectrum for {t>0}. (The same is true for {t=0} if we assume that {A_0} also has an absolutely continuous distribution.)

The stochastic dynamics of this evolution can be described by Dyson Brownian motion:

Theorem 6 (Dyson Brownian motion) Let {t > 0}, and let {dt > 0}, and let {\lambda_1,\ldots,\lambda_n} be as above. Then we have

\displaystyle  d\lambda_i = dB_i + \sum_{1 \leq j \leq n: j \neq i} \frac{dt}{\lambda_i - \lambda_j} + \ldots \ \ \ \ \ (5)

for all {1 \leq i \leq n}, where {d\lambda_i := \lambda_i(A_{t+dt}) - \lambda_i(A_t)}, and {dB_1,\ldots,dB_n} are iid copies of {N(0,dt)_{\bf R}} which are jointly independent of {(A_{t'})_{t' \in [0,t]}}, and the error term {\ldots} has mean {o(dt)} and third moment {O(dt^3)} in the limit {dt \rightarrow 0} (holding {t} and {n} fixed).

Using the language of Ito calculus, one usually views {dt} as infinitesimal and drops the {\ldots} error, thus giving the elegant formula

\displaystyle  d\lambda_i = dB_i + \sum_{1 \leq j \leq n: j \neq i} \frac{dt}{\lambda_i - \lambda_j}

that shows that the eigenvalues {\lambda_i} evolve by Brownian motion, combined with a deterministic repulsion force that repels nearby eigenvalues from each other with a strength inversely proportional to the separation. One can extend the theorem to the {t=0} case by a limiting argument provided that {A_0} has an absolutely continuous distribution. Note that the decay rate of the error {\ldots} can depend on {n}, so it is not safe to let {n} go off to infinity while holding {dt} fixed. However, it is safe to let {dt} go to zero first, and then send {n} off to infinity. (It is also possible, by being more explicit with the error terms, to work with {dt} being a specific negative power of {n}. We will see this sort of analysis later in this course.)

Proof: Fix {t}. We can write {A_{t+dt} = A_t + (dt)^{1/2} G}, where {G} is independent of {A_t} and has the GUE distribution. (Strictly speaking, {G} depends on {dt}, but this dependence will not concern us.) We now condition {A_t} to be fixed, and establish (5) for almost every fixed choice of {A_t}; the general claim then follows upon undoing the conditioning (and applying the dominated convergence theorem). Due to independence, observe that {G} continues to have the GUE distribution even after conditioning {A_t} to be fixed.

Almost surely, {A_t} has simple spectrum; so we may assume that the fixed choice of {A_t} has simple spectrum also. The eigenvalues {\lambda_i} now vary smoothly near {t}, so we may Taylor expand

\displaystyle  \lambda_i(A_{t+dt}) = \lambda_i + (dt)^{1/2} \nabla_G \lambda_i + \frac{1}{2} dt \nabla_G^2 \lambda_i + O( (dt)^{3/2} \|G\|^3 )

for sufficiently small {dt}, where {\nabla_G} is directional differentiation in the {G} direction, and the implied constants in the {O()} notation can depend on {A_t} and {n}. In particular, we do not care what norm is used to measure {G} in.

As {G} has the GUE distribution, the expectation and variance of {\|G\|^3} is bounded (possibly with constant depending on {n}), so the error here has mean {o(dt)} and third moment {O(dt^3)}. We thus have

\displaystyle  d\lambda_i = (dt)^{1/2} \nabla_G \lambda_i + \frac{1}{2} dt \nabla_G^2 \lambda_i + \ldots.

Next, from the first and second Hadamard variation formulae (see Section 4 of Notes 3a) we have

\displaystyle  \nabla_G \lambda_i = u_i^* G u_i

and

\displaystyle  \nabla^2_G \lambda_i = 2 \sum_{i \neq j} \frac{|u_j^* G u_i|^2}{\lambda_i - \lambda_j}

where {u_1,\ldots,u_n} are an orthonormal eigenbasis for {A_t}, and thus

\displaystyle  d\lambda_i = (dt)^{1/2} u_i^* G u_i + dt \sum_{i \neq j} \frac{|u_j^* G u_i|^2}{\lambda_i - \lambda_j} + \ldots.

Now we take advantage of the unitary invariance of the Gaussian unitary ensemble (that is, that {UGU^* \equiv G} for all unitary matrices {G}; this is easiest to see by noting that the probability density function of {G} is proportional to {\exp( - \|G\|_F^2 / 2 )}). From this invariance, we can assume without loss of generality that {u_1,\ldots,u_n} is the standard orthonormal basis of {{\bf C}^n}, so that we now have

\displaystyle  d\lambda_i = (dt)^{1/2} \xi_{ii} + dt \sum_{i \neq j} \frac{|\xi_{ij}|^2}{\lambda_i - \lambda_j} + \ldots

where {\xi_{ij}} are the coefficients of {G}. But the {\xi_{ii}} are iid copies of {N(0,1)_{\bf R}}, and the {\xi_{ij}} are iid copies of {N(0,1)_{\bf C}}, and the claim follows (note that {dt \sum_{i \neq j} \frac{|\xi_{ij}|^2-1}{\lambda_i - \lambda_j}} has mean zero and third moment {O(dt^3)}.) \Box

Remark 10 Interestingly, one can interpret Dyson Brownian motion in a different way, namely as the motion of {n} independent Wiener processes {\lambda_i(t)} after one conditions the {\lambda_i} to be non-intersecting for all time; see this paper of Grabiner. It is intuitively reasonable that this conditioning would cause a repulsion effect, though I do not know of a simple heuristic reason why this conditioning should end up giving the specific repulsion force present in (5).

In the previous section, we saw how a Wiener process led to a PDE (the heat flow equation) that could be used to derive the probability density function for each component {X_t} of that process. We can do the same thing here:

Exercise 6 Let {\lambda_1,\ldots,\lambda_n} be as above. Let {F: {\bf R}^n \rightarrow {\bf R}} be a smooth function with bounded derivatives. Show that for any {t \geq 0}, one has

\displaystyle  \partial_t {\bf E} F(\lambda_1,\ldots,\lambda_n) = {\bf E} D^* F(\lambda_1,\ldots,\lambda_n)

where {D^*} is the adjoint Dyson operator

\displaystyle  D^* F := \frac{1}{2} \sum_{i=1}^n \partial_{\lambda_i}^2 F + \sum_{1 \leq i,j \leq n: i \neq j} \frac{ \partial_{\lambda_i} F }{\lambda_i - \lambda_j }.

If we let {\rho: [0,+\infty) \times {\bf R}^n_{\geq} \rightarrow {\bf R}} denote the density function {\rho(t,\cdot): {\bf R}^n_{\geq} \rightarrow {\bf R}} of {(\lambda_1(t),\ldots,\lambda_n(t))} at time {t \in [0,+\infty)}, deduce the Dyson partial differential equation

\displaystyle  \partial_t \rho = D \rho \ \ \ \ \ (6)

(in the sense of distributions, at least, and on the interior of {{\bf R}^n_\geq}), where {D} is the Dyson operator

\displaystyle  D \rho := \frac{1}{2} \sum_{i=1}^n \partial_{\lambda_i}^2 \rho - \sum_{1 \leq i,j \leq n: i \neq j} \partial_{\lambda_i} ( \frac{\rho}{\lambda_i - \lambda_j} ). \ \ \ \ \ (7)

The Dyson partial differential equation (6) looks a bit complicated, but it can be simplified (formally, at least) by introducing the Vandermonde determinant

\displaystyle  \Delta_n(\lambda_1,\ldots,\lambda_n) := \prod_{1 \leq i < j \leq n} (\lambda_i - \lambda_j). \ \ \ \ \ (8)

Exercise 7 Show that (8) is the determinant of the matrix {(\lambda_i^{j-1})_{1 \leq i,j \leq n}}, and is also the sum {\sum_{\sigma \in S_n} \hbox{sgn}(\sigma) \prod_{i=1}^n \lambda_{\sigma(i)}^{i-1}}.

Note that this determinant is non-zero on the interior of the Weyl chamber {{\bf R}^n_\geq}. The significance of this determinant for us lies in the identity

\displaystyle  \partial_{\lambda_i} \Delta_n = \sum_{1 \leq j \leq n: i \neq j} \frac{\Delta_n}{\lambda_i - \lambda_j} \ \ \ \ \ (9)

which can be used to cancel off the second term in (7). Indeed, we have

Exercise 8 Let {\rho} be a smooth solution to (6) in the interior of {{\bf R}^n_\geq}, and write

\displaystyle  \rho = \Delta_n u \ \ \ \ \ (10)

in this interior. Show that {u} obeys the linear heat equation

\displaystyle  \partial_t u = \frac{1}{2} \sum_{i=1}^n \partial_{\lambda_i}^2 u

in the interior of {{\bf R}^n_\geq}. (Hint: You may need to exploit the identity {\frac{1}{(a-b)(a-c)} + \frac{1}{(b-a)(b-c)} + \frac{1}{(c-a)(c-b)} = 0} for distinct {a,b,c}. Equivalently, you may need to first establish that the Vandermonde determinant is a harmonic function.)

Let {\rho} be the density function of the {(\lambda_1,\ldots,\lambda_n)}, as in (6). Recall that the Wiener random matrix {A_t} has a smooth distribution in the space {V} of Hermitian matrices, while the space of matrices in {V} with non-simple spectrum has codimension {3} by Exercise 10 of Notes 3a. On the other hand, the non-simple spectrum only has codimension {1} in the Weyl chamber (being the boundary of this cone). Because of this, we see that {\rho} vanishes to at least second order on the boundary of this cone (with correspondingly higher vanishing on higher codimension facets of this boundary). Thus, the function {u} in Exercise 8 vanishes to first order on this boundary (again with correspondingly higher vanishing on higher codimension facets). Thus, if we extend {\rho} symmetrically across the cone to all of {{\bf R}^n}, and extend the function {u} antisymmetrically, then the equation (6) and the factorisation (10) extend (in the distributional sense) to all of {{\bf R}^n}. Extending (8) to this domain (and being somewhat careful with various issues involving distributions), we now see that {u} obeys the linear heat equation on all of {{\bf R}^n}.

Now suppose that the initial matrix {A_0} had a deterministic spectrum {\nu = (\nu_1,\ldots,\nu_n)}, which to avoid technicalities we will assume to be in the interior of the Weyl chamber (the boundary case then being obtainable by a limiting argument). Then {\rho} is initially the Dirac delta function at {\nu}, extended symmetrically. Hence, {u} is initially {\frac{1}{\Delta_n(\nu)}} times the Dirac delta function at {\nu}, extended antisymmetrically:

\displaystyle  u(0,\lambda) = \frac{1}{\Delta_n(\nu)} \sum_{\sigma \in S_n} \hbox{sgn}(\sigma) \delta_{\lambda - \sigma(\nu)}.

Using the fundamental solution for the heat equation in {n} dimensions, we conclude that

\displaystyle  u(t,\lambda) = \frac{1}{(2\pi t)^{n/2}} \sum_{\sigma \in S_n} \hbox{sgn}(\sigma) e^{-|\lambda - \sigma(\nu)|^2/2t}.

By the Leibniz formula for determinants, we can express the sum here as a determinant of the matrix

\displaystyle  (e^{-(\lambda_i - \nu_j)^2 / 2t})_{1 \leq i,j \leq n}.

Applying (10), we conclude

Theorem 7 (Johansson formula) Let {A_0} be a Hermitian matrix with simple spectrum {\nu = (\nu_1,\ldots,\nu_n)}, let {t > 0}, and let {A_t = A_0 + t^{1/2} G} where {G} is drawn from GUE. Then the spectrum {\lambda = (\lambda_1,\ldots,\lambda_n)} of {A_t} has probability density function

\displaystyle  \rho(t,\lambda) = \frac{1}{(2\pi t)^{n/2}} \frac{\Delta_n(\lambda)}{\Delta_n(\nu)} \det(e^{-(\lambda_i - \nu_j)^2 / 2t})_{1 \leq i,j \leq n} \ \ \ \ \ (11)

on {{\bf R}^n_\geq}.

This formula is given explicitly in this paper of Johansson, who cites this paper of Brézin and Hikami as inspiration. This formula can also be proven by a variety of other means, for instance via the Harish-Chandra formula. (One can also check by hand that (11) satisfies the Dyson equation (6).)

We will be particularly interested in the case when {A_0=0} and {t=1}, so that we are studying the probability density function of the eigenvalues {(\lambda_1(G),\ldots,\lambda_n(G))} of a GUE matrix {G}. The Johansson formula does not directly apply here, because {\nu} is vanishing. However, we can investigate the limit of (11) in the limit as {\nu \rightarrow 0} inside the Weyl chamber; the Lipschitz nature of the eigenvalue operations {A \mapsto \lambda_i(A)} (from the Weyl inequalities) tell us that if (11) converges locally uniformly as {\nu \rightarrow 0} for {\lambda} in the interior of {{\bf R}^n_\geq}, then the limit will indeed be the probability density function for {\nu=0}. (Note from continuity that the density function cannot assign any mass to the boundary of the Weyl chamber, and in fact must vanish to at least second order by the previous discussion.)

Exercise 9 Show that as {\nu \rightarrow 0}, we have the identities

\displaystyle  \det(e^{-(\lambda_i - \nu_j)^2 / 2})_{1 \leq i,j \leq n} = e^{-|\lambda|^2/2} e^{-|\nu|^2/2} \det(e^{\lambda_i \nu_j})_{1 \leq i,j \leq n}

and

\displaystyle  \det(e^{\lambda_i \nu_j})_{1 \leq i,j \leq n} = \frac{1}{1! \ldots (n-1)!} \Delta_n(\lambda) \Delta_n(\nu) + o( \Delta_n(\nu) )

locally uniformly in {\lambda}. (Hint: for the second identity, use Taylor expansion and the Leibniz formula for determinants, noting the left-hand side vanishes whenever {\Delta_n(\nu)} vanishes and so can be treated by the (smooth) factor theorem.)

From the above exercise, we conclude the fundamental Ginibre formula

\displaystyle  \rho(\lambda) = \frac{1}{(2\pi)^{n/2} 1! \ldots (n-1)!} e^{-|\lambda|^2/2} |\Delta_n(\lambda)|^2 \ \ \ \ \ (12)

for the density function for the spectrum {(\lambda_1(G),\ldots,\lambda_n(G))} of a GUE matrix {G}.

This formula can be derived by a variety of other means; we sketch one such way below.

Exercise 10 For this exercise, assume that it is known that (12) is indeed a probability distribution on the Weyl chamber {{\bf R}^n_\geq} (if not, one would have to replace the constant {(2\pi)^{n/2}} by an unspecified normalisation factor depending only on {n}). Let {D = \hbox{diag}( \lambda_1, \ldots, \lambda_n )} be drawn at random using the distribution (12), and let {U} be drawn at random from Haar measure on {U(n)}. Show that the probability density function of {UDU^*} at a matrix {A} with simple spectrum is equal to {c_n e^{-\|A\|_F^2/2}} for some constant {c_n>0}. (Hint: use unitary invariance to reduce to the case when {A} is diagonal. Now take a small {\epsilon} and consider what {U} and {D} must be in order for {UDU^*} to lie within {\epsilon} of {A} in the Frobenius norm, performing first order calculations only (i.e. linearising and ignoring all terms of order {o(\epsilon)}).)

Conclude that (12) must be the probability density function of the spectrum of a GUE matrix.

Exercise 11 Verify by hand that the self-similar extension

\displaystyle  \rho(t,x) := t^{-n/2} \rho(x/\sqrt{t})

of the function (12) obeys the Dyson PDE (6). Why is this consistent with (12) being the density function for the spectrum of GUE?

Remark 11 Similar explicit formulae exist for other invariant ensembles, such as the gaussian orthogonal ensemble GOE and the gaussian symplectic ensemble GSE. One can also replace the exponent in density functions such as {e^{-\|A\|_F^2/2}} with more general expressions than quadratic expressions of {A}. We will however not detail these formulae in this course (with the exception of the spectral distribution law for random iid gaussian matrices, which we will discuss in a later set of notes).