254A, Notes 3b: Brownian motion and Dyson Brownian motion

18 January, 2010 in 254A - random matrices, math.AP, math.PR, math.SP | Tags: Brownian motion, Dyson Brownian motion, GUE, heat equation, Johansson formula, Ornstein-Uhlenbeck process | by Terence Tao

One theme in this course will be the central nature played by the gaussian random variables ${X \equiv N(\mu,\sigma^2)}$ . Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.

One way to exploit this algebraic structure is to continuously deform the variance ${t := \sigma^2}$ from an initial variance of zero (so that the random variable is deterministic) to some final level ${T}$ . We would like to use this to give a continuous family ${t \mapsto X_t}$ of random variables ${X_t \equiv N(\mu, t)}$ as ${t}$ (viewed as a “time” parameter) runs from ${0}$ to ${T}$ .

At present, we have not completely specified what ${X_t}$ should be, because we have only described the individual distribution ${X_t \equiv N(\mu,t)}$ of each ${X_t}$ , and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.

We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of ${n \times n}$ Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of ${n \times n}$ Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.

— 1. Formal construction of Brownian motion —

We begin with constructing one-dimensional Brownian motion. We shall model this motion using the machinery of Wiener processes:

Definition 1 (Wiener process) Let ${\mu \in {\bf R}}$ , and let ${\Sigma \subset [0,+\infty)}$ be a set of times containing ${0}$ . A (one-dimensional) Wiener process on ${\Sigma}$ with initial position ${\mu}$ is a collection ${(X_t)_{t \in \Sigma}}$ of real random variables ${X_t}$ for each time ${t \in \Sigma}$ , with the following properties:

(i) ${X_0 = \mu}$ .

(ii) Almost surely, the map ${t \mapsto X_t}$ is a continuous function on ${\Sigma}$ .

(iii) For every ${0 \leq t_- < t_+}$ in ${\Sigma}$ , the increment ${X_{t_+}-X_{t_-}}$ has the distribution of ${N( 0, t_+ - t_- )_{\bf R}}$ . (In particular, ${X_t \equiv N(\mu,t)_{\bf R}}$ for every ${t > 0}$ .)

(iv) For every ${t_0 \leq t_1 \leq \ldots \leq t_n}$ in ${\Sigma}$ , the increments ${X_{t_{i}} - X_{t_{i-1}}}$ for ${i=1,\ldots,n}$ are jointly independent.

If ${\Sigma}$ is discrete, we say that ${(X_t)_{t \in \Sigma}}$ is a discrete Wiener process; if ${\Sigma = [0,+\infty)}$ then we say that ${(X_t)_{t \in \Sigma}}$ is a continuous Wiener process.

Remark 2 Collections of random variables ${(X_t)_{t \in \Sigma}}$ , where ${\Sigma}$ is a set of times, will be referred to as stochastic processes, thus Wiener processes are a (very) special type of stochastic process.

Remark 3 In the case of discrete Wiener processes, the continuity requirement (ii) is automatic. For continuous Wiener processes, there is a minor technical issue: the event that ${t \mapsto X_t}$ is continuous need not be a measurable event (one has to take uncountable intersections to define this event). Because of this, we interpret (ii) by saying that there exists a measurable event of probability ${1}$ , such that ${t \mapsto X_t}$ is continuous on all of this event, while also allowing for the possibility that ${t \mapsto X_t}$ could also sometimes be continuous outside of this event also. One can view the collection ${(X_t)_{t \in \Sigma}}$ as a single random variable, taking values in the product space ${{\bf R}^\Sigma}$ (with the product ${\sigma}$ -algebra, of course).

Remark 4 One can clearly normalise the initial position ${\mu}$ of a Wiener process to be zero by replacing ${X_t}$ with ${X_t-\mu}$ for each ${t}$ .

We shall abuse notation somewhat and identify continuous Wiener processes with Brownian motion in our informal discussion, although technically the former is merely a model for the latter. To emphasise this link with Brownian motion, we shall often denote continuous Wiener processes as ${(B_t)_{t \in [0,+\infty)}}$ rather than ${(X_t)_{t \in [0,+\infty)}}$ .

It is not yet obvious that Wiener processes exist, and to what extent they are unique. The situation is easily clarified though for discrete processes:

Proposition 5 (Discrete Brownian motion) Let ${\Sigma}$ be a discrete subset of ${[0,+\infty)}$ containing ${0}$ , and let ${\mu\in {\bf R}}$ . Then (after extending the sample space if necessary) there exists a Wiener process ${(X_t)_{t \in \Sigma}}$ with base point ${\mu}$ . Furthermore, any other Wiener process ${(X'_t)_{t \in \Sigma}}$ with base point ${\mu}$ has the same distribution as ${(X_t)_{t \in \Sigma}}$ .

Proof: As ${\Sigma}$ is discrete and contains ${0}$ , we can write it as ${\{t_0,t_1,t_2,\ldots\}}$ for some

$\displaystyle 0 = t_0 < t_1 < t_2 < \ldots.$

Let ${(dX_i)_{i=1}^\infty}$ be a collection of jointly independent random variables with ${dX_i \equiv N(0,t_i-t_{i-1})_{\bf R}}$ (the existence of such a collection, after extending the sample space, is guaranteed by Exercise 18 of Notes 0.) If we then set

$\displaystyle X_{t_i} := \mu + dX_1 + \ldots + dX_i$

for all ${i=0,1,2,\ldots}$ , then one easily verifies (using Exercise 9 of Notes 1) that ${(X_t)_{t \in \Sigma}}$ is a Wiener process.

Conversely, if ${(X'_t)_{t \in \Sigma}}$ is a Wiener process, and we define ${dX'_i := X'_i - X'_{i-1}}$ for ${i=1,2,\ldots}$ , then from the definition of a Wiener process we see that the ${dX'_i}$ have distribution ${N(0,t_i-t_{i-1})_{\bf R}}$ and are jointly independent (i.e. any finite subcollection of the ${dX'_i}$ are jointly independent). This implies for any finite ${n}$ that the random variables ${(dX_i)_{i=1}^n}$ and ${(dX'_i)_{i=1}^n}$ have the same distribution, and thus ${(X_t)_{t \in \Sigma'}}$ and ${(X'_t)_{t \in \Sigma'}}$ have the same distribution for any finite subset ${\Sigma'}$ of ${\Sigma}$ . From the construction of the product ${\sigma}$ -algebra we conclude that ${(X_t)_{t \in \Sigma}}$ and ${(X'_t)_{t \in \Sigma}}$ have the same distribution, as required. $\Box$

Now we pass from the discrete case to the continuous case.

Proposition 6 (Continuous Brownian motion) Let ${\mu\in {\bf R}}$ . Then (after extending the sample space if necessary) there exists a Wiener process ${(X_t)_{t \in [0,+\infty)}}$ with base point ${\mu}$ . Furthermore, any other Wiener process ${(X'_t)_{t \in [0,+\infty)}}$ with base point ${\mu}$ has the same distribution as ${(X_t)_{t \in [0,+\infty)}}$ .

Proof: The uniqueness claim follows by the same argument used to prove the uniqueness component of Proposition 5, so we just prove existence here. The iterative construction we give here is somewhat analogous to that used to create self-similar fractals, such as the Koch snowflake. (Indeed, Brownian motion can be viewed as a probabilistic analogue of a self-similar fractal.)

The idea is to create a sequence of increasingly fine discrete Brownian motions, and then to take a limit. Proposition 5 allows one to create each individual discrete Brownian motion, but the key is to couple these discrete processes together in a consistent manner.

Here’s how. We start with a discrete Wiener process ${(X_t)_{t \in {\bf N}}}$ on the natural numbers ${{\bf N} = \{0,1,2\ldots\}}$ with initial position ${\mu}$ , which exists by Proposition 5. We now extend this process to the denser set of times ${\frac{1}{2} {\bf N} := \{ \frac{1}{2} n: n \in {\bf N} \}}$ by setting

$\displaystyle X_{t+\frac{1}{2}} := \frac{X_t + X_{t+1}}{2} + Y_{t,0}$

for ${t=0,1,2,\ldots}$ , where ${(Y_{t,0})_{t \in {\bf N}}}$ are iid copies of ${N(0,1/4)_{\bf R}}$ , which are jointly independent of the ${(X_t)_{t \in {\bf N}}}$ . It is a routine matter to use Exercise 9 of Notes 1 to show that this creates a discrete Wiener process ${(X_t)_{t \in \frac{1}{2} {\bf N}}}$ on ${\frac{1}{2}{\bf N}}$ which extends the previous process.

Next, we extend the process further to the denser set of times ${\frac{1}{4} {\bf N}}$ by defining

$\displaystyle X_{t+\frac{1}{4}} := \frac{X_{t} + X_{t+1/2}}{2} + Y_{t,1}$

where ${(Y_{t,1})_{t \in \frac{1}{2}{\bf N}}}$ are iid copies of ${N(0,1/8)_{\bf R}}$ , jointly independent of ${(X_t)_{t \in \frac{1}{2} {\bf N}}}$ . Again, it is a routine matter to show that this creates a discrete Wiener process ${(X_t)_{t \in \frac{1}{4} {\bf N}}}$ on ${\frac{1}{4}{\bf N}}$ .

Iterating this procedure a countable number of times, we obtain a collection of discrete Wiener processes ${(X_t)_{t \in \frac{1}{2^k} {\bf N}}}$ for ${k=0,1,2,\ldots}$ which are consistent with each other, in the sense that the earlier processes in this collection are restrictions of later ones. (This requires a countable number of extensions of the underlying sample space, but one can capture all of these extensions into a single extension via the machinery of inverse limits of probability spaces; it is also not difficult to manually build a single extension sufficient for performing all the above constructions.)

Now we establish a Hölder continuity property. Let ${\theta}$ be any exponent between ${0}$ and ${1/2}$ , and let ${T > 0}$ be finite. Observe that for any ${k = 0,1,\ldots}$ and any ${j \in {\bf N}}$ , we have ${X_{(j+1)/2^k} - X_{j/2^k} \equiv N(0,1/2^k)_{\bf R}}$ and hence (by the subgaussian nature of the normal distribution)

$\displaystyle {\bf P}( |X_{(j+1)/2^k} - X_{j/2^k}| \geq 2^{-k \theta} ) \leq C \exp( - c 2^{k(1-2\theta)} )$

for some absolute constants ${C, c}$ . The right-hand side is summable as ${j,k}$ run over ${{\bf N}}$ subject to the constraint ${j/2^k \leq T}$ . Thus, by the Borel-Cantelli lemma, for each fixed ${T}$ , we almost surely have that

$\displaystyle |X_{(j+1)/2^k} - X_{j/2^k}| \leq 2^{-k \theta}$

for all but finitely many ${j,k \in {\bf N}}$ with ${j/2^k \leq T}$ . In particular, this implies that for each fixed ${T}$ , the function ${t \mapsto X_t}$ is almost surely Hölder continuous of exponent ${\theta}$ on the dyadic rationals ${j/2^k}$ in ${[0,T]}$ , and thus (by the countable union bound) is almost surely locally Hölder continuous of exponent ${\theta}$ on the dyadic rationals in ${[0,+\infty)}$ . In particular, they are almost surely locally uniformly continuous on this domain.

As the dyadic rationals are dense in ${[0,+\infty)}$ , we can thus almost surely extend ${t \mapsto X_t}$ uniquely to a continuous function on all of ${[0,+\infty)}$ . (On the remaining probability zero event, we extend ${t \mapsto X_t}$ in some arbitrary measurable fashion.) Note that if ${t_n}$ is any sequence in ${[0,+\infty)}$ converging to ${t}$ , then ${X_{t_n}}$ converges almost surely to ${X_t}$ , and thus also converges in probability and in distribution. Similarly for differences such as ${X_{t_{+,n}} - X_{t_{-,n}}}$ . Using this, we easily verify that ${(X_t)_{t \in [0,+\infty)}}$ is a continuous Wiener process, as required. $\Box$

Remark 7 One could also have used the Kolmogorov extension theorem to establish the limit.

Exercise 8 Let ${(X_t)_{t \in [0,+\infty)}}$ be a continuous Wiener process. We have already seen that if ${0 < \theta < 1/2}$ , that the map ${t \mapsto X_t}$ is almost surely Hölder continuous of order ${\theta}$ . Show that if ${1/2 \leq \theta \leq 1}$ , then the map ${t \mapsto X_t}$ is almost surely not Hölder continuous of order ${\theta}$ .

Show also that the map ${t \mapsto X_t}$ is almost surely nowhere differentiable. Thus, Brownian motion provides a (probabilistic) example of a continuous function which is nowhere differentiable.

Remark 9 In the above constructions, the initial position ${\mu}$ of the Wiener process was deterministic. However, one can easily construct Wiener processes in which the initial position ${X_0}$ is itself a random variable. Indeed, one can simply set

$\displaystyle X_t := X_0 + B_t$

where ${(B_t)_{t \in [0,+\infty)}}$ is a continuous Wiener process with initial position ${0}$ which is independent of ${X_0}$ . Then we see that ${X_t}$ obeys properties (ii), (iii), (iv) of Definition 1, but the distribution of ${X_t}$ is no longer ${N(\mu,t)_{\bf R}}$ , but is instead the convolution of the law of ${X_0}$ , and the law of ${N(0,t)_{\bf R}}$ .

— 2. Connection with random walks —

We saw how to construct Brownian motion as a limit of discrete Wiener processes, which were partial sums of independent gaussian random variables. The central limit theorem (see Notes 2) allows one to interpret Brownian motion in terms of limits of partial sums of more general independent random variables, otherwise known as (independent) random walks.

Definition 10 (Random walk) Let ${\Delta X}$ be a real random variable, let ${\mu \in {\bf R}}$ be an initial position, and let ${\Delta t > 0}$ be a time step. We define a discrete random walk with initial position ${\mu}$ , time step ${\Delta t}$ and step distribution ${\Delta X}$ (or ${\mu_{\Delta X}}$ ) to be a process ${(X_t)_{t \in \Delta t \cdot {\bf N}}}$ defined by

$\displaystyle X_{n \Delta t} := \mu + \sum_{i=1}^n \Delta X_{i \Delta t}$

where ${(\Delta X_{i \Delta t})_{i=1}^\infty}$ are iid copies of ${\Delta X}$ .

Example 11 From the proof of Proposition 5, we see that a discrete Wiener process on ${\Delta t \cdot {\bf N}}$ with initial position ${\mu}$ is nothing more than a discrete random walk with step distribution of ${N(0,\Delta t)_{\bf R}}$ . Another basic example is simple random walk, in which ${\Delta X}$ is equal to ${(\Delta t)^{1/2}}$ times a signed Bernoulli variable, thus we have ${X_{(n+1) \Delta t} = X_{n\Delta t} \pm (\Delta t)^{1/2}}$ , where the signs ${\pm}$ are unbiased and are jointly independent in ${n}$ .

Exercise 12 (Central limit theorem) Let ${X}$ be a real random variable with mean zero and variance ${1}$ , and let ${\mu \in {\bf R}}$ . For each ${\Delta t > 0}$ , let ${(X^{(\Delta t)}_t)_{t \in [0,+\infty)}}$ be a process formed by starting with a random walk ${(X^{(\Delta t)}_t)_{t \in \Delta t \cdot {\bf N}}}$ with initial position ${\mu}$ , time step ${\Delta t}$ , and step distribution ${(\Delta t)^{1/2} X}$ , and then extending to other times in ${[0,+\infty)}$ , in a piecewise linear fashion, thus

$\displaystyle X^{(\Delta t)}_{(n+\theta) \Delta t} := (1-\theta) X^{(\Delta t)}_{n \Delta t} + \theta X^{(\Delta t)}_{(n+1) \Delta t}$

for all ${n \in {\bf N}}$ and ${0 < \theta < 1}$ . Show that as ${\Delta t \rightarrow 0}$ , the process ${(X^{(\Delta t)}_t)_{t \in [0,+\infty)}}$ converges in distribution to a continuous Wiener process with initial position ${\mu}$ . (Hint: from the Riesz representation theorem (or the Kolmogorov extension theorem), it suffices to establish this convergence for every finite set of times in ${[0,+\infty)}$ . Now use the central limit theorem; treating the piecewise linear modifications to the process as an error term.)

— 3. Connection with the heat equation —

Let ${(B_t)_{t \in [0,+\infty)}}$ be a Wiener process with base point ${\mu}$ , and let ${F: {\bf R} \rightarrow {\bf R}}$ be a smooth function with all derivatives bounded. Then, for each time ${t}$ , the random variable ${F(B_t)}$ is bounded and thus has an expectation ${{\bf E} F(B_t)}$ . From the almost sure continuity of ${B_t}$ and the dominated convergence theorem we see that the map ${t \mapsto {\bf E} F(B_t)}$ is continuous. In fact it is differentiable, and obeys the following differential equation:

Lemma 13 (Equation of motion) For all times ${t \geq 0}$ , we have

$\displaystyle \frac{d}{dt} {\bf E} F(B_t) = \frac{1}{2} {\bf E} F_{xx}(B_t)$

where ${F_{xx}}$ is the second derivative of ${F}$ . In particular, ${t \mapsto {\bf E} F(B_t)}$ is continuously differentiable (because the right-hand side is continuous).

Proof: We work from first principles. It suffices to show for fixed ${t \geq 0}$ , that

$\displaystyle {\bf E} F(B_{t+dt}) = {\bf E} F(B_t) + \frac{1}{2} dt {\bf E} F_{xx}(B_t) + o(dt)$

as ${dt \rightarrow 0}$ . We shall establish this just for non-negative ${dt}$ ; the claim for negative ${dt}$ (which only needs to be considered for ${t>0}$ ) is similar and is left as an exercise.

Write ${dB_t := B_{t+dt} - B_t}$ . From Taylor expansion and the bounded third derivative of ${F}$ , we have

$\displaystyle F(B_{t+dt}) = F(B_t) + F_x(B_t) dB_t + \frac{1}{2} F_{xx}(B_t) |dB_t|^2 + O( |dB_t|^3 ). \ \ \ \ \ (1)$

We take expectations. Since ${dB_t \equiv N(0,dt)_{\bf R}}$ , we have ${{\bf E} |dB_t|^3 = O( (dt)^{3/2} )}$ , so in particular

$\displaystyle {\bf E} F(B_{t+dt}) = {\bf E} F(B_t) + {\bf E} F_x(B_t) dB_t + \frac{1}{2} {\bf E} F_{xx}(B_t) |dB_t|^2 + o(dt).$

Now observe that ${dB_t}$ is independent of ${B_t}$ , and has mean zero and variance ${dt}$ . The claim follows. $\Box$

Exercise 14 Complete the proof of the lemma by considering negative values of ${dt}$ . (Hint: one has to exercise caution because ${dB_t}$ is not independent of ${B_t}$ in this case. However, it will be independent of ${B_{t+dt}}$ . Also, use the fact that ${{\bf E} F_x(B_t)}$ and ${{\bf E} F_{xx}(B_t)}$ are continuous in ${t}$ . Alternatively, one can deduce the formula for the left-derivative from that of the right-derivative via a careful application of the fundamental theorem of calculus, paying close attention to the hypotheses of that theorem.)

Remark 15 In the language of Ito calculus, we can write (1) as

$\displaystyle d F(B_t) = F_x(B_t) dB_t + \frac{1}{2} F_{xx}(B_t) dt. \ \ \ \ \ (2)$

Here, ${dF(B_t) := F(B_{t+dt}) - F(B_t)}$ , and ${dt}$ should either be thought of as being infinitesimal, or being very small, though in the latter case the equation (2) should not be viewed as being exact, but instead only being true up to errors of mean ${o(dt)}$ and third moment ${O(dt^3)}$ . This is a special case of Ito’s formula. It should be compared against the chain rule

$\displaystyle dF(X_t) = F_x(X_t) dX_t$

when ${t \mapsto X_t}$ is a smooth process. The non-smooth nature of Brownian motion causes the quadratic term in the Taylor expansion to be non-negligible, which explains the additional term in (2), although the Hölder continuity of this motion is sufficient to still be able to ignore terms that are of cubic order or higher. (In this spirit, one can summarise (the differential side of) Ito calculus informally by the heuristic equations ${dB_t = O( (dt)^{1/2} )}$ and ${|dB_t|^2 = dt}$ , with the understanding that all terms that are ${o(dt)}$ are discarded.)

Let ${\rho(t,x)\ dx}$ be the probability density function of ${B_t}$ ; by inspection of the normal distribution, this is a smooth function for ${t>0}$ , but is a Dirac mass at ${\mu}$ at time ${t=0}$ . By definition of density function,

$\displaystyle {\bf E} F(B_t) = \int_{\bf R} F(x) \rho(t,x)\ dx$

for any Schwartz function ${F}$ . Applying Lemma 13 and integrating by parts, we see that

$\displaystyle \partial_t \rho = \frac{1}{2} \partial_{xx} \rho \ \ \ \ \ (3)$

in the sense of (tempered) distributions (see e.g. my earlier notes on this topic). In other words, ${\rho}$ is a (tempered distributional) solution to the heat equation (3). Indeed, since ${\rho}$ is the Dirac mass at ${\mu}$ at time ${t=0}$ , ${\rho}$ for later times ${t}$ is the fundamental solution of that equation from initial position ${\mu}$ .

From the theory of PDE (e.g. from Fourier analysis, see Exercise 38 of these notes) one can solve the (distributional) heat equation with this initial data to obtain the unique solution

$\displaystyle \rho(t,x) = \frac{1}{\sqrt{2\pi t}} e^{-|x-\mu|^2/2t}.$

Of course, this is also the density function of ${N(\mu,t)_{\bf R}}$ , which is (unsurprisingly) consistent with the fact that ${B_t \equiv N(\mu,t)}$ . Thus we see why the normal distribution of the central limit theorem involves the same type of functions (i.e. gaussians) as the fundamental solution of the heat equation. Indeed, one can use this argument to heuristically derive the central limit theorem from the fundamental solution of the heat equation (cf. Section 7 of Notes 2), although the derivation is only heuristic because one first needs to know that some limiting distribution already exists (in the spirit of Exercise 12).

Remark 16 Because we considered a Wiener process with a deterministic initial position ${\mu}$ , the density function ${\rho}$ was a Dirac mass at time ${t=0}$ . However, one can run exactly the same arguments for Wiener processes with stochastic initial position (see Remark 9), and one will still obtain the same heat equation (9), but now with a more general initial condition.

We have related one-dimensional Brownian motion to the one-dimensional heat equation, but there is no difficulty establishing a similar relationship in higher dimensions. In a vector space ${{\bf R}^n}$ , define a (continuous) Wiener process ${(X_t)_{t \in [0,+\infty)}}$ in ${{\bf R}^n}$ with an initial position ${\mu = (\mu_1,\ldots,\mu_n) \in {\bf R}^n}$ to be a process whose components ${(X_{t,i})_{t \in [0,+\infty)}}$ for ${i=1,\ldots,n}$ are independent Wiener processes with initial position ${\mu_i}$ . It is easy to see that such processes exist, are unique in distribution, and obey the same sort of properties as in Definition 1, but with the one-dimensional gaussian distribution ${N(\mu,\sigma^2)_{\bf R}}$ replaced by the ${n}$ -dimensional analogue ${N(\mu,\sigma^2 I)_{{\bf R}^n}}$ , which is given by the density function

$\displaystyle \frac{1}{(2\pi \sigma)^{n/2}} e^{-|x-\mu|^2/\sigma^2}\ dx$

where ${dx}$ is now Lebesgue measure on ${{\bf R}^n}$ .

Exercise 17 If ${(B_t)_{t \in [0,+\infty)}}$ is an ${n}$ -dimensional continuous Wiener process, show that

$\displaystyle \frac{d}{dt} {\bf E} F(B_t) = \frac{1}{2} {\bf E} (\Delta F)(B_t)$

whenever ${F: {\bf R}^n \rightarrow {\bf R}}$ is smooth with all derivatives bounded, where

$\displaystyle \Delta F := \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2} F$

is the Laplacian of ${F}$ . Conclude in particular that the density function ${\rho(t,x)\ dx}$ of ${B_t}$ obeys the (distributional) heat equation

$\displaystyle \partial_t \rho = \frac{1}{2} \Delta \rho.$

A simple but fundamental observation is that ${n}$ -dimensional Brownian motion is rotation-invariant: more precisely, if ${(X_t)_{t \in [0,+\infty)}}$ is an ${n}$ -dimensional Wiener process with initial position ${0}$ , and ${U \in O(n)}$ is any orthogonal transformation on ${{\bf R}^n}$ , then ${(UX_t)_{t \in [0,+\infty)}}$ is another Wiener process with initial position ${0}$ , and thus has the same distribution:

$\displaystyle (UX_t)_{t \in [0,+\infty)} \equiv (X_t)_{t \in [0,+\infty)}. \ \ \ \ \ (4)$

This is ultimately because the ${n}$ -dimensional normal distributions ${N(0,\sigma^2 I)_{{\bf R}^n}}$ are manifestly rotation-invariant (see Exercise 10 of Notes 1).

Remark 18 One can also relate variable-coefficient heat equations to variable-coefficient Brownian motion ${(X_t)_{t \in [0,+\infty)}}$ , in which the variance of an increment ${dX_t}$ is now only proportional to ${dt}$ for infinitesimal ${dt}$ rather than being equal to ${dt}$ , with the constant of proportionality allowed to depend on the time ${t}$ and on the position ${X_t}$ . One can also add drift terms by allowing the increment ${dX_t}$ to have a non-zero mean (which is also proportional to ${dt}$ ). This can be accomplished through the machinery of stochastic calculus, which we will not discuss in detail in these notes. In a similar fashion, one can construct Brownian motion (and heat equations) on manifolds or on domains with boundary, though we will not discuss this topic here.

Exercise 19 Let ${X_1}$ be a real random variable of mean zero and variance ${1}$ . Define a stochastic process ${(X_t)_{t \in [0,+\infty)}}$ by the formula

$\displaystyle X_t := e^{-t} ( X_1 + B_{e^{2t}-1} )$

where ${(B_t)_{t \in [0,+\infty)}}$ is a Wiener process with initial position zero that is independent of ${X_1}$ . This process is known as an Ornstein-Uhlenbeck process.

Show that each ${X_t}$ has mean zero and variance ${1}$ .

Show that ${X_t}$ converges in distribution to ${N(0,1)_{\bf R}}$ as ${t \rightarrow \infty}$ .

If ${F: {\bf R} \rightarrow {\bf R}}$ is smooth with all derivatives bounded, show that
$\displaystyle \frac{d}{dt} {\bf E} F(X_t) = {\bf E} LF(X_t)$

where ${L}$ is the Ornstein-Uhlenbeck operator

$\displaystyle LF := F_{xx} - x F_x.$

Conclude that the density function ${\rho(t,x)}$ of ${X_t}$ obeys (in a distributional sense, at least) the Ornstein-Uhlenbeck equation

$\displaystyle \partial_t \rho = L^* \rho$

where the adjoint operator ${L^*}$ is given by

$\displaystyle L^* \rho := \rho_{xx} + \partial_x(x \rho).$

Show that the only probability density function ${\rho}$ for which ${L^*\rho=0}$ is the Gaussian ${\frac{1}{\sqrt{2\pi}} e^{-x^2/2}\ dx}$ ; furthermore, show that ${\hbox{Re} \langle \rho, L^* \rho \rangle_{L^2({\bf R})} \leq 0}$ for all probability density functions ${\rho}$ in the Schwartz space with mean zero and variance ${1}$ . Discuss how this fact relates to the preceding two parts of this exercise.

Remark 20 The heat kernel ${\frac{1}{(\sqrt{2\pi t})^d} e^{-|x-\mu|^2/2t}}$ in ${d}$ dimensions is absolutely integrable in time away from the initial time ${t=0}$ for dimensions ${d \geq 3}$ , but becomes divergent in dimension ${1}$ and (just barely) divergent for ${d=2}$ . This causes the qualitative behaviour of Brownian motion ${B_t}$ in ${{\bf R}^d}$ to be rather different in the two regimes. For instance, in dimensions ${d \geq 3}$ Brownian motion is transient; almost surely one has ${B_t \rightarrow \infty}$ as ${t \rightarrow \infty}$ . But in dimension ${d=1}$ Brownian motion is recurrent: for each ${x_0 \in {\bf R}}$ , one almost surely has ${B_t=x_0}$ for infinitely many ${t}$ . In the critical dimension ${d=2}$ , Brownian motion turns out to not be recurrent, but is instead neighbourhood recurrent: almost surely, ${B_t}$ revisits every neighbourhood of ${x_0}$ at arbitrarily large times, but does not visit ${x_0}$ itself for any positive time ${t}$ . The study of Brownian motion and its relatives is in fact a huge and active area of study in modern probability theory, but will not be discussed in this course.

— 4. Dyson Brownian motion —

The space ${V}$ of ${n \times n}$ Hermitian matrices can be viewed as a real vector space of dimension ${n^2}$ using the Frobenius norm

$\displaystyle A \mapsto \hbox{tr}(A^2)^{1/2} = \sum_{i=1}^n a_{ii}^2 + 2 \sum_{1 \leq i < j \leq n} \hbox{Re}(a_{ij})^2 + \hbox{Im}(a_{ij})^2$

where ${a_{ij}}$ are the coefficients of ${A}$ . One can then identify ${V}$ explicitly with ${{\bf R}^{n^2}}$ via the identification

$\displaystyle (a_{ij})_{1 \leq i,j \leq n} \equiv ( (a_{ii})_{i=1}^n, (\sqrt{2} \hbox{Re}(a_{ij}), \sqrt{2} \hbox{Im}(a_{ij}))_{1 \leq i<j \leq n} ).$

Now that one has this indentification, for each Hermitian matrix ${A_0 \in V}$ (deterministic or stochastic) we can define a Wiener process ${(A_t)_{t \in [0,+\infty)}}$ on ${V}$ with initial position ${A_0}$ . By construction, we see that ${t \mapsto A_t}$ is almost surely continuous, and each increment ${A_{t_+} - A_{t_-}}$ is equal to ${(t_+-t_-)^{1/2}}$ times a matrix drawn from the gaussian unitary ensemble (GUE), with disjoint increments being jointly independent. In particular, the diagonal entries of ${A_{t_+}-A_{t_-}}$ have distribution ${N(0,t_+-t_-)_{\bf R}}$ , and the off-diagonal entries have distribution ${N(0,t_+-t_-)_{\bf C}}$ .

Given any Hermitian matrix ${A}$ , one can form the spectrum ${(\lambda_1(A),\ldots,\lambda_n(A))}$ , which lies in the Weyl chamber ${{\bf R}^n_\geq := \{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \geq \ldots \geq \lambda_n \}}$ . Taking the spectrum of the Wiener process ${(A_t)_{t \in [0,+\infty)}}$ , we obtain a process

$\displaystyle (\lambda_1(A_t),\ldots,\lambda_n(A_t))_{t \in [0,+\infty)}$

in the Weyl cone. We abbreviate ${\lambda_i(A_t)}$ as ${\lambda_i}$ .

For ${t>0}$ , we see that ${A_t}$ is absolutely continuously distributed in ${V}$ . In particular, since almost every Hermitian matrix has simple spectrum, we see that ${A_t}$ has almost surely simple spectrum for ${t>0}$ . (The same is true for ${t=0}$ if we assume that ${A_0}$ also has an absolutely continuous distribution.)

The stochastic dynamics of this evolution can be described by Dyson Brownian motion:

Theorem 21 (Dyson Brownian motion) Let ${t > 0}$ , and let ${dt > 0}$ , and let ${\lambda_1,\ldots,\lambda_n}$ be as above. Then we have

$\displaystyle d\lambda_i = dB_i + \sum_{1 \leq j \leq n: j \neq i} \frac{dt}{\lambda_i - \lambda_j} + \ldots \ \ \ \ \ (5)$

for all ${1 \leq i \leq n}$ , where ${d\lambda_i := \lambda_i(A_{t+dt}) - \lambda_i(A_t)}$ , and ${dB_1,\ldots,dB_n}$ are iid copies of ${N(0,dt)_{\bf R}}$ which are jointly independent of ${(A_{t'})_{t' \in [0,t]}}$ , and the error term ${\ldots}$ is the sum of two terms, one of which has ${L^1}$ norm ${o(dt)}$ , and the other has mean zero and second moment ${O(dt)}$ in the limit ${dt \rightarrow 0}$ (holding ${t}$ and ${n}$ fixed).

Using the language of Ito calculus, one usually views ${dt}$ as infinitesimal and drops the ${\ldots}$ error, thus giving the elegant formula

$\displaystyle d\lambda_i = dB_i + \sum_{1 \leq j \leq n: j \neq i} \frac{dt}{\lambda_i - \lambda_j}$

that shows that the eigenvalues ${\lambda_i}$ evolve by Brownian motion, combined with a deterministic repulsion force that repels nearby eigenvalues from each other with a strength inversely proportional to the separation. One can extend the theorem to the ${t=0}$ case by a limiting argument provided that ${A_0}$ has an absolutely continuous distribution. Note that the decay rate of the error ${\ldots}$ can depend on ${n}$ , so it is not safe to let ${n}$ go off to infinity while holding ${dt}$ fixed. However, it is safe to let ${dt}$ go to zero first, and then send ${n}$ off to infinity. (It is also possible, by being more explicit with the error terms, to work with ${dt}$ being a specific negative power of ${n}$ . We will see this sort of analysis later in this course.)

Proof: Fix ${t}$ . We can write ${A_{t+dt} = A_t + (dt)^{1/2} G}$ , where ${G}$ is independent of ${A_t}$ and has the GUE distribution. (Strictly speaking, ${G}$ depends on ${dt}$ , but this dependence will not concern us.) We now condition ${A_t}$ to be fixed, and establish (5) for almost every fixed choice of ${A_t}$ ; the general claim then follows upon undoing the conditioning (and applying the dominated convergence theorem). Due to independence, observe that ${G}$ continues to have the GUE distribution even after conditioning ${A_t}$ to be fixed.

Almost surely, ${A_t}$ has simple spectrum; so we may assume that the fixed choice of ${A_t}$ has simple spectrum also. Actually, we can say more: since the Wiener random matrix ${A_t}$ has a smooth distribution in the space ${V}$ of Hermitian matrices, while the space ${S}$ of matrices in ${V}$ with non-simple spectrum has codimension ${3}$ by Exercise 10 of Notes 3a, we see that for any ${\delta>0}$ the probability that ${A_t}$ lies within ${\delta}$ of ${S}$ is ${O(\delta^3)}$ , where we allow implied constants to depend on ${n}$ . In particular, if we let ${\Delta := \min_{i \neq j} |\lambda_i - \lambda_j|}$ denote the minimal eigenvalue gap, we have

$\displaystyle \mathop{\bf P}( \Delta \leq \delta ) = O(\delta^3)$

which by dyadic decomposition implies the finite negative second moment

$\displaystyle \mathop{\bf E} \Delta^{-2} = O(1). \ \ \ \ \ (6)$

Let ${\nabla_G \lambda_i}$ denote the derivative of the eigenvalue ${\lambda_i(A_t)}$ in the ${G}$ direction. From the first and second Hadamard variation formulae (see Section 4 of Notes 3a) we have

$\displaystyle \nabla_G \lambda_i = u_i^* G u_i = O( \|G\| )$

and

$\displaystyle \nabla^2_G \lambda_i = 2 \sum_{i \neq j} \frac{|u_j^* G u_i|^2}{\lambda_i - \lambda_j} = O( \|G\|^2 \Delta^{-1} )$

where ${u_1,\ldots,u_n}$ are an orthonormal eigenbasis for ${A_t}$ , and also ${\nabla_G u_i = O( \|G\| \Delta^{-1})}$ . A further differentiation then yields

$\displaystyle \nabla^3_G \lambda_i = O( \|G\|^3 \Delta^{-2} ).$

(One can obtain a more exact third Hadamard variation formula if desired, but it is messy and will not be needed here.) A Taylor expansion now gives the bound

$\displaystyle \lambda_i(A_{t+dt}) = \lambda_i + (dt)^{1/2} \nabla_G \lambda_i + \frac{1}{2} dt \nabla_G^2 \lambda_i + O( (dt)^{3/2} \|G\|^3 \Delta^{-2} ) \ \ \ \ \ (7)$

provided that one is in the regime

$\displaystyle dt^{1/2} \|G\| \leq \frac{1}{2} \Delta$

(in order to keep the eigenvalue gap at least ${\Delta/2}$ as one travels from ${A_t}$ to ${A_{t+dt} = A_t + dt^{1/2} G}$ ). In the opposite regime

$\displaystyle dt^{1/2} \|G\| > \frac{1}{2} \Delta$

we can instead use the Weyl inequalities to bound ${\lambda_i(A_{t+dt}) = \lambda_i(A_t) + O( dt^{1/2} \|G\| ) = \lambda_i(A_t) + O( (dt)^{3/2} \|G\|^3 \Delta^{-2} )}$ ; the other two terms ${(dt)^{1/2} \nabla_G \lambda_i, \frac{1}{2} dt \nabla_G^2 \lambda_i}$ in the Taylor expansion are also ${O( (dt)^{3/2} \|G\|^3 \Delta^{-2} )}$ in this case. Thus (7) holds in all cases.

Now we take advantage of the unitary invariance of the Gaussian unitary ensemble (that is, that ${UGU^* \equiv G}$ for all unitary matrices ${G}$ ; this is easiest to see by noting that the probability density function of ${G}$ is proportional to ${\exp( - \|G\|_F^2 / 2 )}$ ). From this invariance, we can assume without loss of generality that ${u_1,\ldots,u_n}$ is the standard orthonormal basis of ${{\bf C}^n}$ , so that we now have

$\displaystyle d\lambda_i = (dt)^{1/2} \xi_{ii} + dt \sum_{i \neq j} \frac{|\xi_{ij}|^2}{\lambda_i - \lambda_j} + O( (dt)^{3/2} \|G\|^3 \Delta^{-2} )$

where ${\xi_{ij}}$ are the coefficients of ${G}$ . But the ${\xi_{ii}}$ are iid copies of ${N(0,1)_{\bf R}}$ , so we will be done as soon as we show that the terms

$\displaystyle dt \sum_{i \neq j} \frac{|\xi_{ij}|^2-1}{\lambda_i - \lambda_j} + O( (dt)^{3/2} \|G\|^3 \Delta^{-2} )$

are acceptable error terms. But the first term has mean zero and is bounded by ${O( dt \Delta^{-1})}$ and thus has second moment ${O(dt^2)}$ by (6), while the second term has ${L^1}$ norm of ${O((dt)^{3/2}) = o(dt)}$ , again by (6). $\Box$

Remark 22 Interestingly, one can interpret Dyson Brownian motion in a different way, namely as the motion of ${n}$ independent Wiener processes ${\lambda_i(t)}$ after one conditions the ${\lambda_i}$ to be non-intersecting for all time; see this paper of Grabiner. It is intuitively reasonable that this conditioning would cause a repulsion effect, though I do not know of a simple heuristic reason why this conditioning should end up giving the specific repulsion force present in (5).

In the previous section, we saw how a Wiener process led to a PDE (the heat flow equation) that could be used to derive the probability density function for each component ${X_t}$ of that process. We can do the same thing here:

Exercise 23 Let ${\lambda_1,\ldots,\lambda_n}$ be as above. Let ${F: {\bf R}^n \rightarrow {\bf R}}$ be a smooth function with bounded derivatives. Show that for any ${t \geq 0}$ , one has

$\displaystyle \partial_t {\bf E} F(\lambda_1,\ldots,\lambda_n) = {\bf E} D^* F(\lambda_1,\ldots,\lambda_n)$

where ${D^*}$ is the adjoint Dyson operator

$\displaystyle D^* F := \frac{1}{2} \sum_{i=1}^n \partial_{\lambda_i}^2 F + \sum_{1 \leq i,j \leq n: i \neq j} \frac{ \partial_{\lambda_i} F }{\lambda_i - \lambda_j }.$

If we let ${\rho: [0,+\infty) \times {\bf R}^n_{\geq} \rightarrow {\bf R}}$ denote the density function ${\rho(t,\cdot): {\bf R}^n_{\geq} \rightarrow {\bf R}}$ of ${(\lambda_1(t),\ldots,\lambda_n(t))}$ at time ${t \in [0,+\infty)}$ , deduce the Dyson partial differential equation

$\displaystyle \partial_t \rho = D \rho \ \ \ \ \ (8)$

(in the sense of distributions, at least, and on the interior of ${{\bf R}^n_\geq}$ ), where ${D}$ is the Dyson operator

$\displaystyle D \rho := \frac{1}{2} \sum_{i=1}^n \partial_{\lambda_i}^2 \rho - \sum_{1 \leq i,j \leq n: i \neq j} \partial_{\lambda_i} ( \frac{\rho}{\lambda_i - \lambda_j} ). \ \ \ \ \ (9)$

The Dyson partial differential equation (8) looks a bit complicated, but it can be simplified (formally, at least) by introducing the Vandermonde determinant

$\displaystyle \Delta_n(\lambda_1,\ldots,\lambda_n) := \prod_{1 \leq i < j \leq n} (\lambda_i - \lambda_j). \ \ \ \ \ (10)$

Exercise 24 Show that (10) is the determinant of the matrix ${(\lambda_i^{j-1})_{1 \leq i,j \leq n}}$ , and is also the sum ${\sum_{\sigma \in S_n} \hbox{sgn}(\sigma) \prod_{i=1}^n \lambda_{\sigma(i)}^{i-1}}$ .

Note that this determinant is non-zero on the interior of the Weyl chamber ${{\bf R}^n_\geq}$ . The significance of this determinant for us lies in the identity

$\displaystyle \partial_{\lambda_i} \Delta_n = \sum_{1 \leq j \leq n: i \neq j} \frac{\Delta_n}{\lambda_i - \lambda_j} \ \ \ \ \ (11)$

which can be used to cancel off the second term in (9). Indeed, we have

Exercise 25 Let ${\rho}$ be a smooth solution to (8) in the interior of ${{\bf R}^n_\geq}$ , and write

$\displaystyle \rho = \Delta_n u \ \ \ \ \ (12)$

in this interior. Show that ${u}$ obeys the linear heat equation

$\displaystyle \partial_t u = \frac{1}{2} \sum_{i=1}^n \partial_{\lambda_i}^2 u$

in the interior of ${{\bf R}^n_\geq}$ . (Hint: You may need to exploit the identity ${\frac{1}{(a-b)(a-c)} + \frac{1}{(b-a)(b-c)} + \frac{1}{(c-a)(c-b)} = 0}$ for distinct ${a,b,c}$ . Equivalently, you may need to first establish that the Vandermonde determinant is a harmonic function.)

Let ${\rho}$ be the density function of the ${(\lambda_1,\ldots,\lambda_n)}$ , as in (23). As previously remarked, the Wiener random matrix ${A_t}$ has a smooth distribution in the space ${V}$ of Hermitian matrices, while the space of matrices in ${V}$ with non-simple spectrum has codimension ${3}$ by Exercise 10 of Notes 3a. On the other hand, the non-simple spectrum only has codimension ${1}$ in the Weyl chamber (being the boundary of this cone). Because of this, we see that ${\rho}$ vanishes to at least second order on the boundary of this cone (with correspondingly higher vanishing on higher codimension facets of this boundary). Thus, the function ${u}$ in Exercise 25 vanishes to first order on this boundary (again with correspondingly higher vanishing on higher codimension facets). Thus, if we extend ${\rho}$ symmetrically across the cone to all of ${{\bf R}^n}$ , and extend the function ${u}$ antisymmetrically, then the equation (8) and the factorisation (12) extend (in the distributional sense) to all of ${{\bf R}^n}$ . Extending (25) to this domain (and being somewhat careful with various issues involving distributions), we now see that ${u}$ obeys the linear heat equation on all of ${{\bf R}^n}$ .

Now suppose that the initial matrix ${A_0}$ had a deterministic spectrum ${\nu = (\nu_1,\ldots,\nu_n)}$ , which to avoid technicalities we will assume to be in the interior of the Weyl chamber (the boundary case then being obtainable by a limiting argument). Then ${\rho}$ is initially the Dirac delta function at ${\nu}$ , extended symmetrically. Hence, ${u}$ is initially ${\frac{1}{\Delta_n(\nu)}}$ times the Dirac delta function at ${\nu}$ , extended antisymmetrically:

$\displaystyle u(0,\lambda) = \frac{1}{\Delta_n(\nu)} \sum_{\sigma \in S_n} \hbox{sgn}(\sigma) \delta_{\lambda - \sigma(\nu)}.$

Using the fundamental solution for the heat equation in ${n}$ dimensions, we conclude that

$\displaystyle u(t,\lambda) = \frac{1}{(2\pi t)^{n/2}} \sum_{\sigma \in S_n} \hbox{sgn}(\sigma) e^{-|\lambda - \sigma(\nu)|^2/2t}.$

By the Leibniz formula for determinants, we can express the sum here as a determinant of the matrix

$\displaystyle (e^{-(\lambda_i - \nu_j)^2 / 2t})_{1 \leq i,j \leq n}.$

Applying (12), we conclude

Theorem 26 (Johansson formula) Let ${A_0}$ be a Hermitian matrix with simple spectrum ${\nu = (\nu_1,\ldots,\nu_n)}$ , let ${t > 0}$ , and let ${A_t = A_0 + t^{1/2} G}$ where ${G}$ is drawn from GUE. Then the spectrum ${\lambda = (\lambda_1,\ldots,\lambda_n)}$ of ${A_t}$ has probability density function

$\displaystyle \rho(t,\lambda) = \frac{1}{(2\pi t)^{n/2}} \frac{\Delta_n(\lambda)}{\Delta_n(\nu)} \det(e^{-(\lambda_i - \nu_j)^2 / 2t})_{1 \leq i,j \leq n} \ \ \ \ \ (13)$

on ${{\bf R}^n_\geq}$ .

This formula is given explicitly in this paper of Johansson, who cites this paper of Brézin and Hikami as inspiration. This formula can also be proven by a variety of other means, for instance via the Harish-Chandra formula. (One can also check by hand that (13) satisfies the Dyson equation (8).)

We will be particularly interested in the case when ${A_0=0}$ and ${t=1}$ , so that we are studying the probability density function of the eigenvalues ${(\lambda_1(G),\ldots,\lambda_n(G))}$ of a GUE matrix ${G}$ . The Johansson formula does not directly apply here, because ${\nu}$ is vanishing. However, we can investigate the limit of (13) in the limit as ${\nu \rightarrow 0}$ inside the Weyl chamber; the Lipschitz nature of the eigenvalue operations ${A \mapsto \lambda_i(A)}$ (from the Weyl inequalities) tell us that if (13) converges locally uniformly as ${\nu \rightarrow 0}$ for ${\lambda}$ in the interior of ${{\bf R}^n_\geq}$ , then the limit will indeed be the probability density function for ${\nu=0}$ . (Note from continuity that the density function cannot assign any mass to the boundary of the Weyl chamber, and in fact must vanish to at least second order by the previous discussion.)

Exercise 27 Show that as ${\nu \rightarrow 0}$ , we have the identities

$\displaystyle \det(e^{-(\lambda_i - \nu_j)^2 / 2})_{1 \leq i,j \leq n} = e^{-|\lambda|^2/2} e^{-|\nu|^2/2} \det(e^{\lambda_i \nu_j})_{1 \leq i,j \leq n}$

and

$\displaystyle \det(e^{\lambda_i \nu_j})_{1 \leq i,j \leq n} = \frac{1}{1! \ldots (n-1)!} \Delta_n(\lambda) \Delta_n(\nu) + o( \Delta_n(\nu) )$

locally uniformly in ${\lambda}$ . (Hint: for the second identity, use Taylor expansion and the Leibniz formula for determinants, noting the left-hand side vanishes whenever ${\Delta_n(\nu)}$ vanishes and so can be treated by the (smooth) factor theorem.)

From the above exercise, we conclude the fundamental Ginibre formula

$\displaystyle \rho(\lambda) = \frac{1}{(2\pi)^{n/2} 1! \ldots (n-1)!} e^{-|\lambda|^2/2} |\Delta_n(\lambda)|^2 \ \ \ \ \ (14)$

for the density function for the spectrum ${(\lambda_1(G),\ldots,\lambda_n(G))}$ of a GUE matrix ${G}$ .

This formula can be derived by a variety of other means; we sketch one such way below.

Exercise 28 For this exercise, assume that it is known that (14) is indeed a probability distribution on the Weyl chamber ${{\bf R}^n_\geq}$ (if not, one would have to replace the constant ${(2\pi)^{n/2}}$ by an unspecified normalisation factor depending only on ${n}$ ). Let ${D = \hbox{diag}( \lambda_1, \ldots, \lambda_n )}$ be drawn at random using the distribution (14), and let ${U}$ be drawn at random from Haar measure on ${U(n)}$ . Show that the probability density function of ${UDU^*}$ at a matrix ${A}$ with simple spectrum is equal to ${c_n e^{-\|A\|_F^2/2}}$ for some constant ${c_n>0}$ . (Hint: use unitary invariance to reduce to the case when ${A}$ is diagonal. Now take a small ${\varepsilon}$ and consider what ${U}$ and ${D}$ must be in order for ${UDU^*}$ to lie within ${\varepsilon}$ of ${A}$ in the Frobenius norm, performing first order calculations only (i.e. linearising and ignoring all terms of order ${o(\varepsilon)}$ ).)

Conclude that (14) must be the probability density function of the spectrum of a GUE matrix.

Exercise 29 Verify by hand that the self-similar extension

$\displaystyle \rho(t,x) := t^{-n/2} \rho(x/\sqrt{t})$

of the function (14) obeys the Dyson PDE (8). Why is this consistent with (14) being the density function for the spectrum of GUE?

Remark 30 Similar explicit formulae exist for other invariant ensembles, such as the gaussian orthogonal ensemble GOE and the gaussian symplectic ensemble GSE. One can also replace the exponent in density functions such as ${e^{-\|A\|_F^2/2}}$ with more general expressions than quadratic expressions of ${A}$ . We will however not detail these formulae in this course (with the exception of the spectral distribution law for random iid gaussian matrices, which we will discuss in a later set of notes).

55 comments

Comments feed for this article

18 January, 2010 at 7:02 pm

Joshua Batson

I think last display in the proof of prop. 3 should be $\leq$ not $\geq$ .

[Corrected, thanks – T.]

18 January, 2010 at 7:36 pm

Anonymous

Wonderful post. Everything looks different in your hands.
Thank you Prof. Tao.

19 January, 2010 at 8:32 am

Tim vB

Dear Terry,

marvelous lecture notes, thanks for making them public!

On a side note I would like to mention that it can be fun to code a discretization scheme for stochastic differential equations aka a simulation of a stochastic process, in order to solve the associated partial differential equation. It’s one thing to prove that it can be done, but it can be fun too to look at some specific examples and compare the results of the simulation with numerical or analytical results that you get from the partial differential equation.
That’s off topic with regard to your class of course, but maybe some of your students would like to takle that in their spare time, if they have any :-)
(I think that is still an active research topic, used e.g. for PDE with complicated domains/boundaries and boundary conditions, but I’ve been out of the field for years).
BTW: Do you use computer simulations in the class?

19 January, 2010 at 4:13 pm

David Speyer

Possible typo: In the paragraph beginning “Now we establish a Hölder continuity property. Let theta be any exponent between 0 and 2”, I think that 2 should be 1/2.

These are great notes; I am starting to understand what a random walk is.

19 January, 2010 at 5:43 pm

Anonymous

There are eleven “Formula does not parse” errors. I’m using Firefox 3.5.7. The first one is in the def of Wiener Process:

A (one-dimensional) Wiener process on {\Sigma} with initial position {\mu} is a collection {(X_t)_{t \in \Sigma}} of real random …

Here, {(X_t)_{t \in \Sigma}} does not parse.

19 January, 2010 at 7:22 pm

Jonathan Vos Post

I also get eleven “Formula does not parse” errors, using Firefox 3.5.7.

20 January, 2010 at 5:53 am

Tim vB

That does not seem to be a problem of the browser, I’m using Firefox 3.5.7 as well and all formulas are rendered without any problems, but I see them as embedded png graphics.

20 January, 2010 at 12:09 pm

Américo Tavares

A few hours ago I also got the same errors but now all formulas are OK. (I’m using IE).

25 January, 2010 at 7:45 pm

solrize

This post and and the post of a few days ago reviewing probability theory are great. It’s important material that I’ve been wanting to understand for quite a while. I’ve looked at some elementary books that were muddled and left out all the interesting stuff, and I’ve looked at some more advanced books that had way too much assumed background for me (ex-undergrad math major) to understand. These posts of yours are at a level that I think I can make my way through if I work at it, and which appear to demystify the subject. That’s hard to find. Thanks!!!!

28 January, 2010 at 11:30 am

Doug

Hi Terrence,

I have found a discussion of the LOUIS BACHELIER 1900 thesis “THEORIE DE LA SPECULATION” with Brownian motion applied to the stock market [Mathematical Finance, Vol.10, No.3 (July 2000), 341–353].

The appendix of this paper includes the faculty report by Appell, Poincare, J. Boussinesq.
This report is somewhat critical of the institution used, but praises the originality of this type of probability theory [p 349]:

“The manner in which M. Bachelier deduces Gauss’s law is very original, and all the more interesting in that his reasoning can be extended with a few changes to the theory of errors. … In fact, the author makes a comparison with the analytic theory of heat flow. A bit of thought shows that the analogy is real and the comparison is legitimate. The reasoning of Fourier, almost without change, is applicable to this problem so different from the one for which it was originally created. It is regrettable that M. Bachelier did not develop this part of his thesis further. He could have entered into the details of Fourier’s analysis. He did, however, say enough about it to justify Gauss’s law and to foresee cases where it would no longer hold.
Once Gauss’s law is established, one can easily deduce certain consequences susceptible to experimental verification. Such an example is the relation between the value of an option and the deviation from the underlying. One should not expect a very exact verification. The principle of the mathematical expectation holds in the sense that, if it were violated, there would always be people who would act so as to re-establish it and they would eventually notice this. But they would only notice it if the deviations were considerable. The verification, then, can only be gross. The author of the thesis gives statistics where this happens in a very satisfactory manner.
…
In summary, we are of the opinion that there is reason to authorize M. Bachelier tohave his thesis printed and to submit it.”

1 February, 2010 at 4:53 pm

Anonymous

Dear Prof. Tao, there seem to be two typos
between Exercise 4 and Remark 8: “dmiensional”, “ebcause”.

[Corrected, thanks – T.]

2 February, 2010 at 1:34 pm

254A, Notes 4: The semi-circular law « What’s new

[…] transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes […]

18 February, 2010 at 11:04 am

Ben

Hi Terry,
I think there’s a link problem. These notes, Notes 3b: Brownian motion and Dyson Brownian motion seem to be the the same as the notes on eigenvalues, Notes 3b (and don’t mention Dyson’s BM at all). I was able to access the notes yesterday but not today. I’m not sure if this problem is specific to me.

[Whoops, this was caused by a bad edit on my part. Corrected, thanks. -T.]

23 February, 2010 at 12:24 am

Mio

In fractal construction, in order for $X_{t+1/2}$ to have variance $t+1/2$ , shouldn’t $Y_{t,0}$ have variance 1/2 and not 1/4 (same for $Y_{t,1}$ below, 1/4 instead of 1/8)?

23 February, 2010 at 8:03 am

Terence Tao

Well, $(X_t+X_{t+1})/2 = X_t + (X_{t+1}-X_t)/2$ already has a variance of $t+1/4$ , so one only needs an additional variance of $1/4$ from $Y_t$ to balance it.

23 February, 2010 at 8:14 am

Mio

Oops, sorry. Thanks for a great lecture.

23 February, 2010 at 8:15 am

Ben

$\sigma \cdot N(0,1) = N(0, \sigma^2)$ . So eg $X_{t+1} - X_{t+\frac{1}{2}}$ has the right distribution.

23 February, 2010 at 3:43 pm

Alex Bloemendal

Great lecture. The proof of the Ginibre formula (12) for the eigenvalue density of a GUE matrix given in Dyson’s original paper is perhaps even simpler than the one you present. Dyson adds the Ornstein-Uhlenbeck restoring term to the matrix entry processes, which just adds the same term to the induced eigenvalue processes. The matrix evolution then has GUE as its unique stationary distribution; on the other hand, we can obtain the corresponding stationary distribution for the eigenvalues as the unique probability density that is a stationary solution of the resulting modified Dyson PDE (6), which is easily seen to be (12).

I do realize this comment boils down to Exercise 11, but thought it was worth mentioning anyway — because the argument is physically compelling, and because you did introduce the Ornstein-Uhlenbeck process.

By the way, while Exercise 5 is internally consistent, your definition of Ornstein-Uhlenbeck is off from the usual one by a time-change — it has the wrong quadratic variation! I guess this is the source of the extra $1/t$ factor that appears in the definition of the generator $L$ (which, incidentally, is not self-adjoint!).

23 February, 2010 at 10:50 pm

Terence Tao

Thanks for the corrections and comments! It’s true that the Ornstein-Uhlenbeck approach is slightly more efficient to get the Ginibre formula, but I like the heat equation approach as it also gives the Johansson formula quite easily. Though of course, as you say, the two approaches are simply rescalings of each other, so the differences are minor.

23 February, 2010 at 5:20 pm

Alex Bloemendal

A comment regarding Remark 9:

Two-dimensional Brownian motion $B_t$ almost surely never hits a given point $x_0$ after time 0. My favourite proof uses the interpretation of certain harmonic functions as hitting probabilities, together with the form of the Green’s function of the 2-d Laplacian, to establish the following fact. Surround $x_0$ with concentric circles $C_n$ of radii $2^n$ for all $n\in{\mathbb Z}$ ; then the sequence of indices of the successive circles hit by $B_t$ (ignoring immediate repetition) forms a simple random walk on ${\mathbb Z}$ ! In particular, $B_t$ must return to $C_0$ at arbitrarily large times, in between any two of which it can visit only finitely many $C_n$ ‘s. (This argument actually establishes neighbourhood recurrence as well, since $B_t$ must also hit each small circle at arbitrarily large times.)

Like everyone here, I seriously appreciate your wonderful notes and all the time you spend interacting with your readers.

23 February, 2010 at 10:02 pm

254A, Notes 6: Gaussian ensembles « What’s new

[…] have already shown using Dyson Brownian motion in Notes 3b that we have the Ginibre […]

5 March, 2010 at 1:47 pm

254A, Notes 7: The least singular value « What’s new

[…] so we expect each to have magnitude about . This, together with the Hoeffman-Wielandt inequality (Notes 3b) means that we expect to differ by from . In principle, this gives us asymptotic universality on […]

12 March, 2010 at 6:37 pm

Anonymous

Dear Prof. Tao,

Can we define Brownian motion on an infinite dimensional vector space?

a second question :we know that $dB_t=B_{t+dt}-B_t$ has mean zero and variance is $dt.$ therefore $\frac{dB_t}{(dt)^{1/2}}$ has variance 1. but why do we have $\frac{dB_t}{(dt)^{1/2}}$ be of $O(1)$ ?

thanks

14 March, 2010 at 12:17 am

Terence Tao

Well, there are no non-trivial unitarily invariant probability distributions on a Hilbert space (each coordinate would almost surely be zero), so one has to give up either isotropy or finite norm in order to have a meaningful infinite-dimensional Brownian motion. Of course, once one gives up either, one can certainly define Brownian-like motions (e.g. by picking an orthonormal basis and placing an independent weighted copy of Brownian motion in each coordinate).

$dB_t/(dt)^{1/2}$ has the standard normal distribution; it is of size O(1) on the average (and all moments are bounded), but will occasionally be large, and so is not a bounded random variable per se. Nevertheless it is a useful heuristic to think of this quantity as being of size O(1).

13 March, 2010 at 9:42 pm

student

Dear Prof. Tao,

We can consider Brownian motion as a random variable from a probability space to $C([0,T]\rightarrow {\mathbb R})$

Do we have a Markov process which can be considered as a random variable from a probability space to $C^{\infty}([0,T]\rightarrow {\mathbb R})$

thanks

14 March, 2010 at 12:22 am

Terence Tao

It depends a little on how one defines the “present state” of the system, but using a naive notion of the Markov property, the right-derivative would need to be independent of the left-derivative at time t when the position at t is fixed. But for a $C^\infty$ function, these two derivatives are equal, and thus must be determined completely by the position at t. This suggests to me (from the Picard uniqueness theorem) that the only processes of this type are the deterministic ones, though there is the loophole that the derivative is not required to depend in a Lipschitz manner on the position.

14 March, 2010 at 6:22 am

beginner

but the same argument should apply the continuous case as well. on the other hand we know that Brownian motion is a Markov process and has continuous paths. Maybe I am missing a point.

thanks

14 March, 2010 at 10:15 am

Terence Tao

Continuous paths need not have left or right derivatives.

30 January, 2011 at 1:23 pm

Carl Mueller

Usually the sigma-fields are assumed to be right continuous, meaning that the current state contains information about the infinitesimal future. So the right hand derivative (if it existed) would be part of the current state.

4 April, 2010 at 7:57 am

Dyson’s Brownian Motion « Rochester Probability Blog

[…] Dyson’s Brownian Motion Filed under: Uncategorized — carl0mueller @ 3:57 pm Tags: background See also: Tao’s blog on Dyson’s Brownian motion […]

27 January, 2011 at 2:15 pm

Minyu

Hi, Professor Tao:

There is a constant missing in the Ginibre formula and Exercise 9. It’s the reciprocal of a product of some factorials…

[Corrected, thanks – T.]

30 January, 2011 at 12:52 pm

yucao

Is it possible to extend the Brownian motion $(W_t)_{t\in[0,\infity)}$ on the whole ${\Bbb R}$ , i.e. $(W_t)_{t\in(-\infty,\infity)}$ ?

15 July, 2011 at 9:59 am

Dyson Brownian Motion | Research Notebook

[…] main source for the material in this post is Terry Tao‘s set of lecture notes on Random Matrix Theory, though I also used Mehta (2004) and Anderson, Guinnet and Zeitouni (2009) as […]

21 August, 2011 at 9:36 am

On Understanding Probability Puzzles | Nair Research Notes

[…] by , from an experimental point of view. The formal construction for the Brownian motion can be found here for example. And you will get a good historical perspective of the Brownian motion here. One of the […]

18 November, 2011 at 2:38 am

Diffusion in Ehrenfest wind-tree model « Disquisitiones Mathematicae

[…] the “justification” of the word “abnormal” comes by comparison with Brownian motion and/or central limit theorem: once we know that the diffusion is “sublinear” (maybe […]

18 January, 2012 at 12:12 pm

Anonymous

Hi prof. Tau,

Thanks for these great notes!

Maybe this question is much too late, but I’d be very grateful for a reply. I’m used to think about the eigenvalues of the GUE as being subjected to a quadratic confining potential and mutual log repulsion: is there an easy way to understand the absence of the effect of the confining potential (which would give rise to a simple harmonic restoring force for each eigenvalue) in the formula for the Dyson Brownian motion, equation (5)?

Thanks for a great blog!

18 January, 2012 at 6:09 pm

Terence Tao

One can restore the confining potential by adding a $- \lambda_i dt$ term to the equation for $d\lambda_i$ , turning the Dyson Brownian motion to a Dyson Ornstein-Uhlenbeck process. This has the effect of keeping the variance of the matrix entries in the process constant, instead of growing linearly in time as is done here.

The two processes (normalised variance and non-normalised variance) can be easily rescaled to each other, so the choice of which one to use is basically a matter of taste.

1 May, 2012 at 10:52 am

alabair

I am amazed by the surprising number of knowledge that is difficult to me to digest. And by the way, I would like to hear about a white noise process. Good luck.

11 November, 2012 at 1:09 pm

A direct proof of the stationarity of the Dyson sine process under Dyson Brownian motion « What’s new

[…] of a Hermitian matrix under independent Brownian motion of its entries, and is discussed in this previous blog post. To cut a long story short, this stationarity tells us that the self-similar -point correlation […]

4 February, 2013 at 2:10 am

hera

Great lecture! Prof. Tao, do you have ideas to simulate the Dyson’s brownian motion ? I’d like to do a computer simulation with MATLAB, can you give me some ideas?

5 February, 2013 at 2:47 pm

Some notes on Bakry-Emery theory « What’s new

[…] is a stochastic process with initial probability distribution ; see for instance this previous blog post for more […]

8 February, 2013 at 5:47 pm

The Harish-Chandra-Itzykson-Zuber integral formula « What’s new

[…] motion (as well as the closely related formulae for the GUE ensemble), which were derived in this previous blog post. Both of these approaches can be found in several places in the literature (and I do not actually […]

24 April, 2014 at 12:39 am

Sticky Brownian Motion | Eventually Almost Everywhere

[…] 254A, Notes 3b: Brownian motion and Dyson Brownian motion (terrytao.wordpress.com) […]

25 December, 2014 at 3:31 pm

Ramis Movassagh

Dear Terry,
Thank you for the wonderful post (your posts are always a joy to read and learn from).

Shouldn’t Prop. 2 end with “…same distribution at X_t” instead of \mu?
Best, – Ramis

[Corrected, thanks – T.]

25 December, 2014 at 6:37 pm

Ramis Movassagh

Hi Terry, Same with Prop 3. Thanks for the great post.
cheers, -R

[Corrected, thanks – T.]

7 January, 2016 at 12:45 pm

Ito Calculus by peatar | wanikanidailyscience

[…] Warning: I don’t really know much about financial mathematics so if there are any economists around: Let me know about eventual mistakes. Sources: Links in the text and Terry Tao’s blog. […]

8 November, 2017 at 12:24 am

Student

Dear Professor Tao,
In the last part Theorem 6, $\sum_{i \neq j} \frac{|\xi_{ij}|-1}{\lambda_i - \lambda_j}$ has O( $dt^3$ ) third moment, but the constant of the big order should be O( $(\sum_{i \neq j} \frac{1}{|\lambda_i-\lambda_j|})^3$ ). Upon doing the conditioning I think one should check $\sum_{i \neq j} \frac{1}{|\lambda_i - \lambda_j|}$ has finite third moment, but how can I do that?

thanks

8 November, 2017 at 3:27 pm

Terence Tao

Actually, the third derivative scales like $O( \sum_{i \neq j} \frac{1}{|\lambda_i-\lambda_j|^2} )$ , so one only needs to control the second moment, not the third. The probability that two eigenvalues are within $\varepsilon$ of each other is bounded by $O( \varepsilon^3 )$ (this is because the Vandermonde determinant is $O(\varepsilon)$ ), where we allow implied constants to depend on $n$ . So the second moment is finite.

11 November, 2017 at 12:19 am

Student

By the independence of $\{|\xi_{ij}|^2 -1\}$ , product of two or three random variables will have zero expectation, so
$E(dt \sum_{j\neq i} \frac{|\xi_{ij}| -1}{\lambda_i-\lambda_j})^3 = E(dt^3 (\frac{|\xi_{ij}| -1}{\lambda_i-\lambda_j})^3) = C^3 dt^3 \sum_{j\neq i}( \frac{1}{\lambda_i-\lambda_j})^3$ .
Can the term $O(\sum_{j\neq i}( \frac{1}{\lambda_i-\lambda_j})^2)$ be derived by direct computation? or there is some relation I need to use?

The of Vandermonde determinant and the process is derived in Exercise 6 and Theorem 6.
Will Using it to discuss the gap between the eigenvalues lead to circular argument(because it is a part of proof of Theorem 6)?

11 November, 2017 at 9:50 am

Terence Tao

You are right, the third moment bound is a bit too strong. I have rewritten the argument to only claim the second moment bound (for one error term) and an L^1 bound (for the other error term), which still suffices.

29 May, 2018 at 9:30 am

246C notes 4: Brownian motion, conformal invariance, and SLE | What's new

[…] and complex Brownian motions exist from any base point or ; see e.g. this previous blog post for a construction. We have the following simple […]

24 August, 2020 at 4:37 pm

Yixuan Zhou

Dear Professor Tao, I want to ask whether we can take the derivatives of the eigenvalues when we are trying to prove the Dyson Brownian Motion’s equation. I know from a previous note you justified the smoothness when the matrix process that we obtained is smooth with respect to time. However, it is not clear to me that Hermitian matrix valued BM is smooth.

31 August, 2020 at 5:00 pm

Terence Tao

The eigenvalues of a DBM will not be classically differentiable, but one can still manipulate their time derivative by the usual tools of stochastic differential calculus (in particular Ito calculus).

14 September, 2020 at 3:13 pm

Student

Dear Professor Tao,

I have a question about the last part where we derive the Johansson formula. I don’t think I am understanding the argument of extending the probability density function symmetrically from the Weyl chamber to the whole space. The questions that lingers are, first what do you mean by extending a function symmetrically. And second, why would we sum up the different initial starting location if the eigenvalues should be non-crossing.

Thanks in advance!

[See https://en.wikipedia.org/wiki/Symmetric_function ]

6 February, 2021 at 4:43 am

Anonymous

Don’t the eigenvalues of a GUE sample lie on a disk of unit radius, i.e. Girko’s law?

[The circular law of Girko is for non-Hermitian random matrices such as the Ginibre ensemble, whereas the GUE ensemble (which shoudl not be confused with the Ginibre ensemble) is Hermitian. -T]

	Anonymous on On product representations of…
	Alex Gunning on A symmetric formulation of the…
	Terence Tao on On product representations of…
	domotorp on On product representations of…
	Terence Tao on 275A, Notes 3: The weak and st…
	Terence Tao on A symmetric formulation of the…
	Anonymous on On product representations of…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…
	Alex Gunning on A symmetric formulation of the…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…

254A, Notes 3b: Brownian motion and Dyson Brownian motion

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

55 comments

Leave a comment Cancel reply

For commenters

254A, Notes 3b: Brownian motion and Dyson Brownian motion

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

55 comments

Leave a comment Cancel reply

For commenters