One theme in this course will be the central nature played by the gaussian random variables . Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.
One way to exploit this algebraic structure is to continuously deform the variance from an initial variance of zero (so that the random variable is deterministic) to some final level . We would like to use this to give a continuous family of random variables as (viewed as a “time” parameter) runs from to .
At present, we have not completely specified what should be, because we have only described the individual distribution of each , and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.
We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.
— 1. Formal construction of Brownian motion —
We begin with constructing one-dimensional Brownian motion. We shall model this motion using the machinery of Wiener processes:
Definition 1 (Wiener process) Let , and let be a set of times containing . A (one-dimensional) Wiener process on with initial position is a collection of real random variables for each time , with the following properties:
- (i) .
- (ii) Almost surely, the map is a continuous function on .
- (iii) For every in , the increment has the distribution of . (In particular, for every .)
- (iv) For every in , the increments for are jointly independent.
If is discrete, we say that is a discrete Wiener process; if then we say that is a continuous Wiener process.
Remark 1 Collections of random variables , where is a set of times, will be referred to as stochastic processes, thus Wiener processes are a (very) special type of stochastic process.
Remark 2 In the case of discrete Wiener processes, the continuity requirement (ii) is automatic. For continuous Wiener processes, there is a minor technical issue: the event that is continuous need not be a measurable event (one has to take uncountable intersections to define this event). Because of this, we interpret (ii) by saying that there exists a measurable event of probability , such that is continuous on all of this event, while also allowing for the possibility that could also sometimes be continuous outside of this event also. One can view the collection as a single random variable, taking values in the product space (with the product -algebra, of course).
Remark 3 One can clearly normalise the initial position of a Wiener process to be zero by replacing with for each .
We shall abuse notation somewhat and identify continuous Wiener processes with Brownian motion in our informal discussion, although technically the former is merely a model for the latter. To emphasise this link with Brownian motion, we shall often denote continuous Wiener processes as rather than .
It is not yet obvious that Wiener processes exist, and to what extent they are unique. The situation is easily clarified though for discrete processes:
Proposition 2 (Discrete Brownian motion) Let be a discrete subset of containing , and let . Then (after extending the sample space if necessary) there exists a Wiener process with base point . Furthermore, any other Wiener process with base point has the same distribution as .
Proof: As is discrete and contains , we can write it as for some
Let be a collection of jointly independent random variables with (the existence of such a collection, after extending the sample space, is guaranteed by Exercise 18 of Notes 0.) If we then set
for all , then one easily verifies (using Exercise 9 of Notes 1) that is a Wiener process.
Conversely, if is a Wiener process, and we define for , then from the definition of a Wiener process we see that the have distribution and are jointly independent (i.e. any finite subcollection of the are jointly independent). This implies for any finite that the random variables and have the same distribution, and thus and have the same distribution for any finite subset of . From the construction of the product -algebra we conclude that and have the same distribution, as required.
Now we pass from the discrete case to the continuous case.
Proposition 3 (Continuous Brownian motion) Let . Then (after extending the sample space if necessary) there exists a Wiener process with base point . Furthermore, any other Wiener process with base point has the same distribution as .
Proof: The uniqueness claim follows by the same argument used to prove the uniqueness component of Proposition 2, so we just prove existence here. The iterative construction we give here is somewhat analogous to that used to create self-similar fractals, such as the Koch snowflake. (Indeed, Brownian motion can be viewed as a probabilistic analogue of a self-similar fractal.)
The idea is to create a sequence of increasingly fine discrete Brownian motions, and then to take a limit. Proposition 2 allows one to create each individual discrete Brownian motion, but the key is to couple these discrete processes together in a consistent manner.
Here’s how. We start with a discrete Wiener process on the natural numbers with initial position , which exists by Proposition 2. We now extend this process to the denser set of times by setting
for , where are iid copies of , which are jointly independent of the . It is a routine matter to use Exercise 9 of Notes 1 to show that this creates a discrete Wiener process on which extends the previous process.
Next, we extend the process further to the denser set of times by defining
where are iid copies of , jointly independent of . Again, it is a routine matter to show that this creates a discrete Wiener process on .
Iterating this procedure a countable number of times, we obtain a collection of discrete Wiener processes for which are consistent with each other, in the sense that the earlier processes in this collection are restrictions of later ones. (This requires a countable number of extensions of the underlying sample space, but one can capture all of these extensions into a single extension via the machinery of inverse limits of probability spaces; it is also not difficult to manually build a single extension sufficient for performing all the above constructions.)
Now we establish a Hölder continuity property. Let be any exponent between and , and let be finite. Observe that for any and any , we have and hence (by the subgaussian nature of the normal distribution)
for some absolute constants . The right-hand side is summable as run over subject to the constraint . Thus, by the Borel-Cantelli lemma, for each fixed , we almost surely have that
for all but finitely many with . In particular, this implies that for each fixed , the function is almost surely Hölder continuous of exponent on the dyadic rationals in , and thus (by the countable union bound) is almost surely locally Hölder continuous of exponent on the dyadic rationals in . In particular, they are almost surely locally uniformly continuous on this domain.
As the dyadic rationals are dense in , we can thus almost surely extend uniquely to a continuous function on all of . (On the remaining probability zero event, we extend in some arbitrary measurable fashion.) Note that if is any sequence in converging to , then converges almost surely to , and thus also converges in probability and in distribution. Similarly for differences such as . Using this, we easily verify that is a continuous Wiener process, as required.
Remark 4 One could also have used the Kolmogorov extension theorem to establish the limit.
Exercise 1 Let be a continuous Wiener process. We have already seen that if , that the map is almost surely Hölder continuous of order . Show that if , then the map is almost surely not Hölder continuous of order .
Show also that the map is almost surely nowhere differentiable. Thus, Brownian motion provides a (probabilistic) example of a continuous function which is nowhere differentiable.
Remark 5 In the above constructions, the initial position of the Wiener process was deterministic. However, one can easily construct Wiener processes in which the initial position is itself a random variable. Indeed, one can simply set
where is a continuous Wiener process with initial position which is independent of . Then we see that obeys properties (ii), (iii), (iv) of Definition 1, but the distribution of is no longer , but is instead the convolution of the law of , and the law of .
— 2. Connection with random walks —
We saw how to construct Brownian motion as a limit of discrete Wiener processes, which were partial sums of independent gaussian random variables. The central limit theorem (see Notes 2) allows one to interpret Brownian motion in terms of limits of partial sums of more general independent random variables, otherwise known as (independent) random walks.
Definition 4 (Random walk) Let be a real random variable, let be an initial position, and let be a time step. We define a discrete random walk with initial position , time step and step distribution (or ) to be a process defined by
where are iid copies of .
Example 1 From the proof of Proposition 2, we see that a discrete Wiener process on with initial position is nothing more than a discrete random walk with step distribution of . Another basic example is simple random walk, in which is equal to times a signed Bernoulli variable, thus we have , where the signs are unbiased and are jointly independent in .
Exercise 2 (Central limit theorem) Let be a real random variable with mean zero and variance , and let . For each , let be a process formed by starting with a random walk with initial position , time step , and step distribution , and then extending to other times in , in a piecewise linear fashion, thus
for all and . Show that as , the process converges in distribution to a continuous Wiener process with initial position . (Hint: from the Riesz representation theorem (or the Kolmogorov extension theorem), it suffices to establish this convergence for every finite set of times in . Now use the central limit theorem; treating the piecewise linear modifications to the process as an error term.)
— 3. Connection with the heat equation —
Let be a Wiener process with base point , and let be a smooth function with all derivatives bounded. Then, for each time , the random variable is bounded and thus has an expectation . From the almost sure continuity of and the dominated convergence theorem we see that the map is continuous. In fact it is differentiable, and obeys the following differential equation:
where is the second derivative of . In particular, is continuously differentiable (because the right-hand side is continuous).
Proof: We work from first principles. It suffices to show for fixed , that
as . We shall establish this just for non-negative ; the claim for negative (which only needs to be considered for ) is similar and is left as an exercise.
Now observe that is independent of , and has mean zero and variance . The claim follows.
Exercise 3 Complete the proof of the lemma by considering negative values of . (Hint: one has to exercise caution because is not independent of in this case. However, it will be independent of . Also, use the fact that and are continuous in . Alternatively, one can deduce the formula for the left-derivative from that of the right-derivative via a careful application of the fundamental theorem of calculus, paying close attention to the hypotheses of that theorem.)
Here, , and should either be thought of as being infinitesimal, or being very small, though in the latter case the equation (2) should not be viewed as being exact, but instead only being true up to errors of mean and third moment . This is a special case of Ito’s formula. It should be compared against the chain rule
when is a smooth process. The non-smooth nature of Brownian motion causes the quadratic term in the Taylor expansion to be non-negligible, which explains the additional term in (2), although the Hölder continuity of this motion is sufficient to still be able to ignore terms that are of cubic order or higher. (In this spirit, one can summarise (the differential side of) Ito calculus informally by the heuristic equations and , with the understanding that all terms that are are discarded.)
Let be the probability density function of ; by inspection of the normal distribution, this is a smooth function for , but is a Dirac mass at at time . By definition of density function,
in the sense of (tempered) distributions (see e.g. my earlier notes on this topic). In other words, is a (tempered distributional) solution to the heat equation (3). Indeed, since is the Dirac mass at at time , for later times is the fundamental solution of that equation from initial position .
From the theory of PDE (e.g. from Fourier analysis, see Exercise 38 of these notes) one can solve the (distributional) heat equation with this initial data to obtain the unique solution
Of course, this is also the density function of , which is (unsurprisingly) consistent with the fact that . Thus we see why the normal distribution of the central limit theorem involves the same type of functions (i.e. gaussians) as the fundamental solution of the heat equation. Indeed, one can use this argument to heuristically derive the central limit theorem from the fundamental solution of the heat equation (cf. Section 7 of Notes 2), although the derivation is only heuristic because one first needs to know that some limiting distribution already exists (in the spirit of Exercise 2).
Remark 7 Because we considered a Wiener process with a deterministic initial position , the density function was a Dirac mass at time . However, one can run exactly the same arguments for Wiener processes with stochastic initial position (see Remark 5), and one will still obtain the same heat equation (5), but now with a more general initial condition.
We have related one-dimensional Brownian motion to the one-dimensional heat equation, but there is no difficulty establishing a similar relationship in higher dimensions. In a vector space , define a (continuous) Wiener process in with an initial position to be a process whose components for are independent Wiener processes with initial position . It is easy to see that such processes exist, are unique in distribution, and obey the same sort of properties as in Definition 1, but with the one-dimensional gaussian distribution replaced by the -dimensional analogue , which is given by the density function
where is now Lebesgue measure on .
Exercise 4 If is an -dimensional continuous Wiener process, show that
whenever is smooth with all derivatives bounded, where
is the Laplacian of . Conclude in particular that the density function of obeys the (distributional) heat equation
A simple but fundamental observation is that -dimensional Brownian motion is rotation-invariant: more precisely, if is an -dimensional Wiener process with initial position , and is any orthogonal transformation on , then is another Wiener process with initial position , and thus has the same distribution:
This is ultimately because the -dimensional normal distributions are manifestly rotation-invariant (see Exercise 10 of Notes 1).
Remark 8 One can also relate variable-coefficient heat equations to variable-coefficient Brownian motion , in which the variance of an increment is now only proportional to for infinitesimal rather than being equal to , with the constant of proportionality allowed to depend on the time and on the position . One can also add drift terms by allowing the increment to have a non-zero mean (which is also proportional to ). This can be accomplished through the machinery of stochastic calculus, which we will not discuss in detail in these notes. In a similar fashion, one can construct Brownian motion (and heat equations) on manifolds or on domains with boundary, though we will not discuss this topic here.
Exercise 5 Let be a real random variable of mean zero and variance . Define a stochastic process by the formula
where is a Wiener process with initial position zero that is independent of . This process is known as an Ornstein-Uhlenbeck process.
- Show that each has mean zero and variance .
- Show that converges in distribution to as .
- If is smooth with all derivatives bounded, show that
where is the Ornstein-Uhlenbeck operator
Conclude that the density function of obeys (in a distributional sense, at least) the Ornstein-Uhlenbeck equation
where the adjoint operator is given by
- Show that the only probability density function for which is the Gaussian ; furthermore, show that for all probability density functions in the Schwartz space with mean zero and variance . Discuss how this fact relates to the preceding two parts of this exercise.
Remark 9 The heat kernel in dimensions is absolutely integrable in time away from the initial time for dimensions , but becomes divergent in dimension and (just barely) divergent for . This causes the qualitative behaviour of Brownian motion in to be rather different in the two regimes. For instance, in dimensions Brownian motion is transient; almost surely one has as . But in dimension Brownian motion is recurrent: for each , one almost surely has for infinitely many . In the critical dimension , Brownian motion turns out to not be recurrent, but is instead neighbourhood recurrent: almost surely, revisits every neighbourhood of at arbitrarily large times, but does not visit itself for any positive time . The study of Brownian motion and its relatives is in fact a huge and active area of study in modern probability theory, but will not be discussed in this course.
— 4. Dyson Brownian motion —
The space of Hermitian matrices can be viewed as a real vector space of dimension using the Frobenius norm
where are the coefficients of . One can then identify explicitly with via the identification
Now that one has this indentification, for each Hermitian matrix (deterministic or stochastic) we can define a Wiener process on with initial position . By construction, we see that is almost surely continuous, and each increment is equal to times a matrix drawn from the gaussian unitary ensemble (GUE), with disjoint increments being jointly independent. In particular, the diagonal entries of have distribution , and the off-diagonal entries have distribution .
Given any Hermitian matrix , one can form the spectrum , which lies in the Weyl chamber . Taking the spectrum of the Wiener process , we obtain a process
in the Weyl cone. We abbreviate as .
For , we see that is absolutely continuously distributed in . In particular, since almost every Hermitian matrix has simple spectrum, we see that has almost surely simple spectrum for . (The same is true for if we assume that also has an absolutely continuous distribution.)
The stochastic dynamics of this evolution can be described by Dyson Brownian motion:
Using the language of Ito calculus, one usually views as infinitesimal and drops the error, thus giving the elegant formula
that shows that the eigenvalues evolve by Brownian motion, combined with a deterministic repulsion force that repels nearby eigenvalues from each other with a strength inversely proportional to the separation. One can extend the theorem to the case by a limiting argument provided that has an absolutely continuous distribution. Note that the decay rate of the error can depend on , so it is not safe to let go off to infinity while holding fixed. However, it is safe to let go to zero first, and then send off to infinity. (It is also possible, by being more explicit with the error terms, to work with being a specific negative power of . We will see this sort of analysis later in this course.)
Proof: Fix . We can write , where is independent of and has the GUE distribution. (Strictly speaking, depends on , but this dependence will not concern us.) We now condition to be fixed, and establish (5) for almost every fixed choice of ; the general claim then follows upon undoing the conditioning (and applying the dominated convergence theorem). Due to independence, observe that continues to have the GUE distribution even after conditioning to be fixed.
Almost surely, has simple spectrum; so we may assume that the fixed choice of has simple spectrum also. The eigenvalues now vary smoothly near , so we may Taylor expand
for sufficiently small , where is directional differentiation in the direction, and the implied constants in the notation can depend on and . In particular, we do not care what norm is used to measure in.
As has the GUE distribution, the expectation and variance of is bounded (possibly with constant depending on ), so the error here has mean and third moment . We thus have
Next, from the first and second Hadamard variation formulae (see Section 4 of Notes 3a) we have
where are an orthonormal eigenbasis for , and thus
Now we take advantage of the unitary invariance of the Gaussian unitary ensemble (that is, that for all unitary matrices ; this is easiest to see by noting that the probability density function of is proportional to ). From this invariance, we can assume without loss of generality that is the standard orthonormal basis of , so that we now have
where are the coefficients of . But the are iid copies of , and the are iid copies of , and the claim follows (note that has mean zero and third moment .)
Remark 10 Interestingly, one can interpret Dyson Brownian motion in a different way, namely as the motion of independent Wiener processes after one conditions the to be non-intersecting for all time; see this paper of Grabiner. It is intuitively reasonable that this conditioning would cause a repulsion effect, though I do not know of a simple heuristic reason why this conditioning should end up giving the specific repulsion force present in (5).
In the previous section, we saw how a Wiener process led to a PDE (the heat flow equation) that could be used to derive the probability density function for each component of that process. We can do the same thing here:
where is the adjoint Dyson operator
Exercise 7 Show that (8) is the determinant of the matrix , and is also the sum .
which can be used to cancel off the second term in (7). Indeed, we have
Exercise 8 Let be a smooth solution to (6) in the interior of , and write
in the interior of . (Hint: You may need to exploit the identity for distinct . Equivalently, you may need to first establish that the Vandermonde determinant is a harmonic function.)
Let be the density function of the , as in (6). Recall that the Wiener random matrix has a smooth distribution in the space of Hermitian matrices, while the space of matrices in with non-simple spectrum has codimension by Exercise 10 of Notes 3a. On the other hand, the non-simple spectrum only has codimension in the Weyl chamber (being the boundary of this cone). Because of this, we see that vanishes to at least second order on the boundary of this cone (with correspondingly higher vanishing on higher codimension facets of this boundary). Thus, the function in Exercise 8 vanishes to first order on this boundary (again with correspondingly higher vanishing on higher codimension facets). Thus, if we extend symmetrically across the cone to all of , and extend the function antisymmetrically, then the equation (6) and the factorisation (10) extend (in the distributional sense) to all of . Extending (8) to this domain (and being somewhat careful with various issues involving distributions), we now see that obeys the linear heat equation on all of .
Now suppose that the initial matrix had a deterministic spectrum , which to avoid technicalities we will assume to be in the interior of the Weyl chamber (the boundary case then being obtainable by a limiting argument). Then is initially the Dirac delta function at , extended symmetrically. Hence, is initially times the Dirac delta function at , extended antisymmetrically:
Using the fundamental solution for the heat equation in dimensions, we conclude that
By the Leibniz formula for determinants, we can express the sum here as a determinant of the matrix
Applying (10), we conclude
This formula is given explicitly in this paper of Johansson, who cites this paper of Brézin and Hikami as inspiration. This formula can also be proven by a variety of other means, for instance via the Harish-Chandra formula. (One can also check by hand that (11) satisfies the Dyson equation (6).)
We will be particularly interested in the case when and , so that we are studying the probability density function of the eigenvalues of a GUE matrix . The Johansson formula does not directly apply here, because is vanishing. However, we can investigate the limit of (11) in the limit as inside the Weyl chamber; the Lipschitz nature of the eigenvalue operations (from the Weyl inequalities) tell us that if (11) converges locally uniformly as for in the interior of , then the limit will indeed be the probability density function for . (Note from continuity that the density function cannot assign any mass to the boundary of the Weyl chamber, and in fact must vanish to at least second order by the previous discussion.)
Exercise 9 Show that as , we have the identities
locally uniformly in . (Hint: for the second identity, use Taylor expansion and the Leibniz formula for determinants, noting the left-hand side vanishes whenever vanishes and so can be treated by the (smooth) factor theorem.)
From the above exercise, we conclude the fundamental Ginibre formula
This formula can be derived by a variety of other means; we sketch one such way below.
Exercise 10 For this exercise, assume that it is known that (12) is indeed a probability distribution on the Weyl chamber (if not, one would have to replace the constant by an unspecified normalisation factor depending only on ). Let be drawn at random using the distribution (12), and let be drawn at random from Haar measure on . Show that the probability density function of at a matrix with simple spectrum is equal to for some constant . (Hint: use unitary invariance to reduce to the case when is diagonal. Now take a small and consider what and must be in order for to lie within of in the Frobenius norm, performing first order calculations only (i.e. linearising and ignoring all terms of order ).)
Conclude that (12) must be the probability density function of the spectrum of a GUE matrix.
Exercise 11 Verify by hand that the self-similar extension
Remark 11 Similar explicit formulae exist for other invariant ensembles, such as the gaussian orthogonal ensemble GOE and the gaussian symplectic ensemble GSE. One can also replace the exponent in density functions such as with more general expressions than quadratic expressions of . We will however not detail these formulae in this course (with the exception of the spectral distribution law for random iid gaussian matrices, which we will discuss in a later set of notes).