You are currently browsing the category archive for the ‘math.SP’ category.
Igor Rodnianski and I have just uploaded to the arXiv our paper “Effective limiting absorption principles, and applications“, submitted to Communications in Mathematical Physics. In this paper we derive limiting absorption principles (of type discussed in this recent post) for a general class of Schrödinger operators on a wide class of manifolds, namely the asymptotically conic manifolds. The precise definition of such manifolds is somewhat technical, but they include as a special case the asymptotically flat manifolds, which in turn include as a further special case the smooth compact perturbations of Euclidean space
(i.e. the smooth Riemannian manifolds that are identical to
outside of a compact set). The potential
is assumed to be a short range potential, which roughly speaking means that it decays faster than
as
; for several of the applications (particularly at very low energies) we need to in fact assume that
is a strongly short range potential, which roughly speaking means that it decays faster than
.
To begin with, we make no hypotheses about the topology or geodesic geometry of the manifold ; in particular, we allow
to be trapping in the sense that it contains geodesic flows that do not escape to infinity, but instead remain trapped in a bounded subset of
. We also allow the potential
to be signed, which in particular allows bound states (eigenfunctions of negative energy) to be created. For standard technical reasons we restrict attention to dimensions three and higher:
.
It is well known that such Schrödinger operators are essentially self-adjoint, and their spectrum consists of purely absolutely continuous spectrum on
, together with possibly some eigenvalues at zero and negative energy (and at zero energy and in dimensions three and four, there are also the possibility of resonances which, while not strictly eigenvalues, have a somewhat analogous effect on the dynamics of the Laplacian and related objects, such as resolvents). In particular, the resolvents
make sense as bounded operators on
for any
and
. As discussed in the previous blog post, it is of interest to obtain bounds for the behaviour of these resolvents, as this can then be used via some functional calculus manipulations to obtain control on many other operators and PDE relating to the Schrödinger operator
, such as the Helmholtz equation, the time-dependent Schrödinger equation, and the wave equation. In particular, it is of interest to obtain limiting absorption estimates such as
(and particularly in the positive energy regime
), where
and
is an arbitrary test function. The constant
needs to be independent of
for such estimates to be truly useful, but it is also of interest to determine the extent to which these constants depend on
,
, and
. The dependence on
is relatively uninteresting and henceforth we will suppress it. In particular, our paper focused to a large extent on quantitative methods that could give effective bounds on
in terms of quantities such as the magnitude
of the potential
in a suitable norm.
It turns out to be convenient to distinguish between three regimes:
- The high-energy regime
;
- The medium-energy regime
; and
- The low-energy regime
.
Our methods actually apply more or less uniformly to all three regimes, but the nature of the conclusions is quite different in each of the three regimes.
The high-energy regime was essentially worked out by Burq, although we give an independent treatment of Burq’s results here. In this regime it turns out that we have an unconditional estimate of the form (1) with a constant of the shape
where is a constant that depends only on
and on a parameter
that controls the size of the potential
. This constant, while exponentially growing, is still finite, which among other things is enough to rule out the possibility that
contains eigenfunctions (i.e. point spectrum) embedded in the high-energy portion of the spectrum. As is well known, if
contains a certain type of trapped geodesic (in particular those arising from positively curved portions of the manifold, such as the equator of a sphere), then it is possible to construct pseudomodes
that show that this sort of exponential growth is necessary. On the other hand, if we make the non-trapping hypothesis that all geodesics in
escape to infinity, then we can obtain a much stronger high-energy limiting absorption estimate, namely
The exponent here is closely related to the standard fact that on non-trapping manifolds, there is a local smoothing effect for the time-dependent Schrödinger equation that gains half a derivative of regularity (cf. previous blog post). In the high-energy regime, the dynamics are well-approximated by semi-classical methods, and in particular one can use tools such as the positive commutator method and pseudo-differential calculus to obtain the desired estimates. In case of trapping one also needs the standard technique of Carleman inequalities to control the compact (and possibly trapping) core of the manifold, and in particular needing the delicate two-weight Carleman inequalities of Burq.
In the medium and low energy regimes one needs to work harder. In the medium energy regime , we were able to obtain a uniform bound
for all asymptotically conic manifolds (trapping or not) and all short-range potentials. To establish this bound, we have to supplement the existing tools of the positive commutator method and Carleman inequalities with an additional ODE-type analysis of various energies of the solution to a Helmholtz equation on large spheres, as will be discussed in more detail below the fold.
The methods also extend to the low-energy regime . Here, the bounds become somewhat interesting, with a subtle distinction between effective estimates that are uniform over all potentials
which are bounded in a suitable sense by a parameter
(e.g. obeying
for all
), and ineffective estimates that exploit qualitative properties of
(such as the absence of eigenfunctions or resonances at zero) and are thus not uniform over
. On the effective side, and for potentials that are strongly short range (at least at local scales
; one can tolerate merely short-range behaviour at more global scales, but this is a technicality that we will not discuss further here) we were able to obtain a polynomial bound of the form
that blew up at a large polynomial rate at the origin. Furthermore, by carefully designing a sequence of potentials that induce near-eigenfunctions that resemble two different Bessel functions of the radial variable glued together, we are able to show that this type of polynomial bound is sharp in the following sense: given any constant
, there exists a sequence
of potentials on Euclidean space
uniformly bounded by
, and a sequence
of energies going to zero, such that
This shows that if one wants bounds that are uniform in the potential , then arbitrary polynomial blowup is necessary.
Interestingly, though, if we fix the potential , and then ask for bounds that are not necessarily uniform in
, then one can do better, as was already observed in a classic paper of Jensen and Kato concerning power series expansions of the resolvent near the origin. In particular, if we make the spectral assumption that
has no eigenfunctions or resonances at zero, then an argument (based on (a variant of) the Fredholm alternative, which as discussed in this recent blog post gives ineffective bounds) gives a bound of the form
in the low-energy regime (but note carefully here that the constant on the right-hand side depends on the potential
itself, and not merely on the parameter
that upper bounds it). Even if there are eigenvalues or resonances, it turns out that one can still obtain a similar bound but with an exponent of
instead of
. This limited blowup at infinity is in sharp contrast to the arbitrarily large polynomial blowup rate that can occur if one demands uniform bounds. (This particular subtlety between uniform and non-uniform estimates confused us, by the way, for several weeks; for a long time we thought that we had somehow found a contradiction between our results and the results of Jensen and Kato.)
As applications of our limiting absorption estimates, we give local smoothing and dispersive estimates for solutions (as well as the closely related RAGE type theorems) to the time-dependent Schrödinger and wave equations, and also reprove standard facts about the spectrum of Schrödinger operators in this setting.
Perhaps the most fundamental differential operator on Euclidean space is the Laplacian
The Laplacian is a linear translation-invariant operator, and as such is necessarily diagonalised by the Fourier transform
Indeed, we have
for any suitably nice function (e.g. in the Schwartz class; alternatively, one can work in very rough classes, such as the space of tempered distributions, provided of course that one is willing to interpret all operators in a distributional or weak sense).
Because of this explicit diagonalisation, it is a straightforward manner to define spectral multipliers of the Laplacian for any (measurable, polynomial growth) function
, by the formula
(The presence of the minus sign in front of the Laplacian has some minor technical advantages, as it makes positive semi-definite. One can also define spectral multipliers more abstractly from general functional calculus, after establishing that the Laplacian is essentially self-adjoint.) Many of these multipliers are of importance in PDE and analysis, such as the fractional derivative operators
, the heat propagators
, the (free) Schrödinger propagators
, the wave propagators
(or
and
, depending on one’s conventions), the spectral projections
, the Bochner-Riesz summation operators
, or the resolvents
.
Each of these families of multipliers are related to the others, by means of various integral transforms (and also, in some cases, by analytic continuation). For instance:
- Using the Laplace transform, one can express (sufficiently smooth) multipliers in terms of heat operators. For instance, using the identity
(using analytic continuation if necessary to make the right-hand side well-defined), with
being the Gamma function, we can write the fractional derivative operators in terms of heat kernels:
- Using analytic continuation, one can connect heat operators
to Schrödinger operators
, a process also known as Wick rotation. Analytic continuation is a notoriously unstable process, and so it is difficult to use analytic continuation to obtain any quantitative estimates on (say) Schrödinger operators from their heat counterparts; however, this procedure can be useful for propagating identities from one family to another. For instance, one can derive the fundamental solution for the Schrödinger equation from the fundamental solution for the heat equation by this method.
- Using the Fourier inversion formula, one can write general multipliers as integral combinations of Schrödinger or wave propagators; for instance, if
lies in the upper half plane
, one has
for any real number
, and thus we can write resolvents in terms of Schrödinger propagators:
In a similar vein, if, then
for any
, so one can also write resolvents in terms of wave propagators:
- Using the Cauchy integral formula, one can express (sufficiently holomorphic) multipliers in terms of resolvents (or limits of resolvents). For instance, if
, then from the Cauchy integral formula (and Jordan’s lemma) one has
for any
, and so one can (formally, at least) write Schrödinger propagators in terms of resolvents:
- The imaginary part of
is the Poisson kernel
, which is an approximation to the identity. As a consequence, for any reasonable function
, one has (formally, at least)
which leads (again formally) to the ability to express arbitrary multipliers in terms of imaginary (or skew-adjoint) parts of resolvents:
Among other things, this type of formula (withreplaced by a more general self-adjoint operator) is used in the resolvent-based approach to the spectral theorem (by using the limiting imaginary part of resolvents to build spectral measure). Note that one can also express
as
.
Remark 1 The ability of heat operators, Schrödinger propagators, wave propagators, or resolvents to generate other spectral multipliers can be viewed as a sort of manifestation of the Stone-Weierstrass theorem (though with the caveat that the spectrum of the Laplacian is non-compact and so the Stone-Weierstrass theorem does not directly apply). Indeed, observe the *-algebra type properties
Because of these relationships, it is possible (in principle, at least), to leverage one’s understanding one family of spectral multipliers to gain control on another family of multipliers. For instance, the fact that the heat operators have non-negative kernel (a fact which can be seen from the maximum principle, or from the Brownian motion interpretation of the heat kernels) implies (by (1)) that the fractional integral operators
for
also have non-negative kernel. Or, the fact that the wave equation enjoys finite speed of propagation (and hence that the wave propagators
have distributional convolution kernel localised to the ball of radius
centred at the origin), can be used (by (3)) to show that the resolvents
have a convolution kernel that is essentially localised to the ball of radius
around the origin.
In this post, I would like to continue this theme by using the resolvents to control other spectral multipliers. These resolvents are well-defined whenever
lies outside of the spectrum
of the operator
. In the model three-dimensional case
, they can be defined explicitly by the formula
whenever lives in the upper half-plane
, ensuring the absolute convergence of the integral for test functions
. (In general dimension, explicit formulas are still available, but involve Bessel functions. But asymptotically at least, and ignoring higher order terms, one simply replaces
by
for some explicit constant
.) It is an instructive exercise to verify that this resolvent indeed inverts the operator
, either by using Fourier analysis or by Green’s theorem.
Henceforth we restrict attention to three dimensions for simplicity. One consequence of the above explicit formula is that for positive real
, the resolvents
and
tend to different limits as
, reflecting the jump discontinuity in the resolvent function at the spectrum; as one can guess from formulae such as (4) or (5), such limits are of interest for understanding many other spectral multipliers. Indeed, for any test function
, we see that
and
Both of these functions
solve the Helmholtz equation
, then we have the asymptotic
, leading also to the Sommerfeld radiation condition
is the outgoing radial derivative. Indeed, one can show using an integration by parts argument that
is the unique solution of the Helmholtz equation (6) obeying (8) (see below).
is known as the outward radiating solution of the Helmholtz equation (6), and
is known as the inward radiating solution. Indeed, if one views the function
as a solution to the inhomogeneous Schrödinger equation
and using the de Broglie law that a solution to such an equation with wave number (i.e. resembling
for some amplitide
) should propagate at (group) velocity
, we see (heuristically, at least) that the outward radiating solution will indeed propagate radially away from the origin at speed
, while inward radiating solution propagates inward at the same speed.
There is a useful quantitative version of the convergence
Theorem 1 (Limiting absorption principle) Let
be a test function on
, let
, and let
. Then one has
for all
, where
depends only on
, and
is the weighted norm
and
.
This principle allows one to extend the convergence (9) from test functions to all functions in the weighted space
by a density argument (though the radiation condition (8) has to be adapted suitably for this scale of spaces when doing so). The weighted space
on the left-hand side is optimal, as can be seen from the asymptotic (7); a duality argument similarly shows that the weighted space
on the right-hand side is also optimal.
We prove this theorem below the fold. As observed long ago by Kato (and also reproduced below), this estimate is equivalent (via a Fourier transform in the spectral variable ) to a useful estimate for the free Schrödinger equation known as the local smoothing estimate, which in particular implies the well-known RAGE theorem for that equation; it also has similar consequences for the free wave equation. As we shall see, it also encodes some spectral information about the Laplacian; for instance, it can be used to show that the Laplacian has no eigenvalues, resonances, or singular continuous spectrum. These spectral facts are already obvious from the Fourier transform representation of the Laplacian, but the point is that the limiting absorption principle also applies to more general operators for which the explicit diagonalisation afforded by the Fourier transform is not available. (Igor Rodnianski and I are working on a paper regarding this topic, of which I hope to say more about soon.)
In order to illustrate the main ideas and suppress technical details, I will be a little loose with some of the rigorous details of the arguments, and in particular will be manipulating limits and integrals at a somewhat formal level.
A few days ago, I found myself needing to use the Fredholm alternative in functional analysis:
Theorem 1 (Fredholm alternative) Let
be a Banach space, let
be a compact operator, and let
be non-zero. Then exactly one of the following statements hold:
- (Eigenvalue) There is a non-trivial solution
to the equation
.
- (Bounded resolvent) The operator
has a bounded inverse
on
.
Among other things, the Fredholm alternative can be used to establish the spectral theorem for compact operators. A hypothesis such as compactness is necessary; the shift operator on
, for instance, has no eigenfunctions, but
is not invertible for any unit complex number
. The claim is also false when
; consider for instance the multiplication operator
on
, which is compact and has no eigenvalue at zero, but is not invertible.
It had been a while since I had studied the spectral theory of compact operators, and I found that I could not immediately reconstruct a proof of the Fredholm alternative from first principles. So I set myself the exercise of doing so. I thought that I had managed to establish the alternative in all cases, but as pointed out in comments, my argument is restricted to the case where the compact operator is approximable, which means that it is the limit of finite rank operators in the uniform topology. Many Banach spaces (and in particular, all Hilbert spaces) have the approximation property that implies (by a result of Grothendieck) that all compact operators on that space are almost finite rank. For instance, if
is a Hilbert space, then any compact operator is approximable, because any compact set can be approximated by a finite-dimensional subspace, and in a Hilbert space, the orthogonal projection operator to a subspace is always a contraction. (In more general Banach spaces, finite-dimensional subspaces are still complemented, but the operator norm of the projection can be large.) Unfortunately, there are examples of Banach spaces for which the approximation property fails; the first such examples were discovered by Enflo, and a subsequent paper of by Alexander demonstrated the existence of compact operators in certain Banach spaces that are not approximable.
I also found out that this argument was essentially also discovered independently by by MacCluer-Hull and by Uuye. Nevertheless, I am recording this argument here, together with two more traditional proofs of the Fredholm alternative (based on the Riesz lemma and a continuity argument respectively).
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of eigenvectors“, submitted to Random Matrices: Theory and Applications. This paper concerns an extension of our four moment theorem for eigenvalues. Roughly speaking, that four moment theorem asserts (under mild decay conditions on the coefficients of the random matrix) that the fine-scale structure of individual eigenvalues of a Wigner random matrix depend only on the first four moments of each of the entries.
In this paper, we extend this result from eigenvalues to eigenvectors, and specifically to the coefficients of, say, the eigenvector
of a Wigner random matrix
. Roughly speaking, the main result is that the distribution of these coefficients also only depends on the first four moments of each of the entries. In particular, as the distribution of coefficients eigenvectors of invariant ensembles such as GOE or GUE are known to be asymptotically gaussian real (in the GOE case) or gaussian complex (in the GUE case), the same asymptotic automatically holds for Wigner matrices whose coefficients match GOE or GUE to fourth order.
(A technical point here: strictly speaking, the eigenvectors are only determined up to a phase, even when the eigenvalues are simple. So, to phrase the question properly, one has to perform some sort of normalisation, for instance by working with the coefficients of the spectral projection operators
instead of the eigenvectors, or rotating each eigenvector by a random phase, or by fixing the first component of each eigenvector to be positive real. This is a fairly minor technical issue here, though, and will not be discussed further.)
This theorem strengthens a four moment theorem for eigenvectors recently established by Knowles and Yin (by a somewhat different method), in that the hypotheses are weaker (no level repulsion assumption is required, and the matrix entries only need to obey a finite moment condition rather than an exponential decay condition), and a slightly stronger conclusion (less regularity is needed on the test function, and one can handle the joint distribution of polynomially many coefficients, rather than boundedly many coefficients). On the other hand, the Knowles-Yin paper can also handle generalised Wigner ensembles in which the variances of the entries are allowed to fluctuate somewhat.
The method used here is a variation of that in our original paper (incorporating the subsequent improvements to extend the four moment theorem from the bulk to the edge, and to replace exponential decay by a finite moment condition). That method was ultimately based on the observation that if one swapped a single entry (and its adjoint) in a Wigner random matrix, then an individual eigenvalue would not fluctuate much as a consequence (as long as one had already truncated away the event of an unexpectedly small eigenvalue gap). The same analysis shows that the projection matrices
obeys the same stability property.
As an application of the eigenvalue four moment theorem, we establish a four moment theorem for the coefficients of resolvent matrices , even when
is on the real axis (though in that case we need to make a level repulsion hypothesis, which has been already verified in many important special cases and is likely to be true in general). This improves on an earlier four moment theorem for resolvents of Erdos, Yau, and Yin, which required
to stay some distance away from the real axis (specifically, that
for some small
).
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues
of a Wigner random matrix . More specifically, we consider
Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal
on the diagonal, and
off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.
The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has
(in the vague topology) where is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)
One can phrase this law in a number of equivalent ways. For instance, in the bulk region , one almost surely has
, where the classical location
of the (normalised)
eigenvalue
is defined by the formula
The bound (1) also holds in the edge case (by using the operator norm bound , due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.
From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of . There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of
. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing
in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale
(and even at the slightly finer scale of
for some absolute constant
) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of
depends only on the first three moments of the matrix entries.
It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing at the critical scale
, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.
The main result of this paper is to show that this is not the case; that at the critical scale , the distribution of eigenvalues
is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.
Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment
A standard moment method computation shows that the right hand side is equal to
where is the fourth moment of the real part of the off-diagonal coefficients of
. In particular, a change in the fourth moment
by
leads to a change in the expression
by
. Thus, for a typical
, one expects
to shift by
; since
on the average, we thus expect
itself to shift by about
by the mean-value theorem.
To make this rigorous, one needs a sufficiently strong concentration of measure result for that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that
was sharply concentrated around an interval of size
around
for any
(in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that
was also weakly concentrated around an interval of size
around
, in the sense that the probability that one was outside this interval was
for some absolute constant
. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of
as
varied from
to
was of the size of
for some absolute
.
As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual is
(without averaging in
). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean
is indeed sensitive to the fourth moment of the entries at the critical scale
.
One curious feature of the analysis is how differently the median and the mean of the eigenvalue react to the available technology. To control the global behaviour of the eigenvalues (after averaging in
), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of
to an interval of size
, but can only localise the mean to a much larger interval of size
. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as
for which all eigenvalues could deviate as far as
from their expected location, instead of their typical deviation of
. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than
times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape
. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.
Now we turn attention to another important spectral statistic, the least singular value of an
matrix
(or, more generally, the least non-trivial singular value
of a
matrix with
). This quantity controls the invertibility of
. Indeed,
is invertible precisely when
is non-zero, and the operator norm
of
is given by
. This quantity is also related to the condition number
of
, which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of
(and more generally, of the shifts
for complex
) will be of importance in rigorously establishing the circular law for iid random matrices
, as it plays a key role in computing the Stieltjes transform
of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.
The least singular value
which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm
at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of , but are not able to control the “generic” or “incompressible” choices of
, for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with
for
, it gives an upper bound of
for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as
, though these are more difficult to compute than positive moments
.
So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the rows
of the matrix
, and the hyperplane spanned by the other
rows. The reason for this is as follows. First suppose that
, so that
is non-invertible, and there is a linear dependence between the rows
. Thus, one of the
will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the
distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so
.
More generally, if the least singular value is small, one generically expects many of these
distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row
and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of
with a unit normal
of this hyperplane.
When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal (which depends on all the rows other than
) is independent of
, so even after conditioning
to be fixed, the entries of
remain independent. As such, the dot product
is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal
is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)
These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).
To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble of random sign matrices, where
are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.
Our study of random matrices, to date, has focused on somewhat general ensembles, such as iid random matrices or Wigner random matrices, in which the distribution of the individual entries of the matrices was essentially arbitrary (as long as certain moments, such as the mean and variance, were normalised). In these notes, we now focus on two much more special, and much more symmetric, ensembles:
- The Gaussian Unitary Ensemble (GUE), which is an ensemble of random
Hermitian matrices
in which the upper-triangular entries are iid with distribution
, and the diagonal entries are iid with distribution
, and independent of the upper-triangular ones; and
- The Gaussian random matrix ensemble, which is an ensemble of random
(non-Hermitian) matrices
whose entries are iid with distribution
.
The symmetric nature of these ensembles will allow us to compute the spectral distribution by exact algebraic means, revealing a surprising connection with orthogonal polynomials and with determinantal processes. This will, for instance, recover the semi-circular law for GUE, but will also reveal fine spacing information, such as the distribution of the gap between adjacent eigenvalues, which is largely out of reach of tools such as the Stieltjes transform method and the moment method (although the moment method, with some effort, is able to control the extreme edges of the spectrum).
Similarly, we will see for the first time the circular law for eigenvalues of non-Hermitian matrices.
There are a number of other highly symmetric ensembles which can also be treated by the same methods, most notably the Gaussian Orthogonal Ensemble (GOE) and the Gaussian Symplectic Ensemble (GSE). However, for simplicity we shall focus just on the above two ensembles. For a systematic treatment of these ensembles, see the text by Deift.
In the foundations of modern probability, as laid out by Kolmogorov, the basic objects of study are constructed in the following order:
- Firstly, one selects a sample space
, whose elements
represent all the possible states that one’s stochastic system could be in.
- Then, one selects a
-algebra
of events
(modeled by subsets of
), and assigns each of these events a probability
in a countably additive manner, so that the entire sample space has probability
.
- Finally, one builds (commutative) algebras of random variables
(such as complex-valued random variables, modeled by measurable functions from
to
), and (assuming suitable integrability or moment conditions) one can assign expectations
to each such random variable.
In measure theory, the underlying measure space plays a prominent foundational role, with the measurable sets and measurable functions (the analogues of the events and the random variables) always being viewed as somehow being attached to that space. In probability theory, in contrast, it is the events and their probabilities that are viewed as being fundamental, with the sample space
being abstracted away as much as possible, and with the random variables and expectations being viewed as derived concepts. See Notes 0 for further discussion of this philosophy.
However, it is possible to take the abstraction process one step further, and view the algebra of random variables and their expectations as being the foundational concept, and ignoring both the presence of the original sample space, the algebra of events, or the probability measure.
There are two reasons for wanting to shed (or abstract away) these previously foundational structures. Firstly, it allows one to more easily take certain types of limits, such as the large limit
when considering
random matrices, because quantities built from the algebra of random variables and their expectations, such as the normalised moments of random matrices tend to be quite stable in the large
limit (as we have seen in previous notes), even as the sample space and event space varies with
. (This theme of using abstraction to facilitate the taking of the large
limit also shows up in the application of ergodic theory to combinatorics via the correspondence principle; see this previous blog post for further discussion.)
Secondly, this abstract formalism allows one to generalise the classical, commutative theory of probability to the more general theory of non-commutative probability theory, which does not have a classical underlying sample space or event space, but is instead built upon a (possibly) non-commutative algebra of random variables (or “observables”) and their expectations (or “traces”). This more general formalism not only encompasses classical probability, but also spectral theory (with matrices or operators taking the role of random variables, and the trace taking the role of expectation), random matrix theory (which can be viewed as a natural blend of classical probability and spectral theory), and quantum mechanics (with physical observables taking the role of random variables, and their expected value on a given quantum state being the expectation). It is also part of a more general “non-commutative way of thinking” (of which non-commutative geometry is the most prominent example), in which a space is understood primarily in terms of the ring or algebra of functions (or function-like objects, such as sections of bundles) placed on top of that space, and then the space itself is largely abstracted away in order to allow the algebraic structures to become less commutative. In short, the idea is to make algebra the foundation of the theory, as opposed to other possible choices of foundations such as sets, measures, categories, etc..
[Note that this foundational preference is to some extent a metamathematical one rather than a mathematical one; in many cases it is possible to rewrite the theory in a mathematically equivalent form so that some other mathematical structure becomes designated as the foundational one, much as probability theory can be equivalently formulated as the measure theory of probability measures. However, this does not negate the fact that a different choice of foundations can lead to a different way of thinking about the subject, and thus to ask a different set of questions and to discover a different set of proofs and solutions. Thus it is often of value to understand multiple foundational perspectives at once, to get a truly stereoscopic view of the subject.]
It turns out that non-commutative probability can be modeled using operator algebras such as -algebras, von Neumann algebras, or algebras of bounded operators on a Hilbert space, with the latter being accomplished via the Gelfand-Naimark-Segal construction. We will discuss some of these models here, but just as probability theory seeks to abstract away its measure-theoretic models, the philosophy of non-commutative probability is also to downplay these operator algebraic models once some foundational issues are settled.
When one generalises the set of structures in one’s theory, for instance from the commutative setting to the non-commutative setting, the notion of what it means for a structure to be “universal”, “free”, or “independent” can change. The most familiar example of this comes from group theory. If one restricts attention to the category of abelian groups, then the “freest” object one can generate from two generators is the free abelian group of commutative words
with
, which is isomorphic to the group
. If however one generalises to the non-commutative setting of arbitrary groups, then the “freest” object that can now be generated from two generators
is the free group
of non-commutative words
with
, which is a significantly larger extension of the free abelian group
.
Similarly, when generalising classical probability theory to non-commutative probability theory, the notion of what it means for two or more random variables to be independent changes. In the classical (commutative) setting, two (bounded, real-valued) random variables are independent if one has
whenever are well-behaved functions (such as polynomials) such that all of
,
vanishes. In the non-commutative setting, one can generalise the above definition to two commuting bounded self-adjoint variables; this concept is useful for instance in quantum probability, which is an abstraction of the theory of observables in quantum mechanics. But for two (bounded, self-adjoint) non-commutative random variables
, the notion of classical independence no longer applies. As a substitute, one can instead consider the notion of being freely independent (or free for short), which means that
whenever are well-behaved functions such that all of
vanish.
The concept of free independence was introduced by Voiculescu, and its study is now known as the subject of free probability. We will not attempt a systematic survey of this subject here; for this, we refer the reader to the surveys of Speicher and of Biane. Instead, we shall just discuss a small number of topics in this area to give the flavour of the subject only.
The significance of free probability to random matrix theory lies in the fundamental observation that random matrices which are independent in the classical sense, also tend to be independent in the free probability sense, in the large limit
. (This is only possible because of the highly non-commutative nature of these matrices; as we shall see, it is not possible for non-trivial commuting independent random variables to be freely independent.) Because of this, many tedious computations in random matrix theory, particularly those of an algebraic or enumerative combinatorial nature, can be done more quickly and systematically by using the framework of free probability, which by design is optimised for algebraic tasks rather than analytical ones.
Much as free groups are in some sense “maximally non-commutative”, freely independent random variables are about as far from being commuting as possible. For instance, if are freely independent and of expectation zero, then
vanishes, but
instead factors as
. As a consequence, the behaviour of freely independent random variables can be quite different from the behaviour of their classically independent commuting counterparts. Nevertheless there is a remarkably strong analogy between the two types of independence, in that results which are true in the classically independent case often have an interesting analogue in the freely independent setting. For instance, the central limit theorem (Notes 2) for averages of classically independent random variables, which roughly speaking asserts that such averages become gaussian in the large
limit, has an analogue for averages of freely independent variables, the free central limit theorem, which roughly speaking asserts that such averages become semicircular in the large
limit. One can then use this theorem to provide yet another proof of Wigner’s semicircle law (Notes 4).
Another important (and closely related) analogy is that while the distribution of sums of independent commutative random variables can be quickly computed via the characteristic function (i.e. the Fourier transform of the distribution), the distribution of sums of freely independent non-commutative random variables can be quickly computed using the Stieltjes transform instead (or with closely related objects, such as the -transform of Voiculescu). This is strongly reminiscent of the appearance of the Stieltjes transform in random matrix theory, and indeed we will see many parallels between the use of the Stieltjes transform here and in Notes 4.
As mentioned earlier, free probability is an excellent tool for computing various expressions of interest in random matrix theory, such as asymptotic values of normalised moments in the large limit
. Nevertheless, as it only covers the asymptotic regime in which
is sent to infinity while holding all other parameters fixed, there are some aspects of random matrix theory to which the tools of free probability are not sufficient by themselves to resolve (although it can be possible to combine free probability theory with other tools to then answer these questions). For instance, questions regarding the rate of convergence of normalised moments as
are not directly answered by free probability, though if free probability is combined with tools such as concentration of measure (Notes 1) then such rate information can often be recovered. For similar reasons, free probability lets one understand the behaviour of
moments as
for fixed
, but has more difficulty dealing with the situation in which
is allowed to grow slowly in
(e.g.
). Because of this, free probability methods are effective at controlling the bulk of the spectrum of a random matrix, but have more difficulty with the edges of that spectrum (as well as with related concepts such as the operator norm, Notes 3) as well as with fine-scale structure of the spectrum. Finally, free probability methods are most effective when dealing with matrices that are Hermitian with bounded operator norm, largely because the spectral theory of bounded self-adjoint operators in the infinite-dimensional setting of the large
limit is non-pathological. (This is ultimately due to the stable nature of eigenvalues in the self-adjoint setting; see this previous blog post for discussion.) For non-self-adjoint operators, free probability needs to be augmented with additional tools, most notably by bounds on least singular values, in order to recover the required stability for the various spectral data of random matrices to behave continuously with respect to the large
limit. We will discuss this latter point in a later set of notes.
We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble of Hermitian matrices (thus
; this includes real symmetric matrices as an important special case), in which the upper-triangular entries
,
are iid complex random variables with mean zero and unit variance, and the diagonal entries
are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).
In previous notes we saw that the operator norm of was typically of size
, so it is natural to work with the normalised matrix
. Accordingly, given any
Hermitian matrix
, we can form the (normalised) empirical spectral distribution (or ESD for short)
of , where
are the (necessarily real) eigenvalues of
, counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of
.
When is a random matrix ensemble, then the ESD
is now a random measure – i.e. a random variable taking values in the space
of probability measures on the real line. (Thus, the distribution of
is a probability measure on probability measures!)
Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles as
. Recall from Notes 0 that for any sequence of random variables in a
-compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on
, and to deterministic limits, we see that a sequence of random ESDs
converge in probability (resp. converge almost surely) to a deterministic limit
(which, confusingly enough, is a deterministic probability measure!) if, for every test function
, the quantities
converge in probability (resp. converge almost surely) to
.
Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD
, one can form its expectation
, defined via duality (the Riesz representation theorem) as
this probability measure can be viewed as the law of a random eigenvalue
drawn from a random matrix
from the ensemble. We then say that the ESDs converge in expectation to a limit
if
converges the vague topology to
, thus
for all
.
In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.
Exercise 1 Let
be a sequence of
Hermitian matrix ensembles, and let
be a continuous probability measure on
.
- Show that
converges almost surely to
if and only if
converges almost surely to
for all
.
- Show that
converges in probability to
if and only if
converges in probability to
for all
.
- Show that
converges in expectation to
if and only if
converges to
for all
.
We can now state the Wigner semi-circular law.
Theorem 1 (Semicircular law) Let
be the top left
minors of an infinite Wigner matrix
. Then the ESDs
converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution
A numerical example of this theorem in action can be seen at the MathWorld entry for this law.
The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices almost surely has operator norm at most
. Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval
with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of
is almost surely at least
, and thus (in the finite fourth moment case) the norm is in fact equal to
. Indeed, we have just shown that the circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).
As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.
There are several ways to prove (or at least to heuristically justify) the circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).
One theme in this course will be the central nature played by the gaussian random variables . Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.
One way to exploit this algebraic structure is to continuously deform the variance from an initial variance of zero (so that the random variable is deterministic) to some final level
. We would like to use this to give a continuous family
of random variables
as
(viewed as a “time” parameter) runs from
to
.
At present, we have not completely specified what should be, because we have only described the individual distribution
of each
, and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.
We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of
Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.

Recent Comments