You are currently browsing the category archive for the ‘math.SP’ category.

In the previous set of notes we introduced the notion of expansion in arbitrary {k}-regular graphs. For the rest of the course, we will now focus attention primarily to a special type of {k}-regular graph, namely a Cayley graph.

Definition 1 (Cayley graph) Let {G = (G,\cdot)} be a group, and let {S} be a finite subset of {G}. We assume that {S} is symmetric (thus {s^{-1} \in S} whenever {s \in S}) and does not contain the identity {1} (this is to avoid loops). Then the (right-invariant) Cayley graph {Cay(G,S)} is defined to be the graph with vertex set {G} and edge set {\{ \{sx,x\}: x \in G, s \in S \}}, thus each vertex {x \in G} is connected to the {|S|} elements {sx} for {s \in S}, and so {Cay(G,S)} is a {|S|}-regular graph.

Example 1 The graph in Exercise 3 of Notes 1 is the Cayley graph on {{\bf Z}/N{\bf Z}} with generators {S = \{-1,+1\}}.

Remark 1 We call the above Cayley graphs right-invariant because every right translation {x\mapsto xg} on {G} is a graph automorphism of {Cay(G,S)}. This group of automorphisms acts transitively on the vertex set of the Cayley graph. One can thus view a Cayley graph as a homogeneous space of {G}, as it “looks the same” from every vertex. One could of course also consider left-invariant Cayley graphs, in which {x} is connected to {xs} rather than {sx}. However, the two such graphs are isomorphic using the inverse map {x \mapsto x^{-1}}, so we may without loss of generality restrict our attention throughout to left Cayley graphs.

Remark 2 For minor technical reasons, it will be convenient later on to allow {S} to contain the identity and to come with multiplicity (i.e. it will be a multiset rather than a set). If one does so, of course, the resulting Cayley graph will now contain some loops and multiple edges.

For the purposes of building expander families, we would of course want the underlying group {G} to be finite. However, it will be convenient at various times to “lift” a finite Cayley graph up to an infinite one, and so we permit {G} to be infinite in our definition of a Cayley graph.

We will also sometimes consider a generalisation of a Cayley graph, known as a Schreier graph:

Definition 2 (Schreier graph) Let {G} be a finite group that acts (on the left) on a space {X}, thus there is a map {(g,x) \mapsto gx} from {G \times X} to {X} such that {1x = x} and {(gh)x = g(hx)} for all {g,h \in G} and {x \in X}. Let {S} be a symmetric subset of {G} which acts freely on {X} in the sense that {sx \neq x} for all {s \in S} and {x \in X}, and {sx \neq s'x} for all distinct {s,s' \in S} and {x \in X}. Then the Schreier graph {Sch(X,S)} is defined to be the graph with vertex set {X} and edge set {\{ \{sx,x\}: x \in X, s \in S \}}.

Example 2 Every Cayley graph {Cay(G,S)} is also a Schreier graph {Sch(G,S)}, using the obvious left-action of {G} on itself. The {k}-regular graphs formed from {l} permutations {\pi_1,\ldots,\pi_l \in S_n} that were studied in the previous set of notes is also a Schreier graph provided that {\pi_i(v) \neq v, \pi_i^{-1}(v), \pi_j(v)} for all distinct {1 \leq i,j \leq l}, with the underlying group being the permutation group {S_n} (which acts on the vertex set {X = \{1,\ldots,n\}} in the obvious manner), and {S := \{\pi_1,\ldots,\pi_l,\pi_1^{-1},\ldots,\pi_l^{-1}\}}.

Exercise 1 If {k} is an even integer, show that every {k}-regular graph is a Schreier graph involving a set {S} of generators of cardinality {k/2}. (Hint: first show that every {k}-regular graph can be decomposed into {k/2} unions of cycles, each of which partition the vertex set, then use the previous example.

We return now to Cayley graphs. It is easy to characterise qualitative expansion properties of Cayley graphs:

Exercise 2 (Qualitative expansion) Let {Cay(G,S)} be a finite Cayley graph.

  • (i) Show that {Cay(G,S)} is a one-sided {\epsilon}-expander for {G} for some {\epsilon>0} if and only if {S} generates {G}.
  • (ii) Show that {Cay(G,S)} is a two-sided {\epsilon}-expander for {G} for some {\epsilon>0} if and only if {S} generates {G}, and furthermore {S} intersects each index {2} subgroup of {G}.

We will however be interested in more quantitative expansion properties, in which the expansion constant {\epsilon} is independent of the size of the Cayley graph, so that one can construct non-trivial expander families {Cay(G_n,S_n)} of Cayley graphs.

One can analyse the expansion of Cayley graphs in a number of ways. For instance, by taking the edge expansion viewpoint, one can study Cayley graphs combinatorially, using the product set operation

\displaystyle  A \cdot B := \{ab: a \in A, b \in B \}

of subsets of {G}.

Exercise 3 (Combinatorial description of expansion) Let {Cay(G_n,S_n)} be a family of finite {k}-regular Cayley graphs. Show that {Cay(G_n,S_n)} is a one-sided expander family if and only if there is a constant {c>0} independent of {n} such that {|E_n \cup E_n S_n| \geq (1+c) |E_n|} for all sufficiently large {n} and all subsets {E_n} of {G_n} with {|E_n| \leq |G_n|/2}.

One can also give a combinatorial description of two-sided expansion, but it is more complicated and we will not use it here.

Exercise 4 (Abelian groups do not expand) Let {Cay(G_n,S_n)} be a family of finite {k}-regular Cayley graphs, with the {G_n} all abelian, and the {S_n} generating {G_n}. Show that {Cay(G_n,S_n)} are a one-sided expander family if and only if the Cayley graphs have bounded cardinality (i.e. {\sup_n |G_n| < \infty}). (Hint: assume for contradiction that {Cay(G_n,S_n)} is a one-sided expander family with {|G_n| \rightarrow \infty}, and show by two different arguments that {\sup_n |S_n^m|} grows at least exponentially in {m} and also at most polynomially in {m}, giving the desired contradiction.)

The left-invariant nature of Cayley graphs also suggests that such graphs can be profitably analysed using some sort of Fourier analysis; as the underlying symmetry group is not necessarily abelian, one should use the Fourier analysis of non-abelian groups, which is better known as (unitary) representation theory. The Fourier-analytic nature of Cayley graphs can be highlighted by recalling the operation of convolution of two functions {f, g \in \ell^2(G)}, defined by the formula

\displaystyle  f * g(x) := \sum_{y \in G} f(y) g(y^{-1} x) = \sum_{y \in G} f(x y^{-1}) g(y).

This convolution operation is bilinear and associative (at least when one imposes a suitable decay condition on the functions, such as compact support), but is not commutative unless {G} is abelian. (If one is more algebraically minded, one can also identify {\ell^2(G)} (when {G} is finite, at least) with the group algebra {{\bf C} G}, in which case convolution is simply the multiplication operation in this algebra.) The adjacency operator {A} on a Cayley graph {Cay(G,S)} can then be viewed as a convolution

\displaystyle  Af = |S| \mu * f,

where {\mu} is the probability density

\displaystyle  \mu := \frac{1}{|S|} \sum_{s \in S} \delta_s \ \ \ \ \ (1)

where {\delta_s} is the Kronecker delta function on {s}. Using the spectral definition of expansion, we thus see that {Cay(G,S)} is a one-sided expander if and only if

\displaystyle  \langle f, \mu*f \rangle \leq (1-\epsilon) \|f\|_{\ell^2(G)} \ \ \ \ \ (2)

whenever {f \in \ell^2(G)} is orthogonal to the constant function {1}, and is a two-sided expander if

\displaystyle  \| \mu*f \|_{\ell^2(G)} \leq (1-\epsilon) \|f\|_{\ell^2(G)} \ \ \ \ \ (3)

whenever {f \in \ell^2(G)} is orthogonal to the constant function {1}.

We remark that the above spectral definition of expansion can be easily extended to symmetric sets {S} which contain the identity or have multiplicity (i.e. are multisets). (We retain symmetry, though, in order to keep the operation of convolution by {\mu} self-adjoint.) In particular, one can say (with some slight abuse of notation) that a set of elements {s_1,\ldots,s_l} of {G} (possibly with repetition, and possibly with some elements equalling the identity) generates a one-sided or two-sided {\epsilon}-expander if the associated symmetric probability density

\displaystyle  \mu := \frac{1}{2l} \sum_{i=1}^l \delta_{s_i} + \delta_{s_i^{-1}}

obeys either (2) or (3).

We saw in the last set of notes that expansion can be characterised in terms of random walks. One can of course specialise this characterisation to the Cayley graph case:

Exercise 5 (Random walk description of expansion) Let {Cay(G_n,S_n)} be a family of finite {k}-regular Cayley graphs, and let {\mu_n} be the associated probability density functions. Let {A > 1/2} be a constant.

  • Show that the {Cay(G_n,S_n)} are a two-sided expander family if and only if there exists a {C>0} such that for all sufficiently large {n}, one has {\| \mu_n^{*m} - \frac{1}{|G_n|} \|_{\ell^2(G_n)} \leq \frac{1}{|G_n|^A}} for some {m \leq C \log |G_n|}, where {\mu_n^{*m} := \mu_n * \ldots * \mu_n} denotes the convolution of {m} copies of {\mu_n}.
  • Show that the {Cay(G_n,S_n)} are a one-sided expander family if and only if there exists a {C>0} such that for all sufficiently large {n}, one has {\| (\frac{1}{2} \delta_1 + \frac{1}{2} \mu_n)^{*m} - \frac{1}{|G_n|} \|_{\ell^2(G_n)} \leq \frac{1}{|G_n|^A}} for some {m \leq C \log |G_n|}.

In this set of notes, we will connect expansion of Cayley graphs to an important property of certain infinite groups, known as Kazhdan’s property (T) (or property (T) for short). In 1973, Margulis exploited this property to create the first known explicit and deterministic examples of expanding Cayley graphs. As it turns out, property (T) is somewhat overpowered for this purpose; in particular, we now know that there are many families of Cayley graphs for which the associated infinite group does not obey property (T) (or weaker variants of this property, such as property {\tau}). In later notes we will therefore turn to other methods of creating Cayley graphs that do not rely on property (T). Nevertheless, property (T) is of substantial intrinsic interest, and also has many connections to other parts of mathematics than the theory of expander graphs, so it is worth spending some time to discuss it here.

The material here is based in part on this recent text on property (T) by Bekka, de la Harpe, and Valette (available online here).

Read the rest of this entry »

The objective of this course is to present a number of recent constructions of expander graphs, which are a type of sparse but “pseudorandom” graph of importance in computer science, the theory of random walks, geometric group theory, and in number theory. The subject of expander graphs and their applications is an immense one, and we will not possibly be able to cover it in full in this course. In particular, we will say almost nothing about the important applications of expander graphs to computer science, for instance in constructing good pseudorandom number generators, derandomising a probabilistic algorithm, constructing error correcting codes, or in building probabilistically checkable proofs. For such topics, I recommend the survey of Hoory-Linial-Wigderson. We will also only pass very lightly over the other applications of expander graphs, though if time permits I may discuss at the end of the course the application of expander graphs in finite groups such as {SL_2(F_p)} to certain sieving problems in analytic number theory, following the work of Bourgain, Gamburd, and Sarnak.

Instead of focusing on applications, this course will concern itself much more with the task of constructing expander graphs. This is a surprisingly non-trivial problem. On one hand, an easy application of the probabilistic method shows that a randomly chosen (large, regular, bounded-degree) graph will be an expander graph with very high probability, so expander graphs are extremely abundant. On the other hand, in many applications, one wants an expander graph that is more deterministic in nature (requiring either no or very few random choices to build), and of a more specialised form. For the applications to number theory or geometric group theory, it is of particular interest to determine the expansion properties of a very symmetric type of graph, namely a Cayley graph; we will also occasionally work with the more general concept of a Schreier graph. It turns out that such questions are related to deep properties of various groups {G} of Lie type (such as {SL_2({\bf R})} or {SL_2({\bf Z})}), such as Kazhdan’s property (T), the first nontrivial eigenvalue of a Laplacian on a symmetric space {G/\Gamma} associated to {G}, the quasirandomness of {G} (as measured by the size of irreducible representations), and the product theory of subsets of {G}. These properties are of intrinsic interest to many other fields of mathematics (e.g. ergodic theory, operator algebras, additive combinatorics, representation theory, finite group theory, number theory, etc.), and it is quite remarkable that a single problem – namely the construction of expander graphs – is so deeply connected with such a rich and diverse array of mathematical topics. (Perhaps this is because so many of these fields are all grappling with aspects of a single general problem in mathematics, namely when to determine whether a given mathematical object or process of interest “behaves pseudorandomly” or not, and how this is connected with the symmetry group of that object or process.)

(There are also other important constructions of expander graphs that are not related to Cayley or Schreier graphs, such as those graphs constructed by the zigzag product construction, but we will not discuss those types of graphs in this course, again referring the reader to the survey of Hoory, Linial, and Wigderson.)

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant {\det M_n} of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant {\det A_n} of a random iid matrix {A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf R}}), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf C}}, thus the real and imaginary parts are independent with law {N(0,1/2)_{\bf R}}), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution {\zeta_{ij} \equiv \pm 1} (with a {1/2} chance of either sign). More generally, one can consider a matrix {A_n} in which all the entries {\zeta_{ij}} are independently and identically distributed with mean zero and variance {1}.

We can expand {\det A_n} using the Leibniz expansion

\displaystyle  \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)

where {\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}} ranges over the permutations of {\{1,\ldots,n\}}, and {I_\sigma} is the product

\displaystyle  I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.

From the iid nature of the {\zeta_{ij}}, we easily see that each {I_\sigma} has mean zero and variance one, and are pairwise uncorrelated as {\sigma} varies. We conclude that {\det A_n} has mean zero and variance {n!} (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that {\det A_n} is typically of size {O(\sqrt{n!})}.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, {\det A_n} is clearly symmetrical, so we can focus attention on the magnitude {|\det A_n|}. We can interpret this quantity geometrically as the volume of an {n}-dimensional parallelopiped whose generating vectors {X_1,\ldots,X_n} are independent real gaussian vectors in {{\bf R}^n} (i.e. their coefficients are iid with law {N(0,1)_{\bf R}}). Using the classical base-times-height formula, we thus have

\displaystyle  |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)

where {V_i} is the {i-1}-dimensional linear subspace of {{\bf R}^n} spanned by {X_1,\ldots,X_{i-1}} (note that {X_1,\ldots,X_n}, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

\displaystyle  \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group {O(n)}. Because of this, we see that if we fix {X_1,\ldots,X_{i-1}} (and thus {V_i}, the random variable {\hbox{dist}(X_i,V_i)} has the same distribution as {\hbox{dist}(X_i,{\bf R}^{i-1})}, or equivalently the {\chi} distribution

\displaystyle  \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}

where {\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}} are iid copies of {N(0,1)_{\bf R}}. As this distribution does not depend on the {X_1,\ldots,X_{i-1}}, we conclude that the law of {\log |\det A_n|} is given by the sum of {n} independent {\chi}-variables:

\displaystyle  \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.

A standard computation shows that each {\chi_j^2} has mean {j} and variance {2j}, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that {\log \chi_j} has mean {\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})} and variance {\frac{1}{2j}+O(1/j^{3/2})}. As such, {\log |\det A_n|} has mean {\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)} and variance {\frac{1}{2} \log n + O(1)}. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)

where {\rightarrow} denotes convergence in distribution. A bit more informally, we have

\displaystyle  |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)

when {A_n} is a real gaussian matrix; thus, for instance, the median value of {|\det A_n|} is {n^{-1/2+o(1)} \sqrt{n!}}. At first glance, this appears to conflict with the second moment bound {\mathop{\bf E} |\det A_n|^2 = n!} of Turán mentioned earlier, but once one recalls that {\exp(N(0,t)_{\bf R})} has a second moment of {\exp(2t)}, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real {\chi} distribution with the complex {\chi} distribution, in which the {\xi_{i,j}} are distributed according to the complex gaussian {N(0,1)_{\bf C}} instead of the real one). At the end of the day, one ends up with the law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)

or more informally

\displaystyle  |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices {A_n} by Wigner matrices {M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus {\zeta_{ij} = \overline{\zeta_{ji}}} for all {i,j}. Model examples here include the Gaussian Unitary Ensemble (GUE), in which {\zeta_{ij} \equiv N(0,1)_{\bf C}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i=j \leq n}, the Gaussian Orthogonal Ensemble (GOE), in which {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,2)_{\bf R}} for {1 \leq i=j \leq n}, and the symmetric Bernoulli ensemble, in which {\zeta_{ij} \equiv \pm 1} for {1 \leq i \leq j \leq n} (with probability {1/2} of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants {\det M_n} of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the {I_\sigma} are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of {I_\sigma} is still usually zero, but equals {(-1)^{n/2}} in the exceptional case when {\sigma} is a perfect matching (i.e. the union of exactly {n/2} {2}-cycles, a possibility that can of course only happen when {n} is even). As such, the mean {\mathop{\bf E} \det M_n} still vanishes when {n} is odd, but for even {n} it is equal to

\displaystyle  (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}

(the fraction here simply being the number of perfect matchings on {n} vertices). Using Stirling’s formula, one then computes that {|\mathop{\bf E} \det A_n|} is comparable to {n^{-1/4} \sqrt{n!}} when {n} is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that {\mathop{\bf E} |\det A_n|^2} is comparable to {n^{1/2} n!} for GUE and {n^{3/2} n!} for GOE. (The discrepancy here comes from the fact that in the GOE case, {I_\sigma} and {I_\rho} can correlate when {\rho} contains reversals of {k}-cycles of {\sigma} for {k \geq 3}, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let {M_n} be a Wigner matrix.

  • If {M_n} is drawn from GUE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.

  • If {M_n} is drawn from GOE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.

  • The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )

when {M_n} is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )

when {M_n} is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

\displaystyle  \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)

of the Stieltjes transform

\displaystyle  s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}

of {M_n}. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) \cite{trotter} Let {M'_n} be the random tridiagonal real symmetric matrix

\displaystyle  M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}

where the {a_1,\ldots,a_n, b_1,\ldots,b_{n-1}} are jointly independent real random variables, with {a_1,\ldots,a_n \equiv N(0,1)_{\bf R}} being standard real Gaussians, and each {b_i} having a {\chi}-distribution:

\displaystyle  b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}

where {z_{i,j} \equiv N(0,1)_{\bf C}} are iid complex gaussians. Let {M_n} be drawn from GUE. Then the joint eigenvalue distribution of {M_n} is identical to the joint eigenvalue distribution of {M'_n}.

Proof: Let {M_n} be drawn from GUE. We can write

\displaystyle  M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}

where {M_{n-1}} is drawn from the {n-1\times n-1} GUE, {a_n \equiv N(0,1)_{\bf R}}, and {X_n \in {\bf C}^{n-1}} is a random gaussian vector with all entries iid with distribution {N(0,1)_{\bf C}}. Furthermore, {M_{n-1}, X_n, a_n} are jointly independent.

We now apply the tridiagonal matrix algorithm. Let {b_{n-1} := |X_n|}, then {b_n} has the {\chi}-distribution indicated in the proposition. We then conjugate {M_n} by a unitary matrix {U} that preserves the final basis vector {e_n}, and maps {X} to {b_{n-1} e_{n-1}}. Then we have

\displaystyle  U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}

where {\tilde M_{n-1}} is conjugate to {M_{n-1}}. Now we make the crucial observation: because {M_{n-1}} is distributed according to GUE (which is a unitarily invariant ensemble), and {U} is a unitary matrix independent of {M_{n-1}}, {\tilde M_{n-1}} is also distributed according to GUE, and remains independent of both {b_{n-1}} and {a_n}.

We continue this process, expanding {U M_n U^*} as

\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}

Applying a further unitary conjugation that fixes {e_{n-1}, e_n} but maps {X_{n-1}} to {b_{n-2} e_{n-2}}, we may replace {X_{n-1}} by {b_{n-2} e_{n-2}} while transforming {M_{n-2}} to another GUE matrix {\tilde M_{n-2}} independent of {a_n, b_{n-1}, a_{n-1}, b_{n-2}}. Iterating this process, we eventually obtain a coupling of {M_n} to {M'_n} by unitary conjugations, and the claim follows. \Box

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant {D_n} of the above matrix is given by solving the recursion

\displaystyle  D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}

with {D_0=1} and {D_{-1} = 0}. Thus, instead of the product of a sequence of independent scalar {\chi} distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent {2\times 2} matrices whose entries are given by gaussians and {\chi} distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these {2 \times 2} matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

Igor Rodnianski and I have just uploaded to the arXiv our paper “Effective limiting absorption principles, and applications“, submitted to Communications in Mathematical Physics. In this paper we derive limiting absorption principles (of type discussed in this recent post) for a general class of Schrödinger operators {H = -\Delta + V} on a wide class of manifolds, namely the asymptotically conic manifolds. The precise definition of such manifolds is somewhat technical, but they include as a special case the asymptotically flat manifolds, which in turn include as a further special case the smooth compact perturbations of Euclidean space {{\bf R}^n} (i.e. the smooth Riemannian manifolds that are identical to {{\bf R}^n} outside of a compact set). The potential {V} is assumed to be a short range potential, which roughly speaking means that it decays faster than {1/|x|} as {x \rightarrow \infty}; for several of the applications (particularly at very low energies) we need to in fact assume that {V} is a strongly short range potential, which roughly speaking means that it decays faster than {1/|x|^2}.

To begin with, we make no hypotheses about the topology or geodesic geometry of the manifold {M}; in particular, we allow {M} to be trapping in the sense that it contains geodesic flows that do not escape to infinity, but instead remain trapped in a bounded subset of {M}. We also allow the potential {V} to be signed, which in particular allows bound states (eigenfunctions of negative energy) to be created. For standard technical reasons we restrict attention to dimensions three and higher: {d \geq 3}.

It is well known that such Schrödinger operators {H} are essentially self-adjoint, and their spectrum consists of purely absolutely continuous spectrum on {(0,+\infty)}, together with possibly some eigenvalues at zero and negative energy (and at zero energy and in dimensions three and four, there are also the possibility of resonances which, while not strictly eigenvalues, have a somewhat analogous effect on the dynamics of the Laplacian and related objects, such as resolvents). In particular, the resolvents {R(\lambda \pm i\epsilon) := (H - \lambda \mp i\epsilon)^{-1}} make sense as bounded operators on {L^2(M)} for any {\lambda \in {\bf R}} and {\epsilon > 0}. As discussed in the previous blog post, it is of interest to obtain bounds for the behaviour of these resolvents, as this can then be used via some functional calculus manipulations to obtain control on many other operators and PDE relating to the Schrödinger operator {H}, such as the Helmholtz equation, the time-dependent Schrödinger equation, and the wave equation. In particular, it is of interest to obtain limiting absorption estimates such as

\displaystyle  \| R(\lambda \pm i\epsilon) f \|_{H^{0,-1/2-\sigma}(M)} \leq C(M,V,\lambda,\sigma) \| f \|_{H^{0,1/2+\sigma}(M)} \ \ \ \ \ (1)

for {\lambda \in {\bf R}} (and particularly in the positive energy regime {\lambda>0}), where {\sigma,\epsilon > 0} and {f} is an arbitrary test function. The constant {C(M,V,\lambda,\sigma)} needs to be independent of {\epsilon} for such estimates to be truly useful, but it is also of interest to determine the extent to which these constants depend on {M}, {V}, and {\lambda}. The dependence on {\sigma} is relatively uninteresting and henceforth we will suppress it. In particular, our paper focused to a large extent on quantitative methods that could give effective bounds on {C(M,V,\lambda)} in terms of quantities such as the magnitude {A} of the potential {V} in a suitable norm.

It turns out to be convenient to distinguish between three regimes:

  • The high-energy regime {\lambda \gg 1};
  • The medium-energy regime {\lambda \sim 1}; and
  • The low-energy regime {0 < \lambda \ll 1}.

Our methods actually apply more or less uniformly to all three regimes, but the nature of the conclusions is quite different in each of the three regimes.

The high-energy regime {\lambda \gg 1} was essentially worked out by Burq, although we give an independent treatment of Burq’s results here. In this regime it turns out that we have an unconditional estimate of the form (1) with a constant of the shape

\displaystyle  C(M,V,\lambda) = C(M,A) e^{C(M,A) \sqrt{\lambda}}

where {C(M,A)} is a constant that depends only on {M} and on a parameter {A} that controls the size of the potential {V}. This constant, while exponentially growing, is still finite, which among other things is enough to rule out the possibility that {H} contains eigenfunctions (i.e. point spectrum) embedded in the high-energy portion of the spectrum. As is well known, if {M} contains a certain type of trapped geodesic (in particular those arising from positively curved portions of the manifold, such as the equator of a sphere), then it is possible to construct pseudomodes {f} that show that this sort of exponential growth is necessary. On the other hand, if we make the non-trapping hypothesis that all geodesics in {M} escape to infinity, then we can obtain a much stronger high-energy limiting absorption estimate, namely

\displaystyle  C(M,V,\lambda,\sigma) = C(M,A) \lambda^{-1/2}.

The exponent {1/2} here is closely related to the standard fact that on non-trapping manifolds, there is a local smoothing effect for the time-dependent Schrödinger equation that gains half a derivative of regularity (cf. previous blog post). In the high-energy regime, the dynamics are well-approximated by semi-classical methods, and in particular one can use tools such as the positive commutator method and pseudo-differential calculus to obtain the desired estimates. In case of trapping one also needs the standard technique of Carleman inequalities to control the compact (and possibly trapping) core of the manifold, and in particular needing the delicate two-weight Carleman inequalities of Burq.

In the medium and low energy regimes one needs to work harder. In the medium energy regime {\lambda \sim 1}, we were able to obtain a uniform bound

\displaystyle  C(M,V,\lambda) \leq C(M,A)

for all asymptotically conic manifolds (trapping or not) and all short-range potentials. To establish this bound, we have to supplement the existing tools of the positive commutator method and Carleman inequalities with an additional ODE-type analysis of various energies of the solution {u = R(\lambda \pm i\epsilon) f} to a Helmholtz equation on large spheres, as will be discussed in more detail below the fold.

The methods also extend to the low-energy regime {0 < \lambda \ll 1}. Here, the bounds become somewhat interesting, with a subtle distinction between effective estimates that are uniform over all potentials {V} which are bounded in a suitable sense by a parameter {A} (e.g. obeying {|V(x)| \leq A \langle x \rangle^{-2-2\sigma}} for all {x}), and ineffective estimates that exploit qualitative properties of {V} (such as the absence of eigenfunctions or resonances at zero) and are thus not uniform over {V}. On the effective side, and for potentials that are strongly short range (at least at local scales {|x| = O(\lambda^{-1/2})}; one can tolerate merely short-range behaviour at more global scales, but this is a technicality that we will not discuss further here) we were able to obtain a polynomial bound of the form

\displaystyle  C(M,V,\lambda) \leq C(M,A) \lambda^{-C(M,A)}

that blew up at a large polynomial rate at the origin. Furthermore, by carefully designing a sequence of potentials {V} that induce near-eigenfunctions that resemble two different Bessel functions of the radial variable glued together, we are able to show that this type of polynomial bound is sharp in the following sense: given any constant {C > 0}, there exists a sequence {V_n} of potentials on Euclidean space {{\bf R}^d} uniformly bounded by {A}, and a sequence {\lambda_n} of energies going to zero, such that

\displaystyle  C({\bf R}^d,V_n,\lambda_n) \geq \lambda_n^{-C}.

This shows that if one wants bounds that are uniform in the potential {V}, then arbitrary polynomial blowup is necessary.

Interestingly, though, if we fix the potential {V}, and then ask for bounds that are not necessarily uniform in {V}, then one can do better, as was already observed in a classic paper of Jensen and Kato concerning power series expansions of the resolvent near the origin. In particular, if we make the spectral assumption that {V} has no eigenfunctions or resonances at zero, then an argument (based on (a variant of) the Fredholm alternative, which as discussed in this recent blog post gives ineffective bounds) gives a bound of the form

\displaystyle  C(M,V,\lambda) \leq C(M,V) \lambda^{-1/2}

in the low-energy regime (but note carefully here that the constant {C(M,V)} on the right-hand side depends on the potential {V} itself, and not merely on the parameter {A} that upper bounds it). Even if there are eigenvalues or resonances, it turns out that one can still obtain a similar bound but with an exponent of {\lambda^{-3/2}} instead of {\lambda^{-1/2}}. This limited blowup at infinity is in sharp contrast to the arbitrarily large polynomial blowup rate that can occur if one demands uniform bounds. (This particular subtlety between uniform and non-uniform estimates confused us, by the way, for several weeks; for a long time we thought that we had somehow found a contradiction between our results and the results of Jensen and Kato.)

As applications of our limiting absorption estimates, we give local smoothing and dispersive estimates for solutions (as well as the closely related RAGE type theorems) to the time-dependent Schrödinger and wave equations, and also reprove standard facts about the spectrum of Schrödinger operators in this setting.

Read the rest of this entry »

Perhaps the most fundamental differential operator on Euclidean space {{\bf R}^d} is the Laplacian

\displaystyle  \Delta := \sum_{j=1}^d \frac{\partial^2}{\partial x_j^2}.

The Laplacian is a linear translation-invariant operator, and as such is necessarily diagonalised by the Fourier transform

\displaystyle  \hat f(\xi) := \int_{{\bf R}^d} f(x) e^{-2\pi i x \cdot \xi}\ dx.

Indeed, we have

\displaystyle  \widehat{\Delta f}(\xi) = - 4 \pi^2 |\xi|^2 \hat f(\xi)

for any suitably nice function {f} (e.g. in the Schwartz class; alternatively, one can work in very rough classes, such as the space of tempered distributions, provided of course that one is willing to interpret all operators in a distributional or weak sense).

Because of this explicit diagonalisation, it is a straightforward manner to define spectral multipliers {m(-\Delta)} of the Laplacian for any (measurable, polynomial growth) function {m: [0,+\infty) \rightarrow {\bf C}}, by the formula

\displaystyle  \widehat{m(-\Delta) f}(\xi) := m( 4\pi^2 |\xi|^2 ) \hat f(\xi).

(The presence of the minus sign in front of the Laplacian has some minor technical advantages, as it makes {-\Delta} positive semi-definite. One can also define spectral multipliers more abstractly from general functional calculus, after establishing that the Laplacian is essentially self-adjoint.) Many of these multipliers are of importance in PDE and analysis, such as the fractional derivative operators {(-\Delta)^{s/2}}, the heat propagators {e^{t\Delta}}, the (free) Schrödinger propagators {e^{it\Delta}}, the wave propagators {e^{\pm i t \sqrt{-\Delta}}} (or {\cos(t \sqrt{-\Delta})} and {\frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}}}, depending on one’s conventions), the spectral projections {1_I(\sqrt{-\Delta})}, the Bochner-Riesz summation operators {(1 + \frac{\Delta}{4\pi^2 R^2})_+^\delta}, or the resolvents {R(z) := (-\Delta-z)^{-1}}.

Each of these families of multipliers are related to the others, by means of various integral transforms (and also, in some cases, by analytic continuation). For instance:

  1. Using the Laplace transform, one can express (sufficiently smooth) multipliers in terms of heat operators. For instance, using the identity

    \displaystyle  \lambda^{s/2} = \frac{1}{\Gamma(-s/2)} \int_0^\infty t^{-1-s/2} e^{-t\lambda}\ dt

    (using analytic continuation if necessary to make the right-hand side well-defined), with {\Gamma} being the Gamma function, we can write the fractional derivative operators in terms of heat kernels:

    \displaystyle  (-\Delta)^{s/2} = \frac{1}{\Gamma(-s/2)} \int_0^\infty t^{-1-s/2} e^{t\Delta}\ dt. \ \ \ \ \ (1)

  2. Using analytic continuation, one can connect heat operators {e^{t\Delta}} to Schrödinger operators {e^{it\Delta}}, a process also known as Wick rotation. Analytic continuation is a notoriously unstable process, and so it is difficult to use analytic continuation to obtain any quantitative estimates on (say) Schrödinger operators from their heat counterparts; however, this procedure can be useful for propagating identities from one family to another. For instance, one can derive the fundamental solution for the Schrödinger equation from the fundamental solution for the heat equation by this method.
  3. Using the Fourier inversion formula, one can write general multipliers as integral combinations of Schrödinger or wave propagators; for instance, if {z} lies in the upper half plane {{\bf H} := \{ z \in {\bf C}: \hbox{Im} z > 0 \}}, one has

    \displaystyle  \frac{1}{x-z} = i\int_0^\infty e^{-itx} e^{itz}\ dt

    for any real number {x}, and thus we can write resolvents in terms of Schrödinger propagators:

    \displaystyle  R(z) = i\int_0^\infty e^{it\Delta} e^{itz}\ dt. \ \ \ \ \ (2)

    In a similar vein, if {k \in {\bf H}}, then

    \displaystyle  \frac{1}{x^2-k^2} = \frac{i}{k} \int_0^\infty \cos(tx) e^{ikt}\ dt

    for any {x>0}, so one can also write resolvents in terms of wave propagators:

    \displaystyle  R(k^2) = \frac{i}{k} \int_0^\infty \cos(t\sqrt{-\Delta}) e^{ikt}\ dt. \ \ \ \ \ (3)

  4. Using the Cauchy integral formula, one can express (sufficiently holomorphic) multipliers in terms of resolvents (or limits of resolvents). For instance, if {t > 0}, then from the Cauchy integral formula (and Jordan’s lemma) one has

    \displaystyle  e^{itx} = \frac{1}{2\pi i} \lim_{\epsilon \rightarrow 0^+} \int_{\bf R} \frac{e^{ity}}{y-x+i\epsilon}\ dy

    for any {x \in {\bf R}}, and so one can (formally, at least) write Schrödinger propagators in terms of resolvents:

    \displaystyle  e^{-it\Delta} = - \frac{1}{2\pi i} \lim_{\epsilon \rightarrow 0^+} \int_{\bf R} e^{ity} R(y+i\epsilon)\ dy. \ \ \ \ \ (4)

  5. The imaginary part of {\frac{1}{\pi} \frac{1}{x-(y+i\epsilon)}} is the Poisson kernel {\frac{\epsilon}{\pi} \frac{1}{(y-x)^2+\epsilon^2}}, which is an approximation to the identity. As a consequence, for any reasonable function {m(x)}, one has (formally, at least)

    \displaystyle  m(x) = \lim_{\epsilon \rightarrow 0^+} \frac{1}{\pi} \int_{\bf R} (\hbox{Im} \frac{1}{x-(y+i\epsilon)}) m(y)\ dy

    which leads (again formally) to the ability to express arbitrary multipliers in terms of imaginary (or skew-adjoint) parts of resolvents:

    \displaystyle  m(-\Delta) = \lim_{\epsilon \rightarrow 0^+} \frac{1}{\pi} \int_{\bf R} (\hbox{Im} R(y+i\epsilon)) m(y)\ dy. \ \ \ \ \ (5)

    Among other things, this type of formula (with {-\Delta} replaced by a more general self-adjoint operator) is used in the resolvent-based approach to the spectral theorem (by using the limiting imaginary part of resolvents to build spectral measure). Note that one can also express {\hbox{Im} R(y+i\epsilon)} as {\frac{1}{2i} (R(y+i\epsilon) - R(y-i\epsilon))}.

Remark 1 The ability of heat operators, Schrödinger propagators, wave propagators, or resolvents to generate other spectral multipliers can be viewed as a sort of manifestation of the Stone-Weierstrass theorem (though with the caveat that the spectrum of the Laplacian is non-compact and so the Stone-Weierstrass theorem does not directly apply). Indeed, observe the *-algebra type properties

\displaystyle  e^{s\Delta} e^{t\Delta} = e^{(s+t)\Delta}; \quad (e^{s\Delta})^* = e^{s\Delta}

\displaystyle  e^{is\Delta} e^{it\Delta} = e^{i(s+t)\Delta}; \quad (e^{is\Delta})^* = e^{-is\Delta}

\displaystyle  e^{is\sqrt{-\Delta}} e^{it\sqrt{-\Delta}} = e^{i(s+t)\sqrt{-\Delta}}; \quad (e^{is\sqrt{-\Delta}})^* = e^{-is\sqrt{-\Delta}}

\displaystyle  R(z) R(w) = \frac{R(w)-R(z)}{z-w}; \quad R(z)^* = R(\overline{z}).

Because of these relationships, it is possible (in principle, at least), to leverage one’s understanding one family of spectral multipliers to gain control on another family of multipliers. For instance, the fact that the heat operators {e^{t\Delta}} have non-negative kernel (a fact which can be seen from the maximum principle, or from the Brownian motion interpretation of the heat kernels) implies (by (1)) that the fractional integral operators {(-\Delta)^{-s/2}} for {s>0} also have non-negative kernel. Or, the fact that the wave equation enjoys finite speed of propagation (and hence that the wave propagators {\cos(t\sqrt{-\Delta})} have distributional convolution kernel localised to the ball of radius {|t|} centred at the origin), can be used (by (3)) to show that the resolvents {R(k^2)} have a convolution kernel that is essentially localised to the ball of radius {O( 1 / |\hbox{Im}(k)| )} around the origin.

In this post, I would like to continue this theme by using the resolvents {R(z) = (-\Delta-z)^{-1}} to control other spectral multipliers. These resolvents are well-defined whenever {z} lies outside of the spectrum {[0,+\infty)} of the operator {-\Delta}. In the model three-dimensional case {d=3}, they can be defined explicitly by the formula

\displaystyle  R(k^2) f(x) = \int_{{\bf R}^3} \frac{e^{ik|x-y|}}{4\pi |x-y|} f(y)\ dy

whenever {k} lives in the upper half-plane {\{ k \in {\bf C}: \hbox{Im}(k) > 0 \}}, ensuring the absolute convergence of the integral for test functions {f}. (In general dimension, explicit formulas are still available, but involve Bessel functions. But asymptotically at least, and ignoring higher order terms, one simply replaces {\frac{e^{ik|x-y|}}{4\pi |x-y|}} by {\frac{e^{ik|x-y|}}{c_d |x-y|^{d-2}}} for some explicit constant {c_d}.) It is an instructive exercise to verify that this resolvent indeed inverts the operator {-\Delta-k^2}, either by using Fourier analysis or by Green’s theorem.

Henceforth we restrict attention to three dimensions {d=3} for simplicity. One consequence of the above explicit formula is that for positive real {\lambda > 0}, the resolvents {R(\lambda+i\epsilon)} and {R(\lambda-i\epsilon)} tend to different limits as {\epsilon \rightarrow 0}, reflecting the jump discontinuity in the resolvent function at the spectrum; as one can guess from formulae such as (4) or (5), such limits are of interest for understanding many other spectral multipliers. Indeed, for any test function {f}, we see that

\displaystyle  \lim_{\epsilon \rightarrow 0^+} R(\lambda+i\epsilon) f(x) = \int_{{\bf R}^3} \frac{e^{i\sqrt{\lambda}|x-y|}}{4\pi |x-y|} f(y)\ dy

and

\displaystyle  \lim_{\epsilon \rightarrow 0^+} R(\lambda-i\epsilon) f(x) = \int_{{\bf R}^3} \frac{e^{-i\sqrt{\lambda}|x-y|}}{4\pi |x-y|} f(y)\ dy.

Both of these functions

\displaystyle  u_\pm(x) := \int_{{\bf R}^3} \frac{e^{\pm i\sqrt{\lambda}|x-y|}}{4\pi |x-y|} f(y)\ dy

solve the Helmholtz equation

\displaystyle  (-\Delta-\lambda) u_\pm = f, \ \ \ \ \ (6)

but have different asymptotics at infinity. Indeed, if {\int_{{\bf R}^3} f(y)\ dy = A}, then we have the asymptotic

\displaystyle  u_\pm(x) = \frac{A e^{\pm i \sqrt{\lambda}|x|}}{4\pi|x|} + O( \frac{1}{|x|^2}) \ \ \ \ \ (7)

as {|x| \rightarrow \infty}, leading also to the Sommerfeld radiation condition

\displaystyle  u_\pm(x) = O(\frac{1}{|x|}); \quad (\partial_r \mp i\sqrt{\lambda}) u_\pm(x) = O( \frac{1}{|x|^2}) \ \ \ \ \ (8)

where {\partial_r := \frac{x}{|x|} \cdot \nabla_x} is the outgoing radial derivative. Indeed, one can show using an integration by parts argument that {u_\pm} is the unique solution of the Helmholtz equation (6) obeying (8) (see below). {u_+} is known as the outward radiating solution of the Helmholtz equation (6), and {u_-} is known as the inward radiating solution. Indeed, if one views the function {u_\pm(t,x) := e^{-i\lambda t} u_\pm(x)} as a solution to the inhomogeneous Schrödinger equation

\displaystyle  (i\partial_t + \Delta) u_\pm = - e^{-i\lambda t} f

and using the de Broglie law that a solution to such an equation with wave number {k \in {\bf R}^3} (i.e. resembling {A e^{i k \cdot x}} for some amplitide {A}) should propagate at (group) velocity {2k}, we see (heuristically, at least) that the outward radiating solution will indeed propagate radially away from the origin at speed {2\sqrt{\lambda}}, while inward radiating solution propagates inward at the same speed.

There is a useful quantitative version of the convergence

\displaystyle  R(\lambda \pm i\epsilon) f \rightarrow u_\pm, \ \ \ \ \ (9)

known as the limiting absorption principle:

Theorem 1 (Limiting absorption principle) Let {f} be a test function on {{\bf R}^3}, let {\lambda > 0}, and let {\sigma > 0}. Then one has

\displaystyle  \| R(\lambda \pm i\epsilon) f \|_{H^{0,-1/2-\sigma}({\bf R}^3)} \leq C_\sigma \lambda^{-1/2} \|f\|_{H^{0,1/2+\sigma}({\bf R}^3)}

for all {\epsilon > 0}, where {C_\sigma > 0} depends only on {\sigma}, and {H^{0,s}({\bf R}^3)} is the weighted norm

\displaystyle  \|f\|_{H^{0,s}({\bf R}^3)} := \| \langle x \rangle^s f \|_{L^2_x({\bf R}^3)}

and {\langle x \rangle := (1+|x|^2)^{1/2}}.

This principle allows one to extend the convergence (9) from test functions {f} to all functions in the weighted space {H^{0,1/2+\sigma}} by a density argument (though the radiation condition (8) has to be adapted suitably for this scale of spaces when doing so). The weighted space {H^{0,-1/2-\sigma}} on the left-hand side is optimal, as can be seen from the asymptotic (7); a duality argument similarly shows that the weighted space {H^{0,1/2+\sigma}} on the right-hand side is also optimal.

We prove this theorem below the fold. As observed long ago by Kato (and also reproduced below), this estimate is equivalent (via a Fourier transform in the spectral variable {\lambda}) to a useful estimate for the free Schrödinger equation known as the local smoothing estimate, which in particular implies the well-known RAGE theorem for that equation; it also has similar consequences for the free wave equation. As we shall see, it also encodes some spectral information about the Laplacian; for instance, it can be used to show that the Laplacian has no eigenvalues, resonances, or singular continuous spectrum. These spectral facts are already obvious from the Fourier transform representation of the Laplacian, but the point is that the limiting absorption principle also applies to more general operators for which the explicit diagonalisation afforded by the Fourier transform is not available. (Igor Rodnianski and I are working on a paper regarding this topic, of which I hope to say more about soon.)

In order to illustrate the main ideas and suppress technical details, I will be a little loose with some of the rigorous details of the arguments, and in particular will be manipulating limits and integrals at a somewhat formal level.

Read the rest of this entry »

A few days ago, I found myself needing to use the Fredholm alternative in functional analysis:

Theorem 1 (Fredholm alternative) Let {X} be a Banach space, let {T: X \rightarrow X} be a compact operator, and let {\lambda \in {\bf C}} be non-zero. Then exactly one of the following statements hold:

  • (Eigenvalue) There is a non-trivial solution {x \in X} to the equation {Tx = \lambda x}.
  • (Bounded resolvent) The operator {T-\lambda} has a bounded inverse {(T-\lambda)^{-1}} on {X}.

Among other things, the Fredholm alternative can be used to establish the spectral theorem for compact operators. A hypothesis such as compactness is necessary; the shift operator {U} on {\ell^2({\bf Z})}, for instance, has no eigenfunctions, but {U-z} is not invertible for any unit complex number {z}. The claim is also false when {\lambda=0}; consider for instance the multiplication operator {Tf(n) := \frac{1}{n} f(n)} on {\ell^2({\bf N})}, which is compact and has no eigenvalue at zero, but is not invertible.

It had been a while since I had studied the spectral theory of compact operators, and I found that I could not immediately reconstruct a proof of the Fredholm alternative from first principles. So I set myself the exercise of doing so. I thought that I had managed to establish the alternative in all cases, but as pointed out in comments, my argument is restricted to the case where the compact operator {T} is approximable, which means that it is the limit of finite rank operators in the uniform topology. Many Banach spaces (and in particular, all Hilbert spaces) have the approximation property that implies (by a result of Grothendieck) that all compact operators on that space are almost finite rank. For instance, if {X} is a Hilbert space, then any compact operator is approximable, because any compact set can be approximated by a finite-dimensional subspace, and in a Hilbert space, the orthogonal projection operator to a subspace is always a contraction. (In more general Banach spaces, finite-dimensional subspaces are still complemented, but the operator norm of the projection can be large.) Unfortunately, there are examples of Banach spaces for which the approximation property fails; the first such examples were discovered by Enflo, and a subsequent paper of by Alexander demonstrated the existence of compact operators in certain Banach spaces that are not approximable.

I also found out that this argument was essentially also discovered independently by by MacCluer-Hull and by Uuye. Nevertheless, I am recording this argument here, together with two more traditional proofs of the Fredholm alternative (based on the Riesz lemma and a continuity argument respectively).

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of eigenvectors“, submitted to Random Matrices: Theory and Applications. This paper concerns an extension of our four moment theorem for eigenvalues. Roughly speaking, that four moment theorem asserts (under mild decay conditions on the coefficients of the random matrix) that the fine-scale structure of individual eigenvalues of a Wigner random matrix depend only on the first four moments of each of the entries.

In this paper, we extend this result from eigenvalues to eigenvectors, and specifically to the coefficients of, say, the {i^{th}} eigenvector {u_i(M_n)} of a Wigner random matrix {M_n}. Roughly speaking, the main result is that the distribution of these coefficients also only depends on the first four moments of each of the entries. In particular, as the distribution of coefficients eigenvectors of invariant ensembles such as GOE or GUE are known to be asymptotically gaussian real (in the GOE case) or gaussian complex (in the GUE case), the same asymptotic automatically holds for Wigner matrices whose coefficients match GOE or GUE to fourth order.

(A technical point here: strictly speaking, the eigenvectors {u_i(M_n)} are only determined up to a phase, even when the eigenvalues are simple. So, to phrase the question properly, one has to perform some sort of normalisation, for instance by working with the coefficients of the spectral projection operators {P_i(M_n) := u_i(M_n) u_i(M_n)^*} instead of the eigenvectors, or rotating each eigenvector by a random phase, or by fixing the first component of each eigenvector to be positive real. This is a fairly minor technical issue here, though, and will not be discussed further.)

This theorem strengthens a four moment theorem for eigenvectors recently established by Knowles and Yin (by a somewhat different method), in that the hypotheses are weaker (no level repulsion assumption is required, and the matrix entries only need to obey a finite moment condition rather than an exponential decay condition), and a slightly stronger conclusion (less regularity is needed on the test function, and one can handle the joint distribution of polynomially many coefficients, rather than boundedly many coefficients). On the other hand, the Knowles-Yin paper can also handle generalised Wigner ensembles in which the variances of the entries are allowed to fluctuate somewhat.

The method used here is a variation of that in our original paper (incorporating the subsequent improvements to extend the four moment theorem from the bulk to the edge, and to replace exponential decay by a finite moment condition). That method was ultimately based on the observation that if one swapped a single entry (and its adjoint) in a Wigner random matrix, then an individual eigenvalue {\lambda_i(M_n)} would not fluctuate much as a consequence (as long as one had already truncated away the event of an unexpectedly small eigenvalue gap). The same analysis shows that the projection matrices {P_i(M_n)} obeys the same stability property.

As an application of the eigenvalue four moment theorem, we establish a four moment theorem for the coefficients of resolvent matrices {(\frac{1}{\sqrt{n}} M_n - zI)^{-1}}, even when {z} is on the real axis (though in that case we need to make a level repulsion hypothesis, which has been already verified in many important special cases and is likely to be true in general). This improves on an earlier four moment theorem for resolvents of Erdos, Yau, and Yin, which required {z} to stay some distance away from the real axis (specifically, that {\hbox{Im}(z) \geq n^{-1-c}} for some small {c>0}).

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Localization of the eigenvalues and the necessity of four moments“, submitted to Probability Theory and Related Fields. This paper concerns the distribution of the eigenvalues

\displaystyle  \lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)

of a Wigner random matrix {M_n}. More specifically, we consider {n \times n} Hermitian random matrices whose entries have mean zero and variance one, with the upper-triangular portion of the matrix independent, with the diagonal elements iid, and the real and imaginary parts of the strictly upper-triangular portion of the matrix iid. For technical reasons we also assume that the distribution of the coefficients decays exponentially or better. Examples of Wigner matrices include the Gaussian Unitary Ensemble (GUE) and random symmetric complex Bernoulli matrices (which equal {\pm 1} on the diagonal, and {\pm \frac{1}{2} \pm \frac{i}{2}} off the diagonal). The Gaussian Orthogonal Ensemble (GOE) is also an example once one makes the minor change of setting the diagonal entries to have variance two instead of one.

The most fundamental theorem about the distribution of these eigenvalues is the Wigner semi-circular law, which asserts that (almost surely) one has

\displaystyle  \frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}} \rightarrow \rho_{sc}(x)\ dx

(in the vague topology) where {\rho_{sc}(x) := \frac{1}{\pi} (4-x^2)_+^{1/2}} is the semicircular distribution. (See these lecture notes on this blog for more discusssion of this law.)

One can phrase this law in a number of equivalent ways. For instance, in the bulk region {\epsilon n \leq i \leq (1-\epsilon) n}, one almost surely has

\displaystyle  \lambda_i(M_n) = \gamma_i \sqrt{n} + o(\sqrt{n}) \ \ \ \ \ (1)

uniformly for in {i}, where the classical location {\gamma_i \in [-2,2]} of the (normalised) {i^{th}} eigenvalue {\frac{1}{\sqrt{n}} \lambda_i} is defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx := \frac{i}{n}.

The bound (1) also holds in the edge case (by using the operator norm bound {\| M_n\|_{op} = (2+o(1)) \sqrt{n}}, due to Bai and Yin), but for sake of exposition we shall restriction attention here only to the bulk case.

From (1) we see that the semicircular law controls the eigenvalues at the coarse scale of {\sqrt{n}}. There has been a significant amount of work in the literature in obtaining control at finer scales, and in particular at the scale of the average eigenvalue spacing, which is of the order of {\sqrt{n}/n = n^{-1/2}}. For instance, we now have a universal limit theorem for the normalised eigenvalue spacing {\sqrt{n}(\lambda_{i+1}(M_n) - \lambda_i(M_n))} in the bulk for all Wigner matrices, a result of Erdos, Ramirez, Schlein, Vu, Yau, and myself. One tool for this is the four moment theorem of Van and myself, which roughly speaking shows that the behaviour of the eigenvalues at the scale {n^{-1/2}} (and even at the slightly finer scale of {n^{-1/2-c}} for some absolute constant {c>0}) depends only on the first four moments of the matrix entries. There is also a slight variant, the three moment theorem, which asserts that the behaviour of the eigenvalues at the slightly coarser scale of {n^{-1/2+\epsilon}} depends only on the first three moments of the matrix entries.

It is natural to ask whether these moment conditions are necessary. From the result of Erdos, Ramirez, Schlein, Vu, Yau, and myself, it is known that to control the eigenvalue spacing {\lambda_{i+1}(M_n) - \lambda_i(M_n)} at the critical scale {n^{-1/2}}, no knowledge of any moments beyond the second (i.e. beyond the mean and variance) are needed. So it is natural to conjecture that the same is true for the eigenvalues themselves.

The main result of this paper is to show that this is not the case; that at the critical scale {n^{-1/2}}, the distribution of eigenvalues {\lambda_i(M_n)} is sensitive to the fourth moment, and so the hypothesis of the four moment theorem cannot be relaxed.

Heuristically, the reason for this is easy to explain. One begins with an inspection of the expected fourth moment

\displaystyle  \sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4) = {\bf E} \hbox{tr} M_n^4.

A standard moment method computation shows that the right hand side is equal to

\displaystyle  2n^3 + 2a n^2 + \ldots

where {a} is the fourth moment of the real part of the off-diagonal coefficients of {M_n}. In particular, a change in the fourth moment {a} by {O(1)} leads to a change in the expression {\sum_{i=1}^n {\bf E}(\lambda_i(M_n)^4)} by {O(n^2)}. Thus, for a typical {i}, one expects {{\bf E}(\lambda_i(M_n)^4)} to shift by {O(n)}; since {\lambda_i(M_n) = O(\sqrt{n})} on the average, we thus expect {\lambda_i(M_n)} itself to shift by about {O(n^{-1/2})} by the mean-value theorem.

To make this rigorous, one needs a sufficiently strong concentration of measure result for {\lambda_i(M_n)} that keeps it close to its mean value. There are already a number of such results in the literature. For instance, Guionnet and Zeitouni showed that {\lambda_i(M_n)} was sharply concentrated around an interval of size {O(n^\epsilon)} around {\sqrt{n} \gamma_i} for any {\epsilon > 0} (in the sense that the probability that one was outside this interval was exponentially small). In one of my papers with Van, we showed that {\lambda_i(M_n)} was also weakly concentrated around an interval of size {O(n^{-1/2+\epsilon})} around {\sqrt{n} \gamma_i}, in the sense that the probability that one was outside this interval was {O(n^{-c})} for some absolute constant {c>0}. Finally, if one made an additional log-Sobolev hypothesis on the entries, it was shown by by Erdos, Yau, and Yin that the average variance of {\lambda_i(M_n)} as {i} varied from {1} to {n} was of the size of {O(n^{-c})} for some absolute {c>0}.

As it turns out, the first two concentration results are not sufficient to justify the previous heuristic argument. The Erdos-Yau-Yin argument suffices, but requires a log-Sobolev hypothesis. In our paper, we argue differently, using the three moment theorem (together with the theory of the eigenvalues of GUE, which is extremely well developed) to show that the variance of each individual {\lambda_i(M_n)} is {O(n^{-c})} (without averaging in {i}). No log-Sobolev hypothesis is required, but instead we need to assume that the third moment of the coefficients vanishes (because we want to use the three moment theorem to compare the Wigner matrix to GUE, and the coefficients of the latter have a vanishing third moment). From this we are able to make the previous arguments rigorous, and show that the mean {{\bf E} \lambda_i(M_n)} is indeed sensitive to the fourth moment of the entries at the critical scale {n^{-1/2}}.

One curious feature of the analysis is how differently the median and the mean of the eigenvalue {\lambda_i(M_n)} react to the available technology. To control the global behaviour of the eigenvalues (after averaging in {i}), it is much more convenient to use the mean, and we have very precise control on global averages of these means thanks to the moment method. But to control local behaviour, it is the median which is much better controlled. For instance, we can localise the median of {\lambda_i(M_n)} to an interval of size {O(n^{-1/2+\epsilon})}, but can only localise the mean to a much larger interval of size {O(n^{-c})}. Ultimately, this is because with our current technology there is a possible exceptional event of probability as large as {O(n^{-c})} for which all eigenvalues could deviate as far as {O(n^\epsilon)} from their expected location, instead of their typical deviation of {O(n^{-1/2})}. The reason for this is technical, coming from the fact that the four moment theorem method breaks down when two eigenvalues are very close together (less than {n^{-c}} times the average eigenvalue spacing), and so one has to cut out this event, which occurs with a probability of the shape {O(n^{-c})}. It may be possible to improve the four moment theorem proof to be less sensitive to eigenvalue near-collisions, in which case the above bounds are likely to improve.

Now we turn attention to another important spectral statistic, the least singular value {\sigma_n(M)} of an {n \times n} matrix {M} (or, more generally, the least non-trivial singular value {\sigma_p(M)} of a {n \times p} matrix with {p \leq n}). This quantity controls the invertibility of {M}. Indeed, {M} is invertible precisely when {\sigma_n(M)} is non-zero, and the operator norm {\|M^{-1}\|_{op}} of {M^{-1}} is given by {1/\sigma_n(M)}. This quantity is also related to the condition number {\sigma_1(M)/\sigma_n(M) = \|M\|_{op} \|M^{-1}\|_{op}} of {M}, which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of {M} (and more generally, of the shifts {\frac{1}{\sqrt{n}} M - zI} for complex {z}) will be of importance in rigorously establishing the circular law for iid random matrices {M}, as it plays a key role in computing the Stieltjes transform {\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M - zI)^{-1}} of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.

The least singular value

\displaystyle  \sigma_n(M) = \inf_{\|x\|=1} \|Mx\|,

which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm

\displaystyle  \|M\|_{op} = \sigma_1(M) = \sup_{\|x\|=1} \|Mx\|

at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of {x}, but are not able to control the “generic” or “incompressible” choices of {x}, for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with {p=yn} for {0 < y < 1}, it gives an upper bound of {(1-\sqrt{y}+o(1))n} for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as {\hbox{tr} M^{-2}}, though these are more difficult to compute than positive moments {\hbox{tr} M^k}.

So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the {n} rows {X_1,\ldots,X_n \in {\bf C}^n} of the matrix {M}, and the hyperplane spanned by the other {n-1} rows. The reason for this is as follows. First suppose that {\sigma_n(M)=0}, so that {M} is non-invertible, and there is a linear dependence between the rows {X_1,\ldots,X_n}. Thus, one of the {X_i} will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the {n} distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so {\sigma_n(M)=0}.

More generally, if the least singular value {\sigma_n(M)} is small, one generically expects many of these {n} distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row {X_i} and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of {X_i} with a unit normal {n_i} of this hyperplane.

When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal {n_i} (which depends on all the rows other than {X_i}) is independent of {X_i}, so even after conditioning {n_i} to be fixed, the entries of {X_i} remain independent. As such, the dot product {X_i \cdot n_i} is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal {n_i} is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)

These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).

To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble {M = (\xi_{ij})_{1 \leq i,j \leq n}} of random sign matrices, where {\xi_{ij} = \pm 1} are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.

Read the rest of this entry »

Our study of random matrices, to date, has focused on somewhat general ensembles, such as iid random matrices or Wigner random matrices, in which the distribution of the individual entries of the matrices was essentially arbitrary (as long as certain moments, such as the mean and variance, were normalised). In these notes, we now focus on two much more special, and much more symmetric, ensembles:

  • The Gaussian Unitary Ensemble (GUE), which is an ensemble of random {n \times n} Hermitian matrices {M_n} in which the upper-triangular entries are iid with distribution {N(0,1)_{\bf C}}, and the diagonal entries are iid with distribution {N(0,1)_{\bf R}}, and independent of the upper-triangular ones; and
  • The Gaussian random matrix ensemble, which is an ensemble of random {n \times n} (non-Hermitian) matrices {M_n} whose entries are iid with distribution {N(0,1)_{\bf C}}.

The symmetric nature of these ensembles will allow us to compute the spectral distribution by exact algebraic means, revealing a surprising connection with orthogonal polynomials and with determinantal processes. This will, for instance, recover the semi-circular law for GUE, but will also reveal fine spacing information, such as the distribution of the gap between adjacent eigenvalues, which is largely out of reach of tools such as the Stieltjes transform method and the moment method (although the moment method, with some effort, is able to control the extreme edges of the spectrum).

Similarly, we will see for the first time the circular law for eigenvalues of non-Hermitian matrices.

There are a number of other highly symmetric ensembles which can also be treated by the same methods, most notably the Gaussian Orthogonal Ensemble (GOE) and the Gaussian Symplectic Ensemble (GSE). However, for simplicity we shall focus just on the above two ensembles. For a systematic treatment of these ensembles, see the text by Deift.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,867 other followers