You are currently browsing the tag archive for the ‘Van Vu’ tag.

Van Vu and I have just uploaded to the arXiv our paper “Random covariance matrices: Universality of local statistics of eigenvalues“, to be submitted shortly. This paper draws heavily on the technology of our previous paper, in which we established a Four Moment Theorem for the local spacing statistics of eigenvalues of Wigner matrices. This theorem says, roughly speaking, that these statistics are completely determined by the first four moments of the coefficients of such matrices, at least in the bulk of the spectrum. (In a subsequent paper we extended the Four Moment Theorem to the edge of the spectrum.)

In this paper, we establish the analogous result for the singular values of rectangular iid matrices ${M = M_{n,p}}$, or (equivalently) the eigenvalues of the associated covariance matrix ${\frac{1}{n} M M^*}$. As is well-known, there is a parallel theory between the spectral theory of random Wigner matrices and those of covariance matrices; for instance, just as the former has asymptotic spectral distribution governed by the semi-circular law, the latter has asymptotic spectral distribution governed by the Marcenko-Pastur law. One reason for the connection can be seen by noting that the singular values of a rectangular matrix ${M}$ are essentially the same thing as the eigenvalues of the augmented matrix

$\displaystyle \begin{pmatrix} 0 & M \\ M^* & 0\end{pmatrix}$

after eliminating sign ambiguities and degeneracies. So one can view singular values of a rectangular iid matrix as the eigenvalues of a matrix which resembles a Wigner matrix, except that two diagonal blocks of that matrix have been zeroed out.

The zeroing out of these elements prevents one from applying the entire Wigner universality theory directly to the covariance matrix setting (in particular, the crucial Talagrand concentration inequality for the magnitude of a projection of a random vector to a subspace does not work perfectly once there are many zero coefficients). Nevertheless, a large part of the theory (particularly the deterministic components of the theory, such as eigenvalue variation formulae) carry through without much difficulty. The one place where one has to spend a bit of time to check details is to ensure that the Erdos-Schlein-Yau delocalisation result (that asserts, roughly speaking, that the eigenvectors of a Wigner matrix are about as small in ${\ell^\infty}$ norm as one could hope to get) is also true for in the covariance matrix setting, but this is a straightforward (though somewhat tedious) adaptation of the method (which is based on the Stieltjes transform).

As an application, we extend the sine kernel distribution of local covariance matrix statistics, first established in the case of Wishart ensembles (when the underlying variables are gaussian) by Nagao and Wadati, and later extended to gaussian-divisible matrices by Ben Arous and Peche, to any distributions which matches one of these distributions to up to four moments, which covers virtually all complex distributions with independent iid real and imaginary parts, with basically the lone exception of the complex Bernoulli ensemble.

Recently, Erdos, Schlein, Yau, and Yin generalised their local relaxation flow method to also obtain similar universality results for distributions which have a large amount of smoothness, but without any matching moment conditions. By combining their techniques with ours as in our joint paper, one should probably be able to remove both smoothness and moment conditions, in particular now covering the complex Bernoulli ensemble.

In this paper we also record a new observation that the exponential decay hypothesis in our earlier paper can be relaxed to a finite moment condition, for a sufficiently high (but fixed) moment. This is done by rearranging the order of steps of the original argument carefully.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local eigenvalue statistics up to the edge“, submitted to Comm. Math. Phys..  This is a sequel to our previous paper, in which we studied universality of local eigenvalue statistics (such as normalised eigenvalue spacings $\sqrt{n} ( \lambda_{i+1}(M_n) - \lambda_i(M_n) )$) for random matrices $M_n$ of Wigner type, i.e. Hermitian (or symmetric) random matrices in which the upper-triangular entries are independent with mean zero and variance one (for technical reasons we also have to assume an exponential decay condition on the distribution of the entries).   The results in the previous paper were almost entirely focused on the bulk region, in which the index i of the eigenvalues involved was in the range $\varepsilon n \leq i \leq (1-\varepsilon) n$.  The main purpose of this paper is to extend the main results of the previous paper all the way up to the edge, thus allowing one to control all indices $1 \leq i \leq n$.  As an application, we obtain a variant of Soshnikov’s well-known result that the largest eigenvalue of Wigner matrices is distributed (after suitable normalisation) according to the Tracy-Widom law when the coefficients are symmetric, by assuming instead that the coefficients have vanishing third moment.

As one transitions from the bulk to the edge, the density of the eigenvalues decreases to zero (in accordance to the Wigner semicircular law), and so the average spacing between eigenvalues increases.  (For instance, the spacing between eigenvalues in the bulk is of size $n^{-1/2}$, but at the extreme edge it increases to $n^{-1/6}$.)  On the one hand, the increase in average spacing should make life easier, because one does not have to work at such a fine spatial scale in order to see the eigenvalue distribution.  On the other hand, a certain key technical step in the previous paper (in which we adapted an argument of Erdos, Schlein, and Yau to show that eigenvectors of Wigner matrices were delocalised) seemed to require eigenvalue spacings to be of size $O(n^{-1/2})$, which was the main technical obstacle to extending the preceding results from the bulk to the edge.

The main new observation in the paper is that it was not the eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_n)$ which were of importance to eigenvalue delocalisation, but rather the somewhat smaller interlaced eigenvalue spacings $\lambda_{i+1}(M_n) - \lambda_i(M_{n-1})$, where $M_{n-1}$ is a $n-1 \times n-1$ minor of $M_n$.  The Cauchy interlacing law asserts that the latter is smaller than the former.  But the interesting thing is that at the edge (when i is close to n), the interlaced spacings are much smaller than the former, and in particular remain of size about $O(n^{-1/2})$ (up to log factors) even though the non-interlaced spacings increase to be as large as $O(n^{-1/6})$.  This is ultimately due to a sort of “attractive force” on eigenvalues that draws them towards the origin, and counteracts the usual “eigenvalue repulsion effect”, that pushes eigenvalues away from each other.  This induces “bias” for eigenvalues to move in towards the bulk rescues the delocalization result, and the remainder of the arguments in our previous paper then continue with few changes.

Below the fold I wish to give some heuristic justification of the interlacing bias phenomenon, sketch why this is relevant for eigenvector delocalisation, and finally to recall why eigenvalue delocalisation in turn is relevant for universality.

[Update, Aug 16: sign error corrected.]

One further paper in this stream: László Erdős, José Ramírez, Benjamin Schlein, Van Vu, Horng-Tzer Yau, and myself have just uploaded to the arXiv the paper “Bulk universality for Wigner hermitian matrices with subexponential decay“, submitted to Mathematical Research Letters.  (Incidentally, this is my first six-author paper I have been involved in, not counting the polymath projects of course, though I have had a number of five-author papers.)

This short paper (9 pages) combines the machinery from two recent papers on the universality conjecture for the eigenvalue spacings in the bulk for Wigner random matrices (see my earlier blog post for more discussion).  On the one hand, the paper of Erdős-Ramírez-Schlein-Yau established this conjecture under the additional hypothesis that the distribution of the individual entries obeyed some smoothness and exponential decay conditions.  Meanwhile, the paper of Van Vu and myself (which I discussed in my earlier blog post) established the conjecture under a somewhat different set of hypotheses, namely that the distribution of the individual entries obeyed some moment conditions (in particular, the third moment had to vanish), a support condition (the entries had to have real part supported in at least three points), and an exponential decay condition.

After comparing our results, the six of us realised that our methods could in fact be combined rather easily to obtain a stronger result, establishing the universality conjecture assuming only a exponential decay (or more precisely, sub-exponential decay) bound ${\Bbb P}(|x_{\ell k}| > t ) \ll \exp( - t^c )$ on the coefficients; thus all regularity, moment, and support conditions have been eliminated.  (There is one catch, namely that we can no longer control a single spacing $\lambda_{i+1}-\lambda_i$ for a single fixed i, but must now average over all $1 \leq i \leq n$ before recovering the universality.  This is an annoying technical issue but it may be resolvable in the future with further refinements to the method.)

I can describe the main idea behind the unified approach here.  One can arrange the Wigner matrices in a hierarchy, from most structured to least structured:

• The most structured (or special) ensemble is the Gaussian Unitary Ensemble (GUE), in which the coefficients are gaussian. Here, one has very explicit and tractable formulae for the eigenvalue distributions, gap spacing, etc.
• The next most structured ensemble of Wigner matrices are the Gaussian-divisible or Johansson matrices, which are matrices H of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, where $\hat H$ is another Wigner matrix, V is a GUE matrix independent of $\hat H$, and $0 < t < 1$ is a fixed parameter independent of n.  Here, one still has quite explicit (though not quite as tractable) formulae for the joint eigenvalue distribution and related statistics.  Note that the limiting case t=1 is GUE.
• After this, one has the Ornstein-Uhlenbeck-evolved matrices, which are also of the form $H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V$, but now $t = n^{-1+\delta}$ decays at a power rate with n, rather than being comparable to 1.  Explicit formulae still exist for these matrices, but extracting universality out of this is hard work (and occupies the bulk of the paper of Erdős-Ramírez-Schlein-Yau).
• Finally, one has arbitrary Wigner matrices, which can be viewed as the t=0 limit of the above Ornstein-Uhlenbeck process.

The arguments in the paper of Erdős-Ramírez-Schlein-Yau can be summarised as follows (I assume subexponential decay throughout this discussion):

1. (Structured case) The universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.  (The case $1/4 < \delta \leq 1$ was treated in an earlier paper of Erdős-Ramírez-Schlein-Yau, while the case where t is comparable to 1 was treated by Johansson.)
2. (Matching) Every Wigner matrix with suitable smoothness conditions can be “matched” with an Ornstein-Uhlenbeck-evolved matrix, in the sense that the eigenvalue statistics for the two matrices are asymptotically identical.  (This is relatively easy due to the fact that $\delta$ can be taken arbitrarily close to zero.)
3. Combining 1. and 2. one obtains universality for all Wigner matrices obeying suitable smoothness conditions.

The arguments in the paper of Van and myself can be summarised as follows:

1. (Structured case) The universality conjecture is true for Johansson matrices, by the paper of Johansson.
2. (Matching) Every Wigner matrix with some moment and support conditions can be “matched” with a Johansson matrix, in the sense that the first four moments of the entries agree, and hence (by the Lindeberg strategy in our paper) have asymptotically identical statistics.
3. Combining 1. and 2. one obtains universality for all Wigner matrices obtaining suitable moment and support conditions.

What we realised is by combining the hard part 1. of the paper of Erdős-Ramírez-Schlein-Yau with the hard part 2. of the paper of Van and myself, we can remove all regularity, moment, and support conditions.  Roughly speaking, the unified argument proceeds as follows:

1. (Structured case) By the arguments of Erdős-Ramírez-Schlein-Yau, the universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with $t = n^{-1+\delta}$ for any $0 < \delta \leq 1$.
2. (Matching) Every Wigner matrix $H$ can be “matched” with an Ornstein-Uhlenbeck-evolved matrix $e^{-t/2} H + (1-e^{-t})^{1/2} V$ for $t= n^{-1+0.01}$ (say), in the sense that the first four moments of the entries almost agree, which is enough (by the arguments of Van and myself) to show that these two matrices have asymptotically identical statistics on the average.
3. Combining 1. and 2. one obtains universality for the averaged statistics for all Wigner matrices.

The averaging should be removable, but this would require better convergence results to the semicircular law than are currently known (except with additional hypotheses, such as vanishing third moment).  The subexponential decay should also be relaxed to a condition of finiteness for some fixed moment ${\Bbb E} |x|^C$, but we did not pursue this direction in order to keep the paper short.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: universality of local eigenvalue statistics“, submitted to Acta Math..  This paper concerns the eigenvalues $\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)$ of a Wigner matrix $M_n = (\zeta_{ij})_{1 \leq i,j \leq n}$, which we define to be a random Hermitian $n \times n$ matrix whose upper-triangular entries $\zeta_{ij}, 1 \leq i \leq j \leq n$ are independent (and whose strictly upper-triangular entries $\zeta_{ij}, 1 \leq i < j \leq n$ are also identically distributed).  [The lower-triangular entries are of course determined from the upper-triangular ones by the Hermitian property.]  We normalise the matrices so that all the entries have mean zero and variance 1.  Basic examples of Wigner Hermitian matrices include

1. The Gaussian Unitary Ensemble (GUE), in which the upper-triangular entries $\zeta_{ij}, i are complex gaussian, and the diagonal entries $\zeta_{ii}$ are real gaussians;
2. The Gaussian Orthogonal Ensemble (GOE), in which all entries are real gaussian;
3. The Bernoulli Ensemble, in which all entries take values $\pm 1$ (with equal probability of each).

We will make a further distinction into Wigner real symmetric matrices (which are Wigner matrices with real coefficients, such as GOE and the Bernoulli ensemble) and Wigner Hermitian matrices (which are Wigner matrices whose upper-triangular coefficients have real and imaginary parts iid, such as GUE).

The GUE and GOE ensembles have a rich algebraic structure (for instance, the GUE distribution is invariant under conjugation by unitary matrices, while the GOE distribution is similarly invariant under conjugation by orthogonal matrices, hence the terminology), and as a consequence their eigenvalue distribution can be computed explicitly.  For instance, the joint distribution of the eigenvalues $\lambda_1(M_n),\ldots,\lambda_n(M_n)$ for GUE is given by the explicit formula

$\displaystyle C_n \prod_{1 \leq i (0)

for some explicitly computable constant $C_n$ on the orthant $\{ \lambda_1 \leq \ldots \leq \lambda_n\}$ (a result first established by Ginibre).  (A similar formula exists for GOE, but for simplicity we will just discuss GUE here.)  Using this explicit formula one can compute a wide variety of asymptotic eigenvalue statistics.  For instance, the (bulk) empirical spectral distribution (ESD) measure $\frac{1}{n} \sum_{i=1}^n \delta_{\lambda_i(M_n)/\sqrt{n}}$ for GUE (and indeed for all Wigner matrices, see below) is known to converge (in the vague sense) to the Wigner semicircular law

$\displaystyle \frac{1}{2\pi} (4-x^2)_+^{1/2}\ dx =: \rho_{sc}(x)\ dx$ (1)

as $n \to \infty$.  Actually, more precise statements are known for GUE; for instance, for $1 \leq i \leq n$, the $i^{th}$ eigenvalue $\lambda_i(M_n)$ is known to equal

$\displaystyle \lambda_i(M_n) = \sqrt{n} t(\frac{i}{n}) + O( \frac{\log n}{n} )$ (2)

with probability $1-o(1)$, where $t(a) \in [-2,2]$ is the inverse cumulative distribution function of the semicircular law, thus

$\displaystyle a = \int_{-2}^{t(a)} \rho_{sc}(x)\ dx$.

Furthermore, the distribution of the normalised eigenvalue spacing $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ is known; in the bulk region $\varepsilon n \leq i \leq 1-\varepsilon n$ for fixed $\varepsilon > 0$, it converges as $n \to \infty$ to the Gaudin distribution, which can be described explicitly in terms of determinants of the Dyson sine kernel $K(x,y) := \frac{\sin \pi(x-y)}{\pi(x-y)}$.  Many further local statistics of the eigenvalues of GUE are in fact governed by this sine kernel, a result usually proven using the asymptotics of orthogonal polynomials (and specifically, the Hermite polynomials).  (At the edge of the spectrum, say $i = n-O(1)$, the asymptotic distribution is a bit different, being governed instead by the  Tracy-Widom law.)

It has been widely believed that these GUE facts enjoy a universality property, in the sense that they should also hold for wide classes of other matrix models. In particular, Wigner matrices should enjoy the same bulk distribution (1), the same asymptotic law (2) for individual eigenvalues, and the same sine kernel statistics as GUE. (The statistics for Wigner symmetric matrices are slightly different, and should obey GOE statistics rather than GUE ones.)

There has been a fair body of evidence to support this belief.  The bulk distribution (1) is in fact valid for all Wigner matrices (a result of Pastur, building on the original work of Wigner of course).  The Tracy-Widom statistics on the edge were established for all Wigner Hermitian matrices (assuming that the coefficients had a distribution which was symmetric and decayed exponentially) by Soshnikov (with some further refinements by Soshnikov and Peche).  Soshnikov’s arguments were based on an advanced version of the moment method.

The sine kernel statistics were established by Johansson for Wigner Hermitian matrices which were gaussian divisible, which means that they could be expressed as a non-trivial linear combination of another Wigner Hermitian matrix and an independent GUE.  (Basically, this means that distribution of the coefficients is a convolution of some other distribution with a gaussian.  There were some additional technical decay conditions in Johansson’s work which were removed in subsequent work of Ben Arous and Peche.)   Johansson’s work was based on an explicit formula for the joint distribution for gauss divisible matrices that generalises (0) (but is significantly more complicated).

Just last week, Erdos, Ramirez, Schlein, and Yau established sine kernel statistics for Wigner Hermitian matrices with exponential decay and a high degree of smoothness (roughly speaking, they require  control of up to six derivatives of the Radon-Nikodym derivative of the distribution).  Their method is based on an analysis of the dynamics of the eigenvalues under a smooth transition from a general Wigner Hermitian matrix to GUE, essentially a matrix version of the Ornstein-Uhlenbeck process, whose eigenvalue dynamics are governed by Dyson Brownian motion.

In my paper with Van, we establish similar results to that of Erdos et al. under slightly different hypotheses, and by a somewhat different method.  Informally, our main result is as follows:

Theorem 1. (Informal version)  Suppose $M_n$ is a Wigner Hermitian matrix whose coefficients have an exponentially decaying distribution, and whose real and imaginary parts are supported on at least three points (basically, this excludes Bernoulli-type distributions only) and have vanishing third moment (which is for instance the case for symmetric distributions).  Then one has the local statistics (2) (but with an error term of $O(n^{-1+\delta})$ for any $\delta>0$ rather than $O(\log n/n)$) and the sine kernel statistics for individual eigenvalue spacings $\sqrt{n} \rho_{sc}(\frac{i}{n}) (\lambda_{i+1}(M_n) - \lambda_i(M_n))$ (as well as higher order correlations) in the bulk.

If one removes the vanishing third moment hypothesis, one still has the sine kernel statistics provided one averages over all i.

There are analogous results for Wigner real symmetric matrices (see paper for details).  There are also some related results, such as a universal distribution for the least singular value of matrices of the form in Theorem 1, and a crude asymptotic for the determinant (in particular, $\log |\det M_n| = (1+o(1)) \log \sqrt{n!}$ with probability $1-o(1)$).

The arguments are based primarily on the Lindeberg replacement strategy, which Van and I also used to obtain universality for the circular law for iid matrices, and for the least singular value for iid matrices, but also rely on other tools, such as some recent arguments of Erdos, Schlein, and Yau, as well as a very useful concentration inequality of Talagrand which lets us tackle both discrete and continuous matrix ensembles in a unified manner.  (I plan to talk about Talagrand’s inequality in my next blog post.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The distribution of the smallest singular values“, submitted to Geom. Func. Anal..   This paper concerns the least singular value $\sigma_n(M)$ of a random $n \times n$ matrix $M_n = (a_{ij})_{1 \leq i,j \leq n}$ with iid entries, which for simplicity we will take to be real (we also have analogues for complex random matrices), with mean zero and variance one.  A typical model to keep in mind here is the Bernoulli model, when each $a_{ij}$ is equal to +1 or -1 with an equal probability of each, while a privileged role will be planed by the gaussian model, when each $a_{ij} \equiv N(0,1)$ is given the standard gaussian distribution.

The distribution of the least singular value $\sigma_n(M_n)$, which is of importance in smoothed analysis and also has intrinsic interest within the field of random matrices, has been intensively studied in recent years.  For instance, in the Bernoulli case, there have been several recent papers on the singularity probability ${\Bbb P}( \sigma_n(M_n) = 0 )$; it is not hard to obtain a lower bound of $(\frac{1}{2}+o(1))^n$, and this is conjectured to be the correct asymptotic.  The best upper bound so far is by Bourgain, Vu, and Wood, who obtain $(\frac{1}{\sqrt{2}}+o(1))^n$.

Upper and lower tail bounds have also been obtained, starting with the breakthrough paper of Rudelson (building upon some earlier work on rectangular matrices by Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), with subsequent papers by Van and myself, by Rudelson, and also by Rudelson and Vershynin.  To oversimplify somewhat, the conclusion of this work is that the least singular value $\sigma_n(M_n)$ has size comparable to $1/\sqrt{n}$ with high probability.  The techniques are based in part on inverse Littlewood-Offord theorems.

However, in the case of the gaussian ensemble, we know more than just the expected size of the least singular value; we know its asymptotic distribution.  Indeed, it was shown by Edelman in this case that one has

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) = 1 - e^{-t^2/2 - t} + o(1)$ (1)

for any fixed $t > 0$.  This computation was highly algebraic in nature, relying on special identities that are available only for extremely symmetric random matrix ensembles, such as the gaussian random matrix model; in particular, it is not obvious at all that the Bernoulli ensemble necessarily obeys the same distribution as the gaussian one.  Nevertheless, motivated in part by this computation, Spielman and Teng conjectured that the bound

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) \leq t + c^n$

should hold for some $c < 1$ for, say, the Bernoulli ensemble.    This conjecture was verified up to losses of a multiplicative constant by Rudelson and Vershynin.

The main result of our paper is to show that the distribution of the least singular value is in fact universal, being asymptotically the same for all iid (real) random matrix models with the same mean and variance, and with a sufficiently high number of moment conditions.  In particular, the asymptotic (1) for the gaussian ensemble is also true for the Bernoulli ensemble. Furthermore the error term o(1) can be shown to be of the shape $O(n^{-c})$ for some c > 0, which in turn confirms the Spielman-Teng conjecture (without a loss of constant) in the polynomial size regime $t \geq n^{-c'}$ for some $c' > 0$.  We also have some further results for other low singular values (e.g. $\sigma_{n-k}(M_n)$ for fixed k) but they are harder to state, and we will not do so here.

To our knowledge, this is the first universality result for the “hard edge” of the spectrum (i.e. the least few singular values) for iid square matrix models.  [For rectangular matrices, where the hard edge is bounded away from zero, universality was recently established by Feldheim and Sodin.] The bulk distribution for the singular values of such matrices has been known for some time (it is governed by the famous Marchenko-Pastur law), while the distribution at the “soft edge” of the spectrum (i.e. the largest few singular values) was established to be universal by Soshnikov (here the distribution is governed by the Tracy-Widom law for the top singular value, and by the Airy kernel for the next few singular values).  Both of these results are basically obtained by the moment method (or close relatives of this method, such as the resolvent method).  However, the moment method is not effective for discerning the hard edge of the spectrum, since the singular values here are very small compared with the bulk and so have a negligible influence on the moments.  [In the rectangular case, where the hard edge is bounded away from zero, the moment method becomes available again, though the application of it is quite delicate; see the Feldheim-Sodin paper for details.] Instead, we proceed by observing a certain central limit theorem type behaviour for the geometry of the columns of $M_n$, which is enough to give us the desired universality; more details on the proof lie below the fold.

Van Vu and I have just uploaded to the arXiv our preprint “A sharp inverse Littlewood-Offord theorem“, which we have submitted to Random Structures and Algorithms.  This paper gives a solution to the (inverse) Littlewood-Offord problem of understanding when random walks are concentrated in the case when the concentration is of polynomial size in the length $n$ of the walk; our description is sharp up to epsilon powers of $n$.  The theory of inverse Littlewood-Offord problems and related topics has been of importance in recent developments in the spectral theory of discrete random matrices (e.g. a “robust” variant of these theorems was crucial in our work on the circular law).

For simplicity I will restrict attention to the Bernoulli random walk.  Given $n$ real numbers $v_1,\ldots,v_n$, one can form the random variable

$S := \epsilon_1 v_1 + \ldots + \epsilon_n v_n$

where $\epsilon_1,\ldots,\epsilon_n \in \{-1,+1\}$ are iid random signs (with either sign +1, -1 chosen with probability 1/2).  This is a discrete random variable which typically takes $2^n$ values.  However, if there are various arithmetic relations between the step sizes $v_1,\ldots,v_n$, then many of the $2^n$ possible sums collide, and certain values may then arise with much higher probability.  To measure this, define the concentration probability $p(v_1,\ldots,v_n)$ by the formula

$p(v_1,\ldots,v_n) = \sup_x {\Bbb P}(S=x)$.

Intuitively, this probability measures the amount of additive structure present between the $v_1,\ldots,v_n$.  There are two (opposing) problems in the subject:

• (Forward Littlewood-Offord problem) Given some structural assumptions on $v_1,\ldots,v_n$, what bounds can one place on $p(v_1,\ldots,v_n)$?
• (Inverse Littlewood-Offord problem) Given some bounds on $p(v_1,\ldots,v_n)$, what structural assumptions can one then conclude about $v_1,\ldots,v_n$?

Ideally one would like answers to both of these problems which come close to inverting each other, and this is the guiding motivation for our paper.

Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood).  One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices $A_n = (a_{ij})_{1 \leq i,j \leq n}$, where the coefficients $a_{ij}$ are iid random variables with mean zero and unit variance.  (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues $\frac{1}{\sqrt{n}} \lambda_1, \ldots, \frac{1}{\sqrt{n}} \lambda_n$ of the matrix $A_n$ converges to the uniform distribution on the unit circle as $n \to \infty$.  The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.)  In a previous paper, we had established the same claim but under the additional assumption that the $(2+\eta)^{th}$ moment ${\Bbb E} |a_{ij}|^{2+\eta}$ was finite for some $\eta > 0$; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.

As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right.  Observe that the statement of the circular law can be split into two sub-statements:

1. (Universality for iid ensembles) In the asymptotic limit $n \to \infty$, the ESD of the random matrix $A_n$ is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance.  In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix $G_n$ with the same mean and variance.
2. (Circular law for gaussian matrices) In the asymptotic limit $n \to \infty$, the ESD of a gaussian matrix $G_n$ converges to the circular law.

The reason we single out the gaussian matrix ensemble $G_n$ is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))).  Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials).  In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.

These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each.  Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit $n \to \infty$ from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above).  One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average

$\displaystyle \overline{X}_n := \frac{1}{\sqrt{n}} (X_1 + \ldots + X_n )$ (1)

of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution.  (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)

Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix $A_n$ via estimates on the Stieltjes transform

$\displaystyle \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} A_n - zI )|$ (2)

of that matrix for complex numbers z.  This method required a combination of delicate analysis (in particular, a bound on the least singular values of $\frac{1}{\sqrt{n}} A_n - zI$), and algebra (in order to compute and then invert the Stieltjes transform).  [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]

What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting.   This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble.  (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)