You are currently browsing the tag archive for the ‘least singular value’ tag.

Now we turn attention to another important spectral statistic, the least singular value ${\sigma_n(M)}$ of an ${n \times n}$ matrix ${M}$ (or, more generally, the least non-trivial singular value ${\sigma_p(M)}$ of a ${n \times p}$ matrix with ${p \leq n}$). This quantity controls the invertibility of ${M}$. Indeed, ${M}$ is invertible precisely when ${\sigma_n(M)}$ is non-zero, and the operator norm ${\|M^{-1}\|_{op}}$ of ${M^{-1}}$ is given by ${1/\sigma_n(M)}$. This quantity is also related to the condition number ${\sigma_1(M)/\sigma_n(M) = \|M\|_{op} \|M^{-1}\|_{op}}$ of ${M}$, which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of ${M}$ (and more generally, of the shifts ${\frac{1}{\sqrt{n}} M - zI}$ for complex ${z}$) will be of importance in rigorously establishing the circular law for iid random matrices ${M}$, as it plays a key role in computing the Stieltjes transform ${\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M - zI)^{-1}}$ of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.

The least singular value

$\displaystyle \sigma_n(M) = \inf_{\|x\|=1} \|Mx\|,$

which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm

$\displaystyle \|M\|_{op} = \sigma_1(M) = \sup_{\|x\|=1} \|Mx\|$

at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of ${x}$, but are not able to control the “generic” or “incompressible” choices of ${x}$, for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with ${p=yn}$ for ${0 < y < 1}$, it gives an upper bound of ${(1-\sqrt{y}+o(1))n}$ for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as ${\hbox{tr} M^{-2}}$, though these are more difficult to compute than positive moments ${\hbox{tr} M^k}$.

So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the ${n}$ rows ${X_1,\ldots,X_n \in {\bf C}^n}$ of the matrix ${M}$, and the hyperplane spanned by the other ${n-1}$ rows. The reason for this is as follows. First suppose that ${\sigma_n(M)=0}$, so that ${M}$ is non-invertible, and there is a linear dependence between the rows ${X_1,\ldots,X_n}$. Thus, one of the ${X_i}$ will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the ${n}$ distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so ${\sigma_n(M)=0}$.

More generally, if the least singular value ${\sigma_n(M)}$ is small, one generically expects many of these ${n}$ distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row ${X_i}$ and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of ${X_i}$ with a unit normal ${n_i}$ of this hyperplane.

When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal ${n_i}$ (which depends on all the rows other than ${X_i}$) is independent of ${X_i}$, so even after conditioning ${n_i}$ to be fixed, the entries of ${X_i}$ remain independent. As such, the dot product ${X_i \cdot n_i}$ is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal ${n_i}$ is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)

These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).

To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble ${M = (\xi_{ij})_{1 \leq i,j \leq n}}$ of random sign matrices, where ${\xi_{ij} = \pm 1}$ are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The distribution of the smallest singular values“, submitted to Geom. Func. Anal..   This paper concerns the least singular value $\sigma_n(M)$ of a random $n \times n$ matrix $M_n = (a_{ij})_{1 \leq i,j \leq n}$ with iid entries, which for simplicity we will take to be real (we also have analogues for complex random matrices), with mean zero and variance one.  A typical model to keep in mind here is the Bernoulli model, when each $a_{ij}$ is equal to +1 or -1 with an equal probability of each, while a privileged role will be planed by the gaussian model, when each $a_{ij} \equiv N(0,1)$ is given the standard gaussian distribution.

The distribution of the least singular value $\sigma_n(M_n)$, which is of importance in smoothed analysis and also has intrinsic interest within the field of random matrices, has been intensively studied in recent years.  For instance, in the Bernoulli case, there have been several recent papers on the singularity probability ${\Bbb P}( \sigma_n(M_n) = 0 )$; it is not hard to obtain a lower bound of $(\frac{1}{2}+o(1))^n$, and this is conjectured to be the correct asymptotic.  The best upper bound so far is by Bourgain, Vu, and Wood, who obtain $(\frac{1}{\sqrt{2}}+o(1))^n$.

Upper and lower tail bounds have also been obtained, starting with the breakthrough paper of Rudelson (building upon some earlier work on rectangular matrices by Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), with subsequent papers by Van and myself, by Rudelson, and also by Rudelson and Vershynin.  To oversimplify somewhat, the conclusion of this work is that the least singular value $\sigma_n(M_n)$ has size comparable to $1/\sqrt{n}$ with high probability.  The techniques are based in part on inverse Littlewood-Offord theorems.

However, in the case of the gaussian ensemble, we know more than just the expected size of the least singular value; we know its asymptotic distribution.  Indeed, it was shown by Edelman in this case that one has

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) = 1 - e^{-t^2/2 - t} + o(1)$ (1)

for any fixed $t > 0$.  This computation was highly algebraic in nature, relying on special identities that are available only for extremely symmetric random matrix ensembles, such as the gaussian random matrix model; in particular, it is not obvious at all that the Bernoulli ensemble necessarily obeys the same distribution as the gaussian one.  Nevertheless, motivated in part by this computation, Spielman and Teng conjectured that the bound

${\Bbb P}( \sigma_n(M_n) \leq t/\sqrt{n} ) \leq t + c^n$

should hold for some $c < 1$ for, say, the Bernoulli ensemble.    This conjecture was verified up to losses of a multiplicative constant by Rudelson and Vershynin.

The main result of our paper is to show that the distribution of the least singular value is in fact universal, being asymptotically the same for all iid (real) random matrix models with the same mean and variance, and with a sufficiently high number of moment conditions.  In particular, the asymptotic (1) for the gaussian ensemble is also true for the Bernoulli ensemble. Furthermore the error term o(1) can be shown to be of the shape $O(n^{-c})$ for some c > 0, which in turn confirms the Spielman-Teng conjecture (without a loss of constant) in the polynomial size regime $t \geq n^{-c'}$ for some $c' > 0$.  We also have some further results for other low singular values (e.g. $\sigma_{n-k}(M_n)$ for fixed k) but they are harder to state, and we will not do so here.

To our knowledge, this is the first universality result for the “hard edge” of the spectrum (i.e. the least few singular values) for iid square matrix models.  [For rectangular matrices, where the hard edge is bounded away from zero, universality was recently established by Feldheim and Sodin.] The bulk distribution for the singular values of such matrices has been known for some time (it is governed by the famous Marchenko-Pastur law), while the distribution at the “soft edge” of the spectrum (i.e. the largest few singular values) was established to be universal by Soshnikov (here the distribution is governed by the Tracy-Widom law for the top singular value, and by the Airy kernel for the next few singular values).  Both of these results are basically obtained by the moment method (or close relatives of this method, such as the resolvent method).  However, the moment method is not effective for discerning the hard edge of the spectrum, since the singular values here are very small compared with the bulk and so have a negligible influence on the moments.  [In the rectangular case, where the hard edge is bounded away from zero, the moment method becomes available again, though the application of it is quite delicate; see the Feldheim-Sodin paper for details.] Instead, we proceed by observing a certain central limit theorem type behaviour for the geometry of the columns of $M_n$, which is enough to give us the desired universality; more details on the proof lie below the fold.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: A general approach for the least singular value problem“, submitted to Israel J. Math.. This paper continues a recent series of papers by ourselves and also by Rudelson and by RudelsonVershynin on understanding the least singular value $\sigma_n(A) := \inf_{\|v\|=1} \|Av\|$ of a large random $n \times n$ random complex matrix A. There are many random matrix models that one can consider, but here we consider models of the form $A = M_n + N_n$, where $M_n = {\Bbb E}(A)$ is a deterministic matrix depending on n, and $N_n$ is a random matrix whose entries are iid with some complex distribution x of mean zero and unit variance. (In particular, this model is useful for studying the normalised resolvents $(\frac{1}{\sqrt{n}} N_n - zI)^{-1}$ of random iid matrices $N_n$, which are of importance in the spectral theory of these matrices; understanding the least singular value of random perturbations of deterministic matrices is also important in numerical analysis, and particularly in smoothed analysis of algorithms such as the simplex method.)

In the model mean zero case $M_n = 0$, the normalised singular values $\frac{1}{\sqrt{n}} \sigma_1(A) \geq \ldots \geq \frac{1}{\sqrt{n}} \sigma_n(A) \geq 0$ of $A = N_n$ are known to be asymptotically distributed according to the Marchenko-Pastur distribution $\frac{1}{\pi} (4-x^2)_+^{1/2} dx$, which in particular implies that most of the singular values are continuously distributed (via a semicircular distribution) in the interval ${}[0, 2\sqrt{n}]$. (Assuming only second moment hypotheses on the underlying distribution x, this result is due to Yin; there are many earlier results assuming stronger hypotheses on x.) This strongly suggests, but does not formally prove, that the least singular value $\sigma_n(A)$ should be of size $\sim 1/\sqrt{n}$ on the average. (To get such a sharp bound on the least singular value via the Marchenko-Pastur law would require an incredibly strong bound on the convergence rate to this law, which seems out of reach at present, especially when one does not assume strong moment conditions on x; current results such as those of Götze-Tikhomirov or Chatterjee-Bose give some upper bound on $\sigma_n(A)$ which improves upon the trivial bound of $O(n^{1/2})$ by a polynomial factor assuming certain moment conditions on x, but as far as I am aware these bounds do not get close to the optimal value of $O(n^{-1/2})$, except perhaps in the special case when x is Gaussian.) The statement that $\sigma_n(A) \sim 1/\sqrt{n}$ with high probability has been conjectured (in various forms) in a number of places, for instance by von Neumann, by Smale, and by Spielman-Teng.

This week I am at Rutgers University, giving the Lewis Memorial Lectures for this year, which are also concurrently part of a workshop in random matrices. I gave four lectures, three of which were on random matrices, and one of which was on the Szemerédi regularity lemma.

The titles, abstracts, and slides of these talks are as follows.

1. Szemerédi’s lemma revisited. In this general-audience talk, I discuss the Szemerédi regularity lemma (which, roughly speaking, shows that an arbitrary large dense graph can always be viewed as the disjoint union of a bounded number of pseudorandom components), and how it has recently been reinterpreted in a more analytical (and infinitary) language using the theory of graph limits or of exchangeable measures. I also discuss arithmetic analogues of this lemma, including one which (implicitly) underlies my result with Ben Green that the primes contain arbitrarily long arithmetic progressions.
2. Singularity and determinant of random matrices. Here, I present recent progress in understanding the question of how likely a random matrix (e.g. one whose entries are all +1 or -1 with equal probability) is to be invertible, as well as the related question of how large the determinant should be. The case of continuous matrix ensembles (such as the Gaussian ensemble) is well understood, but the discrete case contains some combinatorial difficulties and took longer to understand properly. In particular I present the results of Kahn-Komlós-Szemerédi and later authors showing that discrete random matrices are invertible with exponentially high probability, and also give some results for the distribution of the determinant.
3. The least singular value of random matrices. A more quantitative version of the question “when is a matrix invertible?” is “what is the least singular value of that matrix”? I present here the recent results of Litvak-Pajor-Rudelson-Tomczak-Jaegermann, Rudelson, myself and Vu, and RudelsonVershynin on addressing this question in the discrete case. A central role is played by the inverse Littlewood-Offord theorems of additive combinatorics, which give reasonably sharp necessary conditions for a discrete random walk to concentrate in a small ball.
4. The circular law. One interesting application of the above theory is to extend the circular law for the spectrum of random matrices from the continuous case to the discrete case. Previous arguments of Girko and Bai for the continuous case can be transplanted to the discrete case, but the key new ingredient needed is a least singular value bound for shifted matrices $M-\lambda I$ in order to avoid the spectrum being overwhelmed by pseudospectrum. It turns out that the results of the preceding lecture are almost precisely what are needed to accomplish this.

[Update, Mar 31: first lecture slides corrected.  Thanks to Yoshiyasu Ishigami for pointing out a slight inaccuracy in the text.]