Random matrices: A general approach for the least singular value problem

22 May, 2008 in math.PR, paper | Tags: least singular value, random matrices, universality | by Terence Tao

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: A general approach for the least singular value problem“, submitted to Israel J. Math.. This paper continues a recent series of papers by ourselves and also by Rudelson and by Rudelson–Vershynin on understanding the least singular value $\sigma_n(A) := \inf_{\|v\|=1} \|Av\|$ of a large random $n \times n$ random complex matrix A. There are many random matrix models that one can consider, but here we consider models of the form $A = M_n + N_n$ , where $M_n = {\Bbb E}(A)$ is a deterministic matrix depending on n, and $N_n$ is a random matrix whose entries are iid with some complex distribution x of mean zero and unit variance. (In particular, this model is useful for studying the normalised resolvents $(\frac{1}{\sqrt{n}} N_n - zI)^{-1}$ of random iid matrices $N_n$ , which are of importance in the spectral theory of these matrices; understanding the least singular value of random perturbations of deterministic matrices is also important in numerical analysis, and particularly in smoothed analysis of algorithms such as the simplex method.)

In the model mean zero case $M_n = 0$ , the normalised singular values $\frac{1}{\sqrt{n}} \sigma_1(A) \geq \ldots \geq \frac{1}{\sqrt{n}} \sigma_n(A) \geq 0$ of $A = N_n$ are known to be asymptotically distributed according to the Marchenko-Pastur distribution $\frac{1}{\pi} (4-x^2)_+^{1/2} dx$ , which in particular implies that most of the singular values are continuously distributed (via a semicircular distribution) in the interval ${}[0, 2\sqrt{n}]$ . (Assuming only second moment hypotheses on the underlying distribution x, this result is due to Yin; there are many earlier results assuming stronger hypotheses on x.) This strongly suggests, but does not formally prove, that the least singular value $\sigma_n(A)$ should be of size $\sim 1/\sqrt{n}$ on the average. (To get such a sharp bound on the least singular value via the Marchenko-Pastur law would require an incredibly strong bound on the convergence rate to this law, which seems out of reach at present, especially when one does not assume strong moment conditions on x; current results such as those of Götze-Tikhomirov or Chatterjee-Bose give some upper bound on $\sigma_n(A)$ which improves upon the trivial bound of $O(n^{1/2})$ by a polynomial factor assuming certain moment conditions on x, but as far as I am aware these bounds do not get close to the optimal value of $O(n^{-1/2})$ , except perhaps in the special case when x is Gaussian.) The statement that $\sigma_n(A) \sim 1/\sqrt{n}$ with high probability has been conjectured (in various forms) in a number of places, for instance by von Neumann, by Smale, and by Spielman-Teng.

There have been several papers establishing lower tail bounds on the least singular value consistent with the above conjecture of varying degrees of strength, under various hypotheses on the matrix $M_n$ and distribution x. For instance:

In 1988, Edelman showed that ${\Bbb P}( \sigma_n(A) \leq \varepsilon/\sqrt{n}) = O( \varepsilon )$ for any $\varepsilon > 0$ when $M_n=0$ and x is Gaussian (there is also a matching lower bound when $\varepsilon = O(1)$ ).
Some non-trivial bounds on ${\Bbb P}( \sigma_n(A) \leq \varepsilon/\sqrt{n})$ for very small $\varepsilon$ (polynomially small in size) are established implicitly in the case when $M_n$ is a multiple of the identity in the papers of Girko and of Bai assuming some moment and continuity assumptions on x, although these bounds are not specified explicitly.
In 2005, Rudelson showed that ${\Bbb P}( \sigma_n(A) \leq \varepsilon/\sqrt{n}) = O( n \varepsilon + n^{-1/2})$ when $M_n=0$ and x is subgaussian.
In 2005, Van Vu and I showed that for every $C_1 > 0$ there existed $C_2 > 0$ such that ${\Bbb P}( \sigma_n(A) \leq n^{-C_2}) = O( n^{-C_1} )$ , when $M_n=0$ and x is the Bernoulli distribution (equal to +1 or -1 with a equal probability of each).
In 2006, Senkar, Teng, and Spielman extended Edelman’s result to the case when $M_n$ is nonzero (but x is still Gaussian).
In 2007, Rudelson-Vershynin showed that ${\Bbb P}( \sigma_n(A) \leq \varepsilon/\sqrt{n})$ is bounded by $O( \varepsilon^c )$ for $\varepsilon$ independent of n when $M_n=0$ and x has bounded fourth moment, and is bounded by $O( \varepsilon + c^n)$ for all $\varepsilon > 0$ if x is subgaussian.
In 2007, Van Vu and I showed that our earlier result extends to the case when $M_n$ is non-zero, as long as it has polynomial size in n and is also integer-valued; we also allow x to be more general than the Bernoulli distribution, but still integer-valued. (Our restriction to the integer case was due to a certain rounding trick that converted finite precision arithmetic to exact arithmetic, but only when the expressions involved ultimately come from integers.)
In 2007, Van Vu and I removed the integrality assumptions on our previous results, thus establishing the bound ${\Bbb P}( \sigma_n(A) \leq n^{-C_2}) = O( n^{-C_1} )$ whenever M has polynomial size and with no assumptions on x other than zero mean and unit variance. (See also my earlier blog post on this paper.)
In 2008, Rudelson and Vershynin obtained generalisations of their previous results to rectangular matrices (thus also generalising some previous work of Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), but still for the case $M_n=0$ and with subgaussian hypotheses on x.

These recent results are largely based on entropy (or “epsilon-net”) arguments, combined with conditioning arguments, with the entropy bounds in turn originating from inverse Littlewood-Offord theorems; see my Lewis lectures for further discussion.

The current paper is a partial unification of the results of Rudelson and Vershynin (which give very sharp tail estimates, but under strong hypotheses on M and x) and ourselves (which have very general assumptions on M and x, but a weak tail estimate). A little more precisely, we obtain an estimate of the form

${\Bbb P}( \sigma_n(A) \leq \varepsilon/\sqrt{n}) \leq O( n^{o(1)} \varepsilon + n^{-C} + {\Bbb P}( \|N_n\| \geq C \sqrt{n} ) )$

for any fixed $C > 0$ , assuming that $\|M\| = O(\sqrt{n})$ . This result almost recovers those of Rudelson and Vershynin under subgaussian hypotheses on x (which are known to imply exponentially good bounds on ${\Bbb P}( \|N_n\| \geq C \sqrt{n} )$ ) in the case when $\varepsilon$ has polynomial size, except that we lose a factor of $n^{o(1)}$ . One has analogous results here in which $\|M\|$ and $\|N_n\|$ are bounded by some weaker power of n than $n^{1/2}$ ; roughly speaking, if $\|M\|$ and $\|N_n\|$ have size $O(n^\gamma)$ , then we can show that $\sigma_n(A)$ is at least $n^{-(2C+1) \gamma}$ with probability $1 - O(n^{-C+o(1)})$ for any given C.

We also give a simple example that show that the deterioration of these bounds when $\|M\|$ gets large is necessary; in particular, we show that the universality of the bounds of Edelman type can break down once $\|M\|$ exceeds $n^2$ , because one can design M to force the existence of an unusually small value of $Av = (M +N_n)v$ for a certain type of unit vector v.

Our methods here are based on those of earlier papers, particularly our circular law paper; the key innovation is to run a certain scale pigeonholing argument as efficiently as possible, so that one only loses factors of $n^{o(1)}$ in the final bound.

[Update, May 22: some typos corrected.]

5 comments

Comments feed for this article

22 May, 2008 at 11:47 pm

Robert

I know this is quite a different question but maybe you know the answer: What happens over a finite field F, with N elements, say. If I pick the matrix elements independently with a flat distribution, what is the probability that the matrix A is not invertible? My guess would be 1/N, since one could imagine all elements but the last (a11 say) have been drawn already and det(A)= c1 + c2 a11 with some constants c1 and c2. If c2 does not vanish the determinant takes all N different values when a11 is varied but the case of c2=0 (some minor) it depends on the distribution of c1’s…

This problem came up in an encoding problem: There (imagine for example a RAID system) you want to spread information between a number of different people. For example the data are many (say L) vectors in F^k and everybody in a room has access to those vectors. However they cannot remember kL numbers. But if everybody picks one random vector and projects all the L vectors on that random vector and just remebers the random vector and the L scalar products. Then the full information can likely be reconstructed by any choice of k people (namely if their random vectors are linearly independent). So I am asking what is the probability that this fails?

23 May, 2008 at 1:07 am

enric

wow, you can also prove theorems in the future!

(e.g., item 9)

23 May, 2008 at 7:01 am

Emmanuel Kowalski

For the question in Comment 1 (if I understand it right) for matrrices of size $n$ , the probability you want seems to be simply one minus the ratio of the order of the group of invertible matrices divided by $N^{n^2}$ . The order of the group of invertible matrices is well-known, and one finds this probability is

$1-\prod_{k=1}^n{(1-N^{-k})}$

which has a limit (for fixed $N$ ) as $n$ goes to infinity (one minus the inverse of the generating series of the partition function, evaluated at $z=1/N$ ). This is indeed asymptotic to $1/N$ by Taylor expansion, when $N$ is large.

Because the answer is explicit, one can also understand the behavior in many mixed situations when both $N$ and the size $n$ of the matrices are fixed.

23 May, 2008 at 9:40 am

Terence Tao

Dear Enric: Oops, thanks for the correction!

Dear Robert: Emmanuel is correct. One way to see the fact that the probability of being invertible is $\prod_{k=1}^n (1 - N^{-k})$ is to expose one row at a time. After k-1 rows have been exposed, and conditioning on the event that they are already linearly independent, the probability that the k^th row is going to be linearly independent of all previous rows is $(1 - N^{-k})$ . Multiplying all these conditional probabilities together gives the claim.

There is also a paper of Kahn and Komlos that investigates this problem for slightly different matrix models (e.g. random +1/-1 matrices over finite fields):

http://www.ams.org/mathscinet-getitem?mr=1833067

26 May, 2008 at 1:33 am

Robert

Thanks for the help!

I guess, the question I really wanted to ask is: What is the best strategy to come up with many vectors in F^k such that the the probability that a random selection of k of those is linearly independent? Again my guess would be that you cannot do better than picking (non-zero) random vectors and then the probability is what you write.

	Alex Gunning on A symmetric formulation of the…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…
	Ring Theory Intervie… on Reading seminar: “Stable…
	Anonymous on Work hard
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…

Random matrices: A general approach for the least singular value problem

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

5 comments

Leave a comment Cancel reply

For commenters

Random matrices: A general approach for the least singular value problem

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

5 comments

Leave a comment Cancel reply

For commenters