Van Vu and I have just uploaded to the arXiv our paper “Random matrices: A general approach for the least singular value problem“, submitted to Israel J. Math.. This paper continues a recent series of papers by ourselves and also by Rudelson and by Rudelson–Vershynin on understanding the least singular value of a large random random complex matrix A. There are many random matrix models that one can consider, but here we consider models of the form , where is a deterministic matrix depending on n, and is a random matrix whose entries are iid with some complex distribution x of mean zero and unit variance. (In particular, this model is useful for studying the normalised resolvents of random iid matrices , which are of importance in the spectral theory of these matrices; understanding the least singular value of random perturbations of deterministic matrices is also important in numerical analysis, and particularly in smoothed analysis of algorithms such as the simplex method.)
In the model mean zero case , the normalised singular values of are known to be asymptotically distributed according to the Marchenko-Pastur distribution , which in particular implies that most of the singular values are continuously distributed (via a semicircular distribution) in the interval . (Assuming only second moment hypotheses on the underlying distribution x, this result is due to Yin; there are many earlier results assuming stronger hypotheses on x.) This strongly suggests, but does not formally prove, that the least singular value should be of size on the average. (To get such a sharp bound on the least singular value via the Marchenko-Pastur law would require an incredibly strong bound on the convergence rate to this law, which seems out of reach at present, especially when one does not assume strong moment conditions on x; current results such as those of Götze-Tikhomirov or Chatterjee-Bose give some upper bound on which improves upon the trivial bound of by a polynomial factor assuming certain moment conditions on x, but as far as I am aware these bounds do not get close to the optimal value of , except perhaps in the special case when x is Gaussian.) The statement that with high probability has been conjectured (in various forms) in a number of places, for instance by von Neumann, by Smale, and by Spielman-Teng.
There have been several papers establishing lower tail bounds on the least singular value consistent with the above conjecture of varying degrees of strength, under various hypotheses on the matrix and distribution x. For instance:
- In 1988, Edelman showed that for any when and x is Gaussian (there is also a matching lower bound when ).
- Some non-trivial bounds on for very small (polynomially small in size) are established implicitly in the case when is a multiple of the identity in the papers of Girko and of Bai assuming some moment and continuity assumptions on x, although these bounds are not specified explicitly.
- In 2005, Rudelson showed that when and x is subgaussian.
- In 2005, Van Vu and I showed that for every there existed such that , when and x is the Bernoulli distribution (equal to +1 or -1 with a equal probability of each).
- In 2006, Senkar, Teng, and Spielman extended Edelman’s result to the case when is nonzero (but x is still Gaussian).
- In 2007, Rudelson-Vershynin showed that is bounded by for independent of n when and x has bounded fourth moment, and is bounded by for all if x is subgaussian.
- In 2007, Van Vu and I showed that our earlier result extends to the case when is non-zero, as long as it has polynomial size in n and is also integer-valued; we also allow x to be more general than the Bernoulli distribution, but still integer-valued. (Our restriction to the integer case was due to a certain rounding trick that converted finite precision arithmetic to exact arithmetic, but only when the expressions involved ultimately come from integers.)
- In 2007, Van Vu and I removed the integrality assumptions on our previous results, thus establishing the bound whenever M has polynomial size and with no assumptions on x other than zero mean and unit variance. (See also my earlier blog post on this paper.)
- In 2008, Rudelson and Vershynin obtained generalisations of their previous results to rectangular matrices (thus also generalising some previous work of Litvak, Pajor, Rudelson, and Tomczak-Jaegermann), but still for the case and with subgaussian hypotheses on x.
These recent results are largely based on entropy (or “epsilon-net”) arguments, combined with conditioning arguments, with the entropy bounds in turn originating from inverse Littlewood-Offord theorems; see my Lewis lectures for further discussion.
The current paper is a partial unification of the results of Rudelson and Vershynin (which give very sharp tail estimates, but under strong hypotheses on M and x) and ourselves (which have very general assumptions on M and x, but a weak tail estimate). A little more precisely, we obtain an estimate of the form
for any fixed , assuming that . This result almost recovers those of Rudelson and Vershynin under subgaussian hypotheses on x (which are known to imply exponentially good bounds on ) in the case when has polynomial size, except that we lose a factor of . One has analogous results here in which and are bounded by some weaker power of n than ; roughly speaking, if and have size , then we can show that is at least with probability for any given C.
We also give a simple example that show that the deterioration of these bounds when gets large is necessary; in particular, we show that the universality of the bounds of Edelman type can break down once exceeds , because one can design M to force the existence of an unusually small value of for a certain type of unit vector v.
Our methods here are based on those of earlier papers, particularly our circular law paper; the key innovation is to run a certain scale pigeonholing argument as efficiently as possible, so that one only loses factors of in the final bound.
[Update, May 22: some typos corrected.]
5 comments
Comments feed for this article
22 May, 2008 at 11:47 pm
Robert
I know this is quite a different question but maybe you know the answer: What happens over a finite field F, with N elements, say. If I pick the matrix elements independently with a flat distribution, what is the probability that the matrix A is not invertible? My guess would be 1/N, since one could imagine all elements but the last (a11 say) have been drawn already and det(A)= c1 + c2 a11 with some constants c1 and c2. If c2 does not vanish the determinant takes all N different values when a11 is varied but the case of c2=0 (some minor) it depends on the distribution of c1’s…
This problem came up in an encoding problem: There (imagine for example a RAID system) you want to spread information between a number of different people. For example the data are many (say L) vectors in F^k and everybody in a room has access to those vectors. However they cannot remember kL numbers. But if everybody picks one random vector and projects all the L vectors on that random vector and just remebers the random vector and the L scalar products. Then the full information can likely be reconstructed by any choice of k people (namely if their random vectors are linearly independent). So I am asking what is the probability that this fails?
23 May, 2008 at 1:07 am
enric
wow, you can also prove theorems in the future!
(e.g., item 9)
23 May, 2008 at 7:01 am
Emmanuel Kowalski
For the question in Comment 1 (if I understand it right) for matrrices of size , the probability you want seems to be simply one minus the ratio of the order of the group of invertible matrices divided by . The order of the group of invertible matrices is well-known, and one finds this probability is
which has a limit (for fixed ) as goes to infinity (one minus the inverse of the generating series of the partition function, evaluated at ). This is indeed asymptotic to by Taylor expansion, when is large.
Because the answer is explicit, one can also understand the behavior in many mixed situations when both and the size of the matrices are fixed.
23 May, 2008 at 9:40 am
Terence Tao
Dear Enric: Oops, thanks for the correction!
Dear Robert: Emmanuel is correct. One way to see the fact that the probability of being invertible is is to expose one row at a time. After k-1 rows have been exposed, and conditioning on the event that they are already linearly independent, the probability that the k^th row is going to be linearly independent of all previous rows is . Multiplying all these conditional probabilities together gives the claim.
There is also a paper of Kahn and Komlos that investigates this problem for slightly different matrix models (e.g. random +1/-1 matrices over finite fields):
http://www.ams.org/mathscinet-getitem?mr=1833067
26 May, 2008 at 1:33 am
Robert
Thanks for the help!
I guess, the question I really wanted to ask is: What is the best strategy to come up with many vectors in F^k such that the the probability that a random selection of k of those is linearly independent? Again my guess would be that you cannot do better than picking (non-zero) random vectors and then the probability is what you write.