Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc.. This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures). We conclude by mentioning a few open problems in the subject.

While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly. One such observation is an identity which we call the *negative second moment identity*, which I would like to discuss here. Let A be an matrix; for simplicity we assume that the entries are real-valued. Denote the n rows of A by , which we view as vectors in . Let be the singular values of A. In our applications, the vectors are easily described (e.g. they might be randomly distributed on the discrete cube ), but the distribution of the singular values is much more mysterious, and understanding this distribution is a key objective in this entire theory.

In general, the relationship between the singular values (which encode spectral information about A) and the rows (which encode geometric information about A) are rather complicated. However, there are some simple identities (or “trace formulae”, if you will) that link the two. For instance, by computing the second moment in two different ways, one obtains the *second moment identity*

(1)

where denotes the length of . This simple identity is already enough to get some crude upper bounds on the “average” value of , although it does not preclude the possibility that a lot of singular values are very close to zero, or that a few singular values are extremely large.

The latter scenario (a few very large singular values) can be controlled by higher moment identities, for instance based on the fourth moment . But these moments are not good at controlling the former scenario – when one or more singular values comes close to zero, so that A becomes close to singular (or more precisely, ill-conditioned). For this, we need a different set of identities. One such identity comes from computing the unsigned determinant in two different ways, one using the singular values and one using the elementary base-times-height formula for volume of a parallelopiped. This gives the useful identity

or equivalently (assuming A is non-singular)

. (2)

This identity has some ability to control concentration of singular values near the origin (as the logarithm is large in that region), once one understands the distance between a random vector and a subspace spanned by other random vectors. This is the philosophy used for instance in this paper of mine with Van Vu; related ideas also appear in these papers by Rudelson and Vershynin.

The identity (2) is particularly useful for controlling the least singular value , but is not so useful for controlling other low singular values (e.g. for some moderately small k). For this, we found an alternate identity, based on the *negative* second moment . Observe that the column of has an inner product of 1 with and is orthogonal to all the other rows of A. A little bit of high school geometry then tells us that the length of this column is equal to . Since is equal to both and , we conclude that

(3)

(compare with (1) and (2)). We found the identity (3) to be useful for preventing too many singular values of A from clustering near the origin, much as (1) prevents too many singular values of A from becoming extremely large, thus saving us from having to deploy more sophisticated and lengthier methods to control the singular values . It may well be that further identities or inequalities of this form may simplify these sorts of arguments further.

## 8 comments

Comments feed for this article

18 October, 2008 at 1:09 pm

AnonymousIt is not evident that inner product of Y_j with X_j is one. The expectation of the inner product is one, that is for sure. Then it is not clear whether the equalities hold with expectations or not. But if the goal is to make the reader interested in the field, it is achieved perfectly :)

18 October, 2008 at 4:08 pm

Terence TaoOops, there was a typo: is supposed to be the j^th column of , not the j^th column of A.

20 October, 2008 at 12:05 pm

AnonymousA power of 2 is missing in the |Y_i| terms in the sentence immediately above equation (3).

[Corrected - T.]22 October, 2008 at 12:23 am

AnonymousFour weeks ago you blogged about the “The Princeton Companion to Mathematics” and gave a link to its home page. I enjoyed reading some of the sample chapters given there (I guess I’m gonna buy the book – thanks for bringing it to my attention).

In the sample chapter about “Numerical Analysis” (http://press.princeton.edu/chapters/gowers/gowers_IV_21.pdf), section 4, the author Lloyd N. Trefethen mentions the still missing theoretical analysis of Gaussian elimination with pivoting. Any chance you can tackle this problem using the techniques from your article?

22 October, 2008 at 9:08 am

Terence TaoDear anonymous,

This is a very nice problem. The recent progress on understanding issues such as the effect of random noise on the invertibility of a matrix does support at a heuristic level the empirically verified observation that Gaussian elimination works very well in the presence of random noise, and may indeed help in giving a rigorous explanation of the latter in the future, but there are still significant technical issues to overcome before this is the case. The basic problem is that even if the original matrix A is described by an additive noise model, e.g. A=M+N where M is deterministic and N is a gaussian random matrix, after a few rounds of Gaussian elimination, the matrix that one gets from A is now given by a much more nonlinear random model, and it is not clear how to use the existing technology to continue to guarantee that this new matrix reacts will to future pivoting operations with high probability. One possibility would be to construct some sort of “invariant measure” for the Gaussian elimination algorithm, but it is not obvious to me how one would build such a measure and be able to ensure that it is not singular, unless some algebraic miracle intervenes.

24 October, 2008 at 11:46 am

anonymousdear prof. Tao

in the bottom of page 9 of the article

it is written: {(x_1,…,x_d) \in Z^d | M_i <= m_i <= M^’_i for all 1<=i<=d

you should replace the m_i with x_i

24 October, 2008 at 5:50 pm

Terence TaoDear anonymous: thanks for the correction! We’ll include it in the next major revision of the article.

4 March, 2009 at 8:53 am

Random matrices: The distribution of the smallest singular values « What’s new[...] with our earlier paper on the universality of the circular law, we do not attempt to prove (1) for general ensembles such as the Bernoulli ensemble (say) [...]