Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc..  This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures).  We conclude by mentioning a few open problems in the subject.

While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly.  One such observation is an identity which we call the negative second moment identity, which I would like to discuss here.    Let A be an n \times n matrix; for simplicity we assume that the entries are real-valued.  Denote the n rows of A by X_1,\ldots,X_n, which we view as vectors in {\Bbb R}^n.  Let \sigma_1(A) \geq \ldots \geq \sigma_n(A) \geq 0 be the singular values of A. In our applications, the vectors X_j are easily described (e.g. they might be randomly distributed on the discrete cube \{-1,1\}^n), but the distribution of the singular values \sigma_j(A) is much more mysterious, and understanding this distribution is a key objective in this entire theory.

In general, the relationship between the singular values \sigma_j(A) (which encode spectral information about A) and the rows X_j (which encode geometric information about A) are rather complicated.  However, there are some simple identities (or “trace formulae”, if you will) that link the two.  For instance, by computing the second moment \hbox{tr}(A A^*) in two different ways, one obtains the second moment identity

\sum_{j=1}^n \sigma_j(A)^2 = \sum_{j=1}^n |X_j|^2 (1)

where |X_j| denotes the length of X_j.  This simple identity is already enough to get some crude upper bounds on the “average” value of \sigma_j, although it does not preclude the possibility that a lot of singular values are very close to zero, or that a few singular values are extremely large.

The latter scenario (a few very large singular values) can be controlled by higher moment identities, for instance based on the fourth moment \hbox{tr}(AA^* AA^*).  But these moments are not good at controlling the former scenario – when one or more singular values comes close to zero, so that A becomes close to singular (or more precisely, ill-conditioned).  For this, we need a different set of identities.  One such identity comes from computing the unsigned determinant |\det(A)| = \det(AA^*)^{1/2} in two different ways, one using the singular values and one using the elementary base-times-height formula for volume of a parallelopiped.  This gives the useful identity

\prod_{j=1}^n \sigma_j(A) = \prod_{j=1}^n \hbox{dist}(X_j, \hbox{span}(X_1,\ldots,X_{j-1}))

or equivalently (assuming A is non-singular)

\sum_{j=1}^n \log \sigma_j(A) = \sum_{j=1}^n \log \hbox{dist}(X_j, \hbox{span}(X_1,\ldots,X_{j-1})). (2)

This identity has some ability to control concentration of singular values near the origin (as the logarithm is large in that region), once one understands the distance between a random vector and a subspace spanned by other random vectors.  This is the philosophy used for instance in this paper of mine with Van Vu; related ideas also appear in these papers by Rudelson and Vershynin.

The identity (2) is particularly useful for controlling the least singular value \sigma_n(A), but is not so useful for controlling other low singular values (e.g. \sigma_{n-k}(A) for some moderately small k).  For this, we found an alternate identity, based on the negative second moment \hbox{tr}( (A^{-1})^* A^{-1} ).   Observe that the j^{th} column Y_j of A^{-1} has an inner product of 1 with X_j and is orthogonal to all the other rows of A.  A little bit of high school geometry then tells us that the length |Y_j| of this column is equal to 1 / \hbox{dist}( X_j, \hbox{span}(X_1,\ldots,X_{j-1},X_{j+1},\ldots,X_n) ).  Since \hbox{tr}( (A^{-1})^* A^{-1} ) is equal to both \sum_{j=1}^n \sigma_j(A)^{-2} and \sum_{j=1}^n |Y_j|^2, we conclude that

\sum_{j=1}^n \sigma_j(A)^{-2} = \sum_{j=1}^n \hbox{dist}( X_j, \hbox{span}(X_1,\ldots,X_{j-1},X_{j+1},\ldots,X_n) )^{-2} (3)

(compare with (1) and (2)).  We found the identity (3) to be useful for preventing too many singular values of A from clustering near the origin, much as (1) prevents too many singular values of A from becoming extremely large, thus saving us from having to deploy more sophisticated and lengthier methods to control the singular values \sigma_j(A).  It may well be that further identities or inequalities of this form may simplify these sorts of arguments further.