You are currently browsing the tag archive for the ‘matrix completion’ tag.

Emmanuel Candés and I have just uploaded to the arXiv our paper “The power of convex relaxation: near-optimal matrix completion“, submitted to IEEE Inf. Theory.  In this paper we study the matrix completion problem, which one can view as a sort of “non-commutative” analogue of the sparse recovery problem studied in the field of compressed sensing, although there are also some other significant differences between the two problems.   The sparse recovery problem seeks to recover a sparse vector x \in {\Bbb R}^n from some linear measurements Ax = b \in {\Bbb R}^m, where A is a known m \times n matrix.  For general x, classical linear algebra tells us that if m < n, then the problem here is underdetermined and has multiple solutions; but under the additional assumption that x is sparse (most of the entries are zero), it turns out (under various hypotheses on the measurement matrix A, and in particular if A contains a sufficient amount of “randomness” or “incoherence”) that exact recovery becomes possible in the underdetermined case.  Furthermore, recovery is not only theoretically possible, but is also computationally practical in many cases; in particular, under some assumptions on A, one can recover x by minimising the convex norm \| x \|_{\ell^1} over all solutions to Ax=b.

Now we turn to the matrix completion problem.  Instead of an unknown vector x \in {\Bbb R}^n, we now have an unknown matrix M = (m_{ij})_{i \in [n_1], j \in [n_2]} \in {\Bbb R}^{n_1 \times n_2} (we use the shorthand [n] := \{1,\ldots,n\} here). We will take a specific type of underdetermined linear measurement of M, namely we pick a random subset \Omega \subset [n_1] \times [n_2] of the matrix array [n_1] \times [n_2] of some cardinality 1 \leq m \leq n_1 n_2, and form the random sample P_\Omega(M) := (m_{ij})_{(i,j) \in \Omega} \in {\Bbb R}^{\Omega} of M.

Of course, with no further information on M, it is impossible to complete the matrix M from the partial information P_\Omega(M) – we only have m pieces of information and need n_1 n_2.  But suppose we also know that M is low-rank, e.g. has rank less than r; this is an analogue of sparsity, but for matrices rather than vectors.  Then, in principle, we have reduced the number of degrees of freedom for M from n_1 n_2 to something more like O( r \max(n_1,n_2) ), and so (in analogy with compressed sensing) one may now hope to perform matrix completion with a much smaller fraction of samples, and in particular with m close to r \max(n_1,n_2).

This type of problem comes up in several real-world applications, most famously in the Netflix prize.  The Netflix prize problem is to be able to predict a very large ratings matrix M, whose rows are the customers, whose columns are the movies, and the entries are the rating that each customer would hypothetically assign to each movie.  Of course, not every customer has rented every movie from Netflix, and so only a small fraction P_\Omega(M) of this matrix is actually known.  However, if one makes the assumption that most customers’ rating preference is determined by only a small number of characteristics of the movie (e.g. genre, lead actor/actresses, director, year, etc.), then the matrix should be (approximately) low rank, and so the above type of analysis should be useful (though of course it is not going to be the only tool of use in this messy, real-world problem).

Actually, one expects to need to oversample the matrix by a logarithm or two in order to have a good chance of exact recovery, if one is sampling randomly.  This can be seen even in the rank one case r=1, in which M=uv^* is the product of a column vector and a row vector; let’s consider square matrices n_1=n_2=n for simplicity.  Observe that if the sampled coordinates \Omega completely miss one of the rows of the matrix, then the corresponding element of u has gone completely unmeasured, and one cannot hope to complete this row of the matrix.   Thus one needs to sample every row (and also every column) of the n \times n matrix.  The solution to the coupon collector’s problem then tells us that one needs about O(n \log n) samples to achieve this goal.  In fact, the theory of Erdős-Rényi random graphs tells us that the bipartite graph induced by \Omega becomes almost surely connected beyond this threshold, which turns out to be exactly what is needed to perform matrix completion for rank 1 matrices.

On the other hand, one cannot hope to complete the matrix if some of the singular vectors of the matrix are extremely sparse.  For instance, in the Netflix problem, a singularly idiosyncratic customer (or dually, a singularly unclassifiable movie) may give rise to a row or column of M that has no relation to the rest of the matrix, occupying its own separate component of the singular value decomposition of M; such a row or column is then impossible to complete exactly without sampling the entirety of that row or column.  Thus, to get exact matrix completion from a small fraction of entries, one needs some sort of incoherence assumption on the singular vectors, which spreads them out across all coordinates in a roughly even manner, as opposed to being concentrated on just a few values.

In a recent paper, Candés and Recht proposed solving the matrix completion problem by minimising the nuclear norm (or trace norm)

\|M\|_* = \sum_{i=1}^{\min(n_1,n_2)} \sigma_i(M) = \hbox{tr}( M M^*)^{1/2}

amongst all matrices consistent with the observed data P_\Omega(M).  This nuclear norm is the non-commutative counterpart to the \ell^1 norm for vectors, and so this algorithm is analogous to the \ell^1 minimisation (or basis pursuit) algorithm which is effective for compressed sensing (though not the only such algorithm for this task).  They showed, roughly speaking, that exact matrix completion (for, say, square matrices n_1=n_2=n for simplicity) is ensured with high probability so long as the singular vectors obey a certain incoherence property (basically, their \ell^\infty norm should be close to the minimal possible value, namely O(1/\sqrt{n})), so long as one had the condition

m \gg n^{1.2} r \log n.

This differs from the presumably optimal threshold of nr \log n by a factor of about n^{0.2}.

The main result of our paper is to mostly eliminate this gap, at the cost of a stronger hypothesis on the matrix being measured:

Main theorem. (Informal statement)  Suppose the n_1 \times n_2 matrix M has rank r and obeys a certain “strong incoherence property”.  Then with high probability, nuclear norm minimisation will recover M from a random sample P_\Omega(M) provided that m \gg n r \log^{O(1)} n, where n := \max(n_1,n_2).

A result of a broadly similar nature, but with a rather different recovery algorithm and with a somewhat different range of applicability, was recently established by Keshavan, Oh, and Montanari.  The strong incoherence property is somewhat technical, but is related to the Candés-Recht incoherence property and is satisfied by a number of reasonable random matrix models.  The exponent O(1) here is reasonably civilised (ranging between 2 and 9, depending on the specific model and parameters being used).

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,868 other followers