You are currently browsing the tag archive for the ‘restricted isometry property’ tag.

Over two years ago, Emmanuel Candés and I submitted the paper “The Dantzig selector: Statistical estimation when $p$ is much
larger than $n$
” to the Annals of Statistics. This paper, which appeared last year, proposed a new type of selector (which we called the Dantzig selector, due to its reliance on the linear programming methods to which George Dantzig, who had died as we were finishing our paper, had contributed so much to) for statistical estimation, in the case when the number $p$ of unknown parameters is much larger than the number $n$ of observations. More precisely, we considered the problem of obtaining a reasonable estimate $\beta^*$ for an unknown vector $\beta \in {\Bbb R}^p$ of parameters given a vector $y = X \beta + z \in {\Bbb R}^n$ of measurements, where $X$ is a known $n \times p$ predictor matrix and $z$ is a (Gaussian) noise error with some variance $\sigma^2$. We assumed that the predictor matrix X obeyed the restricted isometry property (RIP, also known as UUP), which roughly speaking asserts that $X\beta$ has norm comparable to $\beta$ whenever the vector $\beta$ is sparse. This RIP property is known to hold for various ensembles of random matrices of interest; see my earlier blog post on this topic.

Our selection algorithm, inspired by our previous work on compressed sensing, chooses the estimated parameters $\beta^*$ to have minimal $l^1$ norm amongst all vectors which are consistent with the data in the sense that the residual vector $r := y - X \beta^*$ obeys the condition

$\| X^* r \|_\infty \leq \lambda$, where $\lambda := C \sqrt{\log p} \sigma$ (1)

(one can check that such a condition is obeyed with high probability in the case that $\beta^* = \beta$, thus the true vector of parameters is feasible for this selection algorithm). This selector is similar, though not identical, to the more well-studied lasso selector in the literature, which minimises the $l^1$ norm of $\beta^*$ penalised by the $l^2$ norm of the residual.

A simple model case arises when n=p and X is the identity matrix, thus the observations are given by a simple additive noise model $y_i = \beta_i + z_i$. In this case, the Dantzig selector $\beta^*$ is given by the hard soft thresholding formula

$\beta^*_i = \max(|y_i| - \lambda, 0 ) \hbox{sgn}(y_i).$

The mean square error ${\Bbb E}( \| \beta - \beta^* \|^2 )$ for this selector can be computed to be roughly

$\lambda^2 + \sum_{i=1}^n \min( |y_i|^2, \lambda^2)$ (2)

and one can show that this is basically best possible (except for constants and logarithmic factors) amongst all selectors in this model. More generally, the main result of our paper was that under the assumption that the predictor matrix obeys the RIP, the mean square error of the Dantzig selector is essentially equal to (2) and thus close to best possible.

After accepting our paper, the Annals of Statistics took the (somewhat uncommon) step of soliciting responses to the paper from various experts in the field, and then soliciting a rejoinder to these responses from Emmanuel and I. Recently, the Annals posted these responses and rejoinder on the arXiv: