You are currently browsing the tag archive for the ‘Talagrand's inequality’ tag.

In the theory of discrete random matrices (e.g. matrices whose entries are random signs ${\pm 1}$), one often encounters the problem of understanding the distribution of the random variable ${\hbox{dist}(X,V)}$, where ${X = (x_1,\ldots,x_n) \in \{-1,+1\}^n}$ is an ${n}$-dimensional random sign vector (so ${X}$ is uniformly distributed in the discrete cube ${\{-1,+1\}^n}$), and ${V}$ is some ${d}$-dimensional subspace of ${{\bf R}^n}$ for some ${0 \leq d \leq n}$.

It is not hard to compute the second moment of this random variable. Indeed, if ${P = (p_{ij})_{1 \leq i,j \leq n}}$ denotes the orthogonal projection matrix from ${{\bf R}^n}$ to the orthogonal complement ${V^\perp}$ of ${V}$, then one observes that

$\displaystyle \hbox{dist}(X,V)^2 = X \cdot P X = \sum_{i=1}^n \sum_{j=1}^n x_i x_j p_{ij}$

and so upon taking expectations we see that

$\displaystyle {\Bbb E} \hbox{dist}(X,V)^2 = \sum_{i=1}^n p_{ii} = \hbox{tr} P = n-d \ \ \ \ \ (1)$

since ${P}$ is a rank ${n-d}$ orthogonal projection. So we expect ${\hbox{dist}(X,V)}$ to be about ${\sqrt{n-d}}$ on the average.

In fact, one has sharp concentration around this value, in the sense that ${\hbox{dist}(X,V) = \sqrt{n-d}+O(1)}$ with high probability. More precisely, we have

Proposition 1 (Large deviation inequality) For any ${t>0}$, one has

$\displaystyle {\Bbb P}( |\hbox{dist}(X,V) - \sqrt{n-d}| \geq t ) \leq C \exp(- c t^2 )$

for some absolute constants ${C, c > 0}$.

In fact the constants ${C, c}$ are very civilised; for large ${t}$ one can basically take ${C=4}$ and ${c=1/16}$, for instance. This type of concentration, particularly for subspaces ${V}$ of moderately large codimension ${n-d}$, is fundamental to much of my work on random matrices with Van Vu, starting with our first paper (in which this proposition first appears). (For subspaces of small codimension (such as hyperplanes) one has to use other tools to get good results, such as inverse Littlewood-Offord theory or the Berry-Esséen central limit theorem, but that is another story.)

Proposition 1 is an easy consequence of the second moment computation and Talagrand’s inequality, which among other things provides a sharp concentration result for convex Lipschitz functions on the cube ${\{-1,+1\}^n}$; since ${\hbox{dist}(x,V)}$ is indeed a convex Lipschitz function, this inequality can be applied immediately. The proof of Talagrand’s inequality is short and can be found in several textbooks (e.g. Alon and Spencer), but I thought I would reproduce the argument here (specialised to the convex case), mostly to force myself to learn the proof properly. Note the concentration of ${O(1)}$ obtained by Talagrand’s inequality is much stronger than what one would get from more elementary tools such as Azuma’s inequality or McDiarmid’s inequality, which would only give concentration of about ${O(\sqrt{n})}$ or so (which is in fact trivial, since the cube ${\{-1,+1\}^n}$ has diameter ${2\sqrt{n}}$); the point is that Talagrand’s inequality is very effective at exploiting the convexity of the problem, as well as the Lipschitz nature of the function in all directions, whereas Azuma’s inequality can only easily take advantage of the Lipschitz nature of the function in coordinate directions. On the other hand, Azuma’s inequality works just as well if the ${\ell^2}$ metric is replaced with the larger ${\ell^1}$ metric, and one can conclude that the ${\ell^1}$ distance between ${X}$ and ${V}$ concentrates around its median to a width ${O(\sqrt{n})}$, which is a more non-trivial fact than the ${\ell^2}$ concentration bound given by that inequality. (The computation of the median of the ${\ell^1}$ distance is more complicated than for the ${\ell^2}$ distance, though, and depends on the orientation of ${V}$.)

Remark 1 If one makes the coordinates of ${X}$ iid Gaussian variables ${x_i \equiv N(0,1)}$ rather than random signs, then Proposition 1 is much easier to prove; the probability distribution of a Gaussian vector is rotation-invariant, so one can rotate ${V}$ to be, say, ${{\bf R}^d}$, at which point ${\hbox{dist}(X,V)^2}$ is clearly the sum of ${n-d}$ independent squares of Gaussians (i.e. a chi-square distribution), and the claim follows from direct computation (or one can use the Chernoff inequality). The gaussian counterpart of Talagrand’s inequality is more classical, being essentially due to Lévy, and will also be discussed later in this post.