You are currently browsing the tag archive for the ‘operator norm’ tag.

Now that we have developed the basic probabilistic tools that we will need, we now turn to the main subject of this course, namely the study of random matrices. There are many random matrix models (aka matrix ensembles) of interest – far too many to all be discussed in a single course. We will thus focus on just a few simple models. First of all, we shall restrict attention to square matrices ${M = (\xi_{ij})_{1 \leq i,j \leq n}}$, where ${n}$ is a (large) integer and the ${\xi_{ij}}$ are real or complex random variables. (One can certainly study rectangular matrices as well, but for simplicity we will only look at the square case.) Then, we shall restrict to three main models:

• Iid matrix ensembles, in which the coefficients ${\xi_{ij}}$ are iid random variables with a single distribution ${\xi_{ij} \equiv \xi}$. We will often normalise ${\xi}$ to have mean zero and unit variance. Examples of iid models include the Bernouli ensemble (aka random sign matrices) in which the ${\xi_{ij}}$ are signed Bernoulli variables, the real gaussian matrix ensemble in which ${\xi_{ij} \equiv N(0,1)_{\bf R}}$, and the complex gaussian matrix ensemble in which ${\xi_{ij} \equiv N(0,1)_{\bf C}}$.
• Symmetric Wigner matrix ensembles, in which the upper triangular coefficients ${\xi_{ij}}$, ${j \geq i}$ are jointly independent and real, but the lower triangular coefficients ${\xi_{ij}}$, ${j are constrained to equal their transposes: ${\xi_{ij}=\xi_{ji}}$. Thus ${M}$ by construction is always a real symmetric matrix. Typically, the strictly upper triangular coefficients will be iid, as will the diagonal coefficients, but the two classes of coefficients may have a different distribution. One example here is the symmetric Bernoulli ensemble, in which both the strictly upper triangular and the diagonal entries are signed Bernoulli variables; another important example is the Gaussian Orthogonal Ensemble (GOE), in which the upper triangular entries have distribution ${N(0,1)_{\bf R}}$ and the diagonal entries have distribution ${N(0,2)_{\bf R}}$. (We will explain the reason for this discrepancy later.)
• Hermitian Wigner matrix ensembles, in which the upper triangular coefficients are jointly independent, with the diagonal entries being real and the strictly upper triangular entries complex, and the lower triangular coefficients ${\xi_{ij}}$, ${j are constrained to equal their adjoints: ${\xi_{ij} = \overline{\xi_{ji}}}$. Thus ${M}$ by construction is always a Hermitian matrix. This class of ensembles contains the symmetric Wigner ensembles as a subclass. Another very important example is the Gaussian Unitary Ensemble (GUE), in which all off-diagional entries have distribution ${N(0,1)_{\bf C}}$, but the diagonal entries have distribution ${N(0,1)_{\bf R}}$.

Given a matrix ensemble ${M}$, there are many statistics of ${M}$ that one may wish to consider, e.g. the eigenvalues or singular values of ${M}$, the trace and determinant, etc. In these notes we will focus on a basic statistic, namely the operator norm

$\displaystyle \| M \|_{op} := \sup_{x \in {\bf C}^n: |x|=1} |Mx| \ \ \ \ \ (1)$

of the matrix ${M}$. This is an interesting quantity in its own right, but also serves as a basic upper bound on many other quantities. (For instance, ${\|M\|_{op}}$ is also the largest singular value ${\sigma_1(M)}$ of ${M}$ and thus dominates the other singular values; similarly, all eigenvalues ${\lambda_i(M)}$ of ${M}$ clearly have magnitude at most ${\|M\|_{op}}$.) Because of this, it is particularly important to get good upper tail bounds

$\displaystyle {\bf P}( \|M\|_{op} \geq \lambda ) \leq \ldots$

on this quantity, for various thresholds ${\lambda}$. (Lower tail bounds are also of interest, of course; for instance, they give us confidence that the upper tail bounds are sharp.) Also, as we shall see, the problem of upper bounding ${\|M\|_{op}}$ can be viewed as a non-commutative analogue of upper bounding the quantity ${|S_n|}$ studied in Notes 1. (The analogue of the central limit theorem in Notes 2 is the Wigner semi-circular law, which will be studied in the next set of notes.)

An ${n \times n}$ matrix consisting entirely of ${1}$s has an operator norm of exactly ${n}$, as can for instance be seen from the Cauchy-Schwarz inequality. More generally, any matrix whose entries are all uniformly ${O(1)}$ will have an operator norm of ${O(n)}$ (which can again be seen from Cauchy-Schwarz, or alternatively from Schur’s test, or from a computation of the Frobenius norm). However, this argument does not take advantage of possible cancellations in ${M}$. Indeed, from analogy with concentration of measure, when the entries of the matrix ${M}$ are independent, bounded and have mean zero, we expect the operator norm to be of size ${O(\sqrt{n})}$ rather than ${O(n)}$. We shall see shortly that this intuition is indeed correct. (One can see, though, that the mean zero hypothesis is important; from the triangle inequality we see that if we add the all-ones matrix (for instance) to a random matrix with mean zero, to obtain a random matrix whose coefficients all have mean ${1}$, then at least one of the two random matrices necessarily has operator norm at least ${n/2}$.)

As mentioned before, there is an analogy here with the concentration of measure phenomenon, and many of the tools used in the latter (e.g. the moment method) will also appear here. (Indeed, we will be able to use some of the concentration inequalities from Notes 1 directly to help control ${\|M\|_{op}}$ and related quantities.) Similarly, just as many of the tools from concentration of measure could be adapted to help prove the central limit theorem, several the tools seen here will be of use in deriving the semi-circular law.

The most advanced knowledge we have on the operator norm is given by the Tracy-Widom law, which not only tells us where the operator norm is concentrated in (it turns out, for instance, that for a Wigner matrix (with some additional technical assumptions), it is concentrated in the range ${[2\sqrt{n} - O(n^{-1/6}), 2\sqrt{n} + O(n^{-1/6})]}$), but what its distribution in that range is. While the methods in this set of notes can eventually be pushed to establish this result, this is far from trivial, and will only be briefly discussed here. (We may return to the Tracy-Widom law later in this course, though.)