You are currently browsing the tag archive for the ‘Kadison-Singer problem’ tag.

One of the basic tools in modern combinatorics is the probabilistic method, introduced by Erdos, in which a deterministic solution to a given problem is shown to exist by constructing a random candidate for a solution, and showing that this candidate solves all the requirements of the problem with positive probability. When the problem requires a real-valued statistic {X} to be suitably large or suitably small, the following trivial observation is often employed:

Proposition 1 (Comparison with mean) Let {X} be a random real-valued variable, whose mean (or first moment) {\mathop{\bf E} X} is finite. Then

\displaystyle  X \leq \mathop{\bf E} X

with positive probability, and

\displaystyle  X \geq \mathop{\bf E} X

with positive probability.

This proposition is usually applied in conjunction with a computation of the first moment {\mathop{\bf E} X}, in which case this version of the probabilistic method becomes an instance of the first moment method. (For comparison with other moment methods, such as the second moment method, exponential moment method, and zeroth moment method, see Chapter 1 of my book with Van Vu. For a general discussion of the probabilistic method, see the book by Alon and Spencer of the same name.)

As a typical example in random matrix theory, if one wanted to understand how small or how large the operator norm {\|A\|_{op}} of a random matrix {A} could be, one might first try to compute the expected operator norm {\mathop{\bf E} \|A\|_{op}} and then apply Proposition 1; see this previous blog post for examples of this strategy (and related strategies, based on comparing {\|A\|_{op}} with more tractable expressions such as the moments {\hbox{tr} A^k}). (In this blog post, all matrices are complex-valued.)

Recently, in their proof of the Kadison-Singer conjecture (and also in their earlier paper on Ramanujan graphs), Marcus, Spielman, and Srivastava introduced an striking new variant of the first moment method, suited in particular for controlling the operator norm {\|A\|_{op}} of a Hermitian positive semi-definite matrix {A}. Such matrices have non-negative real eigenvalues, and so {\|A\|_{op}} in this case is just the largest eigenvalue {\lambda_1(A)} of {A}. Traditionally, one tries to control the eigenvalues through averaged statistics such as moments {\hbox{tr} A^k = \sum_i \lambda_i(A)^k} or Stieltjes transforms {\hbox{tr} (A-z)^{-1} = \sum_i (\lambda_i(A)-z)^{-1}}; again, see this previous blog post. Here we use {z} as short-hand for {zI_d}, where {I_d} is the {d \times d} identity matrix. Marcus, Spielman, and Srivastava instead rely on the interpretation of the eigenvalues {\lambda_i(A)} of {A} as the roots of the characteristic polynomial {p_A(z) := \hbox{det}(z-A)} of {A}, thus

\displaystyle  \|A\|_{op} = \hbox{maxroot}( p_A ) \ \ \ \ \ (1)

where {\hbox{maxroot}(p)} is the largest real root of a non-zero polynomial {p}. (In our applications, we will only ever apply {\hbox{maxroot}} to polynomials that have at least one real root, but for sake of completeness let us set {\hbox{maxroot}(p)=-\infty} if {p} has no real roots.)

Prior to the work of Marcus, Spielman, and Srivastava, I think it is safe to say that the conventional wisdom in random matrix theory was that the representation (1) of the operator norm {\|A\|_{op}} was not particularly useful, due to the highly non-linear nature of both the characteristic polynomial map {A \mapsto p_A} and the maximum root map {p \mapsto \hbox{maxroot}(p)}. (Although, as pointed out to me by Adam Marcus, some related ideas have occurred in graph theory rather than random matrix theory, for instance in the theory of the matching polynomial of a graph.) For instance, a fact as basic as the triangle inequality {\|A+B\|_{op} \leq \|A\|_{op} + \|B\|_{op}} is extremely difficult to establish through (1). Nevertheless, it turns out that for certain special types of random matrices {A} (particularly those in which a typical instance {A} of this ensemble has a simple relationship to “adjacent” matrices in this ensemble), the polynomials {p_A} enjoy an extremely rich structure (in particular, they lie in families of real stable polynomials, and hence enjoy good combinatorial interlacing properties) that can be surprisingly useful. In particular, Marcus, Spielman, and Srivastava established the following nonlinear variant of Proposition 1:

Proposition 2 (Comparison with mean) Let {m,d \geq 1}. Let {A} be a random matrix, which is the sum {A = \sum_{i=1}^m A_i} of independent Hermitian rank one {d \times d} matrices {A_i}, each taking a finite number of values. Then

\displaystyle  \hbox{maxroot}(p_A) \leq \hbox{maxroot}( \mathop{\bf E} p_A )

with positive probability, and

\displaystyle  \hbox{maxroot}(p_A) \geq \hbox{maxroot}( \mathop{\bf E} p_A )

with positive probability.

We prove this proposition below the fold. The hypothesis that each {A_i} only takes finitely many values is technical and can likely be relaxed substantially, but we will not need to do so here. Despite the superficial similarity with Proposition 1, the proof of Proposition 2 is quite nonlinear; in particular, one needs the interlacing properties of real stable polynomials to proceed. Another key ingredient in the proof is the observation that while the determinant {\hbox{det}(A)} of a matrix {A} generally behaves in a nonlinear fashion on the underlying matrix {A}, it becomes (affine-)linear when one considers rank one perturbations, and so {p_A} depends in an affine-multilinear fashion on the {A_1,\ldots,A_m}. More precisely, we have the following deterministic formula, also proven below the fold:

Proposition 3 (Deterministic multilinearisation formula) Let {A} be the sum of deterministic rank one {d \times d} matrices {A_1,\ldots,A_m}. Then we have

\displaystyle  p_A(z) = \mu[A_1,\ldots,A_m](z) \ \ \ \ \ (2)

for all {z \in C}, where the mixed characteristic polynomial {\mu[A_1,\ldots,A_m](z)} of any {d \times d} matrices {A_1,\ldots,A_m} (not necessarily rank one) is given by the formula

\displaystyle  \mu[A_1,\ldots,A_m](z) \ \ \ \ \ (3)

\displaystyle  = (\prod_{i=1}^m (1 - \frac{\partial}{\partial z_i})) \hbox{det}( z + \sum_{i=1}^m z_i A_i ) |_{z_1=\ldots=z_m=0}.

Among other things, this formula gives a useful representation of the mean characteristic polynomial {\mathop{\bf E} p_A}:

Corollary 4 (Random multilinearisation formula) Let {A} be the sum of jointly independent rank one {d \times d} matrices {A_1,\ldots,A_m}. Then we have

\displaystyle  \mathop{\bf E} p_A(z) = \mu[ \mathop{\bf E} A_1, \ldots, \mathop{\bf E} A_m ](z) \ \ \ \ \ (4)

for all {z \in {\bf C}}.

Proof: For fixed {z}, the expression {\hbox{det}( z + \sum_{i=1}^m z_i A_i )} is a polynomial combination of the {z_i A_i}, while the differential operator {(\prod_{i=1}^m (1 - \frac{\partial}{\partial z_i}))} is a linear combination of differential operators {\frac{\partial^j}{\partial z_{i_1} \ldots \partial z_{i_j}}} for {1 \leq i_1 < \ldots < i_j \leq d}. As a consequence, we may expand (3) as a linear combination of terms, each of which is a multilinear combination of {A_{i_1},\ldots,A_{i_j}} for some {1 \leq i_1 < \ldots < i_j \leq d}. Taking expectations of both sides of (2) and using the joint independence of the {A_i}, we obtain the claim. \Box

In view of Proposition 2, we can now hope to control the operator norm {\|A\|_{op}} of certain special types of random matrices {A} (and specifically, the sum of independent Hermitian positive semi-definite rank one matrices) by first controlling the mean {\mathop{\bf E} p_A} of the random characteristic polynomial {p_A}. Pursuing this philosophy, Marcus, Spielman, and Srivastava establish the following result, which they then use to prove the Kadison-Singer conjecture:

Theorem 5 (Marcus-Spielman-Srivastava theorem) Let {m,d \geq 1}. Let {v_1,\ldots,v_m \in {\bf C}^d} be jointly independent random vectors in {{\bf C}^d}, with each {v_i} taking a finite number of values. Suppose that we have the normalisation

\displaystyle  \mathop{\bf E} \sum_{i=1}^m v_i v_i^* = 1

where we are using the convention that {1} is the {d \times d} identity matrix {I_d} whenever necessary. Suppose also that we have the smallness condition

\displaystyle  \mathop{\bf E} \|v_i\|^2 \leq \varepsilon

for some {\varepsilon>0} and all {i=1,\ldots,m}. Then one has

\displaystyle  \| \sum_{i=1}^m v_i v_i^* \|_{op} \leq (1+\sqrt{\varepsilon})^2 \ \ \ \ \ (5)

with positive probability.

Note that the upper bound in (5) must be at least {1} (by taking {v_i} to be deterministic) and also must be at least {\varepsilon} (by taking the {v_i} to always have magnitude at least {\sqrt{\varepsilon}}). Thus the bound in (5) is asymptotically tight both in the regime {\varepsilon\rightarrow 0} and in the regime {\varepsilon \rightarrow \infty}; the latter regime will be particularly useful for applications to Kadison-Singer. It should also be noted that if one uses more traditional random matrix theory methods (based on tools such as Proposition 1, as well as more sophisticated variants of these tools, such as the concentration of measure results of Rudelson and Ahlswede-Winter), one obtains a bound of {\| \sum_{i=1}^m v_i v_i^* \|_{op} \ll_\varepsilon \log d} with high probability, which is insufficient for the application to the Kadison-Singer problem; see this article of Tropp. Thus, Theorem 5 obtains a sharper bound, at the cost of trading in “high probability” for “positive probability”.

In the paper of Marcus, Spielman and Srivastava, Theorem 5 is used to deduce a conjecture {KS_2} of Weaver, which was already known to imply the Kadison-Singer conjecture; actually, a slight modification of their argument gives the paving conjecture of Kadison and Singer, from which the original Kadison-Singer conjecture may be readily deduced. We give these implications below the fold. (See also this survey article for some background on the Kadison-Singer problem.)

Let us now summarise how Theorem 5 is proven. In the spirit of semi-definite programming, we rephrase the above theorem in terms of the rank one Hermitian positive semi-definite matrices {A_i := v_iv_i^*}:

Theorem 6 (Marcus-Spielman-Srivastava theorem again) Let {A_1,\ldots,A_m} be jointly independent random rank one Hermitian positive semi-definite {d \times d} matrices such that the sum {A :=\sum_{i=1}^m A_i} has mean

\displaystyle  \mathop{\bf E} A = I_d

and such that

\displaystyle  \mathop{\bf E} \hbox{tr} A_i \leq \varepsilon

for some {\varepsilon>0} and all {i=1,\ldots,m}. Then one has

\displaystyle  \| A \|_{op} \leq (1+\sqrt{\varepsilon})^2

with positive probability.

In view of (1) and Proposition 2, this theorem follows from the following control on the mean characteristic polynomial:

Theorem 7 (Control of mean characteristic polynomial) Let {A_1,\ldots,A_m} be jointly independent random rank one Hermitian positive semi-definite {d \times d} matrices such that the sum {A :=\sum_{i=1}^m A_i} has mean

\displaystyle  \mathop{\bf E} A = 1

and such that

\displaystyle  \mathop{\bf E} \hbox{tr} A_i \leq \varepsilon

for some {\varepsilon>0} and all {i=1,\ldots,m}. Then one has

\displaystyle  \hbox{maxroot}(\mathop{\bf E} p_A) \leq (1 +\sqrt{\varepsilon})^2.

This result is proven using the multilinearisation formula (Corollary 4) and some convexity properties of real stable polynomials; we give the proof below the fold.

Thanks to Adam Marcus, Assaf Naor and Sorin Popa for many useful explanations on various aspects of the Kadison-Singer problem.

Read the rest of this entry »

Archives