One of the basic tools in modern combinatorics is the probabilistic method, introduced by Erdos, in which a deterministic solution to a given problem is shown to exist by constructing a random candidate for a solution, and showing that this candidate solves all the requirements of the problem with positive probability. When the problem requires a real-valued statistic {X} to be suitably large or suitably small, the following trivial observation is often employed:

Proposition 1 (Comparison with mean) Let {X} be a random real-valued variable, whose mean (or first moment) {\mathop{\bf E} X} is finite. Then

\displaystyle  X \leq \mathop{\bf E} X

with positive probability, and

\displaystyle  X \geq \mathop{\bf E} X

with positive probability.

This proposition is usually applied in conjunction with a computation of the first moment {\mathop{\bf E} X}, in which case this version of the probabilistic method becomes an instance of the first moment method. (For comparison with other moment methods, such as the second moment method, exponential moment method, and zeroth moment method, see Chapter 1 of my book with Van Vu. For a general discussion of the probabilistic method, see the book by Alon and Spencer of the same name.)

As a typical example in random matrix theory, if one wanted to understand how small or how large the operator norm {\|A\|_{op}} of a random matrix {A} could be, one might first try to compute the expected operator norm {\mathop{\bf E} \|A\|_{op}} and then apply Proposition 1; see this previous blog post for examples of this strategy (and related strategies, based on comparing {\|A\|_{op}} with more tractable expressions such as the moments {\hbox{tr} A^k}). (In this blog post, all matrices are complex-valued.)

Recently, in their proof of the Kadison-Singer conjecture (and also in their earlier paper on Ramanujan graphs), Marcus, Spielman, and Srivastava introduced an striking new variant of the first moment method, suited in particular for controlling the operator norm {\|A\|_{op}} of a Hermitian positive semi-definite matrix {A}. Such matrices have non-negative real eigenvalues, and so {\|A\|_{op}} in this case is just the largest eigenvalue {\lambda_1(A)} of {A}. Traditionally, one tries to control the eigenvalues through averaged statistics such as moments {\hbox{tr} A^k = \sum_i \lambda_i(A)^k} or Stieltjes transforms {\hbox{tr} (A-z)^{-1} = \sum_i (\lambda_i(A)-z)^{-1}}; again, see this previous blog post. Here we use {z} as short-hand for {zI_d}, where {I_d} is the {d \times d} identity matrix. Marcus, Spielman, and Srivastava instead rely on the interpretation of the eigenvalues {\lambda_i(A)} of {A} as the roots of the characteristic polynomial {p_A(z) := \hbox{det}(z-A)} of {A}, thus

\displaystyle  \|A\|_{op} = \hbox{maxroot}( p_A ) \ \ \ \ \ (1)

where {\hbox{maxroot}(p)} is the largest real root of a non-zero polynomial {p}. (In our applications, we will only ever apply {\hbox{maxroot}} to polynomials that have at least one real root, but for sake of completeness let us set {\hbox{maxroot}(p)=-\infty} if {p} has no real roots.)

Prior to the work of Marcus, Spielman, and Srivastava, I think it is safe to say that the conventional wisdom in random matrix theory was that the representation (1) of the operator norm {\|A\|_{op}} was not particularly useful, due to the highly non-linear nature of both the characteristic polynomial map {A \mapsto p_A} and the maximum root map {p \mapsto \hbox{maxroot}(p)}. (Although, as pointed out to me by Adam Marcus, some related ideas have occurred in graph theory rather than random matrix theory, for instance in the theory of the matching polynomial of a graph.) For instance, a fact as basic as the triangle inequality {\|A+B\|_{op} \leq \|A\|_{op} + \|B\|_{op}} is extremely difficult to establish through (1). Nevertheless, it turns out that for certain special types of random matrices {A} (particularly those in which a typical instance {A} of this ensemble has a simple relationship to “adjacent” matrices in this ensemble), the polynomials {p_A} enjoy an extremely rich structure (in particular, they lie in families of real stable polynomials, and hence enjoy good combinatorial interlacing properties) that can be surprisingly useful. In particular, Marcus, Spielman, and Srivastava established the following nonlinear variant of Proposition 1:

Proposition 2 (Comparison with mean) Let {m,d \geq 1}. Let {A} be a random matrix, which is the sum {A = \sum_{i=1}^m A_i} of independent Hermitian rank one {d \times d} matrices {A_i}, each taking a finite number of values. Then

\displaystyle  \hbox{maxroot}(p_A) \leq \hbox{maxroot}( \mathop{\bf E} p_A )

with positive probability, and

\displaystyle  \hbox{maxroot}(p_A) \geq \hbox{maxroot}( \mathop{\bf E} p_A )

with positive probability.

We prove this proposition below the fold. The hypothesis that each {A_i} only takes finitely many values is technical and can likely be relaxed substantially, but we will not need to do so here. Despite the superficial similarity with Proposition 1, the proof of Proposition 2 is quite nonlinear; in particular, one needs the interlacing properties of real stable polynomials to proceed. Another key ingredient in the proof is the observation that while the determinant {\hbox{det}(A)} of a matrix {A} generally behaves in a nonlinear fashion on the underlying matrix {A}, it becomes (affine-)linear when one considers rank one perturbations, and so {p_A} depends in an affine-multilinear fashion on the {A_1,\ldots,A_m}. More precisely, we have the following deterministic formula, also proven below the fold:

Proposition 3 (Deterministic multilinearisation formula) Let {A} be the sum of deterministic rank one {d \times d} matrices {A_1,\ldots,A_m}. Then we have

\displaystyle  p_A(z) = \mu[A_1,\ldots,A_m](z) \ \ \ \ \ (2)

for all {z \in C}, where the mixed characteristic polynomial {\mu[A_1,\ldots,A_m](z)} of any {d \times d} matrices {A_1,\ldots,A_m} (not necessarily rank one) is given by the formula

\displaystyle  \mu[A_1,\ldots,A_m](z) \ \ \ \ \ (3)

\displaystyle  = (\prod_{i=1}^m (1 - \frac{\partial}{\partial z_i})) \hbox{det}( z + \sum_{i=1}^m z_i A_i ) |_{z_1=\ldots=z_m=0}.

Among other things, this formula gives a useful representation of the mean characteristic polynomial {\mathop{\bf E} p_A}:

Corollary 4 (Random multilinearisation formula) Let {A} be the sum of jointly independent rank one {d \times d} matrices {A_1,\ldots,A_m}. Then we have

\displaystyle  \mathop{\bf E} p_A(z) = \mu[ \mathop{\bf E} A_1, \ldots, \mathop{\bf E} A_m ](z) \ \ \ \ \ (4)

for all {z \in {\bf C}}.

Proof: For fixed {z}, the expression {\hbox{det}( z + \sum_{i=1}^m z_i A_i )} is a polynomial combination of the {z_i A_i}, while the differential operator {(\prod_{i=1}^m (1 - \frac{\partial}{\partial z_i}))} is a linear combination of differential operators {\frac{\partial^j}{\partial z_{i_1} \ldots \partial z_{i_j}}} for {1 \leq i_1 < \ldots < i_j \leq d}. As a consequence, we may expand (3) as a linear combination of terms, each of which is a multilinear combination of {A_{i_1},\ldots,A_{i_j}} for some {1 \leq i_1 < \ldots < i_j \leq d}. Taking expectations of both sides of (2) and using the joint independence of the {A_i}, we obtain the claim. \Box

In view of Proposition 2, we can now hope to control the operator norm {\|A\|_{op}} of certain special types of random matrices {A} (and specifically, the sum of independent Hermitian positive semi-definite rank one matrices) by first controlling the mean {\mathop{\bf E} p_A} of the random characteristic polynomial {p_A}. Pursuing this philosophy, Marcus, Spielman, and Srivastava establish the following result, which they then use to prove the Kadison-Singer conjecture:

Theorem 5 (Marcus-Spielman-Srivastava theorem) Let {m,d \geq 1}. Let {v_1,\ldots,v_m \in {\bf C}^d} be jointly independent random vectors in {{\bf C}^d}, with each {v_i} taking a finite number of values. Suppose that we have the normalisation

\displaystyle  \mathop{\bf E} \sum_{i=1}^m v_i v_i^* = 1

where we are using the convention that {1} is the {d \times d} identity matrix {I_d} whenever necessary. Suppose also that we have the smallness condition

\displaystyle  \mathop{\bf E} \|v_i\|^2 \leq \varepsilon

for some {\varepsilon>0} and all {i=1,\ldots,m}. Then one has

\displaystyle  \| \sum_{i=1}^m v_i v_i^* \|_{op} \leq (1+\sqrt{\varepsilon})^2 \ \ \ \ \ (5)

with positive probability.

Note that the upper bound in (5) must be at least {1} (by taking {v_i} to be deterministic) and also must be at least {\varepsilon} (by taking the {v_i} to always have magnitude at least {\sqrt{\varepsilon}}). Thus the bound in (5) is asymptotically tight both in the regime {\varepsilon\rightarrow 0} and in the regime {\varepsilon \rightarrow \infty}; the latter regime will be particularly useful for applications to Kadison-Singer. It should also be noted that if one uses more traditional random matrix theory methods (based on tools such as Proposition 1, as well as more sophisticated variants of these tools, such as the concentration of measure results of Rudelson and Ahlswede-Winter), one obtains a bound of {\| \sum_{i=1}^m v_i v_i^* \|_{op} \ll_\varepsilon \log d} with high probability, which is insufficient for the application to the Kadison-Singer problem; see this article of Tropp. Thus, Theorem 5 obtains a sharper bound, at the cost of trading in “high probability” for “positive probability”.

In the paper of Marcus, Spielman and Srivastava, Theorem 5 is used to deduce a conjecture {KS_2} of Weaver, which was already known to imply the Kadison-Singer conjecture; actually, a slight modification of their argument gives the paving conjecture of Kadison and Singer, from which the original Kadison-Singer conjecture may be readily deduced. We give these implications below the fold. (See also this survey article for some background on the Kadison-Singer problem.)

Let us now summarise how Theorem 5 is proven. In the spirit of semi-definite programming, we rephrase the above theorem in terms of the rank one Hermitian positive semi-definite matrices {A_i := v_iv_i^*}:

Theorem 6 (Marcus-Spielman-Srivastava theorem again) Let {A_1,\ldots,A_m} be jointly independent random rank one Hermitian positive semi-definite {d \times d} matrices such that the sum {A :=\sum_{i=1}^m A_i} has mean

\displaystyle  \mathop{\bf E} A = I_d

and such that

\displaystyle  \mathop{\bf E} \hbox{tr} A_i \leq \varepsilon

for some {\varepsilon>0} and all {i=1,\ldots,m}. Then one has

\displaystyle  \| A \|_{op} \leq (1+\sqrt{\varepsilon})^2

with positive probability.

In view of (1) and Proposition 2, this theorem follows from the following control on the mean characteristic polynomial:

Theorem 7 (Control of mean characteristic polynomial) Let {A_1,\ldots,A_m} be jointly independent random rank one Hermitian positive semi-definite {d \times d} matrices such that the sum {A :=\sum_{i=1}^m A_i} has mean

\displaystyle  \mathop{\bf E} A = 1

and such that

\displaystyle  \mathop{\bf E} \hbox{tr} A_i \leq \varepsilon

for some {\varepsilon>0} and all {i=1,\ldots,m}. Then one has

\displaystyle  \hbox{maxroot}(\mathop{\bf E} p_A) \leq (1 +\sqrt{\varepsilon})^2.

This result is proven using the multilinearisation formula (Corollary 4) and some convexity properties of real stable polynomials; we give the proof below the fold.

Thanks to Adam Marcus, Assaf Naor and Sorin Popa for many useful explanations on various aspects of the Kadison-Singer problem.

— 1. Multilinearisation formula —

In this section we prove Proposition 3. The key observation is a standard one:

Lemma 8 (Determinant is linear with respect to rank one perturbations) Let {A, B} be {d \times d} matrices, with {A} rank one. Then the polynomial {t \mapsto \hbox{det}(B+tA)} is affine-linear (i.e. it is of the form {t \mapsto a_0 + a_1 t} for some coefficients {a_0,a_1}).

Proof: When {B} is the identity matrix, this follows from the Weinstein–Aronszajn identity {\hbox{det}(1 + XY) = \hbox{det}(1+YX)}, which shows in particular that {\hbox{det}(1 + t u v^*) = 1 + t v^* u} for any rank one matrix {uv^*}. As the determinant is multiplicative, the claim then follows when {B} is any invertible matrix; as the invertible matrices are dense in the space of all matrices, we conclude the general case of the lemma. (One can also prove this lemma via cofactor expansion, after changing variables to make {u} or {v} a basis vector.) \Box

Iterating this, we obtain

Corollary 9 (Determinant is multilinear with respect to rank one perturbations) Let {A_1,\ldots,A_m,B} be {d \times d} matrices, with {A_1,\ldots,A_m} rank one. Then the polynomial {(t_1,\ldots,t_m) \mapsto \hbox{det}(B + t_1 A_1 + \ldots + t_m A_m)} is affine-multilinear, that is to say it is of the form

\displaystyle  (t_1,\ldots,t_m) \mapsto \sum_{1 \leq i_1 < \ldots < i_j \leq m} a_{i_1,\ldots,i_j} t_{i_1} \ldots t_{i_j}

for some coefficients {a_{i_1,\ldots,i_j}}.

Proof: By Lemma 8, the polynomial {(t_1,\ldots,t_m) \mapsto \hbox{det}(B + t_1 A_1 + \ldots + t_m A_m)} is affine-linear in each of the variables {t_i} separately (if one freezes the other {m-1} variables {t_j}). The claim follows. \Box

Now observe that if a one-variable polynomial {t \mapsto P(t)} is affine-linear, then it can be recovered from its order {1} Taylor expansion at the origin:

\displaystyle  P(t) = P(0) + t P'(0)

\displaystyle  = (1 + t \frac{\partial}{\partial z}) P(z)|_{z=0}.

More generally, if a polynomial {(t_1,\ldots,t_m) \mapsto P(t_1,\ldots,t_m)} is affine-multilinear in the sense of Corollary 9, then it can be recovered from an “order {(1,\ldots,1)} Taylor expansion” at the origin by the formula

\displaystyle  P(t_1,\ldots,t_m) = (\prod_{i=1}^m (1 + t_i \frac{\partial}{\partial z_i})) P(z_1,\ldots,z_m) |_{z_1=\ldots=z_m=0}.

Applying this to the polynomial in Corollary 9, we conclude that

\displaystyle  \hbox{det}(B + t_1 A_1 + \ldots + t_m A_m)

\displaystyle = (\prod_{i=1}^m (1 + t_i \frac{\partial}{\partial z_i})) P(z_1,\ldots,z_m) |_{z_1=\ldots=z_m=0}.

Setting {B = x} and {t_1=\ldots=t_m=-1}, we obtain Proposition 3 as required.

— 2. Real stable polynomials and interlacing —

Now we prove Proposition 2. The key observation is that almost all of the polynomials one needs to work with here are of a special class, namely the real stable polynomials.

Definition 10 A polynomial {p: {\bf C}^m \rightarrow {\bf C}} is stable if it has no zeroes in the region {\{ (z_1,\ldots,z_m) \in {\bf C}^m: \hbox{Im}(z_1),\ldots,\hbox{Im}(z_m) > 0 \}} (in particular, {p} is non-zero). A polynomial {p: {\bf C}^m \rightarrow {\bf C}} is real stable if it is stable and all of its coefficients are real.

Observe from conjugation symmetry that a real stable polynomial has no zeroes in either of the two regions {\{ (z_1,\ldots,z_m) \in {\bf C}^m: \hbox{Im}(z_1),\ldots,\hbox{Im}(z_m) > 0 \}} and {\{ (z_1,\ldots,z_m) \in {\bf C}^m: \hbox{Im}(z_1),\ldots,\hbox{Im}(z_m) < 0 \}}. In particular, a one-variable real polynomial is stable if and only if it only has real zeroes. The notion of stability is however more complicated in higher dimensions. See this survey of Wagner for a more thorough discussion of the property of real stability.

To me, the property of being real stable is somewhat similar to properties such as log-convexity or total positivity; these are properties that are only obeyed by a minority of objects being studied, but in many key cases, these properties are available and can be very profitable to exploit. In practice, real stable polynomials get to enjoy the combined benefit of real analysis methods (e.g. the intermediate value theorem) with complex analysis methods (e.g. Rouche’s theorem), because of the guarantee that zeroes will be real.

A basic example of a real stable polynomial is the characteristic polynomial of a Hermitian matrix; this is a rephrasing of the fundamental fact that the eigenvalues of a Hermitian matrix are all real. There is a multidimensional version of this example, which will be the primary source of real stability for us:

Lemma 11 If {A_1,\ldots,A_m} are positive semi-definite Hermitian matrices, then the polynomial {p( z, z_1,\ldots,z_m) := \hbox{det}(z + z_1 A_1 + \ldots+ z_m A_m)} is real stable.

Proof: It is clear that this polynomial is real (note that it takes real values for real inputs). To show that it is stable, observe that if {z, z_1,\ldots,z_m} have positive imaginary part, then the skew-adjoint part of {z + z_1 A_1 + \ldots+ z_m A_m} is strictly positive definite, and so {z + z_1 A_1 + \ldots+ z_m A_m} is non-singular (since the quadratic form {\hbox{Im} \langle (z+z_1 A_1+\ldots+z_m A_m)v, v \rangle} is non-degenerate). \Box

Stable polynomials are also closed under restriction and under differential operators such as {(1 - \frac{\partial}{\partial z_j})}.

Lemma 12 Let {p( z_1,\ldots,z_m )} be a real stable polynomial of {m} variables.

  • (Restriction) If {m > 1} and {t} is real, then the polynomial {p( z_1,\ldots,z_{m-1},t)} is a real stable polynomial of {m-1} variables, unless it is identically zero.
  • (Differentiation) For any real {t}, the polynomial {(1 + t \frac{\partial}{\partial z_m}) p(z_1,\ldots,z_m)} is a real stable polynomial of {m} variables.

Similarly for permutations of the {m} variables.

Proof: The final claim is clear, as the property of real stability is unchanged under permutation of the variables.

We first prove the restriction claim. The polynomial {p( z_1,\ldots,z_{m-1},t)} is clearly real. If we have a zero {p( z_1,\ldots,z_{m-1},t)=0} with {z_1,\ldots,z_{m-1}} having positive imaginary part, but {p(\cdot,\ldots,\cdot,t)} does not vanish identically, then by the stable nature of zeroes (see, e.g., Rouche’s theorem) we will also have a nearby zero {p(z'_1,\ldots,z'_{m-2}, z'_{m-1}, z_m)=0} for any {z_m} sufficiently close to {t}, with {z'_i} arbitrarily close to {z_i}. In particular, by setting {z_m} have positive imaginary part, we contradict the stability of {p(z_1,\ldots,z_m)}.

Now we turn to the differentiation claim. Again the real nature of {(1 + t \frac{\partial}{\partial z_m}) p(z_1,\ldots,z_m)} is clear. To prove stability, it suffices after freezing the first {m-1} variables to show that {p(z) +t p'(z)} is stable whenever {p(z)} is stable. We may of course take {t} non-zero. But a stable polynomial of one variable may be factored as {p(z) = c \prod_{i=1}^d (z-w_i)} for some roots {w_1,\ldots,w_d} with non-positive imaginary part. Since

\displaystyle  p(z) + t p'(z) = p(z) ( 1 + \sum_{i=1}^d \frac{t}{z-w_i} )

we see that if {z} has positive imaginary part, then the imaginary part of {1 + \sum_{i=1}^d \frac{t}{z-w_i}} has the opposite sign to {t}, and is in particular non-zero; since {p(z)} has no zeroes in the upper half-plane, neither does {p(z) +t p'(z)}. \Box

Combining these facts with (3), we immediately obtain:

Corollary 13 (Mixed characteristic polynomials are real stable) If {A_1,\ldots,A_m} are positive semi-definite matrices, then {\mu[A_1,\ldots,A_m]} is real stable. In other words, all the zeroes of {\mu[A_1,\ldots,A_m]} are real.

Now, we relate stability with “convexity” properties of the largest root. Note in general that if {(1-\lambda)p +\lambda q} is a convex combination of two polynomials {p,q}, that there is no requirement that {\hbox{maxroot}((1-\lambda)p +\lambda q)} lie between {\hbox{maxroot}(p)} and {\hbox{maxroot}(q)}. For instance, if {p(z) = (z-1)^2 (z+2)} and {q(z) = (z+1)^2 (z+2)}, then {\hbox{maxroot}(p)=1} and {\hbox{maxroot}(q)=-1}, but {\hbox{maxroot}(\frac{1}{2}p+\frac{1}{2}q) = -2}. Note also that {p,q} are stable in this case. However the situation improves when one also assumes that all convex combinations of {p,q} are stable:

Lemma 14 (Comparison with mean) Let {p(z), q(z)} be real stable polynomials of one variable that are monic of the same degree. Suppose that {(1-t)p +tq} is also real stable for every {0 \leq t \leq 1}. Then for every {0 \leq \lambda \leq 1}, {\hbox{maxroot}((1-\lambda)p + \lambda q)} lies between {\hbox{maxroot}(p)} and {\hbox{maxroot}(q)}.

Proof: We may of course assume {0 <\lambda <1}. Without loss of generality let us assume {\hbox{maxroot}(p) \leq \hbox{maxroot}(q)}, so our task is to show that

\displaystyle \hbox{maxroot}(p) \leq \hbox{maxroot}( (1-\lambda)p + \lambda q) \leq \hbox{maxroot}(q).

The second inequality is easy: as {p,q} are monic, {p(x)} is positive for {x >\hbox{maxroot}(p)} and {q(x)} is positive for {x > \hbox{maxroot}(q)}, and so {(1-\lambda)p(x) + \lambda q(x)} is positive for {x> \hbox{maxroot}(q)}, giving the claim.

The first inequality is trickier. Suppose it fails; then {(1-\lambda)p + \lambda q} has no zero in {[\hbox{maxroot}(p), \hbox{maxroot}(q)]}. We have already seen that this polynomial is positive at {\hbox{maxroot}(q)}, thus it must also be positive at {\hbox{maxroot}(p)}. In other words, {q} is positive at {\hbox{maxroot}(p)}. Since {q} already had a zero at {\hbox{maxroot}(q)} and was positive to the right of this zero, we conclude that {q} has at least two zeroes (counting multiplicity) to right of {\hbox{maxroot}(p)}. On the other hand, for {t} sufficiently close to {0}, {(1-t)p+tq} has no zeroes to the right of {\hbox{maxroot}(p)}, and all intermediate convex combinations {(1-t) p + t q} cannot vanish at {\hbox{maxroot}(p)}. Thus, at some intermediate value of {t}, the number of real zeroes of {(1-t) p + t q} to the right of {\hbox{maxroot}(p)} must acquire a jump discontinuity (e.g. two real zeroes colliding and then disappearing, or vice versa), but this contradicts Rouche’s theorem and real stability. \Box

One can control other roots than the maximal roots of {p,q,(1-t)p+tq} by these arguments, and show that these polynomials have a common interlacing polynomial of one lower degree, but we will not need these stronger results here.

Specialising this to mixed characteristic polynomials {\mu[A_1,\ldots,A_m]} (which are always monic of degree {d}), we conclude

Corollary 15 (Comparison with mean) Let {A_1,\ldots,A_m} be jointly independent random rank one positive semi-definite Hermitian matrices, with each {A_i} taking finitely many values. For any {1 \leq j < m} and any fixed choice of {A_1,\ldots,A_{j-1}}, we have

\displaystyle  \hbox{maxroot}( \mu[ A_1,\ldots,A_{j-1},A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m] )

\displaystyle \leq \hbox{maxroot}( \mu[ A_1,\ldots,A_{j-1}, \mathop{\bf E} A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m] )

with positive probability, and

\displaystyle  \hbox{maxroot}( \mu[ A_1,\ldots,A_{j-1},A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m] )

\displaystyle \geq \hbox{maxroot}( \mu[ A_1,\ldots,A_{j-1}, \mathop{\bf E} A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m] )

with positive probability.

Proof: For fixed {A_1,\ldots,A_{j-1}}, {\mathop{\bf E} A_j} is a convex combination of all the values of {A_j} that occur with positive probability, and hence (by the affine-multilinear nature of {\mu}) {\mu[ A_1,\ldots,A_{j-1}, \mathop{\bf E} A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m]} is a convex combination of the {\mu[ A_1,\ldots,A_{j-1}, A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m]} over the same range of {A_j}. On the other hand, by Corollary 13, all such convex combinations are real stable. Iterating Lemma 14, we see that {\hbox{maxroot}(\mu[ A_1,\ldots,A_{j-1}, \mathop{\bf E} A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m])} must lie in the convex hull of the {\hbox{maxroot}(\mu[ A_1,\ldots,A_{j-1}, A_j,\mathop{\bf E} A_{j+1},\ldots,\mathop{\bf E} A_m])}, and the claim follows. \Box

Iterating the above corollary for {j=1,\ldots,m} and using Proposition 3, we obtain Proposition 2 as desired.

— 3. Convexity —

In order to establish Theorem 7, we will need some convexity properties of real stable polynomials {p}, or more precisely of the log-derivatives {\Phi_p^j := \frac{\frac{\partial}{\partial z_j} p}{p}} of such polynomials, when one is “above” all the zeroes. We begin with the one-dimensional case:

Lemma 16 (One-dimensional convexity) Let {p(z)} be a real stable polynomial of one variable, and let {\Phi_p:= \frac{p'}{p}} be the logarithmic derivative (which is defined away from the zeroes of {p}). Then {(-1)^k \Phi_p^{(k)}(x) > 0} whenever {k=0,1,2\ldots} and {x > \hbox{maxroot}(p)}. In particular, {\Phi_p} is positive, decreasing, and strictly convex to the right of {\hbox{maxroot}(p)}.

Proof: If we factor {p(z) = c \prod_{i=1}^d (z-y_i)}, where {y_1,\ldots,y_d} are the real roots, then {\Phi_p(z) = \sum_{i=1}^d \frac{1}{z-y_i}}. From direct computation we have

\displaystyle  (-1)^k \Phi_p^{(k)}(x) = k! \sum_{i=1}^d \frac{1}{(x-y_i)^{k+1}}.

Since {x-y_i} is positive when {x > \hbox{maxroot}(p)}, the claim follows. \Box

Now we obtain a higher dimensional version of this lemma. If {p(z_1,\ldots,z_m)} is a real stable polynomial, we say that a point {x = (x_1,\ldots,x_m) \in {\bf R}^m} lies above the roots of {p} if {p} has no zeroes on the orthant {\{ (y_1,\ldots,y_m): y_i \geq x_i \hbox{ for all } i=1,\ldots,m\}}.

Lemma 17 (Multi-dimensional convexity) Let {p(z_1,\ldots,z_m)} be a real stable polynomial of {m} variables, let {1 \leq i,j \leq m}, and let {\Phi^i_p := \frac{\partial_{z_i} p}{p}} be the logarithmic derivative in the {e_i} direction. Then {(-1)^k \frac{\partial^k}{\partial z_j^k} \Phi_p^i(x) \geq 0} whenever {k=0,1,2,\ldots} and {x} lies above the roots of {p}. In particular, the function {t \mapsto \Phi_p(x+te_j)} is non-negative, non-increasing, and convex for non-negative {t}.

Proof: For {i=j} or {k=0}, the claim follows from Lemma 16 after freezing all the other variables, so we may assume that {i \neq j} and {k \geq 1}. By freezing all the other variables and relabeling, we may assume that {m=2}, {i=1}, and {j=2}, thus {p(z_1,z_2)} is a real stable polynomial, {x = (x_1,x_2)} lies above the roots of {p}, and the task is to show that

\displaystyle  (-1)^k \frac{\partial^k}{\partial z_2^k} \Phi_p^1(x) \geq 0

for all {k=1,2,\ldots}. Using the variables {x_1,x_2} to denote differentiation instead of {z_1,z_2}, the left-hand side is equal to

\displaystyle  (-1)^k \frac{\partial^k}{\partial x_2^k} \frac{\partial}{\partial x_1} \log p(x)

\displaystyle  = \frac{\partial}{\partial x_1}( (-1)^k \frac{\partial^k}{\partial x_2^k} \log p(x) )

so it suffices to show that {(-1)^k \frac{\partial^k}{\partial x_2^k} \log p(x)} is non-decreasing in the {x_1} direction. By continuity, it suffices to do this for generic {x_1} (thus, we may exclude a finite number of exceptional {x_1} if we wish).

For fixed {x_1}, the polynomial {p_{x_1}: x_2 \mapsto p(x_1,x_2)} is real stable, and thus has real roots, which we denote as {y_1(x_1),\ldots,y_d(x_1)} (counting multiplicity). For generic {x_1}, the number {d} of roots does not depend on {x_1}, the {y_i(x_1)} can be chosen to vary smoothly in {x_1}, and the multiplicity of each root {y_i(x_1)} is locally constant in {x_1} (the latter claim can be seen by noting that the roots of {p_{x_1}} of multiplicity at least {k} are the same as the roots of the resultant of polynomial {p_{x_1}} and its first {k-1} derivatives {p_{x_1}', p_{x_1}',\dots,p_{x_1}^{(k-1)}}, and the set of the latter roots are generically smoothly varying in {x_1}). We then have

\displaystyle (-1)^k \frac{\partial^k}{\partial x_2^k} \log p(x) = -(k-1)! \sum_{i=1}^d \frac{1}{(x_2-y_i(x_1))^k}

so it suffices to show that each of the {\frac{1}{(x_2-y_i(x_1))^k}} is a non-increasing function of {x_1}. But if {(x_1,x_2)} lies above the roots of {p}, then the {y_i(x_1)} all lie below {x_2}; so it suffices to show that the {y_i(x_1)} are all generically non-increasing. But if this were not the case, then (as {y_i(x_1)} is generically smooth), {y_i(x_1)} would have a positive derivative for all {x_1} in an open interval. In particular, there is an {x^0} such that the root {y_i(x_1)} has positive derivative and a constant multiplicity {m} for all {x_1} sufficiently close to {x^0}. This root {y_i(x_1)} depends analytically on {x_1} near {x^0} (this can be seen by noting that {y_i(x_1)} is the unique and simple root near {y_i(x^0)} of the {(m-1)^{th}} derivative of {p_{x_1}} and using the inverse function theorem), so by analytic continuation we conclude that for complex {z_1} near {x^0}, the polynomial {z_2 \mapsto p(z_1,z_2)} has a complex root {y_i(z_1)} near {y_i(x^0)} that depends complex analytically on {z_1}. Since the derivative of {y_i} at {x^0} is a positive real, we conclude that there are zeroes {(z_1,z_2)} of {p} near {(x^0,y^0)} with {\hbox{Im}(z_1), \hbox{Im}(z_2)} both strictly positive, contradicting stability, and the claim follows. \Box

Now we begin the proof of Theorem 7. We abbreviate {\frac{\partial}{\partial z_i}} as {\partial_i}. We start with Proposition 3 to write

\displaystyle  \mathop{\bf E} p_A(z) = (\prod_{i=1}^m (1 - \partial_i)) \hbox{det}( z + \sum_{i=1}^m z_i \mathop{\bf E} A_i ) |_{z_1=\ldots=z_m=0}.

From the hypothesis {\mathop{\bf E} A = 1}, we may rewrite

\displaystyle  z + \sum_{i=1}^m z_i \mathop{\bf E} A_i = \sum_{i=1}^m (z_i+z) \mathop{\bf E} A_i

and so after translating the {z_i} by {z}, we have

\displaystyle  \mathop{\bf E} p_A(z) = (\prod_{i=1}^m (1 - \partial_i)) \hbox{det}( \sum_{i=1}^m z_i \mathop{\bf E} A_i ) |_{z_1=\ldots=z_m=z}.

Replacing each {A_i} by their mean, we reduce to showing the following deterministic estimate:

Theorem 18 Let {A_1,\ldots,A_m} be Hermitian positive semi-definite {d \times d} matrices such that

\displaystyle  \sum_{i=1}^m A_i = I_d \ \ \ \ \ (6)

and such that

\displaystyle  \hbox{tr} A_i \leq \varepsilon \ \ \ \ \ (7)

for some {\varepsilon>0} and all {i=1,\ldots,m}. Let {p(z_1,\ldots,z_m)} be the polynomial

\displaystyle  p(z_1,\ldots,z_m) :=\hbox{det}( \sum_{i=1}^m z_i A_i ).

Then {((1+\sqrt{\varepsilon})^2, \ldots, (1+\sqrt{\varepsilon})^2)} lies above the roots of {(\prod_{i=1}^m (1 - \partial_i)) p}.

We will prove this theorem as follows. First observe that {(t,\ldots,t)} lies above the roots of {p} for any {t>0}, since {\sum_{i=1}^m x_i A_i - t} is positive semi-definite whenever {x_i \geq t}, and so {\sum_{i=1}^m x_i A_i} is non-singular in this region. We now wish to pass from {p} to {(\prod_{i=1}^m (1 - \partial_i)) p} by successive application of the derivative operators {1 - \partial_i}. An initial step in this direction is the following elementary consequence of the monotonicity properties from Lemma 17.

Lemma 19 Let {q(z_1,\ldots,z_m)} be a real stable polynomial, and suppose that {x = (x_1,\ldots,x_m)} lies above the roots of {q}. Suppose that {1 \leq j \leq m}, with the stability bound

\displaystyle  \Phi_q^j(x) < 1.

Then {x} also lies above the roots of {q-\partial_j q}.

Proof: By the monotonicity properties in Lemma 17, {\Phi_q^j(y) < 1} for all {y} above {x} (by which we mean that {y_k \geq x_k} for all {k}). On the other hand, as {q(y)} is non-zero in this region, {q(y) - \partial_j q(y)} can only vanish when {\Phi_q^j = \frac{\partial_j q}{q}} is equal to {1}, and so {x} lies above the roots of {q-\partial_j q} as claimed. \Box

Unfortunately, this lemma does not directly lend itself to iteration, because the lemma provides no control on {\Phi_{q -\partial_j q}}. To fix this problem, we need the following more complicated variant of Lemma 19:

Lemma 20 Let {q(z_1,\ldots,z_m)} be a real stable polynomial, and suppose that {x = (x_1,\ldots,x_m)} lies above the roots of {q}. Suppose that {1 \leq j \leq m}, with the stability bound

\displaystyle  \Phi_q^j(x) + \frac{1}{\delta}\leq 1 \ \ \ \ \ (8)

for some {\delta> 0}. Then {x+\delta e_j} lies above the roots of {q - \partial_j q}, and furthermore one has the monotonicity

\displaystyle  \Phi_{q-\partial_j q}^i( x + \delta e_j) \leq \Phi_q^i(x) \ \ \ \ \ (9)

for all {1 \leq i \leq m}.

Proof: From Lemma 19 we already know that {x} lies above the roots of {q-\partial_j q}, and so {x+\delta e_j} does also. Now we prove (9). Above the roots of {q}, we factorise

\displaystyle  q - \partial_j q = q (1 - \Phi_q^j)

and thus

\displaystyle  \log(q - \partial_j q) = \log q + \log(1 - \Phi_q^j).

Applying {\partial_i}, we conclude that

\displaystyle  \Phi_{q-\delta_j q}^i = \Phi_q^i - \frac{\partial_i \Phi_q^j}{1 - \Phi_q^j}.

The claim (9) is then equivalent to

\displaystyle  - \frac{\partial_i \Phi_q^j(x+\delta e_j)}{1 - \Phi_q^j(x+\delta e_j)} \leq \Phi_q^i(x) - \Phi_q^i(x+\delta e_j).

On the other hand, by Lemma 17, {t \mapsto \Phi_q^i(x+te_i)} is convex for {t \geq 0}, and thus

\displaystyle  \Phi_q^i(x) - \Phi_q^i(x+\delta e_j) \geq - \delta \partial_j \Phi_q^i(x + \delta e_j)

But

\displaystyle \partial_j \Phi_q^i = \partial_j \partial_i \log q = \partial_i \partial_j \log q = \partial_i \Phi_q^j

and this quantity is non-positive at {x+\delta e_j} thanks to Lemma 17. Thus it suffices to show that

\displaystyle  \frac{1}{1 - \Phi_q^j(x+\delta e_j)} \leq \delta.

But by Lemma 17 and (8) one has

\displaystyle  \Phi_q^j(x+\delta e_j) \leq \Phi_q^j(x) \leq 1- \frac{1}{\delta}

and the claim follows. \Box

We can iterate this to obtain

Corollary 21 Let {q(z_1,\ldots,z_m)} be a real stable polynomial, and suppose that {x = (x_1,\ldots,x_m)} lies above the roots of {q}. Suppose that

\displaystyle  \Phi_q^j(x) + \frac{1}{\delta}\leq 1 \ \ \ \ \ (10)

for some {\delta> 0} and all {1 \leq j \leq m}. Then {x+(\delta,\ldots,\delta)} lies above the roots of {(\prod_{i=1}^m (1-\partial_i)) q}.

Proof: For each {0 \leq k \leq m}, set {x_k := x+\delta(e_1+\ldots+e_k)} and {q_k := (\prod_{i=1}^k (1-\partial_i)) q}. By Lemma 20 induction we see that for each {0 \leq k \leq m}, {x_k} lies above the roots of {q_k} with

\displaystyle  \Phi_{q_k}^j(x_k) + \frac{1}{\delta}\leq 1

for all {j=1,\ldots,m}. Setting {k=m}, we obtain the claim. \Box

Finally, we prove Theorem 18. By definition, we have

\displaystyle  \log p( x_1,\ldots,x_m ) = \log \hbox{det} ( \sum_{i=1}^m x_i A_i )

and hence on taking {\partial_j} and using the Jacobi formula we conclude that

\displaystyle  \Phi_p^j( x_1,\ldots,x_m) = \hbox{tr}( A_j (\sum_{i=1}^m x_i A_i)^{-1} ).

In particular, for any {t>0}, we may use (6), (7) to conclude that

\displaystyle  \Phi_p^j(t,\ldots,t) \leq \varepsilon/t.

As remarked before, {(t,\ldots,t)} lies above the zeroes of {p}. Thus, if we choose {t,\delta>0} such that

\displaystyle  \frac{\varepsilon}{t} + \frac{1}{\delta} \leq 1 \ \ \ \ \ (11)

then by Corollary 21, we conclude that {(t+\delta,\ldots,t+\delta)} lies above the zeroes of {(\prod_{i=1}^m (1 - \partial_i)) p}. To optimise {t+\delta} subject to the constraint (11), we set {t := \sqrt{\varepsilon}+\varepsilon} and {\delta := 1+\sqrt{\varepsilon}}, and Theorem 18 follows.

— 4. Weaver, Akemann-Anderson, and paving —

Theorem 5 is probabilistic in nature, but by specialising to a specific type of random vector (and by taking advantage of the freedom to enlarge the vector space {{\bf C}^m}, since the bounds are independent of the dimension {m}), we obtain the following deterministic consequence:

Theorem 22 (Weaver-type bound) Let {m,r,d \geq 1} be integers and {A,B>0} be real numbers. Suppose that {w_1,\ldots,w_m \in {\bf C}^d} are such that {\|w_i\|^2 \leq A} for all {i} and

\displaystyle  \sum_{i=1}^m |\langle u, w_i \rangle|^2 = B \ \ \ \ \ (12)

for every unit vector {u \in{\bf C}^d} (i.e., {w_1,\ldots,w_n} is a {B}tight frame), then there exists a partition {\{1,\ldots,m\} = S_1 \cup \ldots \cup S_r} into {r} components such that

\displaystyle  \sum_{i \in S_j} |\langle u, w_i \rangle|^2 \leq A + 2 r^{-1/2} A^{1/2} B^{1/2} + r^{-1} B \ \ \ \ \ (13)

for {j=1,\ldots,r} and every unit vector {u \in {\bf C}^d}.

Proof: Let notation and the hypotheses be as in the theorem. Define the independent random vectors {v_1,\ldots,v_m \in {\bf C}^{d} \otimes {\bf C}^r} in the {dr}-dimensional tensor product {{\bf C}^d \otimes {\bf C}^r} of {{\bf C}^d} and {{\bf C}^r} by setting

\displaystyle  v_i := \frac{\sqrt{r}}{\sqrt{B}} w_i \otimes e_{j_i} \ \ \ \ \ (14)

for each {i=1,\ldots,m}, where {j_1,\ldots,j_m} are sampled uniformly and independently from {\{1,\ldots,r\}}. From (12) one has

\displaystyle  \sum_{i=1}^m w_i w_i^* = B

so a brief calculation shows that

\displaystyle  \mathop{\bf E} \sum_{i=1}^m v_i v_i^* = 1.

(Here {1} is the identity transformation on {{\bf C}^d \otimes {\bf C}^r}.) Also, we have

\displaystyle  \mathop{\bf E} \|v_i\|^2 \leq \frac{rA}{B}

for all {i}. Applying Theorem 5, we conclude that

\displaystyle  \| \sum_{i=1}^m v_i v_i^* \|_{op} \leq (1 + \sqrt{\frac{rA}{B}})^2

with positive probability. In particular, this occurs for at least one realisation of the {v_i}. Setting {S_j} to be those {i} for which {j_i=j} for each {j=1,\ldots,r}, we conclude that

\displaystyle  \| \sum_{i \in S_j} \frac{r}{B} w_i w_i^* \|_{op} \leq (1 + \sqrt{\frac{rA}{B}})^2

for {j=1,\dots,r}; testing this against a unit vector {u} we conclude that

\displaystyle  \sum_{i \in S_j} |\langle w_i,u\rangle|^2 \leq \frac{B}{r} (1 + \sqrt{\frac{rA}{B}})^2,

and the claim follows after a little arithmetic. \Box

For a given {r \geq 2}, the Weaver conjecture {KS_r} asserts for {A=1} the existence of a {B} such that the above theorem holds with the upper bound in (13) replaced by {B-\delta} for some absolute constant {\delta>0}. This is easily seen to follow from (13) by taking {B} large enough. For instance (as noted by Marcus, Spielman, and Srivastava), if one takes {r=2, A=1, B=18} then the upper bound in (13) is {16}, giving the case {KS_2} of Weaver’s conjecture, which was already known to imply the Kadison-Singer conjecture (see e.g. this paper of Bice for an ultrafilter-based proof of this fact). Note also that in the large {r} limit (keeping {A} fixed), the bound in (13) approaches {A}, which is the optimal value (as can be seen by considering the case where each {v_i} has the maximal magnitude of {A^{1/2}}); this will be important later.

We can dualise Theorem 22 as follows, giving a bound of Akemann-Anderson type:

Corollary 23 (Paving conjecture for projections) Let {m \geq 1}, and let {P = (p_{ij})_{1\leq i,j \leq m}} be an {m \times m} orthogonal projection matrix, that is to say an idempotent Hermitian matrix. Suppose that we have the bound {p_{ii} \leq \varepsilon} for all {1 \leq i\leq m} and some {\varepsilon>0}. Then, for any {r \geq 1}, there exists a partition {1 = Q_1 + \ldots + Q_r} of the identity into diagonal projections {Q_1,\ldots,Q_r} (i.e. the {Q_1,\ldots,Q_r} are diagonal matrices with {0}s and {1}s on the diagonal, that sum to the identity matrix) such that

\displaystyle  \| Q_j PQ_j\|_{op} \leq \varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1}

for all {j=1,\ldots,r}.

Proof: Let {V} be the range of {P}, then {P e_1,\ldots, P e_m} are vectors in {V} with

\displaystyle  \| P e_i \|^2 = \langle P e_i, e_i \rangle = p_{ii} \leq \varepsilon.

Also, for any unit vector {u \in V}, we have

\displaystyle  \sum_{i=1}^m |\langle u, Pe_i \rangle|^2 = \sum_{i=1}^m |\langle Pu, e_i \rangle|^2

\displaystyle  = \|Pu \|^2

\displaystyle  = \|u\|^2

\displaystyle  = 1.

Applying Theorem 22 with {w_i := Pe_i}, {A := \varepsilon}, {B := 1}, we can find a partition {\{1,\ldots,m\} = S_1 \cup \ldots \cup S_r} such that

\displaystyle  \sum_{i \in S_j} |\langle u, Pe_i \rangle|^2 \leq \varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1}

for all unit vectors {u \in V} and {j=1,\ldots,r}. If we let {Q_j} be the diagonal projection {Q = \sum_{i \in S_j} e_ie_i^*}, then

\displaystyle  \sum_{i \in S_j} |\langle u, Pe_i \rangle|^2 = \sum_{i \in S_j} |\langle Pu,e_i \rangle|^2

\displaystyle  = \sum_{i \in S_j} |\langle u, e_i \rangle|^2

\displaystyle  = \| Q_ju \|^2

and so

\displaystyle  \|Q_ju \| \leq (\varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1})^{1/2} \|u\|

for all {u} in {V}. Thus

\displaystyle  \|Q_jP \|_{op} \leq (\varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1})^{1/2}

and thus on squaring

\displaystyle  \|Q_j P Q_j\|_{op} \leq \varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1}

as desired. \Box

This is a special case of the Kadison-Singer paving conjecture, but we may use a number of standard maneuvers to amplify this to the full paving conjecture, loosely following the papers of Popa and of Akemann-Weaver. First, by using the “dilation trick“, we may relax the hypothesis of being a projection:

Corollary 24 (Paving conjecture for positive semi-definite matrices) Let {m \geq 1}, and let {P = (p_{ij})_{1\leq i,j \leq m}} be a Hermitian positive semi-definite {m \times m} matrix with the bound {\|P\|_{op} \leq A} for some {A>0}. Suppose that we have the bound {p_{ii} \leq \varepsilon} for all {1 \leq i\leq m} and some {\varepsilon>0}. Then, for any {r \geq 1}, there exists a partition {1 = Q_1 + \ldots + Q_r} of the identity into diagonal projections {Q_1,\ldots,Q_r} such that

\displaystyle  \| Q_j PQ_j\|_{op} \leq \varepsilon + 2r^{-1/2} \sqrt{\varepsilon A} + r^{-1} A

for all {j=1,\ldots,r}.

Proof: By scaling we may set {A=1}. The matrix {P} is no longer a projection matrix, but by the spectral theorem and the hypotheses on {P}, we still have a decomposition of the form

\displaystyle  P = \sum_{i=1}^m \lambda_i u_i u_i^*

where {u_1,\ldots,u_m} are an orthonormal basis for {{\bf C}^m}, and {0 \leq \lambda_i \leq 1} for all {i}.

Now we embed {{\bf C}^m} in {{\bf C}^m \oplus {\bf C}^M} for some large {M}. If {M} is large enough (depending on {m,\varepsilon}), we may find orthonormal vectors {v_1,\ldots,v_m \in {\bf C}^M} with disjoint supports, such that each of the {v_j} have coefficients of magnitude at most {\varepsilon^{1/2}}. We may then form the dilated vectors

\displaystyle  \tilde u_i := \sqrt{\lambda_i} u_i \oplus \sqrt{1-\lambda_i} v_i

which are an orthonormal system in {{\bf C}^m \oplus {\bf C}^M}, and the dilated operator

\displaystyle  P' := \sum_{i=1}^m \tilde u_i \tilde u_i^*

which is an orthogonal projection on {{\bf C}^m \oplus {\bf C}^M}. This can be viewed as a {m+M \times m+M} matrix, whose top left {m \times m} minor is {P}. By construction, all the diagonal entries of this matrix are at most {\varepsilon}. Applying Corollary 23, we may find a partition {1 = Q'_1 + \ldots + Q'_r} of the identity matrix on {{\bf C}^m \oplus {\bf C}^M} into diagonal projections {Q'_1,\ldots,Q'_r} such that

\displaystyle  \| Q'_j P' Q'_j \|_{op} \leq \varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1}

for all {j=1,\ldots,r}. If we let {Q_j} be the component of {Q'_j} that maps {{\bf C}^m} to {{\bf C}^m} (i.e. the top left {m \times m} minor), then the {Q_1,\ldots,Q_r} are diagonal {m \times m} matrices summing to the identity, and we have

\displaystyle  \| Q_j P Q_j \|_{op} \leq \varepsilon + 2r^{-1/2} \sqrt{\varepsilon} + r^{-1}

for all {j=1,\ldots,r}, as required. \Box

Now we shift by a multiple of the identity:

Corollary 25 (One-sided paving for Hermitian matrices) Let {m \geq 1}, and let {P = (p_{ij})_{1\leq i,j \leq m}} be a Hermitian {m \times m} matrix with vanishing diagonal, whose eigenvalues lie between {-A} and {B} for some {A,B>0}. Then, for any {r \geq 1}, there exists a partition {1 = Q_1 + \ldots + Q_r} of the identity into diagonal projections {Q_1,\ldots,Q_r} such that for each {1 \leq j \leq r}, the eigenvalues of {Q_j P Q_j} lie between {-A} and {2 r^{-1/2} \sqrt{A(A+B)} + r^{-1} (A+B)}.

Proof: Apply Corollary 24 with {P} replaced by {P+A}, {A} replaced by {A+B}, and {\varepsilon} set equal to {A}. \Box

We combine this result with its reflection:

Corollary 26 (Paving conjecture for Hermitian matrices) Let {m \geq 1}, and let {P = (p_{ij})_{1\leq i,j \leq m}} be a Hermitian {m \times m} matrix with vanishing diagonal, with {\|P\|_{op} \leq A} for some {A>0}. Then, for any {r \geq 1}, there exists a partition {1 = Q_1 + \ldots + Q_{r^2}} of the identity into diagonal projections {Q_1,\ldots,Q_{r^2}} such that

\displaystyle  \|Q_j P Q_j \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

for all {j=1,\ldots,r^2}.

Proof: The eigenvalues of {P} lie between {-A} and {A}. By Corollary 25, we may find a partition {1 = Q_1 + \ldots + Q_r} into diagonal projections such that each {Q_j P Q_j} have eigenvalues between {-A} and {2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A}; of course these matrices also have eigenvalues between {-A} and {A} and have zero diagonal. Thus, for each {j}, we may apply Corollary 25 to {-Q_jPQ_j} to find a further splitting {1 = Q_{j,1} + \ldots + Q_{j,r}} into diagonal projections such that {-Q_{j,j'} Q_j P Q_j Q_{j,j'}} lie between {-A} and {2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A}. The eigenvalues of {Q_{j,j'} Q_j P Q_j Q_{j,j'}} are then bounded in magnitude by {2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A}, and the claim follows by using the partition {1 = \sum_{j=1}^r \sum_{j'=1}^r Q_{j,j'} Q_j}. \Box

Finally, we remove the Hermitian condition:

Corollary 27 (Paving conjecture for finite matrices) Let {m \geq 1}, and let {P = (p_{ij})_{1\leq i,j \leq m}} be a matrix with vanishing diagonal, with {\|P\|_{op} \leq A} for some {A>0}. Then, for any {r \geq 1}, there exists a partition {1 = Q_1 + \ldots + Q_{r^4}} of the identity into diagonal projections {Q_1,\ldots,Q_{r^4}} such that

\displaystyle  \|Q_j P Q_j \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

for all {j=1,\ldots,r^4}.

Proof: By applying the preceding corollary to the Hermitian matrices {P+P^*} and {i(P-P^*)}, which both have operator norm at most {2A}, we may find partitions {1 = Q_1 + \ldots + Q_{r^2}} and {1= Q'_1 + \ldots + Q'_{r^2}} into diagonal projections such that

\displaystyle  \|Q_j (P+P^*) Q_j \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

and

\displaystyle  \|Q'_{j'} i(P-P^*) Q'_{j'} \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

for all {1 \leq j,j' \leq r^2}. In particular,

\displaystyle  \|Q'_{j'} Q_j (P \pm P^*) Q_j Q'_{j'} \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

(note that {Q_j, Q'_{j'}} commute). The claim then follows from the triangle inequality. \Box

Because the bound here is independent of the dimension {m}, we may use a standard compactness argument to handle infinite matrices:

Corollary 28 (Paving conjecture for infinite matrices) Let {P = (p_{ij})_{i,j > 0}} be an infinite matrix with vanishing diagonal, with {\|P\|_{op} \leq A} for some {A>0}. Then, for any {r \geq 1}, there exists a partition {1 = Q_1 + \ldots + Q_{r^4}} of the identity into diagonal projections {Q_1,\ldots,Q_{r^4}} such that

\displaystyle  \|Q_j P Q_j \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

for all {j=1,\ldots,r^4}.

Proof: For each {m\geq 1}, let {P^{(m)} := (p_{ij} 1_{i \leq m} 1_{j \leq m})_{i,j>0}} be the top {m \times m} minor of {P}, extended by zero to an infinite matrix. This matrix has zeroes on the diagonal and has operator norm at most {A}. Applying Corollary 27 to the {m\times m} minor of {P} and then extending, we may find a partition {1 = \sum_{j=1}^{r^4} Q^{(m)}_j} into diagonal projections {Q^{(m)}_1,\ldots,Q^{(m)}_{r^4}} such that

\displaystyle  \|Q_j^{(m)} P^{(m)} Q_j^{(m)} \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A \ \ \ \ \ (15)

for all {j=1,\ldots,r^4}.

By the Arzela-Ascoli theorem (or the sequential Banach-Alaoglu theorem), we may pass to a subsequence {m_l} such that each of the {Q^{(m_l)}_j} converges pointwise to a limit {Q_j} (i.e. each coefficient of the matrix {Q^{(m_l)}_j} converges as {l \rightarrow \infty} to the corresponding coefficient of {Q_j}. As the {Q^{(m_l)}_j} are diagonal projections, the {Q_j} are as well; since {1 = \sum_{j=1}^{r^4} Q^{(m_l)}_j}, we see on taking pointwise limits that {1 = \sum_{j=1}^{r^4} Q_j}.

Now let {u, v} be unit vectors in {\ell^2({\bf N})} with finite support. Then {Q_j^{(m_l)} u = Q_j u} for sufficiently large {l}, and similarly {Q_j^{(m_l)} v = Q_j v}. As {Q_j u} and {Q_j v} have finite support, we also have {P^{(m_l)} Q_j u = P Q_j u} on the support of {Q_j v} for sufficiently large {l}, and hence

\displaystyle  \langle P^{(m_l)} Q_j^{(m_l)} u, Q_j^{(m_l)} v \rangle = \langle P Q_j u, Q_j v \rangle = \langle Q_j P Q_j u, v \rangle

for sufficiently large {m}. By (15) and the Cauchy-Schwarz inequality, we conclude that

\displaystyle  |\langle Q_j P Q_j u, v \rangle| \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A;

taking suprema over all unit vectors {u,v} we obtain

\displaystyle  \|Q_j P Q_j \|_{op} \leq 2 \sqrt{2} r^{-1/2} A + 2 r^{-1} A

for all {1 \leq j \leq r^4} as required. \Box

— 5. The Kadison-Singer problem —

To turn to the original formulation of the Kadison-Singer problem, we need some terminology from {C^*}-algebras. We will work with two specific {C^*} algebras: the space {B( \ell^2( {\bf N} ) )} of bounded linear operators on {\ell^2({\bf N})} (or equivalently, infinite matrices with bounded operator norm); and the subspace {A \subset B( \ell^2( {\bf N} ) )} of diagonal matrices in {B(\ell^2({\bf N}))}. These are both examples of {C^*}-algebras, but we will not need the specific definition of a {C^*}-algebra in what follows.

A state {\phi: B(\ell^2({\bf N})) \rightarrow {\bf C}} on {B(\ell^2({\bf N}))} is a bounded linear functional on {B(\ell^2({\bf N}))} with {\phi(1)=1}, such that {\phi(Q) \geq 0} for all positive semi-definite {Q \in B(\ell^2({\bf N}))}. A state {\phi: B(\ell^2({\bf N})) \rightarrow {\bf C}} is pure if it cannot be represented as a convex combination of two other states. Define a state {\phi: A \rightarrow {\bf C}} (and a pure state {\phi: A \rightarrow {\bf C}}) similarly.

The Kadison-Singer problem, now solved in the affirmative by Marcus, Spielman, and Srivastava, states the following:

Theorem 29 (Kadison-Singer problem) Let {\phi: A \rightarrow {\bf C}} be a pure state. Then there is a unique state {\psi: B(\ell^2({\bf N})) \rightarrow {\bf C}} that extends {\phi}. (In particular, this state is automatically pure).

Existence of {\psi} is easy: set {\psi(Q):= \phi(\hbox{diag}(Q))}, where {\hbox{diag}(Q)} is the matrix {Q} with all off-diagonal entries set to zero. The hard part is uniqueness. To solve this problem, let us first understand the structure of pure states on {A}.

Let {\phi: A \rightarrow {\bf C}}, and let {P \in A} be a diagonal projection matrix. Since {P} and {1-P} are both positive semi-definite, we have {\phi(P), \phi(1-P) \geq 0}, and hence {0 \leq \phi(P) \leq 1}. We in fact claim that either {\phi(P)=0} or {\phi(P)=1}. Indeed, suppose that {\phi(P) = \theta} for some {0< \theta < 1}. Then we can write {\phi = (1-\theta) \phi_0 + \theta \phi_1}, where {\phi_1(Q) := \frac{1}{\theta} \phi(PQ)} and {\phi_0(Q) := \frac{1}{1-\theta} \phi((1-P)Q)}. It is easy to see that {\phi_0,\phi_1} are both states on {A}, contradicting the hypothesis that {\phi} is pure.

Remark 30 With only a little more effort, this argument shows that the pure states of {A} are in one-to-one correspondence with the ultrafilters of {{\bf N}}, but we will not need to use the finer properties of ultrafilters here.

Now we show uniqueness. Let {\psi: B(\ell^2({\bf N})) \rightarrow {\bf C}} be a state extending {\phi}. Let {Q \in B(\ell^2({\bf N}))}; our task is to show that {\psi(Q) = \phi( \hbox{diag}(Q))}. By subtracting off {\hbox{diag}(Q)}, we may assume that {Q} has zero diagonal. Our task is now to show that {\psi(Q)=0}.

Let {\varepsilon > 0}. By Corollary 28 (with {r} sufficiently large), we may find a partition {1 = P_1 +\ldots + P_s} into diagonal projections {P_1,\ldots,P_s} such that

\displaystyle  \|P_j Q P_j \|_{op} \leq \varepsilon \ \ \ \ \ (16)

for all {j=1,\ldots,s}.

Recall that {\psi(P_j) = \phi(P_j)} is either {0} or {1}; since {1 = \phi(1) = \sum_j \phi(P_j)}, we have {\phi(P_j)=1} for exactly one {j}, say {j=j_0}, and {\phi(P_j)=0} for all other {j}.

Suppose that {1 \leq j \leq s} is such that {j \neq j_0}. Since {\langle A, B \rangle:= \psi(AB^*)} is a Hermitian semi-definite inner product on {B(\ell^2({\bf N}))}, we have the Cauchy-Schwarz inequality

\displaystyle  |\psi(P_j^* R)| \leq \psi(P_j^* P_j)^{1/2} \psi(R^* R_j);

since {\psi(P_j^* P_j)=\psi(P_j) = 0}, we conclude that {\psi(P_j R)=0} for all {j \neq j_0} and {R \in B(\ell^2({\bf N}))}; similarly {\psi(R P_j)=0}. We conclude that {\psi(Q) = \psi(P_{j_0} Q P_{j_0})}, and hence by (16) we have

\displaystyle  |\psi(Q)| \leq \|\psi\| \varepsilon

where {\|\psi\|} is the operator norm of {\psi}. Since {\varepsilon} may be arbitrarily small, we conclude that {\psi(Q)=0} as required.