You are currently browsing the category archive for the ‘math.OA’ category.

I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if ${D,X \in B(H)}$ are bounded operators on a Hilbert space ${H}$ whose commutator ${[D,X] := DX-XD}$ is close to the identity in the sense that

$\displaystyle \| [D,X] - I \|_{op} \leq \varepsilon \ \ \ \ \ (1)$

for some ${\varepsilon > 0}$, then one has the lower bound

$\displaystyle \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (2)$

In the other direction, for any ${0 < \varepsilon < 1}$, there are examples of operators ${D,X \in B(H)}$ obeying (1) such that

$\displaystyle \| X \|_{op} \|D \|_{op} \ll \varepsilon^{-2}. \ \ \ \ \ (3)$

In this paper we improve the upper bound to come closer to the lower bound:

Theorem 1 For any ${0 < \varepsilon < 1/2}$, and any infinite-dimensional ${H}$, there exist operators ${D,X \in B(H)}$ obeying (1) such that

$\displaystyle \| X \|_{op} \|D \|_{op} \ll \log^{16} \frac{1}{\varepsilon}. \ \ \ \ \ (4)$

One can probably improve the exponent ${16}$ somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to ${1}$ without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.

We now sketch the methods of proof. The construction giving (3) proceeded by first identifying ${B(H)}$ with the algebra ${M_2(B(H))}$ of ${2 \times 2}$ matrices that have entries in ${B(H)}$. It is then possible to find two matrices ${D, X \in M_2(B(H))}$ whose commutator takes the form

$\displaystyle [D,X] = \begin{pmatrix} I & u \\ 0 & I \end{pmatrix}$

for some bounded operator ${u \in B(H)}$ (for instance one can take ${u}$ to be an isometry). If one then conjugates ${D, X}$ by the diagonal operator ${\mathrm{diag}(\varepsilon,1)}$, one can eusure that (1) and (3) both hold.

It is natural to adapt this strategy to ${n \times n}$ matrices ${D,X \in M_n(B(H))}$ rather than ${2 \times 2}$ matrices, where ${n}$ is a parameter at one’s disposal. If one can find matrices ${D,X \in M_n(B(H))}$ that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator ${[D,X]}$ only differs from the identity in the top right corner, thus

$\displaystyle [D, X] = \begin{pmatrix} I & 0 & 0 & \dots & 0 & S \\ 0 & I & 0 & \dots & 0 & 0 \\ 0 & 0 & I & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & I & 0 \\ 0 & 0 & 0 & \dots & 0 & I \end{pmatrix}.$

for some ${S}$, then by conjugating by a diagonal matrix such as ${\mathrm{diag}( \mu^{n-1}, \mu^{n-2}, \dots, 1)}$ for some ${\mu}$ and optimising in ${\mu}$, one can improve the bound ${\varepsilon^{-2}}$ in (3) to ${O_n( \varepsilon^{-\frac{2}{n-1}} )}$; if the bounds in the implied constant in the ${O_n(1)}$ are polynomial in ${n}$, one can then optimise in ${n}$ to obtain a bound of the form (4) (perhaps with the exponent ${16}$ replaced by a different constant).

The task is then to find almost upper triangular matrices ${D, X}$ whose commutator takes the required form. The lower diagonals of ${D,X}$ must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say ${X}$ by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say ${u}$. After a lot of further lengthy experimentation, I eventually realised that one could conjugate ${X}$ further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that ${X}$ takes the normal form

$\displaystyle X := \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & b_1 \\ I & 0 & 0 & \dots & 0 & b_2 \\ 0 & I & 0 & \dots & 0 & b_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & 0 & b_{n-1} \\ 0 & 0 & 0 & \dots & I & b_n \end{pmatrix}.$

$\displaystyle D := \begin{pmatrix} v & I & 0 & \dots & 0 & b_1 u \\ u & v & 2 I & \dots & 0 & b_2 u \\ 0 & u & v & \dots & 0 & b_3 u \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & v & (n-1) I + b_{n-1} u \\ 0 & 0 & 0 & \dots & u & v + b_n u \end{pmatrix}$

for some ${u,v \in B(H)}$, solving the system of equations

$\displaystyle [v, b_i] + [u, b_{i-1}] + i b_{i+1} + b_i [u, b_n] = 0 \ \ \ \ \ (5)$

for ${i=2,\dots,n-1}$, and also

$\displaystyle [v, b_n] + [u, b_{n-1}] + b_n [u, b_n] = n \cdot 1_{B(H)}. \ \ \ \ \ (6)$

It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes ${u,v}$ to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in ${n}$ in the implied constant.

There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if ${A \in B(H)}$ obeys the bounds

$\displaystyle \|A \| = O(1)$

and

$\displaystyle \| A \| = O( \mathrm{dist}( A, {\bf C} + K(H) )^{2/3} ) \ \ \ \ \ (7)$

(where ${{\bf C} + K(H)}$ denotes the space of all operators of the form ${\lambda I + T}$ with ${\lambda \in {\bf C}}$ and ${T}$ compact), then there exist operators ${D,X \in B(H)}$ with ${\|D\|, \|X\| = O(1)}$ such that ${A = [D,X]}$. (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite ${W^*}$ algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that ${\mathrm{dist}(A, {\bf C} + K(H)) = \varepsilon}$ and ${\|A\| = O( \varepsilon^{2/3})}$ for some ${0 < \varepsilon \ll 1}$. A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors ${e_n, f_n, g_n}$ for ${n=1,2,\dots}$ such that for each ${n}$, one has ${A e_n = \varepsilon_n f_n + v_n}$ for some ${\varepsilon_n}$ comparable to ${\varepsilon}$, and some ${v_n}$ orthogonal to all of the ${e_n,f_n,g_n}$. After some conjugation (and a suitable identification of ${B(H)}$ with ${M_2(B(H))}$, one can thus place ${A}$ in a normal form

$\displaystyle A = \begin{pmatrix} \varepsilon^{2/3} x & \varepsilon v^* \\ \varepsilon^{2/3} y & \varepsilon^{2/3} z \end{pmatrix}$

where ${v \in B(H)}$ is a isometry with infinite deficiency, and ${x,y,z \in B(H)}$ have norm ${O(1)}$. Setting ${\varepsilon' := \varepsilon^{1/3}}$, it then suffices to solve the commutator equation

$\displaystyle [D,X] = \begin{pmatrix} x & \varepsilon' v^* \\ y & z \end{pmatrix}$

with ${\|D\|_{op} \|X\|_{op} \ll (\varepsilon')^{-2}}$; note the similarity with (3).

By the usual Hilbert’s hotel construction, one can complement ${v}$ with another isometry ${u}$ obeying the “Hilbert’s hotel” identity

$\displaystyle uu^* + vv^* = I$

and also ${u^* u = v^* v = I}$, ${u^* v = v^* u = 0}$. Proceeding as in the previous post, we can try the ansatz

$\displaystyle D = \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, X = \begin{pmatrix} b & \varepsilon' I \\ c & d \end{pmatrix}$

for some operators ${a,b,c,d \in B(H)}$, leading to the system of equations

$\displaystyle [\frac{1}{2} u^*, b] + [\frac{1}{2} u^* - v^*, c] = x+z$

$\displaystyle \varepsilon' a = [\frac{1}{2} u^*, b] - x$

$\displaystyle \frac{1}{2} u^* c + c (\frac{1}{2} u^* - v^*) + ab-da = y.$

Using the first equation to solve for ${b,c}$, the second to then solve for ${a}$, and the third to then solve for ${c}$, one can obtain matrices ${D,X}$ with the required properties.

Thus far, my attempts to extend this construction to larger matrices with good bounds on ${D,X}$ have been unsuccessful. A model problem would be to express

$\displaystyle \begin{pmatrix} I & 0 & \varepsilon v^* \\ 0 & I & 0 \\ 0 & 0 & I \end{pmatrix}$

as a commutator ${[D,X]}$ with ${\|D\| \|X\|}$ significantly smaller than ${O(\varepsilon^{-2})}$. The construction in my paper achieves something like this, but with ${v^*}$ replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.

I am recording here some notes on a nice problem that Sorin Popa shared with me recently. To motivate the question, we begin with the basic observation that the differentiation operator ${Df(x) := \frac{d}{dx} f(x)}$ and the position operator ${Xf(x) := xf(x)}$ in one dimension formally obey the commutator equation

$\displaystyle [D,X] = 1 \ \ \ \ \ (1)$

where ${1}$ is the identity operator and ${[D,X] := DX-XD}$ is the commutator. Among other things, this equation is fundamental in quantum mechanics, leading for instance to the Heisenberg uncertainty principle.

The operators ${D,X}$ are unbounded on spaces such as ${L^2({\bf R})}$. One can ask whether the commutator equation (1) can be solved using bounded operators ${D,X \in B(H)}$ on a Hilbert space ${H}$ rather than unbounded ones. In the finite dimensional case when ${D, X}$ are just ${n \times n}$ matrices for some ${n \geq 1}$, the answer is clearly negative, since the left-hand side of (1) has trace zero and the right-hand side does not. What about in infinite dimensions, when the trace is not available? As it turns out, the answer is still negative, as was first worked out by Wintner and Wielandt. A short proof can be given as follows. Suppose for contradiction that we can find bounded operators ${D, X}$ obeying (1). From (1) and an easy induction argument, we obtain the identity

$\displaystyle [D,X^n] = n X^{n-1} \ \ \ \ \ (2)$

for all natural numbers ${n}$. From the triangle inequality, this implies that

$\displaystyle n \| X^{n-1} \|_{op} \leq 2 \|D\|_{op} \| X^n \|_{op}.$

Iterating this, we conclude that

$\displaystyle \| X \|_{op} \leq \frac{(2 \|D\|_{op})^{n-1}}{n!} \|X^n \|_{op}$

for any ${n}$. Bounding ${\|X^n\|_{op} \leq \|X\|_{op}^n}$ and then sending ${n \rightarrow \infty}$, we conclude that ${\|X\|_{op}=0}$, which clearly contradicts (1). (Note the argument can be generalised without difficulty to the case when ${D,X}$ lie in a Banach algebra, rather than be bounded operators on a Hilbert space.)

It was observed by Popa that there is a quantitative version of this result:

Theorem 1 Let ${D, X \in B(H)}$ such that

$\displaystyle \| [D,X] - I \|_{op} \leq \varepsilon$

for some ${\varepsilon > 0}$. Then we have

$\displaystyle \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (3)$

Proof: By multiplying ${D}$ by a suitable constant and dividing ${X}$ by the same constant, we may normalise ${\|D\|_{op}=1/2}$. Write ${DX - XD = 1 + E}$ with ${\|E\|_{op} \leq \varepsilon}$. Then the same induction that established (2) now shows that

$\displaystyle [D,X^n]= n X^{n-1} + X^{n-1} E + X^{n-2} E X + \dots + E X^{n-1}$

and hence by the triangle inequality

$\displaystyle n \| X^{n-1} \|_{op} \leq \| X^n \|_{op} + n \varepsilon \|X\|_{op}^{n-1}.$

We divide by ${n!}$ and sum to conclude that

$\displaystyle \sum_{n=0}^\infty \frac{\|X^n\|_{op}}{n!} \leq \sum_{n=1}^\infty \frac{\|X^n\|_{op}}{n!} + \varepsilon \exp( \|X\|_{op} )$

giving the claim.
$\Box$

Again, the argument generalises easily to any Banach algebra. Popa then posed the question of whether the quantity ${\frac{1}{2} \log \frac{1}{\varepsilon}}$ can be replaced by any substantially larger function of ${\varepsilon}$, such as a polynomial in ${\frac{1}{\varepsilon}}$. As far as I know, the above simple bound has not been substantially improved.

In the opposite direction, one can ask for constructions of operators ${X,D}$ that are not too large in operator norm, such that ${[D,X]}$ is close to the identity. Again, one cannot do this in finite dimensions: ${[D,X]}$ has trace zero, so at least one of its eigenvalues must outside the disk ${\{ z: |z-1| < 1\}}$, and therefore ${\|[D,X]-1\|_{op} \geq 1}$ for any finite-dimensional ${n \times n}$ matrices ${X,D}$.

However, it was shown in 1965 by Brown and Pearcy that in infinite dimensions, one can construct operators ${D,X}$ with ${[D,X]}$ arbitrarily close to ${1}$ in operator norm (in fact one can prescribe any operator for ${[D,X]}$ as long as it is not equal to a non-zero multiple of the identity plus a compact operator). In the above paper of Popa, a quantitative version of the argument (based in part on some earlier work of Apostol and Zsido) was given as follows. The first step is to observe the following Hilbert space version of Hilbert’s hotel: in an infinite dimensional Hilbert space ${H}$, one can locate isometries ${u, v \in B(H)}$ obeying the equation

$\displaystyle uu^* + vv^* = 1, \ \ \ \ \ (4)$

where ${u^*}$ denotes the adjoint of ${u}$. For instance, if ${H}$ has a countable orthonormal basis ${e_1, e_2, \dots}$, one could set

$\displaystyle u := \sum_{n=1}^\infty e_{2n-1} e_n^*$

and

$\displaystyle v := \sum_{n=1}^\infty e_{2n} e_n^*,$

where ${e_n^*}$ denotes the linear functional ${x \mapsto \langle x, e_n \rangle}$ on ${H}$. Observe that (4) is again impossible to satisfy in finite dimension ${n}$, as the left-hand side must have trace ${2n}$ while the right-hand side has trace ${n}$.

As ${u,v}$ are isometries, we have

$\displaystyle v^* v = u^* u = 1; \ \ \ \ \ (5)$

Multiplying (4) on the left by ${v^*}$ and right by ${u}$, or on the left by ${u^*}$ and right by ${v}$, then gives

$\displaystyle v^* u = u^* v = 0. \ \ \ \ \ (6)$

From (4), (5) we see in particular that, while we cannot express ${1}$ as a commutator of bounded operators, we can at least express it as the sum of two commutators:

$\displaystyle [u^*, u] + [v^*, v] =1.$

We can rewrite this somewhat strangely as

$\displaystyle [\frac{1}{2} u^*, 4u+2v] + [\frac{1}{2} u^* - v^*, -2v] = 2$

and hence there exists a bounded operator ${a}$ such that

$\displaystyle [\frac{1}{2} u^*, 4u+2v] = 1+a; \quad [\frac{1}{2} u^* - v^*, -2v] = 1-a.$

Moving now to the Banach algebra of ${2 \times 2}$ matrices with entries in ${B(H)}$ (which can be equivalently viewed as ${B(H \oplus H)}$), a short computation then gives the identity

$\displaystyle \left[ \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, \begin{pmatrix} 4u+2v & 1 \\ 0 & -2v \end{pmatrix} \right] = \begin{pmatrix} 1 & v^* \\ b & 1 \end{pmatrix}$

for some bounded operator ${b}$ whose exact form will not be relevant for the argument. Now, by Neumann series (and the fact that ${u,v}$ have unit operator norm), we can find another bounded operator ${c}$ such that

$\displaystyle c + \frac{1}{2} v c u^* = b,$

and then another brief computation shows that

$\displaystyle \left[ \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, \begin{pmatrix} 4u+2v & 1 \\ vc & -2v \end{pmatrix} \right] = \begin{pmatrix} 1 & v^* \\ 0 & 1 \end{pmatrix}.$

Thus we can express the operator ${\begin{pmatrix} 1 & v^* \\ 0 & 1 \end{pmatrix}}$ as the commutator of two operators of norm ${O(1)}$. Conjugating by ${\begin{pmatrix} \varepsilon^{1/2} & 0 \\ 0 & \varepsilon^{-1/2} \end{pmatrix}}$ for any ${0 < \varepsilon \leq 1}$, we may then express ${\begin{pmatrix} 1 & \varepsilon v^* \\ 0 & 1 \end{pmatrix}}$ as the commutator of two operators of norm ${O(\varepsilon^{-1})}$. This shows that the right-hand side of (3) cannot be replaced with anything that blows up faster than ${\varepsilon^{-2}}$ as ${\varepsilon \rightarrow 0}$. Can one improve this bound further?

The wave equation is usually expressed in the form

$\displaystyle \partial_{tt} u - \Delta u = 0$

where ${u \colon {\bf R} \times {\bf R}^d \rightarrow {\bf C}}$ is a function of both time ${t \in {\bf R}}$ and space ${x \in {\bf R}^d}$, with ${\Delta}$ being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain ${{\bf R}^d}$ with some other manifold and replacing the Laplacian ${\Delta}$ with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on ${{\bf R}^d}$. We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

$\displaystyle \int_{{\bf R}^d} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx$

which we can rewrite using integration by parts and the ${L^2}$ inner product ${\langle, \rangle}$ on ${{\bf R}^d}$ as

$\displaystyle \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle \partial_t u(t), \partial_t u(t) \rangle.$

A key feature of the wave equation is finite speed of propagation: if, at time ${t=0}$ (say), the initial position ${u(0)}$ and initial velocity ${\partial_t u(0)}$ are both supported in a ball ${B(x_0,R) := \{ x \in {\bf R}^d: |x-x_0| \leq R \}}$, then at any later time ${t>0}$, the position ${u(t)}$ and velocity ${\partial_t u(t)}$ are supported in the larger ball ${B(x_0,R+t)}$. This can be seen for instance (formally, at least) by inspecting the exterior energy

$\displaystyle \int_{|x-x_0| > R+t} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx$

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time ${t=0}$.

The wave equation is second order in time, but one can turn it into a first order system by working with the pair ${(u(t),v(t))}$ rather than just the single field ${u(t)}$, where ${v(t) := \partial_t u(t)}$ is the velocity field. The system is then

$\displaystyle \partial_t u(t) = v(t)$

$\displaystyle \partial_t v(t) = \Delta u(t)$

and the conserved energy is now

$\displaystyle \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (1)$

Finite speed of propagation then tells us that if ${u(0),v(0)}$ are both supported on ${B(x_0,R)}$, then ${u(t),v(t)}$ are supported on ${B(x_0,R+t)}$ for all ${t>0}$. One also has time reversal symmetry: if ${t \mapsto (u(t),v(t))}$ is a solution, then ${t \mapsto (u(-t), -v(-t))}$ is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times ${t<0}$ using this symmetry.

If one has an eigenfunction

$\displaystyle -\Delta \phi = \lambda^2 \phi$

of the Laplacian, then we have the explicit solutions

$\displaystyle u(t) = e^{\pm it \lambda} \phi$

$\displaystyle v(t) = \pm i \lambda e^{\pm it \lambda} \phi$

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity ${v(0)=0}$, the solution ${u(t)}$ is given via functional calculus by

$\displaystyle u(t) = \cos(t \sqrt{-\Delta}) u(0)$

and the propagator ${\cos(t \sqrt{-\Delta})}$ can be expressed as the average of half-wave operators:

$\displaystyle \cos(t \sqrt{-\Delta}) = \frac{1}{2} ( e^{it\sqrt{-\Delta}} + e^{-it\sqrt{-\Delta}} ).$

One can view ${\cos(t \sqrt{-\Delta} )}$ as a minor of the full wave propagator

$\displaystyle U(t) := \exp \begin{pmatrix} 0 & t \\ t\Delta & 0 \end{pmatrix}$

$\displaystyle = \begin{pmatrix} \cos(t \sqrt{-\Delta}) & \frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}} \\ \sin(t\sqrt{-\Delta}) \sqrt{-\Delta} & \cos(t \sqrt{-\Delta} ) \end{pmatrix}$

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

$\displaystyle \begin{pmatrix} u(t) \\ v(t) \end{pmatrix} = U(t) \begin{pmatrix} u(0) \\ v(0) \end{pmatrix}. \ \ \ \ \ (2)$

Viewing the contraction ${\cos(t\sqrt{-\Delta})}$ as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable ${t}$ now lives on the integers ${{\bf Z}}$ rather than on ${{\bf R}}$, and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

$\displaystyle u(t+1) = P u(t) + v(t) \ \ \ \ \ (3)$

$\displaystyle v(t+1) = P v(t) - (1-P^2) u(t)$

where ${t}$ is now an integer, ${u(t), v(t)}$ take values in some Hilbert space (e.g. ${\ell^2}$ functions on a graph ${G}$), and ${P}$ is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

$\displaystyle u(t+\varepsilon) = P_\varepsilon u(t) + \varepsilon v(t)$

$\displaystyle v(t+\varepsilon) = P_\varepsilon v(t) - \frac{1}{\varepsilon} (1-P_\varepsilon^2) u(t)$

where ${\varepsilon>0}$ is a small parameter (representing the discretised time step), ${t}$ now takes values in the integer multiples ${\varepsilon {\bf Z}}$ of ${\varepsilon}$, and ${P_\varepsilon}$ is the wave propagator operator ${P_\varepsilon := \cos( \varepsilon \sqrt{-\Delta} )}$ or the heat propagator ${P_\varepsilon := \exp( - \varepsilon^2 \Delta/2 )}$ (the two operators are different, but agree to fourth order in ${\varepsilon}$). One can then formally verify that the wave equation emerges from this rescaled system in the limit ${\varepsilon \rightarrow 0}$. (Thus, ${P}$ is not exactly the direct analogue of the Laplacian ${\Delta}$, but can be viewed as something like ${P_\varepsilon = 1 - \frac{\varepsilon^2}{2} \Delta + O( \varepsilon^4 )}$ in the case of small ${\varepsilon}$, or ${P = 1 - \frac{1}{2}\Delta + O(\Delta^2)}$ if we are not rescaling to the small ${\varepsilon}$ case. The operator ${P}$ is sometimes known as the diffusion operator)

Assuming ${P}$ is self-adjoint, solutions to the system (3) formally conserve the energy

$\displaystyle \frac{1}{2} \langle (1-P^2) u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (4)$

This energy is positive semi-definite if ${P}$ is a contraction. We have the same time reversal symmetry as before: if ${t \mapsto (u(t),v(t))}$ solves the system (3), then so does ${t \mapsto (u(-t), -v(-t))}$. If one has an eigenfunction

$\displaystyle P \phi = \cos(\lambda) \phi$

to the operator ${P}$, then one has an explicit solution

$\displaystyle u(t) = e^{\pm it \lambda} \phi$

$\displaystyle v(t) = \pm i \sin(\lambda) e^{\pm it \lambda} \phi$

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field ${v}$ by one unit. Suppose we know that ${P}$ has unit speed in the sense that whenever ${f}$ is supported in a ball ${B(x,R)}$, then ${Pf}$ is supported in the ball ${B(x,R+1)}$. Then an easy induction shows that if ${u(0), v(0)}$ are supported in ${B(x_0,R), B(x_0,R+1)}$ respectively, then ${u(t), v(t)}$ are supported in ${B(x_0,R+t), B(x_0, R+t+1)}$.

The fundamental solution ${U(t) = U^t}$ to the discretised wave equation (3), in the sense of (2), is given by the formula

$\displaystyle U(t) = U^t = \begin{pmatrix} P & 1 \\ P^2-1 & P \end{pmatrix}^t$

$\displaystyle = \begin{pmatrix} T_t(P) & U_{t-1}(P) \\ (P^2-1) U_{t-1}(P) & T_t(P) \end{pmatrix}$

where ${T_t}$ and ${U_t}$ are the Chebyshev polynomials of the first and second kind, thus

$\displaystyle T_t( \cos \theta ) = \cos(t\theta)$

and

$\displaystyle U_t( \cos \theta ) = \frac{\sin((t+1)\theta)}{\sin \theta}.$

In particular, ${P}$ is now a minor of ${U(1) = U}$, and can also be viewed as an average of ${U}$ with its inverse ${U^{-1}}$:

$\displaystyle \begin{pmatrix} P & 0 \\ 0 & P \end{pmatrix} = \frac{1}{2} (U + U^{-1}). \ \ \ \ \ (5)$

As before, ${U}$ is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers ${P^n}$ and ${U^n}$ are discrete analogues of the heat propagators ${e^{t\Delta/2}}$ and wave propagators ${U(t)}$ respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality) Let ${G}$ be a (possibly infinite) regular graph, let ${n \geq 1}$, and let ${x, y}$ be vertices in ${G}$. Then the probability that the simple random walk at ${x}$ lands at ${y}$ at time ${n}$ is at most ${2 \exp( - d(x,y)^2 / 2n )}$, where ${d}$ is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers ${{\bf Z}}$. Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length ${n}$ concentrate on the ball of radius ${O(\sqrt{n})}$ or so centred at the origin of the random walk.

Proof: Let ${P \colon \ell^2(G) \rightarrow \ell^2(G)}$ be the graph Laplacian, thus

$\displaystyle Pf(x) = \frac{1}{D} \sum_{y \sim x} f(y)$

for any ${f \in \ell^2(G)}$, where ${D}$ is the degree of the regular graph and sum is over the ${D}$ vertices ${y}$ that are adjacent to ${x}$. This is a contraction of unit speed, and the probability that the random walk at ${x}$ lands at ${y}$ at time ${n}$ is

$\displaystyle \langle P^n \delta_x, \delta_y \rangle$

where ${\delta_x, \delta_y}$ are the Dirac deltas at ${x,y}$. Using (5), we can rewrite this as

$\displaystyle \langle (\frac{1}{2} (U + U^{-1}))^n \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle$

where we are now using the energy form (4). We can write

$\displaystyle (\frac{1}{2} (U + U^{-1}))^n = {\bf E} U^{S_n}$

where ${S_n}$ is the simple random walk of length ${n}$ on the integers, that is to say ${S_n = \xi_1 + \dots + \xi_n}$ where ${\xi_1,\dots,\xi_n = \pm 1}$ are independent uniform Bernoulli signs. Thus we wish to show that

$\displaystyle {\bf E} \langle U^{S_n} \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle \leq 2 \exp(-d(x,y)^2 / 2n ).$

By finite speed of propagation, the inner product here vanishes if ${|S_n| < d(x,y)}$. For ${|S_n| \geq d(x,y)}$ we can use Cauchy-Schwarz and the unitary nature of ${U}$ to bound the inner product by ${1}$. Thus the left-hand side may be upper bounded by

$\displaystyle {\bf P}( |S_n| \geq d(x,y) )$

and the claim now follows from the Chernoff inequality. $\Box$

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality) Let ${t > 0}$, and let ${f,g \in L^2({\bf R}^d)}$ be supported on compact sets ${F,G}$ respectively. Then

$\displaystyle |\langle e^{t\Delta/2} f, g \rangle| \leq \sqrt{\frac{2t}{\pi d(F,G)^2}} \exp( - d(F,G)^2 / 2t ) \|f\|_{L^2} \|g\|_{L^2}$

where ${d(F,G)}$ is the Euclidean distance between ${F}$ and ${G}$.

Proof: By Fourier inversion one has

$\displaystyle e^{-t\xi^2/2} = \frac{1}{\sqrt{2\pi t}} \int_{\bf R} e^{-s^2/2t} e^{is\xi}\ ds$

$\displaystyle = \sqrt{\frac{2}{\pi t}} \int_0^\infty e^{-s^2/2t} \cos(s \xi )\ ds$

for any real ${\xi}$, and thus

$\displaystyle \langle e^{t\Delta/2} f, g\rangle = \sqrt{\frac{2}{\pi}} \int_0^\infty e^{-s^2/2t} \langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds.$

By finite speed of propagation, the inner product ${\langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds}$ vanishes when ${s < d(F,G)}$; otherwise, we can use Cauchy-Schwarz and the contractive nature of ${\cos(s \sqrt{-\Delta} )}$ to bound this inner product by ${\|f\|_{L^2} \|g\|_{L^2}}$. Thus

$\displaystyle |\langle e^{t\Delta/2} f, g\rangle| \leq \sqrt{\frac{2}{\pi t}} \|f\|_{L^2} \|g\|_{L^2} \int_{d(F,G)}^\infty e^{-s^2/2t}\ ds.$

Bounding ${e^{-s^2/2t}}$ by ${e^{-d(F,G)^2/2t} e^{-d(F,G) (s-d(F,G))/t}}$, we obtain the claim. $\Box$

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than ${{\bf R}^d}$.

The prime number theorem can be expressed as the assertion

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$, where

$\displaystyle \Lambda(n) := \sum_{d|n} \mu(d) \log \frac{n}{d}$

is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula

$\displaystyle \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + O(x) \ \ \ \ \ (2)$

where the second von Mangoldt function ${\Lambda_2}$ is defined by the formula

$\displaystyle \Lambda_2(n) := \sum_{d|n} \mu(d) \log^2 \frac{n}{d} \ \ \ \ \ (3)$

or equivalently

$\displaystyle \Lambda_2(n) = \Lambda(n) \log n + \sum_{d|n} \Lambda(d) \Lambda(\frac{n}{d}). \ \ \ \ \ (4)$

(We are avoiding the use of the ${*}$ symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate

$\displaystyle \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + o(x \log x) \ \ \ \ \ (5)$

suffices.

In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space ${C_c({\bf R})}$ of compactly supported continuous functions ${f: {\bf R} \rightarrow {\bf C}}$ equipped with the convolution operation

$\displaystyle f*g(t) := \int_{\bf R} f(u) g(t-u)\ du.$

This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula

$\displaystyle \lim_{n \rightarrow \infty} \|f^n\|^{1/n} = \sup_{\lambda \in \hat B} |\lambda(f)| \ \ \ \ \ (6)$

for any element ${f}$ of a unital commutative Banach algebra ${B}$, where ${\hat B}$ is the space of characters (i.e., continuous unital algebra homomorphisms from ${B}$ to ${{\bf C}}$) of ${B}$. This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.

The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.

Theorem 1 (Construction of a Banach algebra norm) For any ${G \in C_c({\bf R})}$, let ${\|G\|}$ denote the quantity

$\displaystyle \|G\| := \limsup_{x \rightarrow \infty} |\sum_n \frac{\Lambda(n)}{n} G( \log \frac{x}{n} ) - \int_{\bf R} G(t)\ dt|.$

Then ${\| \|}$ is a seminorm on ${C_c({\bf R})}$ with the bound

$\displaystyle \|G\| \leq \|G\|_{L^1({\bf R})} := \int_{\bf R} |G(t)|\ dt \ \ \ \ \ (7)$

for all ${G \in C_c({\bf R})}$. Furthermore, we have the Banach algebra bound

$\displaystyle \| G * H \| \leq \|G\| \|H\| \ \ \ \ \ (8)$

for all ${G,H \in C_c({\bf R})}$.

We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that ${C_c({\bf R})}$ contains a plentiful supply of local units:

Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum) Let ${\| \|}$ be a seminorm on ${C_c({\bf R})}$ obeying (7), (8). Suppose that ${\| \|}$ is not identically zero. Then there exists ${\xi \in {\bf R}}$ such that

$\displaystyle |\int_{\bf R} G(t) e^{-it\xi}\ dt| \leq \|G\|$

for all ${G \in C_c}$. In particular, by (7), one has

$\displaystyle \|G\| = \| G \|_{L^1({\bf R})}$

whenever ${G(t) e^{-it\xi}}$ is a non-negative function.

The second is a consequence of the Selberg symmetry formula and the fact that ${\Lambda}$ is real (as well as Mertens’ theorem, in the ${\xi=0}$ case), and is closely related to the non-vanishing of the Riemann zeta function ${\zeta}$ on the line ${\{ 1+i\xi: \xi \in {\bf R}\}}$:

Theorem 3 (Breaking the parity barrier) Let ${\xi \in {\bf R}}$. Then there exists ${G \in C_c({\bf R})}$ such that ${G(t) e^{-it\xi}}$ is non-negative, and

$\displaystyle \|G\| < \|G\|_{L^1({\bf R})}.$

Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm ${\| \|}$ constructed in Theorem 1 is trivial, and thus

$\displaystyle \sum_n \frac{\Lambda(n)}{n} G( \log \frac{x}{n} ) = \int_{\bf R} G(t)\ dt + o(1)$

as ${x \rightarrow \infty}$ for any Schwartz function ${G}$ (the decay rate in ${o(1)}$ may depend on ${G}$). Specialising to functions of the form ${G(t) = e^{-t} \eta( e^{-t} )}$ for some smooth compactly supported ${\eta}$ on ${(0,+\infty)}$, we conclude that

$\displaystyle \sum_n \Lambda(n) \eta(\frac{n}{x}) = \int_{\bf R} \eta(u)\ du + o(x)$

as ${x \rightarrow \infty}$; by the smooth Urysohn lemma this implies that

$\displaystyle \sum_{\varepsilon x \leq n \leq x} \Lambda(n) = x - \varepsilon x + o(x)$

as ${x \rightarrow \infty}$ for any fixed ${\varepsilon>0}$, and the prime number theorem then follows by a telescoping series argument.

The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that

$\displaystyle \sum_{n \leq x} \Lambda(n) \chi(n) = o(x)$

for any fixed Dirichlet character ${\chi}$; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity ${L(1,\chi) = \sum_n \frac{\chi(n)}{n}}$ is non-vanishing.

One of the basic tools in modern combinatorics is the probabilistic method, introduced by Erdos, in which a deterministic solution to a given problem is shown to exist by constructing a random candidate for a solution, and showing that this candidate solves all the requirements of the problem with positive probability. When the problem requires a real-valued statistic ${X}$ to be suitably large or suitably small, the following trivial observation is often employed:

Proposition 1 (Comparison with mean) Let ${X}$ be a random real-valued variable, whose mean (or first moment) ${\mathop{\bf E} X}$ is finite. Then

$\displaystyle X \leq \mathop{\bf E} X$

with positive probability, and

$\displaystyle X \geq \mathop{\bf E} X$

with positive probability.

This proposition is usually applied in conjunction with a computation of the first moment ${\mathop{\bf E} X}$, in which case this version of the probabilistic method becomes an instance of the first moment method. (For comparison with other moment methods, such as the second moment method, exponential moment method, and zeroth moment method, see Chapter 1 of my book with Van Vu. For a general discussion of the probabilistic method, see the book by Alon and Spencer of the same name.)

As a typical example in random matrix theory, if one wanted to understand how small or how large the operator norm ${\|A\|_{op}}$ of a random matrix ${A}$ could be, one might first try to compute the expected operator norm ${\mathop{\bf E} \|A\|_{op}}$ and then apply Proposition 1; see this previous blog post for examples of this strategy (and related strategies, based on comparing ${\|A\|_{op}}$ with more tractable expressions such as the moments ${\hbox{tr} A^k}$). (In this blog post, all matrices are complex-valued.)

Recently, in their proof of the Kadison-Singer conjecture (and also in their earlier paper on Ramanujan graphs), Marcus, Spielman, and Srivastava introduced an striking new variant of the first moment method, suited in particular for controlling the operator norm ${\|A\|_{op}}$ of a Hermitian positive semi-definite matrix ${A}$. Such matrices have non-negative real eigenvalues, and so ${\|A\|_{op}}$ in this case is just the largest eigenvalue ${\lambda_1(A)}$ of ${A}$. Traditionally, one tries to control the eigenvalues through averaged statistics such as moments ${\hbox{tr} A^k = \sum_i \lambda_i(A)^k}$ or Stieltjes transforms ${\hbox{tr} (A-z)^{-1} = \sum_i (\lambda_i(A)-z)^{-1}}$; again, see this previous blog post. Here we use ${z}$ as short-hand for ${zI_d}$, where ${I_d}$ is the ${d \times d}$ identity matrix. Marcus, Spielman, and Srivastava instead rely on the interpretation of the eigenvalues ${\lambda_i(A)}$ of ${A}$ as the roots of the characteristic polynomial ${p_A(z) := \hbox{det}(z-A)}$ of ${A}$, thus

$\displaystyle \|A\|_{op} = \hbox{maxroot}( p_A ) \ \ \ \ \ (1)$

where ${\hbox{maxroot}(p)}$ is the largest real root of a non-zero polynomial ${p}$. (In our applications, we will only ever apply ${\hbox{maxroot}}$ to polynomials that have at least one real root, but for sake of completeness let us set ${\hbox{maxroot}(p)=-\infty}$ if ${p}$ has no real roots.)

Prior to the work of Marcus, Spielman, and Srivastava, I think it is safe to say that the conventional wisdom in random matrix theory was that the representation (1) of the operator norm ${\|A\|_{op}}$ was not particularly useful, due to the highly non-linear nature of both the characteristic polynomial map ${A \mapsto p_A}$ and the maximum root map ${p \mapsto \hbox{maxroot}(p)}$. (Although, as pointed out to me by Adam Marcus, some related ideas have occurred in graph theory rather than random matrix theory, for instance in the theory of the matching polynomial of a graph.) For instance, a fact as basic as the triangle inequality ${\|A+B\|_{op} \leq \|A\|_{op} + \|B\|_{op}}$ is extremely difficult to establish through (1). Nevertheless, it turns out that for certain special types of random matrices ${A}$ (particularly those in which a typical instance ${A}$ of this ensemble has a simple relationship to “adjacent” matrices in this ensemble), the polynomials ${p_A}$ enjoy an extremely rich structure (in particular, they lie in families of real stable polynomials, and hence enjoy good combinatorial interlacing properties) that can be surprisingly useful. In particular, Marcus, Spielman, and Srivastava established the following nonlinear variant of Proposition 1:

Proposition 2 (Comparison with mean) Let ${m,d \geq 1}$. Let ${A}$ be a random matrix, which is the sum ${A = \sum_{i=1}^m A_i}$ of independent Hermitian rank one ${d \times d}$ matrices ${A_i}$, each taking a finite number of values. Then

$\displaystyle \hbox{maxroot}(p_A) \leq \hbox{maxroot}( \mathop{\bf E} p_A )$

with positive probability, and

$\displaystyle \hbox{maxroot}(p_A) \geq \hbox{maxroot}( \mathop{\bf E} p_A )$

with positive probability.

We prove this proposition below the fold. The hypothesis that each ${A_i}$ only takes finitely many values is technical and can likely be relaxed substantially, but we will not need to do so here. Despite the superficial similarity with Proposition 1, the proof of Proposition 2 is quite nonlinear; in particular, one needs the interlacing properties of real stable polynomials to proceed. Another key ingredient in the proof is the observation that while the determinant ${\hbox{det}(A)}$ of a matrix ${A}$ generally behaves in a nonlinear fashion on the underlying matrix ${A}$, it becomes (affine-)linear when one considers rank one perturbations, and so ${p_A}$ depends in an affine-multilinear fashion on the ${A_1,\ldots,A_m}$. More precisely, we have the following deterministic formula, also proven below the fold:

Proposition 3 (Deterministic multilinearisation formula) Let ${A}$ be the sum of deterministic rank one ${d \times d}$ matrices ${A_1,\ldots,A_m}$. Then we have

$\displaystyle p_A(z) = \mu[A_1,\ldots,A_m](z) \ \ \ \ \ (2)$

for all ${z \in C}$, where the mixed characteristic polynomial ${\mu[A_1,\ldots,A_m](z)}$ of any ${d \times d}$ matrices ${A_1,\ldots,A_m}$ (not necessarily rank one) is given by the formula

$\displaystyle \mu[A_1,\ldots,A_m](z) \ \ \ \ \ (3)$

$\displaystyle = (\prod_{i=1}^m (1 - \frac{\partial}{\partial z_i})) \hbox{det}( z + \sum_{i=1}^m z_i A_i ) |_{z_1=\ldots=z_m=0}.$

Among other things, this formula gives a useful representation of the mean characteristic polynomial ${\mathop{\bf E} p_A}$:

Corollary 4 (Random multilinearisation formula) Let ${A}$ be the sum of jointly independent rank one ${d \times d}$ matrices ${A_1,\ldots,A_m}$. Then we have

$\displaystyle \mathop{\bf E} p_A(z) = \mu[ \mathop{\bf E} A_1, \ldots, \mathop{\bf E} A_m ](z) \ \ \ \ \ (4)$

for all ${z \in {\bf C}}$.

Proof: For fixed ${z}$, the expression ${\hbox{det}( z + \sum_{i=1}^m z_i A_i )}$ is a polynomial combination of the ${z_i A_i}$, while the differential operator ${(\prod_{i=1}^m (1 - \frac{\partial}{\partial z_i}))}$ is a linear combination of differential operators ${\frac{\partial^j}{\partial z_{i_1} \ldots \partial z_{i_j}}}$ for ${1 \leq i_1 < \ldots < i_j \leq d}$. As a consequence, we may expand (3) as a linear combination of terms, each of which is a multilinear combination of ${A_{i_1},\ldots,A_{i_j}}$ for some ${1 \leq i_1 < \ldots < i_j \leq d}$. Taking expectations of both sides of (2) and using the joint independence of the ${A_i}$, we obtain the claim. $\Box$

In view of Proposition 2, we can now hope to control the operator norm ${\|A\|_{op}}$ of certain special types of random matrices ${A}$ (and specifically, the sum of independent Hermitian positive semi-definite rank one matrices) by first controlling the mean ${\mathop{\bf E} p_A}$ of the random characteristic polynomial ${p_A}$. Pursuing this philosophy, Marcus, Spielman, and Srivastava establish the following result, which they then use to prove the Kadison-Singer conjecture:

Theorem 5 (Marcus-Spielman-Srivastava theorem) Let ${m,d \geq 1}$. Let ${v_1,\ldots,v_m \in {\bf C}^d}$ be jointly independent random vectors in ${{\bf C}^d}$, with each ${v_i}$ taking a finite number of values. Suppose that we have the normalisation

$\displaystyle \mathop{\bf E} \sum_{i=1}^m v_i v_i^* = 1$

where we are using the convention that ${1}$ is the ${d \times d}$ identity matrix ${I_d}$ whenever necessary. Suppose also that we have the smallness condition

$\displaystyle \mathop{\bf E} \|v_i\|^2 \leq \varepsilon$

for some ${\varepsilon>0}$ and all ${i=1,\ldots,m}$. Then one has

$\displaystyle \| \sum_{i=1}^m v_i v_i^* \|_{op} \leq (1+\sqrt{\varepsilon})^2 \ \ \ \ \ (5)$

with positive probability.

Note that the upper bound in (5) must be at least ${1}$ (by taking ${v_i}$ to be deterministic) and also must be at least ${\varepsilon}$ (by taking the ${v_i}$ to always have magnitude at least ${\sqrt{\varepsilon}}$). Thus the bound in (5) is asymptotically tight both in the regime ${\varepsilon\rightarrow 0}$ and in the regime ${\varepsilon \rightarrow \infty}$; the latter regime will be particularly useful for applications to Kadison-Singer. It should also be noted that if one uses more traditional random matrix theory methods (based on tools such as Proposition 1, as well as more sophisticated variants of these tools, such as the concentration of measure results of Rudelson and Ahlswede-Winter), one obtains a bound of ${\| \sum_{i=1}^m v_i v_i^* \|_{op} \ll_\varepsilon \log d}$ with high probability, which is insufficient for the application to the Kadison-Singer problem; see this article of Tropp. Thus, Theorem 5 obtains a sharper bound, at the cost of trading in “high probability” for “positive probability”.

In the paper of Marcus, Spielman and Srivastava, Theorem 5 is used to deduce a conjecture ${KS_2}$ of Weaver, which was already known to imply the Kadison-Singer conjecture; actually, a slight modification of their argument gives the paving conjecture of Kadison and Singer, from which the original Kadison-Singer conjecture may be readily deduced. We give these implications below the fold. (See also this survey article for some background on the Kadison-Singer problem.)

Let us now summarise how Theorem 5 is proven. In the spirit of semi-definite programming, we rephrase the above theorem in terms of the rank one Hermitian positive semi-definite matrices ${A_i := v_iv_i^*}$:

Theorem 6 (Marcus-Spielman-Srivastava theorem again) Let ${A_1,\ldots,A_m}$ be jointly independent random rank one Hermitian positive semi-definite ${d \times d}$ matrices such that the sum ${A :=\sum_{i=1}^m A_i}$ has mean

$\displaystyle \mathop{\bf E} A = I_d$

and such that

$\displaystyle \mathop{\bf E} \hbox{tr} A_i \leq \varepsilon$

for some ${\varepsilon>0}$ and all ${i=1,\ldots,m}$. Then one has

$\displaystyle \| A \|_{op} \leq (1+\sqrt{\varepsilon})^2$

with positive probability.

In view of (1) and Proposition 2, this theorem follows from the following control on the mean characteristic polynomial:

Theorem 7 (Control of mean characteristic polynomial) Let ${A_1,\ldots,A_m}$ be jointly independent random rank one Hermitian positive semi-definite ${d \times d}$ matrices such that the sum ${A :=\sum_{i=1}^m A_i}$ has mean

$\displaystyle \mathop{\bf E} A = 1$

and such that

$\displaystyle \mathop{\bf E} \hbox{tr} A_i \leq \varepsilon$

for some ${\varepsilon>0}$ and all ${i=1,\ldots,m}$. Then one has

$\displaystyle \hbox{maxroot}(\mathop{\bf E} p_A) \leq (1 +\sqrt{\varepsilon})^2.$

This result is proven using the multilinearisation formula (Corollary 4) and some convexity properties of real stable polynomials; we give the proof below the fold.

Thanks to Adam Marcus, Assaf Naor and Sorin Popa for many useful explanations on various aspects of the Kadison-Singer problem.

A finite group ${G=(G,\cdot)}$ is said to be a Frobenius group if there is a non-trivial subgroup ${H}$ of ${G}$ (known as the Frobenius complement of ${G}$) such that the conjugates ${gHg^{-1}}$ of ${H}$ are “disjoint as possible” in the sense that ${H \cap gHg^{-1} = \{1\}}$ whenever ${g \not \in H}$. This gives a decomposition

$\displaystyle G = \bigcup_{gH \in G/H} (gHg^{-1} \backslash \{1\}) \cup K \ \ \ \ \ (1)$

where the Frobenius kernel ${K}$ of ${G}$ is defined as the identity element ${1}$ together with all the non-identity elements that are not conjugate to any element of ${H}$. Taking cardinalities, we conclude that

$\displaystyle |G| = \frac{|G|}{|H|} (|H| - 1) + |K|$

and hence

$\displaystyle |H| |K| = |G|. \ \ \ \ \ (2)$

A remarkable theorem of Frobenius gives an unexpected amount of structure on ${K}$ and hence on ${G}$:

Theorem 1 (Frobenius’ theorem) Let ${G}$ be a Frobenius group with Frobenius complement ${H}$ and Frobenius kernel ${K}$. Then ${K}$ is a normal subgroup of ${G}$, and hence (by (2) and the disjointness of ${H}$ and ${K}$ outside the identity) ${G}$ is the semidirect product ${K \rtimes H}$ of ${H}$ and ${K}$.

I discussed Frobenius’ theorem and its proof in this recent blog post. This proof uses the theory of characters on a finite group ${G}$, in particular relying on the fact that a character on a subgroup ${H}$ can induce a character on ${G}$, which can then be decomposed into irreducible characters with natural number coefficients. Remarkably, even though a century has passed since Frobenius’ original argument, there is no proof known of this theorem which avoids character theory entirely; there are elementary proofs known when the complement ${H}$ has even order or when ${H}$ is solvable (we review both of these cases below the fold), which by the Feit-Thompson theorem does cover all the cases, but the proof of the Feit-Thompson theorem involves plenty of character theory (and also relies on Theorem 1). (The answers to this MathOverflow question give a good overview of the current state of affairs.)

I have been playing around recently with the problem of finding a character-free proof of Frobenius’ theorem. I didn’t succeed in obtaining a completely elementary proof, but I did find an argument which replaces character theory (which can be viewed as coming from the representation theory of the non-commutative group algebra ${{\bf C} G \equiv L^2(G)}$) with the Fourier analysis of class functions (i.e. the representation theory of the centre ${Z({\bf C} G) \equiv L^2(G)^G}$ of the group algebra), thus replacing non-commutative representation theory by commutative representation theory. This is not a particularly radical depature from the existing proofs of Frobenius’ theorem, but it did seem to be a new proof which was technically “character-free” (even if it was not all that far from character-based in spirit), so I thought I would record it here.

The main ideas are as follows. The space ${L^2(G)^G}$ of class functions can be viewed as a commutative algebra with respect to the convolution operation ${*}$; as the regular representation is unitary and faithful, this algebra contains no nilpotent elements. As such, (Gelfand-style) Fourier analysis suggests that one can analyse this algebra through the idempotents: class functions ${\phi}$ such that ${\phi*\phi = \phi}$. In terms of characters, idempotents are nothing more than sums of the form ${\sum_{\chi \in \Sigma} \chi(1) \chi}$ for various collections ${\Sigma}$ of characters, but we can perform a fair amount of analysis on idempotents directly without recourse to characters. In particular, it turns out that idempotents enjoy some important integrality properties that can be established without invoking characters: for instance, by taking traces one can check that ${\phi(1)}$ is a natural number, and more generally we will show that ${{\bf E}_{(a,b) \in S} {\bf E}_{x \in G} \phi( a x b^{-1} x^{-1} )}$ is a natural number whenever ${S}$ is a subgroup of ${G \times G}$ (see Corollary 4 below). For instance, the quantity

$\displaystyle \hbox{rank}(\phi) := {\bf E}_{a \in G} {\bf E}_{x \in G} \phi(a xa^{-1} x^{-1})$

is a natural number which we will call the rank of ${\phi}$ (as it is also the linear rank of the transformation ${f \mapsto f*\phi}$ on ${L^2(G)}$).

In the case that ${G}$ is a Frobenius group with kernel ${K}$, the above integrality properties can be used after some elementary manipulations to establish that for any idempotent ${\phi}$, the quantity

$\displaystyle \frac{1}{|G|} \sum_{a \in K} {\bf E}_{x \in G} \phi( axa^{-1}x^{-1} ) - \frac{1}{|G| |K|} \sum_{a,b \in K} \phi(ab^{-1}) \ \ \ \ \ (3)$

is an integer. On the other hand, one can also show by elementary means that this quantity lies between ${0}$ and ${\hbox{rank}(\phi)}$. These two facts are not strong enough on their own to impose much further structure on ${\phi}$, unless one restricts attention to minimal idempotents ${\phi}$. In this case spectral theory (or Gelfand theory, or the fundamental theorem of algebra) tells us that ${\phi}$ has rank one, and then the integrality gap comes into play and forces the quantity (3) to always be either zero or one. This can be used to imply that the convolution action of every minimal idempotent ${\phi}$ either preserves ${\frac{|G|}{|K|} 1_K}$ or annihilates it, which makes ${\frac{|G|}{|K|} 1_K}$ itself an idempotent, which makes ${K}$ normal.

Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

Theorem 1 (Mean ergodic theorem) Let ${(X,\mu,T)}$ be a measure-preserving system (a probability space ${(X,\mu)}$ with an invertible measure-preserving transformation ${T}$). Then for any ${f \in L^2(X,\mu)}$, the averages ${\frac{1}{N} \sum_{n=1}^N T^n f}$ converge in ${L^2(X,\mu)}$ norm as ${N \rightarrow \infty}$, where ${T^n f(x) := f(T^{-n} x)}$.

In this post, all functions in ${L^2(X,\mu)}$ and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of ${f}$ to the ${T}$-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

Theorem 2 (von Neumann mean ergodic theorem) Let ${H}$ be a Hilbert space, and let ${U: H \rightarrow H}$ be a unitary operator on ${H}$. Then for any ${f \in H}$, the averages ${\frac{1}{N} \sum_{n=1}^N U^n f}$ converge strongly in ${H}$ as ${N \rightarrow \infty}$.

Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

Now we turn to Walsh’s theorem.

Theorem 3 (Walsh’s convergence theorem) Let ${(X,\mu)}$ be a measure space with a measure-preserving action of a nilpotent group ${G}$. Let ${g_1,\ldots,g_k: {\bf Z} \rightarrow G}$ be polynomial sequences in ${G}$ (i.e. each ${g_i}$ takes the form ${g_i(n) = a_{i,1}^{p_{i,1}(n)} \ldots a_{i,j}^{p_{i,j}(n)}}$ for some ${a_{i,1},\ldots,a_{i,j} \in G}$ and polynomials ${p_{i,1},\ldots,p_{i,j}: {\bf Z} \rightarrow {\bf Z}}$). Then for any ${f_1,\ldots,f_k \in L^\infty(X,\mu)}$, the averages ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$ converge in ${L^2(X,\mu)}$ norm as ${N \rightarrow \infty}$, where ${g(n) f(x) := f(g(n)^{-1} x)}$.

It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand ${(g_1(n) f_1) \ldots (g_k(n) f_k)}$, one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra ${L^\infty(X,\mu)}$. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space ${({\mathcal A},\tau)}$, which for us will be a commutative unital algebra ${{\mathcal A}}$ over the reals together with a linear functional ${\tau: {\mathcal A} \rightarrow {\bf R}}$ which maps ${1}$ to ${1}$ and obeys the non-negativity axiom ${\tau(f^2) \ge 0}$ for all ${f}$. The key example to keep in mind here is ${{\mathcal A} = L^\infty(X,\mu)}$ of essentially bounded real-valued measurable functions with the supremum norm, and with the trace ${\tau(f) := \int_X f\ d\mu}$. We will also assume in our definition of commutative probability spaces that all elements ${f}$ of ${{\mathcal A}}$ are bounded in the sense that the spectral radius ${\rho(f) := \lim_{k \rightarrow \infty} \tau(f^{2k})^{1/2k}}$ is finite. (In the concrete case of ${L^\infty(X,\mu)}$, the spectral radius is just the ${L^\infty}$ norm.)

Given a commutative probability space, we can form an inner product ${\langle, \rangle_{L^2(\tau)}}$ on it by the formula

$\displaystyle \langle f, g \rangle_{L^2(\tau)} := \tau(fg).$

This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on ${{\mathcal A}}$. We could complete this structure into a Hilbert space ${L^2(\tau)}$ (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing ${L^2(\tau)}$ as providing a semi-metric on ${{\mathcal A}}$. For future reference we record the inequalities

$\displaystyle \rho(fg) \leq \rho(f) \rho(g)$

$\displaystyle \rho(f+g) \leq \rho(f) + \rho(g)$

$\displaystyle \| fg\|_{L^2(\tau)} \leq \|f\|_{L^2(\tau)} \rho(g)$

for any ${f,g}$, which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the ${L^\infty(X,\mu)}$ case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

The abstract version of Theorem 3 is then

Theorem 4 (Walsh’s theorem, abstract version) Let ${({\mathcal A},\tau)}$ be a commutative probability space, and let ${G}$ be a nilpotent group acting on ${{\mathcal A}}$ by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and ${L^2(\tau)}$ norm). Let ${g_1,\ldots,g_k: {\bf Z} \rightarrow G}$ be polynomial sequences. Then for any ${f_1,\ldots,f_k \in {\mathcal A}}$, the averages ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$ form a Cauchy sequence in ${L^2(\tau)}$ (semi-)norm as ${N \rightarrow \infty}$.

It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression ${\|\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)\|_{L^2(\tau)}^2}$ can be viewed as the inner product of (say) ${f_k}$ with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split ${f_k}$ into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$, except with the polynomials ${g_1,\ldots,g_k}$ replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.

One of the basic problems in the field of operator algebras is to develop a functional calculus for either a single operator ${A}$, or a collection ${A_1, A_2, \ldots, A_k}$ of operators. These operators could in principle act on any function space, but typically one either considers complex matrices (which act on a complex finite dimensional space), or operators (either bounded or unbounded) on a complex Hilbert space. (One can of course also obtain analogous results for real operators, but we will work throughout with complex operators in this post.)

Roughly speaking, a functional calculus is a way to assign an operator ${F(A)}$ or ${F(A_1,\ldots,A_k)}$ to any function ${F}$ in a suitable function space, which is linear over the complex numbers, preserve the scalars (i.e. ${c(A) = c}$ when ${c \in {\bf C}}$), and should be either an exact or approximate homomorphism in the sense that

$\displaystyle FG(A_1,\ldots,A_k) = F(A_1,\ldots,A_k) G(A_1,\ldots,A_k), \ \ \ \ \ (1)$

should hold either exactly or approximately. In the case when the ${A_i}$ are self-adjoint operators acting on a Hilbert space (or Hermitian matrices), one often also desires the identity

$\displaystyle \overline{F}(A_1,\ldots,A_k) = F(A_1,\ldots,A_k)^* \ \ \ \ \ (2)$

to also hold either exactly or approximately. (Note that one cannot reasonably expect (1) and (2) to hold exactly for all ${F,G}$ if the ${A_1,\ldots,A_k}$ and their adjoints ${A_1^*,\ldots,A_k^*}$ do not commute with each other, so in those cases one has to be willing to allow some error terms in the above wish list of properties of the calculus.) Ideally, one should also be able to relate the operator norm of ${f(A)}$ or ${f(A_1,\ldots,A_k)}$ with something like the uniform norm on ${f}$. In principle, the existence of a good functional calculus allows one to manipulate operators as if they were scalars (or at least approximately as if they were scalars), which is very helpful for a number of applications, such as partial differential equations, spectral theory, noncommutative probability, and semiclassical mechanics. A functional calculus for multiple operators ${A_1,\ldots,A_k}$ can be particularly valuable as it allows one to treat ${A_1,\ldots,A_k}$ as being exact or approximate scalars simultaneously. For instance, if one is trying to solve a linear differential equation that can (formally at least) be expressed in the form

$\displaystyle F(X,D) u = f$

for some data ${f}$, unknown function ${u}$, some differential operators ${X,D}$, and some nice function ${F}$, then if one’s functional calculus is good enough (and ${F}$ is suitably “elliptic” in the sense that it does not vanish or otherwise degenerate too often), one should be able to solve this equation either exactly or approximately by the formula

$\displaystyle u = F^{-1}(X,D) f,$

which is of course how one would solve this equation if one pretended that the operators ${X,D}$ were in fact scalars. Formalising this calculus rigorously leads to the theory of pseudodifferential operators, which allows one to (approximately) solve or at least simplify a much wider range of differential equations than one what can achieve with more elementary algebraic transformations (e.g. integrating factors, change of variables, variation of parameters, etc.). In quantum mechanics, a functional calculus that allows one to treat operators as if they were approximately scalar can be used to rigorously justify the correspondence principle in physics, namely that the predictions of quantum mechanics approximate that of classical mechanics in the semiclassical limit ${\hbar \rightarrow 0}$.

There is no universal functional calculus that works in all situations; the strongest functional calculi, which are close to being an exact *-homomorphisms on very large class of functions, tend to only work for under very restrictive hypotheses on ${A}$ or ${A_1,\ldots,A_k}$ (in particular, when ${k > 1}$, one needs the ${A_1,\ldots,A_k}$ to commute either exactly, or very close to exactly), while there are weaker functional calculi which have fewer nice properties and only work for a very small class of functions, but can be applied to quite general operators ${A}$ or ${A_1,\ldots,A_k}$. In some cases the functional calculus is only formal, in the sense that ${f(A)}$ or ${f(A_1,\ldots,A_k)}$ has to be interpreted as an infinite formal series that does not converge in a traditional sense. Also, when one wishes to select a functional calculus on non-commuting operators ${A_1,\ldots,A_k}$, there is a certain amount of non-uniqueness: one generally has a number of slightly different functional calculi to choose from, which generally have the same properties but differ in some minor technical details (particularly with regards to the behaviour of “lower order” components of the calculus). This is a similar to how one has a variety of slightly different coordinate systems available to parameterise a Riemannian manifold or Lie group. This is on contrast to the ${k=1}$ case when the underlying operator ${A = A_1}$ is (essentially) normal (so that ${A}$ commutes with ${A^*}$); in this special case (which includes the important subcases when ${A}$ is unitary or (essentially) self-adjoint), spectral theory gives us a canonical and very powerful functional calculus which can be used without further modification in applications.

Despite this lack of uniqueness, there is one standard choice for a functional calculus available for general operators ${A_1,\ldots,A_k}$, namely the Weyl functional calculus; it is analogous in some ways to normal coordinates for Riemannian manifolds, or exponential coordinates of the first kind for Lie groups, in that it treats lower order terms in a reasonably nice fashion. (But it is important to keep in mind that, like its analogues in Riemannian geometry or Lie theory, there will be some instances in which the Weyl calculus is not the optimal calculus to use for the application at hand.)

I decided to write some notes on the Weyl functional calculus (also known as Weyl quantisation), and to sketch the applications of this calculus both to the theory of pseudodifferential operators. They are mostly for my own benefit (so that I won’t have to redo these particular calculations again), but perhaps they will also be of interest to some readers here. (Of course, this material is also covered in many other places. e.g. Folland’s “harmonic analysis in phase space“.)

Let ${A, B}$ be two Hermitian ${n \times n}$ matrices. When ${A}$ and ${B}$ commute, we have the identity

$\displaystyle e^{A+B} = e^A e^B.$

When ${A}$ and ${B}$ do not commute, the situation is more complicated; we have the Baker-Campbell-Hausdorff formula

$\displaystyle e^{A+B} = e^A e^B e^{-\frac{1}{2}[A,B]} \ldots$

where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity

$\displaystyle \hbox{det}(e^{A+B}) = \hbox{det}(e^A e^B).$

Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between ${e^{A+B}}$ and ${e^A e^B}$, namely the Golden-Thompson inequality

$\displaystyle \hbox{tr}(e^{A+B}) \leq \hbox{tr}(e^A e^B). \ \ \ \ \ (1)$

The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices ${A, B}$ are required. Note that the right-hand side can be rearranged using the cyclic property of trace as ${\hbox{tr}( e^{B/2} e^A e^{B/2} )}$; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as ${\hbox{tr}(e^A e^B e^C)}$ need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.

To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in ${A, B}$. The left-hand side expands as

$\displaystyle \hbox{tr} 1 + \hbox{tr} (A+B) + \frac{1}{2} \hbox{tr} (A^2 + AB + BA + B^2) + \frac{1}{6} \hbox{tr} (A+B)^3$

$\displaystyle + \frac{1}{24} \hbox{tr} (A+B)^4 + \ldots$

while the right-hand side expands as

$\displaystyle \hbox{tr} 1 + \hbox{tr} (A+B) + \frac{1}{2} \hbox{tr} (A^2 + 2AB + B^2)$

$\displaystyle + \frac{1}{6} \hbox{tr} (A^3 + 3A^2 B + 3 A B^2+B^3) +$

$\displaystyle \frac{1}{24} \hbox{tr} (A^4 + 4 A^3 B + 6 A^2 B^2 + 4 A B^3 +B^4) + \ldots$

Using the cyclic property of trace ${\hbox{tr}(AB) = \hbox{tr}(BA)}$, one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out ${(A+B)^4}$ and using the cyclic property of trace as much as possible, we see that the fourth order terms almost agree, but the left-hand side contains a term ${\frac{1}{12} \hbox{tr}(ABAB)}$ whose counterpart on the right-hand side is ${\frac{1}{12} \hbox{tr}(ABBA)}$. The difference between the two can be factorised (again using the cyclic property of trace) as ${-\frac{1}{24} \hbox{tr} [A,B]^2}$. Since ${[A,B] := AB-BA}$ is skew-Hermitian, ${-[A,B]^2}$ is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)

Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair ${A, B}$ of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the ${A}$ factors lie on one side of a product and all the ${B}$ factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.

In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)

The basic result in this area is

Theorem 1 (Circular law) Let ${M_n}$ be an ${n \times n}$ iid matrix, whose entries ${\xi_{ij}}$, ${1 \leq i,j \leq n}$ are iid with a fixed (complex) distribution ${\xi_{ij} \equiv \xi}$ of mean zero and variance one. Then the spectral measure ${\mu_{\frac{1}{\sqrt{n}} M_n}}$ converges both in probability and almost surely to the circular law ${\mu_{circ} := \frac{1}{\pi} 1_{|x|^2+|y|^2 \leq 1}\ dx dy}$, where ${x, y}$ are the real and imaginary coordinates of the complex plane.

This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts ${\frac{1}{\sqrt{n}} M_n - zI}$, was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.

At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.