You are currently browsing the tag archive for the ‘determinants’ tag.

The determinant {\det(A)} of a square matrix {A} obeys a large number of important identities, the most basic of which is the multiplicativity property

\displaystyle  \det(AB) = \det(A) \det(B) \ \ \ \ \ (1)

whenever {A,B} are square matrices of the same dimension. This identity then generates many other important identities. For instance, if {A} is an {n \times m} matrix and {B} is an {m \times n} matrix, then by applying the previous identity to equate the determinants of {\begin{pmatrix} 1 & -A \\ B & 1 \end{pmatrix} \begin{pmatrix} 1 & A \\ 0 & 1 \end{pmatrix}} and {\begin{pmatrix} 1 & A \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & -A \\ B & 1 \end{pmatrix}} (where we will adopt the convention that {1} denotes an identity matrix of whatever dimension is needed to make sense of the expressions being computed, and similarly for {0}) we obtain the Sylvester determinant identity

\displaystyle  \det( 1 + AB ) = \det( 1 + BA ). \ \ \ \ \ (2)

This identity, which relates an {n \times n} determinant with an {m \times m} determinant, is very useful in random matrix theory (a point emphasised in particular by Deift), particularly in regimes in which {m} is much smaller than {n}.

Another identity generated from (1) arises when trying to compute the determinant of a {(n+m) \times (n+m)} block matrix

\displaystyle  \begin{pmatrix} A & B \\ C & D \end{pmatrix}

where {A} is an {n \times n} matrix, {B} is an {n \times m} matrix, {C} is an {m \times n} matrix, and {D} is an {m \times m} matrix. If {A} is invertible, then we can manipulate this matrix via block Gaussian elimination as

\displaystyle  \begin{pmatrix} A & B \\ C & D \end{pmatrix} = \begin{pmatrix} A & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & A^{-1} B \\ C & D \end{pmatrix}

\displaystyle  = \begin{pmatrix} A & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ C & 1 \end{pmatrix} \begin{pmatrix} 1 & A^{-1} B \\ 0 & D - C A^{-1} B \end{pmatrix}

and on taking determinants using (1) we obtain the Schur determinant identity

\displaystyle  \det \begin{pmatrix} A & B \\ C & D \end{pmatrix} = \det(A) \det( D - C A^{-1} B ) \ \ \ \ \ (3)

relating the determinant of a block-diagonal matrix with the determinant of the Schur complement {D-C A^{-1} B} of the upper left block {A}. This identity can be viewed as the correct way to generalise the {2 \times 2} determinant formula

\displaystyle  \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad-bc = a ( d - c a^{-1} b).

It is also possible to use determinant identities to deduce other matrix identities that do not involve the determinant, by the technique of matrix differentiation (or equivalently, matrix linearisation). The key observation is that near the identity, the determinant behaves like the trace, or more precisely one has

\displaystyle  \det( 1 + \epsilon A ) = 1 + \epsilon \hbox{tr}(A) + O(\epsilon^2) \ \ \ \ \ (4)

for any bounded square matrix {A} and infinitesimal {\epsilon}. (If one is uncomfortable with infinitesimals, one can interpret this sort of identity as an asymptotic as {\epsilon\rightarrow 0}.) Combining this with (1) we see that for square matrices {A,B} of the same dimension with {A} invertible and {A^{-1}, B} invertible, one has

\displaystyle  \det( A + \epsilon B ) = \det(A) \det(1 + \epsilon A^{-1} B )

\displaystyle = \det(A) (1 + \epsilon \hbox{tr}( A^{-1} B ) + O(\epsilon^2) )

for infinitesimal {\epsilon}. To put it another way, if {A(t)} is a square matrix that depends in a differentiable fashion on a real parameter {t}, then

\displaystyle  \frac{d}{dt} \det(A(t)) = \det(A(t)) \hbox{tr}( A(t)^{-1} \frac{d}{dt} A(t) )

whenever {A(t)} is invertible. (Note that if one combines this identity with cofactor expansion, one recovers Cramer’s rule.)

Let us see some examples of this differentiation method. If we take the Sylvester identity (2) and multiply one of the rectangular matrices {A} by an infinitesimal {\epsilon}, we obtain

\displaystyle  \det( 1 + \epsilon A B ) = \det( 1 + \epsilon B A);

applying (4) and extracting the linear term in {\epsilon} (or equivalently, differentiating at {\epsilon} and then setting {\epsilon=0}) we conclude the cyclic property of trace:

\displaystyle  \hbox{tr}(AB) = \hbox{tr}(BA).

To manipulate derivatives and inverses, we begin with the Neumann series approximation

\displaystyle  (1 + \epsilon A)^{-1} = 1 - \epsilon A + O(\epsilon^2)

for bounded square {A} and infinitesimal {\epsilon}, which then leads to the more general approximation

\displaystyle  (A + \epsilon B)^{-1} = (1 + \epsilon A^{-1} B)^{-1} A^{-1}

\displaystyle  = A^{-1} - \epsilon A^{-1} B A^{-1} + O(\epsilon^2) \ \ \ \ \ (5)

for square matrices {A,B} of the same dimension with {B, A^{-1}} bounded. To put it another way, we have

\displaystyle  \frac{d}{dt} A(t)^{-1} = -A(t)^{-1} (\frac{d}{dt} A(t)) A(t)^{-1}

whenever {A(t)} depends in a differentiable manner on {t} and {A(t)} is invertible.

We can then differentiate (or linearise) the Schur identity (3) in a number of ways. For instance, if we replace the lower block {D} by {D + \epsilon H} for some test {m \times m} matrix {H}, then by (4), the left-hand side of (3) becomes (assuming the invertibility of the block matrix)

\displaystyle  (\det \begin{pmatrix} A & B \\ C & D \end{pmatrix}) (1 + \epsilon \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} \begin{pmatrix} 0 & 0 \\ 0 & H \end{pmatrix} + O(\epsilon^2) )

while the right-hand side becomes

\displaystyle  \det(A) \det(D-CA^{-1}B) (1 + \epsilon \hbox{tr}( (D-CA^{-1}B)^{-1} H ) + O(\epsilon^2) );

extracting the linear term in {\epsilon} (after dividing through by (3)), we conclude that

\displaystyle  \hbox{tr} (\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} \begin{pmatrix} 0 & 0 \\ 0 & H \end{pmatrix}) = \hbox{tr}( (D-CA^{-1}B)^{-1} H ).

As {H} was an arbitrary {m \times m} matrix, we conclude from duality that the lower right {m \times m} block of {\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1}} is given by the inverse {(D-CA^{-1}B)^{-1}} of the Schur complement:

\displaystyle  \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \begin{pmatrix} ?? & ?? \\ ?? & (D-CA^{-1}B)^{-1} \end{pmatrix}.

One can also compute the other components of this inverse in terms of the Schur complement {D-CA^{-1} B} by a similar method (although the formulae become more complicated). As a variant of this method, we can perturb the block matrix in (3) by an infinitesimal multiple of the identity matrix giving

\displaystyle  \det \begin{pmatrix} A+\epsilon & B \\ C & D+\epsilon \end{pmatrix} = \det(A+\epsilon) \det( D +\epsilon - C (A+\epsilon)^{-1} B ). \ \ \ \ \ (6)

By (4), the left-hand side is

\displaystyle  (\det \begin{pmatrix} A & B \\ C & D \end{pmatrix}) (1 + \epsilon \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} + O(\epsilon^2) ).

From (5), we have

\displaystyle  D + \epsilon - C (A+ \epsilon)^{-1} B = D - C A^{-1} B + \epsilon(1 + C A^{-2} B) + O(\epsilon^2)

and so from (4) the right-hand side of (6) is

\displaystyle  \det(A) \det(D-CA^{-1} B) \times

\displaystyle  \times ( 1 + \epsilon (\hbox{tr}(A^{-1}) + \hbox{tr}( (D-CA^{-1} B)^{-1} (1 + C A^{-2} B)) ) + O(\epsilon^2) );

extracting the linear component in {\epsilon}, we conclude the identity

\displaystyle  \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \hbox{tr}(A^{-1}) + \hbox{tr}( (D-CA^{-1} B)^{-1} (1 + C A^{-2} B)) \ \ \ \ \ (7)

which relates the trace of the inverse of a block matrix, with the trace of the inverse of one of its blocks. This particular identity turns out to be useful in random matrix theory; I hope to elaborate on this in a later post.

As a final example of this method, we can analyse low rank perturbations {A+BC} of a large ({n \times n}) matrix {A}, where {B} is an {n \times m} matrix and {C} is an {m \times n} matrix for some {m<n}. (This type of situation is also common in random matrix theory, for instance it arose in this previous paper of mine on outliers to the circular law.) If {A} is invertible, then from (1) and (2) one has the matrix determinant lemma

\displaystyle  \det( A + BC ) = \det(A) \det( 1 + A^{-1} BC) = \det(A) \det(1 + CA^{-1} B);

if one then perturbs {A} by an infinitesimal matrix {\epsilon H}, we have

\displaystyle  \det( A + BC + \epsilon H ) = \det(A + \epsilon H ) \det(1 + C(A+\epsilon H)^{-1} B).

Extracting the linear component in {\epsilon} as before, one soon arrives at

\displaystyle  \hbox{tr}( (A+BC)^{-1} H ) = \hbox{tr}( A^{-1} H ) - \hbox{tr}( (1 + C A^{-1} B)^{-1} C A^{-1} H A^{-1} B )

assuming that {A} and {A+BC} are both invertible; as {H} is arbitrary, we conclude (after using the cyclic property of trace) the Sherman-Morrison formula

\displaystyle  (A+BC)^{-1} = A^{-1} - A^{-1} B (1 + C A^{-1} B)^{-1} C A^{-1}

for the inverse of a low rank perturbation {A+BC} of a matrix {A}. While this identity can be easily verified by direct algebraic computation, it is somewhat difficult to discover this identity by such algebraic manipulation; thus we see that the “determinant first” approach to matrix identities can make it easier to find appropriate matrix identities (particularly those involving traces and/or inverses), even if the identities one is ultimately interested in do not involve determinants. (As differentiation typically makes an identity lengthier, but also more “linear” or “additive”, the determinant identity tends to be shorter (albeit more nonlinear and more multiplicative) than the differentiated identity, and can thus be slightly easier to derive.)

Exercise 1 Use the “determinant first” approach to derive the Woodbury matrix identity (also known as the binomial inverse theorem)

\displaystyle  (A+BDC)^{-1} = A^{-1} - A^{-1} B (D^{-1} + CA^{-1} B)^{-1} C A^{-1}

where {A} is an {n \times n} matrix, {B} is an {n \times m} matrix, {C} is an {m \times n} matrix, and {D} is an {m \times m} matrix, assuming that {A}, {D} and {A+BDC} are all invertible.

Exercise 2 Let {A,B} be invertible {n \times n} matrices. Establish the identity

\displaystyle  \det(A + B) \det(A - B) = \det(B) \det( AB^{-1} A - B)

and differentiate this in {A} to deduce the identity

\displaystyle  (A+B)^{-1} + (A-B)^{-1} = 2 (A - BA^{-1} B)^{-1}

(assuming that all inverses exist) and hence

\displaystyle  (A+B)^{-1} = (A - BA^{-1} B)^{-1} - (B - AB^{-1} A)^{-1}.

Rotating {B} by {i} then gives

\displaystyle  (A+iB)^{-1} = (A + BA^{-1} B)^{-1} - i (B + AB^{-1} A)^{-1},

which is useful for inverting a matrix {A+iB} that has been split into a self-adjoint component {A} and a skew-adjoint component {iB}.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: Universality of local spectral statistics of non-Hermitian matrices“. The main result of this paper is a “Four Moment Theorem” that establishes universality for local spectral statistics of non-Hermitian matrices with independent entries, under the additional hypotheses that the entries of the matrix decay exponentially, and match moments with either the real or complex gaussian ensemble to fourth order. This is the non-Hermitian analogue of a long string of recent results establishing universality of local statistics in the Hermitian case (as discussed for instance in this recent survey of Van and myself, and also in several other places).

The complex case is somewhat easier to describe. Given a (non-Hermitian) random matrix ensemble {M_n} of {n \times n} matrices, one can arbitrarily enumerate the (geometric) eigenvalues as {\lambda_1(M_n),\ldots,\lambda_n(M_n) \in {\bf C}}, and one can then define the {k}-point correlation functions {\rho^{(k)}_n: {\bf C}^k \rightarrow {\bf R}^+} to be the symmetric functions such that

\displaystyle  \int_{{\bf C}^k} F(z_1,\ldots,z_k) \rho^{(k)}_n(z_1,\ldots,z_k)\ dz_1 \ldots dz_k

\displaystyle  = {\bf E} \sum_{1 \leq i_1 < \ldots < i_k \leq n} F(\lambda_1(M_n),\ldots,\lambda_k(M_n)).

In the case when {M_n} is drawn from the complex gaussian ensemble, so that all the entries are independent complex gaussians of mean zero and variance one, it is a classical result of Ginibre that the asymptotics of {\rho^{(k)}_n} near some point {z \sqrt{n}} as {n \rightarrow \infty} and {z \in {\bf C}} is fixed are given by the determinantal rule

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow \hbox{det}( K(w_i,w_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)

for {|z| < 1} and

\displaystyle  \rho^{(k)}_n(z\sqrt{n} + w_1,\ldots,z\sqrt{n}+w_k) \rightarrow 0

for {|z| > 1}, where {K} is the reproducing kernel

\displaystyle  K(z,w) := \frac{1}{\pi} e^{-|z|^2/2 - |w|^2/2 + z \overline{w}}.

(There is also an asymptotic for the boundary case {|z|=1}, but it is more complicated to state.) In particular, we see that {\rho^{(k)}_n(z \sqrt{n}) \rightarrow \frac{1}{\pi} 1_{|z| \leq 1}} for almost every {z}, which is a manifestation of the well-known circular law for these matrices; but the circular law only captures the macroscopic structure of the spectrum, whereas the asymptotic (1) describes the microscopic structure.

Our first main result is that the asymptotic (1) for {|z|<1} also holds (in the sense of vague convergence) when {M_n} is a matrix whose entries are independent with mean zero, variance one, exponentially decaying tails, and which all match moments with the complex gaussian to fourth order. (Actually we prove a stronger result than this which is valid for all bounded {z} and has more uniform bounds, but is a bit more technical to state.) An analogous result is also established for real gaussians (but now one has to separate the correlation function into components depending on how many eigenvalues are real and how many are strictly complex; also, the limiting distribution is more complicated, being described by Pfaffians rather than determinants). Among other things, this allows us to partially extend some known results on complex or real gaussian ensembles to more general ensembles. For instance, there is a central limit theorem of Rider which establishes a central limit theorem for the number of eigenvalues of a complex gaussian matrix in a mesoscopic disk; from our results, we can extend this central limit theorem to matrices that match the complex gaussian ensemble to fourth order, provided that the disk is small enough (for technical reasons, our error bounds are not strong enough to handle large disks). Similarly, extending some results of Edelman-Kostlan-Shub and of Forrester-Nagao, we can show that for a matrix matching the real gaussian ensemble to fourth order, the number of real eigenvalues is {\sqrt{\frac{2n}{\pi}} + O(n^{1/2-c})} with probability {1-O(n^{-c})} for some absolute constant {c>0}.

There are several steps involved in the proof. The first step is to apply the Girko Hermitisation trick to replace the problem of understanding the spectrum of a non-Hermitian matrix, with that of understanding the spectrum of various Hermitian matrices. The two identities that realise this trick are, firstly, Jensen’s formula

\displaystyle  \log |\det(M_n-z_0)| = - \sum_{1 \leq i \leq n: \lambda_i(M_n) \in B(z_0,r)} \log \frac{r}{|\lambda_i(M_n)-z_0|}

\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |\det(M_n-z_0-re^{i\theta})|\ d\theta

that relates the local distribution of eigenvalues to the log-determinants {\log |\det(M_n-z_0)|}, and secondly the elementary identity

\displaystyle  \log |\det(M_n - z)| = \frac{1}{2} \log|\det W_{n,z}| + \frac{1}{2} n \log n

that relates the log-determinants of {M_n-z} to the log-determinants of the Hermitian matrices

\displaystyle  W_{n,z} := \frac{1}{\sqrt{n}} \begin{pmatrix} 0 & M_n -z \\ (M_n-z)^* & 0 \end{pmatrix}.

The main difficulty is then to obtain concentration and universality results for the Hermitian log-determinants {\log|\det W_{n,z}|}. This turns out to be a task that is analogous to the task of obtaining concentration for Wigner matrices (as we did in this recent paper), as well as central limit theorems for log-determinants of Wigner matrices (as we did in this other recent paper). In both of these papers, the main idea was to use the Four Moment Theorem for Wigner matrices (which can now be proven relatively easily by a combination of the local semi-circular law and resolvent swapping methods), combined with (in the latter paper) a central limit theorem for the gaussian unitary ensemble (GUE). This latter task was achieved by using the convenient Trotter normal form to tridiagonalise a GUE matrix, which has the effect of revealing the determinant of that matrix as the solution to a certain linear stochastic difference equation, and one can analyse the distribution of that solution via such tools as the martingale central limit theorem.

The matrices {W_{n,z}} are somewhat more complicated than Wigner matrices (for instance, the semi-circular law must be replaced by a distorted Marchenko-Pastur law), but the same general strategy works to obtain concentration and universality for their log-determinants. The main new difficulty that arises is that the analogue of the Trotter norm for gaussian random matrices is not tridiagonal, but rather Hessenberg (i.e. upper-triangular except for the lower diagonal). This ultimately has the effect of expressing the relevant determinant as the solution to a nonlinear stochastic difference equation, which is a bit trickier to solve for. Fortunately, it turns out that one only needs good lower bounds on the solution, as one can use the second moment method to upper bound the determinant and hence the log-determinant (following a classical computation of Turan). This simplifies the analysis on the equation somewhat.

While this result is the first local universality result in the category of random matrices with independent entries, there are still two limitations to the result which one would like to remove. The first is the moment matching hypotheses on the matrix. Very recently, one of the ingredients of our paper, namely the local circular law, was proved without moment matching hypotheses by Bourgade, Yau, and Yin (provided one stays away from the edge of the spectrum); however, as of this time of writing the other main ingredient – the universality of the log-determinant – still requires moment matching. (The standard tool for obtaining universality without moment matching hypotheses is the heat flow method (and more specifically, the local relaxation flow method), but the analogue of Dyson Brownian motion in the non-Hermitian setting appears to be somewhat intractible, being a coupled flow on both the eigenvalues and eigenvectors rather than just on the eigenvalues alone.)

My colleague Ricardo Pérez-Marco showed me a very cute proof of Pythagoras’ theorem, which I thought I would share here; it’s not particularly earth-shattering, but it is perhaps the most intuitive proof of the theorem that I have seen yet.

In the above diagram, a, b, c are the lengths BC, CA, and AB of the right-angled triangle ACB, while x and y are the areas of the right-angled triangles CDB and ADC respectively. Thus the whole triangle ACB has area x+y.

Now observe that the right-angled triangles CDB, ADC, and ACB are all similar (because of all the common angles), and thus their areas are proportional to the square of their respective hypotenuses. In other words, (x,y,x+y) is proportional to (a^2, b^2, c^2). Pythagoras’ theorem follows.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,868 other followers