You are currently browsing the tag archive for the ‘matrices’ tag.

While talking mathematics with a postdoc here at UCLA (March Boedihardjo) we came across the following matrix problem which we managed to solve, but the proof was cute and the process of discovering it was fun, so I thought I would present the problem here as a puzzle without revealing the solution for now.

The problem involves word maps on a matrix group, which for sake of discussion we will take to be the special orthogonal group $SO(3)$ of real $3 \times 3$ matrices (one of the smallest matrix groups that contains a copy of the free group, which incidentally is the key observation powering the Banach-Tarski paradox).  Given any abstract word $w$ of two generators $x,y$ and their inverses (i.e., an element of the free group ${\bf F}_2$), one can define the word map $w: SO(3) \times SO(3) \to SO(3)$ simply by substituting a pair of matrices in $SO(3)$ into these generators.  For instance, if one has the word $w = x y x^{-2} y^2 x$, then the corresponding word map $w: SO(3) \times SO(3) \to SO(3)$ is given by

$\displaystyle w(A,B) := ABA^{-2} B^2 A$

for $A,B \in SO(3)$.  Because $SO(3)$ contains a copy of the free group, we see the word map is non-trivial (not equal to the identity) if and only if the word itself is nontrivial.

Anyway, here is the problem:

Problem. Does there exist a sequence $w_1, w_2, \dots$ of non-trivial word maps $w_n: SO(3) \times SO(3) \to SO(3)$ that converge uniformly to the identity map?

To put it another way, given any $\varepsilon > 0$, does there exist a non-trivial word $w$ such that $\|w(A,B) - 1 \| \leq \varepsilon$ for all $A,B \in SO(3)$, where $\| \|$ denotes (say) the operator norm, and $1$ denotes the identity matrix in $SO(3)$?

As I said, I don’t want to spoil the fun of working out this problem, so I will leave it as a challenge. Readers are welcome to share their thoughts, partial solutions, or full solutions in the comments below.

The determinant ${\det(A)}$ of a square matrix ${A}$ obeys a large number of important identities, the most basic of which is the multiplicativity property

$\displaystyle \det(AB) = \det(A) \det(B) \ \ \ \ \ (1)$

whenever ${A,B}$ are square matrices of the same dimension. This identity then generates many other important identities. For instance, if ${A}$ is an ${n \times m}$ matrix and ${B}$ is an ${m \times n}$ matrix, then by applying the previous identity to equate the determinants of ${\begin{pmatrix} 1 & -A \\ B & 1 \end{pmatrix} \begin{pmatrix} 1 & A \\ 0 & 1 \end{pmatrix}}$ and ${\begin{pmatrix} 1 & A \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & -A \\ B & 1 \end{pmatrix}}$ (where we will adopt the convention that ${1}$ denotes an identity matrix of whatever dimension is needed to make sense of the expressions being computed, and similarly for ${0}$) we obtain the Weinstein-Aronszajn determinant identity

$\displaystyle \det( 1 + AB ) = \det( 1 + BA ). \ \ \ \ \ (2)$

This identity, which converts an ${n \times n}$ determinant into an ${m \times m}$ determinant, is very useful in random matrix theory (a point emphasised in particular by Deift), particularly in regimes in which ${m}$ is much smaller than ${n}$.

Another identity generated from (1) arises when trying to compute the determinant of a ${(n+m) \times (n+m)}$ block matrix

$\displaystyle \begin{pmatrix} A & B \\ C & D \end{pmatrix}$

where ${A}$ is an ${n \times n}$ matrix, ${B}$ is an ${n \times m}$ matrix, ${C}$ is an ${m \times n}$ matrix, and ${D}$ is an ${m \times m}$ matrix. If ${A}$ is invertible, then we can manipulate this matrix via block Gaussian elimination as

$\displaystyle \begin{pmatrix} A & B \\ C & D \end{pmatrix} = \begin{pmatrix} A & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & A^{-1} B \\ C & D \end{pmatrix}$

$\displaystyle = \begin{pmatrix} A & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ C & 1 \end{pmatrix} \begin{pmatrix} 1 & A^{-1} B \\ 0 & D - C A^{-1} B \end{pmatrix}$

and on taking determinants using (1) we obtain the Schur determinant identity

$\displaystyle \det \begin{pmatrix} A & B \\ C & D \end{pmatrix} = \det(A) \det( D - C A^{-1} B ) \ \ \ \ \ (3)$

relating the determinant of a block-diagonal matrix with the determinant of the Schur complement ${D-C A^{-1} B}$ of the upper left block ${A}$. This identity can be viewed as the correct way to generalise the ${2 \times 2}$ determinant formula

$\displaystyle \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad-bc = a ( d - c a^{-1} b).$

It is also possible to use determinant identities to deduce other matrix identities that do not involve the determinant, by the technique of matrix differentiation (or equivalently, matrix linearisation). The key observation is that near the identity, the determinant behaves like the trace, or more precisely one has

$\displaystyle \det( 1 + \varepsilon A ) = 1 + \varepsilon \hbox{tr}(A) + O(\varepsilon^2) \ \ \ \ \ (4)$

for any bounded square matrix ${A}$ and infinitesimal ${\varepsilon}$. (If one is uncomfortable with infinitesimals, one can interpret this sort of identity as an asymptotic as ${\varepsilon\rightarrow 0}$.) Combining this with (1) we see that for square matrices ${A,B}$ of the same dimension with ${A}$ invertible and ${A^{-1}, B}$ invertible, one has

$\displaystyle \det( A + \varepsilon B ) = \det(A) \det(1 + \varepsilon A^{-1} B )$

$\displaystyle = \det(A) (1 + \varepsilon \hbox{tr}( A^{-1} B ) + O(\varepsilon^2) )$

for infinitesimal ${\varepsilon}$. To put it another way, if ${A(t)}$ is a square matrix that depends in a differentiable fashion on a real parameter ${t}$, then we have the Jacobi formula

$\displaystyle \frac{d}{dt} \det(A(t)) = \det(A(t)) \hbox{tr}( A(t)^{-1} \frac{d}{dt} A(t) )$

whenever ${A(t)}$ is invertible. (Note that if one combines this identity with cofactor expansion, one recovers Cramer’s rule.)

Let us see some examples of this differentiation method. If we take the Weinstein-Aronszajn identity (2) and multiply one of the rectangular matrices ${A}$ by an infinitesimal ${\varepsilon}$, we obtain

$\displaystyle \det( 1 + \varepsilon A B ) = \det( 1 + \varepsilon B A);$

applying (4) and extracting the linear term in ${\varepsilon}$ (or equivalently, differentiating at ${\varepsilon}$ and then setting ${\varepsilon=0}$) we conclude the cyclic property of trace:

$\displaystyle \hbox{tr}(AB) = \hbox{tr}(BA).$

To manipulate derivatives and inverses, we begin with the Neumann series approximation

$\displaystyle (1 + \varepsilon A)^{-1} = 1 - \varepsilon A + O(\varepsilon^2)$

for bounded square ${A}$ and infinitesimal ${\varepsilon}$, which then leads to the more general approximation

$\displaystyle (A + \varepsilon B)^{-1} = (1 + \varepsilon A^{-1} B)^{-1} A^{-1}$

$\displaystyle = A^{-1} - \varepsilon A^{-1} B A^{-1} + O(\varepsilon^2) \ \ \ \ \ (5)$

for square matrices ${A,B}$ of the same dimension with ${B, A^{-1}}$ bounded. To put it another way, we have

$\displaystyle \frac{d}{dt} A(t)^{-1} = -A(t)^{-1} (\frac{d}{dt} A(t)) A(t)^{-1}$

whenever ${A(t)}$ depends in a differentiable manner on ${t}$ and ${A(t)}$ is invertible.

We can then differentiate (or linearise) the Schur identity (3) in a number of ways. For instance, if we replace the lower block ${D}$ by ${D + \varepsilon H}$ for some test ${m \times m}$ matrix ${H}$, then by (4), the left-hand side of (3) becomes (assuming the invertibility of the block matrix)

$\displaystyle (\det \begin{pmatrix} A & B \\ C & D \end{pmatrix}) (1 + \varepsilon \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} \begin{pmatrix} 0 & 0 \\ 0 & H \end{pmatrix} + O(\varepsilon^2) )$

while the right-hand side becomes

$\displaystyle \det(A) ( \det(D-BA^{-1}C) + \varepsilon \hbox{tr}( (D-BA^{-1}C)^{-1} H ) + O(\varepsilon^2) );$

extracting the linear term in ${\varepsilon}$, we conclude that

$\displaystyle \hbox{tr} (\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} \begin{pmatrix} 0 & 0 \\ 0 & H \end{pmatrix}) = \hbox{tr}( (D-BA^{-1}C)^{-1} H ).$

As ${H}$ was an arbitrary ${m \times m}$ matrix, we conclude from duality that the lower right ${m \times m}$ block of ${\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1}}$ is given by the inverse ${(D-BA^{-1}C)^{-1}}$ of the Schur complement:

$\displaystyle \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \begin{pmatrix} ?? & ?? \\ ?? & (D-BA^{-1}C)^{-1} \end{pmatrix}.$

One can also compute the other components of this inverse in terms of the Schur complement ${D-BA^{-1} C}$ by a similar method (although the formulae become more complicated). As a variant of this method, we can perturb the block matrix in (3) by an infinitesimal multiple of the identity matrix giving

$\displaystyle \det \begin{pmatrix} A+\varepsilon & B \\ C & D+\varepsilon \end{pmatrix} = \det(A+\varepsilon) \det( D +\varepsilon - C (A+\varepsilon)^{-1} B ). \ \ \ \ \ (6)$

By (4), the left-hand side is

$\displaystyle (\det \begin{pmatrix} A & B \\ C & D \end{pmatrix}) (1 + \varepsilon \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} + O(\varepsilon^2) ).$

From (5), we have

$\displaystyle D + \varepsilon - C (A+ \varepsilon)^{-1} B = D - C A^{-1} B + \varepsilon(1 + C A^{-2} B) + O(\varepsilon^2)$

and so from (4) the right-hand side of (6) is

$\displaystyle \det(A) \det(D-CA^{-1} B) \times$

$\displaystyle \times ( 1 + \varepsilon (\hbox{tr}(A^{-1}) + \hbox{tr}( (D-CA^{-1} B)^{-1} (1 + C A^{-2} B)) ) + O(\varepsilon^2) );$

extracting the linear component in ${\varepsilon}$, we conclude the identity

$\displaystyle \hbox{tr} \begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \hbox{tr}(A^{-1}) + \hbox{tr}( (D-CA^{-1} B)^{-1} (1 + C A^{-2} B)) \ \ \ \ \ (7)$

which relates the trace of the inverse of a block matrix, with the trace of the inverse of one of its blocks. This particular identity turns out to be useful in random matrix theory; I hope to elaborate on this in a later post.

As a final example of this method, we can analyse low rank perturbations ${A+BC}$ of a large (${n \times n}$) matrix ${A}$, where ${B}$ is an ${n \times m}$ matrix and ${C}$ is an ${m \times n}$ matrix for some ${m. (This type of situation is also common in random matrix theory, for instance it arose in this previous paper of mine on outliers to the circular law.) If ${A}$ is invertible, then from (1) and (2) one has the matrix determinant lemma

$\displaystyle \det( A + BC ) = \det(A) \det( 1 + A^{-1} BC) = \det(A) \det(1 + CA^{-1} B);$

if one then perturbs ${A}$ by an infinitesimal matrix ${\varepsilon H}$, we have

$\displaystyle \det( A + BC + \varepsilon H ) = \det(A + \varepsilon H ) \det(1 + C(A+\varepsilon H)^{-1} B).$

Extracting the linear component in ${\varepsilon}$ as before, one soon arrives at

$\displaystyle \hbox{tr}( (A+BC)^{-1} H ) = \hbox{tr}( A^{-1} H ) - \hbox{tr}( (1 + C A^{-1} B)^{-1} C A^{-1} H A^{-1} B )$

assuming that ${A}$ and ${A+BC}$ are both invertible; as ${H}$ is arbitrary, we conclude (after using the cyclic property of trace) the Sherman-Morrison formula

$\displaystyle (A+BC)^{-1} = A^{-1} - A^{-1} B (1 + C A^{-1} B)^{-1} C A^{-1}$

for the inverse of a low rank perturbation ${A+BC}$ of a matrix ${A}$. While this identity can be easily verified by direct algebraic computation, it is somewhat difficult to discover this identity by such algebraic manipulation; thus we see that the “determinant first” approach to matrix identities can make it easier to find appropriate matrix identities (particularly those involving traces and/or inverses), even if the identities one is ultimately interested in do not involve determinants. (As differentiation typically makes an identity lengthier, but also more “linear” or “additive”, the determinant identity tends to be shorter (albeit more nonlinear and more multiplicative) than the differentiated identity, and can thus be slightly easier to derive.)

Exercise 1 Use the “determinant first” approach to derive the Woodbury matrix identity (also known as the binomial inverse theorem)

$\displaystyle (A+BDC)^{-1} = A^{-1} - A^{-1} B (D^{-1} + CA^{-1} B)^{-1} C A^{-1}$

where ${A}$ is an ${n \times n}$ matrix, ${B}$ is an ${n \times m}$ matrix, ${C}$ is an ${m \times n}$ matrix, and ${D}$ is an ${m \times m}$ matrix, assuming that ${A}$, ${D}$ and ${A+BDC}$ are all invertible.

Exercise 2 Let ${A,B}$ be invertible ${n \times n}$ matrices. Establish the identity

$\displaystyle \det(A + B) \det(A - B) = \det(B) \det( AB^{-1} A - B)$

and differentiate this in ${A}$ to deduce the identity

$\displaystyle (A+B)^{-1} + (A-B)^{-1} = 2 (A - BA^{-1} B)^{-1}$

(assuming that all inverses exist) and thence

$\displaystyle (A+B)^{-1} = (A - BA^{-1} B)^{-1} - (B - AB^{-1} A)^{-1}.$

Rotating ${B}$ by ${i}$ then gives

$\displaystyle (A+iB)^{-1} = (A + BA^{-1} B)^{-1} - i (B + AB^{-1} A)^{-1},$

which is useful for inverting a matrix ${A+iB}$ that has been split into a self-adjoint component ${A}$ and a skew-adjoint component ${iB}$.

My colleague Ricardo Pérez-Marco showed me a very cute proof of Pythagoras’ theorem, which I thought I would share here; it’s not particularly earth-shattering, but it is perhaps the most intuitive proof of the theorem that I have seen yet.

In the above diagram, a, b, c are the lengths BC, CA, and AB of the right-angled triangle ACB, while x and y are the areas of the right-angled triangles CDB and ADC respectively. Thus the whole triangle ACB has area x+y.

Now observe that the right-angled triangles CDB, ADC, and ACB are all similar (because of all the common angles), and thus their areas are proportional to the square of their respective hypotenuses. In other words, (x,y,x+y) is proportional to $(a^2, b^2, c^2)$. Pythagoras’ theorem follows.

This problem lies in the highly interconnected interface between algebraic combinatorics (esp. the combinatorics of Young tableaux and related objects, including honeycombs and puzzles), algebraic geometry (particularly classical and quantum intersection theory and geometric invariant theory), linear algebra (additive and multiplicative, real and tropical), and the representation theory (classical, quantum, crystal, etc.) of classical groups. (Another open problem in this subject is to find a succinct and descriptive name for the field.) I myself haven’t actively worked in this area for several years, but I still find it a fascinating and beautiful subject. (With respect to the dichotomy between structure and randomness, this subject lies deep within the “structure” end of the spectrum.)

As mentioned above, the problems in this area can be approached from a variety of quite diverse perspectives, but here I will focus on the linear algebra perspective, which is perhaps the most accessible. About nine years ago, Allen Knutson and I introduced a combinatorial gadget, called a honeycomb, which among other things controlled the relationship between the eigenvalues of two arbitrary Hermitian matrices A, B, and the eigenvalues of their sum A+B; this was not the first such gadget that achieved this purpose, but it was a particularly convenient one for studying this problem, in particular it was used to resolve two conjectures in the subject, the saturation conjecture and the Horn conjecture. (These conjectures have since been proven by a variety of other methods.) There is a natural multiplicative version of these problems, which now relates the eigenvalues of two arbitrary unitary matrices U, V and the eigenvalues of their product UV; this led to the “quantum saturation” and “quantum Horn” conjectures, which were proven a couple years ago. However, the quantum analogue of a “honeycomb” remains a mystery; this is the main topic of the current post.