You are currently browsing the tag archive for the ‘sweeping a matrix’ tag.

I recently learned about a curious operation on square matrices known as sweeping, which is used in numerical linear algebra (particularly in applications to statistics), as a useful and more robust variant of the usual Gaussian elimination operations seen in undergraduate linear algebra courses. Given an {n \times n} matrix {A := (a_{ij})_{1 \leq i,j \leq n}} (with, say, complex entries) and an index {1 \leq k \leq n}, with the entry {a_{kk}} non-zero, the sweep {\hbox{Sweep}_k[A] = (\hat a_{ij})_{1 \leq i,j \leq n}} of {A} at {k} is the matrix given by the formulae

\displaystyle  \hat a_{ij} := a_{ij} - \frac{a_{ik} a_{kj}}{a_{kk}}

\displaystyle  \hat a_{ik} := \frac{a_{ik}}{a_{kk}}

\displaystyle  \hat a_{kj} := \frac{a_{kj}}{a_{kk}}

\displaystyle  \hat a_{kk} := \frac{-1}{a_{kk}}

for all {i,j \in \{1,\dots,n\} \backslash \{k\}}. Thus for instance if {k=1}, and {A} is written in block form as

\displaystyle  A = \begin{pmatrix} a_{11} & X \\ Y & B \end{pmatrix} \ \ \ \ \ (1)

for some {1 \times n-1} row vector {X}, {n-1 \times 1} column vector {Y}, and {n-1 \times n-1} minor {B}, one has

\displaystyle  \hbox{Sweep}_1[A] = \begin{pmatrix} -1/a_{11} & X / a_{11} \\ Y/a_{11} & B - a_{11}^{-1} YX \end{pmatrix}. \ \ \ \ \ (2)

The inverse sweep operation {\hbox{Sweep}_k^{-1}[A] = (\check a_{ij})_{1 \leq i,j \leq n}} is given by a nearly identical set of formulae:

\displaystyle  \check a_{ij} := a_{ij} - \frac{a_{ik} a_{kj}}{a_{kk}}

\displaystyle  \check a_{ik} := -\frac{a_{ik}}{a_{kk}}

\displaystyle  \check a_{kj} := -\frac{a_{kj}}{a_{kk}}

\displaystyle  \check a_{kk} := \frac{-1}{a_{kk}}

for all {i,j \in \{1,\dots,n\} \backslash \{k\}}. One can check that these operations invert each other. Actually, each sweep turns out to have order {4}, so that {\hbox{Sweep}_k^{-1} = \hbox{Sweep}_k^3}: an inverse sweep performs the same operation as three forward sweeps. Sweeps also preserve the space of symmetric matrices (allowing one to cut down computational run time in that case by a factor of two), and behave well with respect to principal minors; a sweep of a principal minor is a principal minor of a sweep, after adjusting indices appropriately.

Remarkably, the sweep operators all commute with each other: {\hbox{Sweep}_k \hbox{Sweep}_l = \hbox{Sweep}_l \hbox{Sweep}_k}. If {1 \leq k \leq n} and we perform the first {k} sweeps (in any order) to a matrix

\displaystyle  A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}

with {A_{11}} a {k \times k} minor, {A_{12}} a {k \times n-k} matrix, {A_{12}} a {n-k \times k} matrix, and {A_{22}} a {n-k \times n-k} matrix, one obtains the new matrix

\displaystyle  \hbox{Sweep}_1 \dots \hbox{Sweep}_k[A] = \begin{pmatrix} -A_{11}^{-1} & A_{11}^{-1} A_{12} \\ A_{21} A_{11}^{-1} & A_{22} - A_{21} A_{11}^{-1} A_{12} \end{pmatrix}.

Note the appearance of the Schur complement in the bottom right block. Thus, for instance, one can essentially invert a matrix {A} by performing all {n} sweeps:

\displaystyle  \hbox{Sweep}_1 \dots \hbox{Sweep}_n[A] = -A^{-1}.

If a matrix has the form

\displaystyle  A = \begin{pmatrix} B & X \\ Y & a \end{pmatrix}

for a {n-1 \times n-1} minor {B}, {n-1 \times 1} column vector {X}, {1 \times n-1} row vector {Y}, and scalar {a}, then performing the first {n-1} sweeps gives

\displaystyle  \hbox{Sweep}_1 \dots \hbox{Sweep}_{n-1}[A] = \begin{pmatrix} -B^{-1} & B^{-1} X \\ Y B^{-1} & a - Y B^{-1} X \end{pmatrix}

and all the components of this matrix are usable for various numerical linear algebra applications in statistics (e.g. in least squares regression). Given that sweeps behave well with inverses, it is perhaps not surprising that sweeps also behave well under determinants: the determinant of {A} can be factored as the product of the entry {a_{kk}} and the determinant of the {n-1 \times n-1} matrix formed from {\hbox{Sweep}_k[A]} by removing the {k^{th}} row and column. As a consequence, one can compute the determinant of {A} fairly efficiently (so long as the sweep operations don’t come close to dividing by zero) by sweeping the matrix for {k=1,\dots,n} in turn, and multiplying together the {kk^{th}} entry of the matrix just before the {k^{th}} sweep for {k=1,\dots,n} to obtain the determinant.

It turns out that there is a simple geometric explanation for these seemingly magical properties of the sweep operation. Any {n \times n} matrix {A} creates a graph {\hbox{Graph}[A] := \{ (X, AX): X \in {\bf R}^n \}} (where we think of {{\bf R}^n} as the space of column vectors). This graph is an {n}-dimensional subspace of {{\bf R}^n \times {\bf R}^n}. Conversely, most subspaces of {{\bf R}^n \times {\bf R}^n} arises as graphs; there are some that fail the vertical line test, but these are a positive codimension set of counterexamples.

We use {e_1,\dots,e_n,f_1,\dots,f_n} to denote the standard basis of {{\bf R}^n \times {\bf R}^n}, with {e_1,\dots,e_n} the standard basis for the first factor of {{\bf R}^n} and {f_1,\dots,f_n} the standard basis for the second factor. The operation of sweeping the {k^{th}} entry then corresponds to a ninety degree rotation {\hbox{Rot}_k: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}^n \times {\bf R}^n} in the {e_k,f_k} plane, that sends {f_k} to {e_k} (and {e_k} to {-f_k}), keeping all other basis vectors fixed: thus we have

\displaystyle  \hbox{Graph}[ \hbox{Sweep}_k[A] ] = \hbox{Rot}_k \hbox{Graph}[A]

for generic {n \times n} {A} (more precisely, those {A} with non-vanishing entry {a_{kk}}). For instance, if {k=1} and {A} is of the form (1), then {\hbox{Graph}[A]} is the set of tuples {(r,R,s,S) \in {\bf R} \times {\bf R}^{n-1} \times {\bf R} \times {\bf R}^{n-1}} obeying the equations

\displaystyle  a_{11} r + X R = s

\displaystyle  Y r + B R = S.

The image of {(r,R,s,S)} under {\hbox{Rot}_1} is {(s, R, -r, S)}. Since we can write the above system of equations (for {a_{11} \neq 0}) as

\displaystyle  \frac{-1}{a_{11}} s + \frac{X}{a_{11}} R = -r

\displaystyle  \frac{Y}{a_{11}} s + (B - a_{11}^{-1} YX) R = S

we see from (2) that {\hbox{Rot}_1 \hbox{Graph}[A]} is the graph of {\hbox{Sweep}_1[A]}. Thus the sweep operation is a multidimensional generalisation of the high school geometry fact that the line {y = mx} in the plane becomes {y = \frac{-1}{m} x} after applying a ninety degree rotation.

It is then an instructive exercise to use this geometric interpretation of the sweep operator to recover all the remarkable properties about these operations listed above. It is also useful to compare the geometric interpretation of sweeping as rotation of the graph to that of Gaussian elimination, which instead shears and reflects the graph by various elementary transformations (this is what is going on geometrically when one performs Gaussian elimination on an augmented matrix). Rotations are less distorting than shears, so one can see geometrically why sweeping can produce fewer numerical artefacts than Gaussian elimination.

Archives