0x023 Matrix Analysis

This page summarizes topics related to Matrix Analysis and related algorithms

Foundation

Subspace

Definition (fundamental subspace) There are four fundamental subspaces in linear algebra

  • $N(A)$: null space of A
  • $C(A)$: column space of A
  • $N(A^T)$: null space of $A^T$
  • $C(A^T)$: row space of $A$
Reference: The Big Picture of Linear Algebra Gilbert Strang

Lemma

  • $N(A) \perp C(A^T)$
  • $N(A) + C(A) = n$
  • $N(A^T) \perp C(A)$
  • $N(A^T) + C(A^T) = m$

The important thing here is that null space $N(A)$ is orthogonal to row space $C(A^T)$, because for any vector $x$ such that $Ax=0$, $x$ is orthogonal to every row in $A$, therefore any linear combination from row space is also orthogonal to $x$. In other words, $Ax = 0$ implies for all $y$, $(y^T A)x = 0$, therefore, $y^TA$ and $x$ are orthogonal.

Definition (rank) rank is the dimension size of the column space

$$rank(A) = \dim range(A)$$

Lemma (properties of rank)

  • $rank(A^T) = rank(A)$
  • $|rank(A) – rank(B)| \leq rank(A+B) \leq rank(A) + rank(B)$
  • If $A,C$ is nonsingular, then $rank(AB) = rank(B) = rank(BC)$

Trace

Trace has a cyclic permutation property useful to compute derivatives

$$tr(\mathbf{ABC}) = tr(\mathbf{CAB}) = tr(\mathbf{BCA})$$

Derivatives

Summary of some important derivatives with respect to matrix and vector

$$\big( \frac{\partial \mathbf{a}}{\partial x} \big)_i = \frac{\partial a_i}{\partial x}$$

$$\big( \frac{\partial x}{\partial \mathbf{a}} \big)_i = \frac{\partial x}{\partial a_i}$$

$$\big( \frac{\partial \mathbf{a}}{\partial \mathbf{b}} \big)_{i,j} = \frac{\partial a_i}{\partial b_j}$$

Some important conclusions are

$$\frac{\partial (\mathbf{x^T a}) } {\partial x} = \frac{\partial (\mathbf{a^T x}) } {\partial x} = \mathbf{a}$$

$$\frac{\partial (\mathbf{a^T A a})}{\partial \mathbf{a}} = \mathbf{(A+A^T)a}$$

$$\frac{\partial}{\partial \mathbf{A}} tr(\mathbf{BA}) = \mathbf{B^T}$$

$$\frac{\partial \log \det(A)}{\partial A} = A^{-T}$$

Submatrices

schur’s complement of the block matric

$$\begin{pmatrix} A & B \\ C&D \\ \end{pmatrix}$$

is defined to be

$$M = (A – BD^{-1}C)^{-1}$$

The inverse of the original matrix

$$\begin{pmatrix} M & -MBD^{-1} \\ -D^{-1}CM & D^{-1} + D^{-1}CMBD^{-1} \\ \end{pmatrix}$$

A good reference is here

Norm

Norms arise naturally in the study of power series of matrices and in the analysis of numerical computations

For example, it is sufficient to say the following formula is valid when any matrix norm of $A$ is less than 1

$$(I-A)^{-1} = I+A+A^2+A^3…$$

Definition (spectral norm)

$$||A||_2 = \max \frac{||Ax||}{||x||} = \sigma_1$$

Definition (Frobenius norm) Frobenius norm is to think of matrix as a long vector and to take vector norm.

$$||A||_{F} = \sqrt{\sigma^2_1 + … + \sigma^2_r}$$

Proof $||A||_F = \sum_{i,j} (a_{i,j})^2 = tr(A^T A) = \sum_i \lambda_i = \sum_i \sigma_i^2$

Definition (nuclear norm)

$$||A||_{N} = \sigma_1 + \sigma_2 + … + \sigma_r$$

Triangular Matrix

Decomposition (LU decomposition)

Diagonal matrix

This section is to study of similarity of $A \in M_n$ via a general nonsingular matrix $S$: the transformation $A \to S^{-1} A S$

Diagonalizable matrices have independent eigenvectors, but not necessarily orthogonal.

Eigenvalue and Eigenvector

Definition (spectrum) The spectrum of $A \in M_n$ is the set of all $\lambda \in C$ that are eigenvalues of $A$; denoted by $\sigma(A)$

A diagonal matrix can be represented in the following form

$$AX = X \Lambda$$

where $\Lambda$ is a diagonal matrix $diag(\lambda_1, …, \lambda_n)$, $X$ is a matrix whose columns are eigenvectors $x_i$

Because columns of $X$ are linear independent and $X$ spans the whole space, $X$ is invertible, and we obtain the following decomposition.

Decomposition (eigen-decomposition) An diagonalizable matrix $A$ can be factorized into the following form

$$ A = X \Lambda X^{-1}$$

This decomposition can be interpreted clearly by considering component.

If $v=\sum_i c_i x_i$, Multiplying $X^{-1}$ is to extract the coefficient component under the basis of $X$

$$X^{-1} v = (c_1, …, c_n)$$

Multiplying $A$ is to extend linearly each eigenvector by eigenvalue.

$$AX^{-1} v = (c_1 \lambda_1, …, c_n \lambda_n)$$

Then using the original basis to represent the vector by multiplying $X$

$$XAX^{-1} v = (c_1 \lambda_1 x_1, …, c_n \lambda_n x_n)$

With this decomposition, we can compute $A^k$ efficiently by

$$A^k = X \Lambda^k X^{-1}$$

In particular, if $\lambda < 1$, it can be dropped when $k$ is large enough.

Similarity

Similar matrices are just different basis presentations of the same linear transformation, similar matrices have the same set of eigenvalues (spectrum).

Definition (similarity) $A, B \in M_n$. We say that $B$ is similar to $A$ if there exists a nonsingular $S \in M_n$ such that

$$B = S^{-1} A S$$

Similarity is an equivalence relation on $M_n$, sometimes we write $A ~ B$ to express similarity.

Corollary (Invariant under similarity) There are several properties preserved by the similarity relation. For example, the eigenvalues of $A, B$ (spectrum) are preserved, the characteristic polynomial is preserved, rank is also preserved.

However, note that the same set of eigenvalues does not imply similarity

Definition (diagonalizable) If $A \in M_n$ is similar to a diagonal matrix, then $A$ is said to be diagonalizable

The purpose of eigendecomposition is to find a similar diagonal matrix.

Note that diagonalizable matrix does not require orthonormal basis.

Lemma (sum of rank-1 matrices) A diagonalizable $A \in M_n$ can be decomposed into sum of $n$ rank-1 matrices

$$A = \sum_i \lambda_i x_i y_i^T$$

where $x_i$ is the (right) eigenvector, $y_n$ is the left eigenvectors of $A$ (i.e.: $y^T A = \lambda y^T$)

Unitary, Normal Matrix

analogous to a number $r$ where $|r|=1$

Definition (unitary matrix, orthogonal matrix) A matrix $U \in M_n$ is unitary if $U U^* = I$. A matrix $U \in M_n(R)$ is real orthogonal if $U^T U = I$

The set of unitary matices in $M-n$ forms a group $U(n)$ which is a subgroup of $GL(n)$.

Decomposition (QR decomposition) Decomposition of a matrix into a orthogonal matrix and a triangular matrix

Note that it is an enhanced version of CR decomposition with Gram-Schmit

A thin version of QR is also available

Unitary Similarity

Unitary similarity is a special case of similarity

Definition (unitary similarity) $A$ is unitarily similar to $B$ if there exists a unitary matrix $U$ such that

$$A = UBU^*$$

We say that $A$ is unitarily diagonalizable if it is unitarily similar to a diagonal matrix

Normal Matrices

Definition (normal matrix) A matrix $A \in M_n$ is called normal if $A$ commutes with its conjugate transpose

$$A A^{*} = A^{*} A$$

Example: A diagonalizable matrix might not be a normal matrix

$$M = \begin{pmatrix}
1 & 1 \\
0 & 2 \\
\end{pmatrix}$$

Theorem (statements of normal matrix) Let $A= [ a_{ij} ] \in M_n$ have eigenvalues $\lambda_1, …, \lambda_n$. The following statements are equivalent

  • $A$ is normal
  • A is unitarily diagonalizable
  • $\sum_{i,j=1}^n |a_{ij}|^2 = \sum_i^n |\lambda_i|^2$
  • $A$ has $n$ orthonormal eigenvectors.

Unitary Equivalence and Singular Value Decomposition

Definition (equivalence) Two matrices are said to be equivalent iff there exist invertible matrices $P,Q$ such that

$$A = PBQ$$

Definition (unitary equivalence) For $A \in M_{n,m}$, the transformation $A \to UAV$ in which $U \in M_n,V \in M_m$ are both unitary, is called unitary equivalence

Decomposition (Singular Value Decomposition) Let $A \in M_{n,m}$ matrix with rank r, there exists a matrix $\Sigma \in M_{n,m}$ where the diagonal entries are $\sigma_1 \geq \sigma_2 … \geq \sigma_r \geq 0$, and orthogonal matrix $U \in M_{n,n}$, $V \in M_{m,m}$ such that

$$A = U\Sigma V^T$$

(orthogonal) x (diagonal) x (orthogonal)

The reduced form of SVD using rank $r$ can be written as

$$A = U_{n \times r} \Sigma_{r \times r} V_{m \times r}^T $$

This decomposition can also be written in the rank 1 sum form

$$A = \sum_{i=1}^r u_i v_i^T$$

$u_i, v_i$ are left and right eigenvectors, they are connected with $A$ by

$$A v_i = \sigma_i u_i$$

In the matrix form, it is

$$AV = \Sigma U$$

Theorem (Eckart-Young) The first k rank 1 sum is the best rank $k$ approximation of $A$, suppose that $B$ has rank k, then

$$|| A – A_k || \leq ||A-B||$$

Note (singular value VS eigenvalue)

Note that singular value does not match eigenvalue in general, actually the first singular value is the upper bound.

$$\sigma_1 \geq \lambda$$

However, in some cases, they could be identical, for example.

If $S=Q\Lambda Q^T$ is symmetric positive definite matrix, then $U=V=Q$ and $\Sigma=\Lambda$

If $S$ is symmetric but has negative eigenvalue, then the corresponding singular value is its reverse, and one set of the corresponding singular vector is its reverse.

Decomposition (Polar Decomposition) polar decomposition is analogous to $re^{i\theta}$, where an arbitrary matrix $A$ can be decomposed into an orthogonal matrix $Q$ and symmetric positive semidefinite matrix $S$

$$A = U \Sigma V^T = (UV^T)(V^T \Sigma V) = QS$$

Hermitian, Symmetric Matrix

analogous to a real number

Decomposition (eigen-decomposition) The eigen-decomposition of a symmetric matrix is as follows:

$$A = Q\Lambda Q^T$$

In an equivalent form, it can be represented by a rank 1 symmetric matrix sum

$$A = \sum_i \lambda_i q_i q_i^T$$

Eigenvector of symmetric matrix are orthogonal. Suppose $Sx = \lambda x, Sy = \alpha y$, then$y^T S x = \lambda y^T x$, $x^T S y = \alpha x^T y$ and $y^T S x = x^T S y$. therefore $\lambda – \alpha x^T y = 0$ and $x^T y = 0$ when $\lambda \neq \alpha$

Congruences and Diagonalizations

Definition (*congruence, T-congruence) Let $A,B \in M_n$, if there exists a nonsingular matrix $S$ such that

$$B=SAS^*$$

Then $B$ is said to be *congruent or conjunctive to $A$. If

$$B=SAS^T$$

Then $B$ is said to be T-congruent to $A$

Note that both types of congruence are equivalence relations.

Theorem (Sylvester’s law of inertia) Congruent matrices have the same number of positive, negative, zero eigenvalues

Positive Semidefinite, Positive Definite Matrix

analogous to a non-negative, positive number. It is corresponding to the positive operator in linear algebra.

Definition (positive definite, positive semidefinite) A Hermitian matrix $A \in M_n$ is positive definite if for all nonzero $x \in C^n$

$$x^* A x > 0$$

It is positive semidefinite if for all nonzero $x \in C^n$

$$x^* A x \geq 0$$

Characterizations and Properties

Property All entries on diagonal of Hermitian positive semidefinite matrix are positive.

Theorem (eigenvalue criterion) A Hermitian matrix is positive semidefinite iff all of is eigenvalues are nonnegative, it is positive definite iff all of its eigenvalues are positive.

Theorem (Sylvester’s criterion) Let $A \in M_n$ be Hermitian

  • If every principal minor of $A$ is nonnegative, then $A$ is positive semidefinite
  • If every leading principal minor of $A$ is positive, then $A$ is positive definite.

Decomposition (Cholesky decomposition) Cholesky decomposition is the decomposition of the following form where $L$ is a lower triangular matrix with real and positive diagonal entries

$$A = L L^*$$

Note that this is to find the square root

Reference

[1] Horn, Roger A., and Charles R. Johnson. Matrix analysis. Cambridge university press, 2012.

[2] Golub, G. H. (1996). CF vanLoan, Matrix Computations. The Johns Hopkins.

[3] Petersen, K. B., and M. S. Pedersen. “The Matrix Cookbook, vol. 7.” Technical University of Denmark 15 (2008).

[4] Strang, Gilbert. Linear algebra and learning from data. Wellesley-Cambridge Press, 2019.