# 0x023 Matrix Analysis

Contents Index

## Foundation

### Subspace

Definition (fundamental subspace) There are four fundamental subspaces in linear algebra

• $N(A)$: null space of A
• $C(A)$: column space of A
• $N(A^T)$: null space of $A^T$
• $C(A^T)$: row space of $A$

Lemma

• $N(A) \perp C(A^T)$
• $N(A) + C(A) = n$
• $N(A^T) \perp C(A)$
• $N(A^T) + C(A^T) = m$

The important thing here is that null space $N(A)$ is orthogonal to row space $C(A^T)$, because for any vector $x$ such that $Ax=0$, $x$ is orthogonal to every row in $A$, therefore any linear combination from row space is also orthogonal to $x$. In other words, $Ax = 0$ implies for all $y$, $(y^T A)x = 0$, therefore, $y^TA$ and $x$ are orthogonal.

Definition (rank) rank is the dimension size of the column space

$$rank(A) = \dim range(A)$$

Lemma (properties of rank)

• $rank(A^T) = rank(A)$
• $|rank(A) – rank(B)| \leq rank(A+B) \leq rank(A) + rank(B)$
• If $A,C$ is nonsingular, then $rank(AB) = rank(B) = rank(BC)$

### Trace

Trace has a cyclic permutation property useful to compute derivatives

$$tr(\mathbf{ABC}) = tr(\mathbf{CAB}) = tr(\mathbf{BCA})$$

### Derivatives

Summary of some important derivatives with respect to matrix and vector

$$\big( \frac{\partial \mathbf{a}}{\partial x} \big)_i = \frac{\partial a_i}{\partial x}$$

$$\big( \frac{\partial x}{\partial \mathbf{a}} \big)_i = \frac{\partial x}{\partial a_i}$$

$$\big( \frac{\partial \mathbf{a}}{\partial \mathbf{b}} \big)_{i,j} = \frac{\partial a_i}{\partial b_j}$$

Some important conclusions are

$$\frac{\partial (\mathbf{x^T a}) } {\partial x} = \frac{\partial (\mathbf{a^T x}) } {\partial x} = \mathbf{a}$$

$$\frac{\partial (\mathbf{a^T A a})}{\partial \mathbf{a}} = \mathbf{(A+A^T)a}$$

$$\frac{\partial}{\partial \mathbf{A}} tr(\mathbf{BA}) = \mathbf{B^T}$$

$$\frac{\partial \log \det(A)}{\partial A} = A^{-T}$$

### Submatrices

schur’s complement of the block matric

$$\begin{pmatrix} A & B \\ C&D \\ \end{pmatrix}$$

is defined to be

$$M = (A – BD^{-1}C)^{-1}$$

The inverse of the original matrix

$$\begin{pmatrix} M & -MBD^{-1} \\ -D^{-1}CM & D^{-1} + D^{-1}CMBD^{-1} \\ \end{pmatrix}$$

A good reference is here

### Norm

Norms arise naturally in the study of power series of matrices and in the analysis of numerical computations

For example, it is sufficient to say the following formula is valid when any matrix norm of $A$ is less than 1

$$(I-A)^{-1} = I+A+A^2+A^3…$$

Definition (spectral norm)

$$||A||_2 = \max \frac{||Ax||}{||x||} = \sigma_1$$

Definition (Frobenius norm) Frobenius norm is to think of matrix as a long vector and to take vector norm.

$$||A||_{F} = \sqrt{\sigma^2_1 + … + \sigma^2_r}$$

Proof $||A||_F = \sum_{i,j} (a_{i,j})^2 = tr(A^T A) = \sum_i \lambda_i = \sum_i \sigma_i^2$

Definition (nuclear norm)

$$||A||_{N} = \sigma_1 + \sigma_2 + … + \sigma_r$$

## Triangular Matrix

Decomposition (LU decomposition)

## Diagonal matrix

This section is to study of similarity of $A \in M_n$ via a general nonsingular matrix $S$: the transformation $A \to S^{-1} A S$

Diagonalizable matrices have independent eigenvectors, but not necessarily orthogonal.

### Eigenvalue and Eigenvector

Definition (spectrum) The spectrum of $A \in M_n$ is the set of all $\lambda \in C$ that are eigenvalues of $A$; denoted by $\sigma(A)$

A diagonal matrix can be represented in the following form

$$AX = X \Lambda$$

where $\Lambda$ is a diagonal matrix $diag(\lambda_1, …, \lambda_n)$, $X$ is a matrix whose columns are eigenvectors $x_i$

Because columns of $X$ are linear independent and $X$ spans the whole space, $X$ is invertible, and we obtain the following decomposition.

Decomposition (eigen-decomposition) An diagonalizable matrix $A$ can be factorized into the following form

$$A = X \Lambda X^{-1}$$

This decomposition can be interpreted clearly by considering component.

If $v=\sum_i c_i x_i$, Multiplying $X^{-1}$ is to extract the coefficient component under the basis of $X$

$$X^{-1} v = (c_1, …, c_n)$$

Multiplying $A$ is to extend linearly each eigenvector by eigenvalue.

$$AX^{-1} v = (c_1 \lambda_1, …, c_n \lambda_n)$$

Then using the original basis to represent the vector by multiplying $X$

1 & 1 \\
0 & 2 \\
\end{pmatrix}$$Theorem (statements of normal matrix) Let A= [ a_{ij} ] \in M_n have eigenvalues \lambda_1, …, \lambda_n. The following statements are equivalent • A is normal • A is unitarily diagonalizable • \sum_{i,j=1}^n |a_{ij}|^2 = \sum_i^n |\lambda_i|^2 • A has n orthonormal eigenvectors. ### Unitary Equivalence and Singular Value Decomposition Definition (equivalence) Two matrices are said to be equivalent iff there exist invertible matrices P,Q such that$$A = PBQ$$Definition (unitary equivalence) For A \in M_{n,m}, the transformation A \to UAV in which U \in M_n,V \in M_m are both unitary, is called unitary equivalence Decomposition (Singular Value Decomposition) Let A \in M_{n,m} matrix with rank r, there exists a matrix \Sigma \in M_{n,m} where the diagonal entries are \sigma_1 \geq \sigma_2 … \geq \sigma_r \geq 0, and orthogonal matrix U \in M_{n,n}, V \in M_{m,m} such that$$A = U\Sigma V^T$$(orthogonal) x (diagonal) x (orthogonal) The reduced form of SVD using rank r can be written as$$A = U_{n \times r} \Sigma_{r \times r} V_{m \times r}^T $$This decomposition can also be written in the rank 1 sum form$$A = \sum_{i=1}^r u_i v_i^T$$u_i, v_i are left and right eigenvectors, they are connected with A by$$A v_i = \sigma_i u_i$$In the matrix form, it is$$AV = \Sigma U$$Theorem (Eckart-Young) The first k rank 1 sum is the best rank k approximation of A, suppose that B has rank k, then$$|| A – A_k || \leq ||A-B||$$Note (singular value VS eigenvalue) Note that singular value does not match eigenvalue in general, actually the first singular value is the upper bound.$$\sigma_1 \geq \lambda$$However, in some cases, they could be identical, for example. If S=Q\Lambda Q^T is symmetric positive definite matrix, then U=V=Q and \Sigma=\Lambda If S is symmetric but has negative eigenvalue, then the corresponding singular value is its reverse, and one set of the corresponding singular vector is its reverse. Decomposition (Polar Decomposition) polar decomposition is analogous to re^{i\theta}, where an arbitrary matrix A can be decomposed into an orthogonal matrix Q and symmetric positive semidefinite matrix S$$A = U \Sigma V^T = (UV^T)(V^T \Sigma V) = QS$$## Hermitian, Symmetric Matrix analogous to a real number Decomposition (eigen-decomposition) The eigen-decomposition of a symmetric matrix is as follows:$$A = Q\Lambda Q^T$$In an equivalent form, it can be represented by a rank 1 symmetric matrix sum$$A = \sum_i \lambda_i q_i q_i^T$$Eigenvector of symmetric matrix are orthogonal. Suppose Sx = \lambda x, Sy = \alpha y, theny^T S x = \lambda y^T x, x^T S y = \alpha x^T y and y^T S x = x^T S y. therefore \lambda – \alpha x^T y = 0 and x^T y = 0 when \lambda \neq \alpha ### Congruences and Diagonalizations Definition (*congruence, T-congruence) Let A,B \in M_n, if there exists a nonsingular matrix S such that$$B=SAS^*$$Then B is said to be *congruent or conjunctive to A. If$$B=SAS^T$$Then B is said to be T-congruent to A Note that both types of congruence are equivalence relations. Theorem (Sylvester’s law of inertia) Congruent matrices have the same number of positive, negative, zero eigenvalues ## Positive Semidefinite, Positive Definite Matrix analogous to a non-negative, positive number. It is corresponding to the positive operator in linear algebra. Definition (positive definite, positive semidefinite) A Hermitian matrix A \in M_n is positive definite if for all nonzero x \in C^n$$x^* A x > 0$$It is positive semidefinite if for all nonzero x \in C^n$$x^* A x \geq 0$$### Characterizations and Properties Property All entries on diagonal of Hermitian positive semidefinite matrix are positive. Theorem (eigenvalue criterion) A Hermitian matrix is positive semidefinite iff all of is eigenvalues are nonnegative, it is positive definite iff all of its eigenvalues are positive. Theorem (Sylvester’s criterion) Let A \in M_n be Hermitian • If every principal minor of A is nonnegative, then A is positive semidefinite • If every leading principal minor of A is positive, then A is positive definite. Decomposition (Cholesky decomposition) Cholesky decomposition is the decomposition of the following form where L is a lower triangular matrix with real and positive diagonal entries$$A = L L^*

Note that this is to find the square root

## Reference

[1] Horn, Roger A., and Charles R. Johnson. Matrix analysis. Cambridge university press, 2012.

[2] Golub, G. H. (1996). CF vanLoan, Matrix Computations. The Johns Hopkins.

[3] Petersen, K. B., and M. S. Pedersen. “The Matrix Cookbook, vol. 7.” Technical University of Denmark 15 (2008).

[4] Strang, Gilbert. Linear algebra and learning from data. Wellesley-Cambridge Press, 2019.