- Steven J. Leon
- sixth edition
- row equivalent
- null space
- basis vectors are linearly independent by definition
- N(A)
- null space of A
- R(A)
- range of A which is actually the column space
- rank
- dimension of row space
- nullity
- dimension of null space
- the rank nullity theorem
- rank(A) + nullity(A) = n
-
$(V, +, ·)$ ,$(W, +, ·)$ - linear map
-
$L: V \rightarrow W$ - map from a vector space to a vector space
- additivity
$L(u + v) = L(u) + L(v)$
- homogeneity
$L(kv) = kL(v)$ $k \in F$
- kernel
$\operatorname{ker}(L) = L^{-1}(0) = {v \in V | L(v) = 0}$
- image
$\operatorname{img}(L) = L(V) = {L(v) \in W | v \in V}$
-
- endomorphism
$V = W$ - ex)
$R^3 \rightarrow R^3$
- monomorphism (or injective)
$L(w) = L(v) \Rightarrow w = v$
- epimorphism (or surjective)
$L(V) = W$ - 치역(range) = 공역(codomain)
- isomorphism
- monomorphism + epimorphism
- automorphism
- isomorphism + endomorphism
- inverse map
- 2 sided inverse map (left inverse map + right inverse map)
- identity map
$L(v) = v$ $L = I_v$
- inverse map
- left inverse map + right inverse map
- We can see it
$A$ and$B$ represent the same linear transform$L$ - where
-
$A$ is with respect to the basis$E$ -
$B$ is with respect to the basis$F$ -
$S$ tranlates a vector represented by the basis$F$ into the same vector represented by the basis$E$ - columns of S can be seen as
$F$ represented by the basis of$E$
- columns of S can be seen as
-
- where
https://en.wikipedia.org/wiki/Matrix_similarity
properties of similar matrices
- same trace
- same determinant
- same rank
- same eigenvalues
- represent a dataset of documents as a normalized bag of words matrix
- calculate cosine similarity between each document and the search text
- where
-
$X$ is$n \times p$ matrix centered to the mean of each feature -
$U$ is$n \times p$ matrix normalized for each feature
-
- correlation matrix
$C = U^TU$
- covariance matrix
$S = {1\over n - 1} X^TX$
(Refer to 6.5 - application 4)
- orthogonal subspaces
- orthogonal complement
- fundamental subspaces theorem
-
$N(A) = R(A^T)^\perp$ and$N(A^T) = R(A)^\perp$ - whrer
-
$R(A)$ is the range of$A$ which is the column space of$A$
-
-
- Frobenius norm
- $||A||F = (\langle A, A\rangle)^{1/2} = (\sum\limits{i=1}^{m} \sum\limits_{j=1}^{n} a_{ij}^2) ^{1/2} = (\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_n^2) ^{1/2}$
- where
$A \in R^{m \times n}$ -
$\sigma_i$ is a singular value- Refer to the lemma 6.5.2
- where
- $||A||F = (\langle A, A\rangle)^{1/2} = (\sum\limits{i=1}^{m} \sum\limits_{j=1}^{n} a_{ij}^2) ^{1/2} = (\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_n^2) ^{1/2}$
Inner product may be defined as
where
(Theorem 5.7.2)
Let
where
- if
$n = 0$ $\alpha_0 = \gamma_0 = 1$
- if
$n > 0$
if we choose
inner product:
recursion relation:
the first five polynomials of the sequence:
$p_0(x) = 1$ $p_1(x) = x$ $p_2(x) = {1 \over 2}(3x^2 - 1)$ $p_3(x) = {1 \over 2}(5x^3 - 3x)$ $p_4(x) = {1 \over 8}(35x^4 -30x^2 + 3)$
inner product:
properties:
recursion relation:
the first four polynomials of the sequence:
$T_0(x) = 1$ $T_1(x) = x$ $T_2(x) = 2x^2 - 1$ $T_3(x) = 4x^3 - 3x$
(generalization of Legendre and Chebyshev polynomials)
inner product:
recursion relation:
the first four polynomials of the sequence:
$H_0(x) = 1$ $H_1(x) = 2x$ $H_2(x) = 4x^2 - 2$ $H_3(x) = 8x^3 - 12x$
inner product:
recursion relation:
the first four polynomials of the sequence :
$L_0^{(0)}(x) = 1$ $L_1^{(0)}(x) = 1 - x$ $L_2^{(0)}(x) = {1 \over 2}x^2 - x + 2$ $L_3^{(0)}(x) = {1 \over 6}x^3 + 9 x^2 - 18x + 6$
characteristics polynomial
They have the same eigenvalues.
(Theorem 6.3.1)
- If
$\lambda_1, \lambda_2, ..., \lambda_k$ are distinct eigenvalues of an$n \times n$ matrix A with corresponding eigenvectors$\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_k$ , - then,
$\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_k$ are linearly independent
(Theorem 6.3.2)
- An
$n \times n$ matrix$A$ is diagonalizable if and only if$A$ has$n$ linearly independent eigenvectors.
For any
If
-
diagonalizable
- unitarily diagonalizable
- orthogonally diagonalizable
- unitarily diagonalizable
-
defective over real and complex entries
- non diagonalizable for those entries
-
Schur form
-
$T = U^HAU$ -
$T$ : upper triangular
-
-
-
Jordan normal form or Jordan canonical form
$J = P^{−1}AP$ - whether it's defective or not (over real and complex entries)
-
diagonalizable
$D = P^{-1}AP$
-
diagonalizable by a unitary matrix
$D = U^HAU$
-
orthogonally diagonalizable
$D = P^{T}AP$
(conjugate transpose or Hermitian transpose)
$\mathbf{z}^H = \overline{\mathbf{z}}^T$
(complex inner products)
$\langle\mathbf{z}, \mathbf{z}\rangle \ge 0$ $\langle\mathbf{z}, \mathbf{w}\rangle = \overline{\langle\mathbf{w}, \mathbf{z}\rangle}$ $\langle\alpha\mathbf{z} + \beta\mathbf{w}, \mathbf{u}\rangle = \alpha\langle \mathbf{z}, \mathbf{u}\rangle + \beta\langle \mathbf{w}, \mathbf{u}\rangle$
(standard inner product on
$\langle \mathbf{z},\mathbf{w} \rangle = \mathbf{w}^H \mathbf{z}$ $||\mathbf{z}|| = (\mathbf{z}^H\mathbf{z})^{1/2}$
(Hermitian matrices)
A matrix
(Some rules)
$(A^H)^H = A$ $(\alpha A + \beta B)^H = \overline\alpha A^H + \overline\beta B^H$ $(AC)^H = C^H A^H$
(Theorem 6.4.1)
The eigenvalues of a Hermitian matrix are all real. Furthermore, eigenvectors belonging to distinct eigenvalues are orthogonal.
(definition)
An
- Note that
$U^H = U^{-1}$ .
(Corollary 6.4.2)
If the eigenvalues of a Hermitian matrix
- Note that actually it hold even if the eigenvalues are not distinct. See 6.4.4 spectral theorem.
(Theorem 6.4.3 - Schur's Theorem)
For each
(Theorem 6.4.4 - Spectral Theorem)
If
- Note that eigenvectors corresponding to the same eigenvalue need not be orthogonal to each other. However, since every subspace has an orthonormal basis, orthonormal bases can be found for each eigenspace, so an orthonormal basis of eigenvectors can be found.
(Corollary 6.4.5)
If
(Definition)
A matrix
(Theorem 6.4.6)
A matrix
(Theorem 6.5.1 - The SVD Theorem)
If
(observations)
- The singular values
$\sigma_1, ..., \sigma_n$ are unique.- Note that the matrices
$U$ and$V$ are not unique
- Note that the matrices
-
$V$ diagonalizes$A^T A$ -
$U$ diagnoalizes$AA^T$ -
$v_j$ 's are called right singular vectors -
$u_j$ 's are called left singular vectors -
$A \mathbf{v}_j = \sigma_j \mathbf{u}_j$ - where
$j = 1, ..., n$
- where
- If
$A$ has rank$r$ ,- then
- (i)
$\mathbf{v}_1, ..., \mathbf{v}_r$ form an orthonormal basis for$R(A^T)$ . - (ii)
$\mathbf{v}_{r+1}, ..., \mathbf{v}_n$ form an orthonormal basis for$N(A)$ . - (iii)
$\mathbf{u}_1, ..., \mathbf{u}_r$ form an orthonormal basis for$R(A)$ . - (iv)
$\mathbf{u}_{r+1}, ..., \mathbf{u}_m$ form an orthonormal basis for$N(A^T)$ .
- (i)
- r = (the number of its nonzero singular values)
- Note that it doesn't apply to the number of eigenvalues.
-
$A = U_1 \Sigma_1 V_1^T$ - where
$U_1 = (\mathbf{u}_1, \mathbf{u}_2, ..., \mathbf{u}_r)$ $V_1 = (\mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_r)$
- called the compact form of the singular value decomposition of
$A$
- where
- then
(Lemma 6.5.2)
If
(Theorem 6.5.3)
Let
$$||A - X||F = \min\limits{S \in \mathcal{M}} ||A - S ||F$$
then
$$||A - X||F = (\sigma{k+1}^2 + \sigma{k+2}^2 + \cdots + \sigma_n^2) ^{1/2}$$
In particular, if
$$ \Sigma' = \left[ \begin{array}{c c c|c} \sigma_1 & & & \ & \ddots & & O \ & & \sigma_k & \ \hline & O & & O \end{array} \right]
\left[ \begin{matrix}{} \Sigma_k & O \ O & O \end{matrix} \right] $$
then $$||A - A'||F = (\sigma{k+1}^2 + \sigma_{k+2}^2 + \cdots + \sigma_n^2) ^{1/2} = \min\limits_{S \in \mathcal{M}} ||A - S ||_F$$
(definition)
The numerical rank of an
- (The total storage for
$A_k$ ) =$k(2n + 1)$ .
-
$\operatorname{argmin}\limits_{i} [Q^T x]_i$ - the number of scalar multiplication:
$mn$
- the number of scalar multiplication:
- $\operatorname{argmin}\limits_{i} [Q_1^T x]i = \operatorname{argmin}\limits{i} [V_1 \Sigma_1 U_1^T x]_i$
- where
$\sigma_{r+1} = \cdots = \sigma_{n} = 0$
- the number of scalar multiplication:
$r(m + n + 1)$
- where
- Given a
$n \times p$ data matrix$X$ centered to each column mean - finding orthonormal basis
$\mathbf{y}_1, ..., \mathbf{y}_r$ that span$R(X)$ where$r \le p$ as maximize the variance of$\mathbf{y}_i$ one by one from$\mathbf{y}_1$ . $\mathbf{y}_i = X\mathbf{v}_i = \sigma_i \mathbf{u}_i$ - note that
$\operatorname{var}(\mathbf{y}_1) = {1 \over n - 1}(X \mathbf{v}_1)^T X\mathbf{v}_1 = \mathbf{v}_1^T S \mathbf{v}_1$ -
$S = {1 \over n - 1} X^T X$ - covariance matrix
-
$X = U_1 \Sigma_1 V_1^T = U_1W$ -
$U_1$ : the principle hidden features$n \times r$
-
$W$ : representation of observed features as linear combinations of the principle hidden features$r \times p$
-
- quadratic equation
$ax^2 + 2bxy + cy^2 + dx + ey + f = 0$
- quadratic form
$\mathbf{x}^TA\mathbf{x}$
- imaginery conic
- degenerate conic
- non degenerate conic
- standard position
- circle
$x^2 + y^2 = r^2$
- ellipse
${x^2 \over \alpha^2} + {y^2 \over \beta^2} = 1$
- hyperbola
${x^2 \over \alpha^2} - {y^2 \over \beta^2} = 1$ ${y^2 \over \alpha^2} - {x^2 \over \beta^2} = 1$
- parabola
$x^2 = \alpha y$ $y^2 = \beta x$
- circle
- non standard position
- standard position
TODO