0
Singular Value Decomposition

The Singular-Value Decomposition, or SVD for short, is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler. Singular value decomposition is a method of decomposing a matrix into three other matrices:

$A=USV^{T}$

Where:

• A is an m$\times$n matrix

• U is an m$\times$n orthogonal matrix

• S is an n$\times$n diagonal matrix

• V is an n$\times$n orthogonal matrix

SVD of real symmetric matrix:

• A is real symmetric of size n$\times$n

• $A=A^{T}$

• U=V since $A^{T}A=AA^{T}=A^{2}$

$A=Q\sum Q^{T}$

• Q is of size $n \times n$ and cootnains eigen vectors of $A^{2}$

• This is called spectral decomposition of A

• $\sum$ contains n singular values

• Eigenvectors of A= eigenvectors of $A^{2}$

• Eigen values of A=square root of eigen values of $A^{2}$

• Eigen values of A=singular values of A

Transformation using SVD

• Transformed data

$T=AV=U\sum$

• V is called SVD transform matrix

• Essentially, T is just a rotation of A

• Dimensionality of T is n

• n different basis vectors than the original space

• columns of V give the basis vectors in rotated space

• V shows how each dimension can be represented as a linear combination of other dimensions

• Columns are input basis vectors

• U shows how each object can be represented as a linear combination of other objects

• Columns are output basis vectors

• Length of vectors are presented

$\vert|\overrightarrow a_{i}\vert|_{2} =\vert|\overrightarrow t_{i}\vert|_{2}$

Example

$A\left[ \begin{matrix} 2 & 4 \\ 1 & 3 \\ 0 & 1 \\ -1 & 0.5 \end{matrix} \right] =U\begin{bmatrix} -0.80 & 0.22 & 0.05 & 0.54 \\ -0.56 & -0.20 & -0.34 & -0.71 \\ -0.16 & -0.31 & 0.90 & -0.21 \\ -0.01 & -0.89 & -0.22 & 0.37 \end{bmatrix}\times \sum \left[ \begin{matrix} 5.54 & 0 \\ 0 & 1.24 \\ 0 & 0 \\ 0 & 0 \end{matrix} \right] \times V^{ T }\begin{bmatrix} -0.39 & -0.92 \\ 0.92 & -0.39 \end{bmatrix}^T$

$T=AV=U\sum=\begin{bmatrix} 2 & 4\\ 1 & 3\\ 0 & 1\\ -1 & 0.5 \end{bmatrix} \times\begin{bmatrix} -0.39 & -0.92\\ 0.92 & -0.39\\ \end{bmatrix} =\begin{bmatrix} -4.46 & 0.27\\ -3.15 & -0.25\\ -0.92 & -0.39\\ -0.06 & -1.15 \end{bmatrix}$

Example compact form

$A=\begin{bmatrix} 2 & 4\\ 1 & 3\\ 0 & 1\\ -1 & 0.5 \end{bmatrix}=U\begin{bmatrix} -0.80 & 0.22\\ 0.56 & -0.20\\ -0.16 & -0.31\\ -0.01 & -0.89 \end{bmatrix}\times \sum\begin{bmatrix} 5.54 & 0\\ 0 & 1.24\\ \end{bmatrix}\times V^{T}\begin{bmatrix} -0.39 & -0.92\\ 0.92 & -0.39\\ \end{bmatrix}^{T}$

• If A is to size m$\times$n, then U is m$\times$n, V is n$\times$n and $\sum$ is n$\times$n matrix

• Works because there at most n non zero singular values in $\sum$

Dimensionality reduction using SVD

$A=U\sum V^{T}={\substack{\sum_{k=1}^{n}}}(u_{i}\sigma_{ii}v_{i}^{T})$

• Use only k dimensions

• Retain first k columns for U and V and first k values for $\sum$

• first k columns of V give the basis vectors in reduced space

• Best rank k approximation in terms of sum squared error

$A\approx \substack{\sum_{i=1}^{k}(v_{i}}\sigma_{ii}v_{i}^{T})=U_{1....k}\sum_{1...k}V^{T}_{1...K}$

$T\approx A V_{1...k}$

Example of dimensionality reduction (k=1)

$A\approx A_{k}= U_{k}\begin{bmatrix} -0.80 & 0 & 0 & 0\\ -0.56 & 0 & 0 0\\ -0.16 & 0 & 0 & 0\\ -0.01 & 0 & 0 & 0 \end{bmatrix} \times \sum_{k}\begin{bmatrix} 5.54 & 0\\ 0 & 0\\ 0 & 0\\ 0 & 0 \end{bmatrix}\times V_{k}^{T}\begin{bmatrix} -0.39 & 0\\ -0.92 & 0\\ \end{bmatrix}^{T}=\begin{bmatrix} 1.74 & 4.10\\ 1.23 & 2.90\\ 0.35 & 0.84\\ 0.02 & 0.06 \end{bmatrix}$

$T\approx T_{k}=AV_{k}=U_{k}\sum_{k}=\begin{bmatrix} 2 & 4\\ 1 & 3\\ 0 & 1\\ -1 & 0.5 \end{bmatrix}\times\begin{bmatrix} -0.39 & 0\\ 0.92 & 0\\ \end{bmatrix}=\begin{bmatrix} -4.46 & 0\\ -3.15 & 0\\ -0.92 & 0\\ -0.06 & 0 \end{bmatrix}$

Compact way of dimensionality reduction (k=1)

$A\approx A_{K}=U_{k}\begin{bmatrix} -0.80\\ -0.56\\ -0.16\\ -0.01 \end{bmatrix}\times \sum_{k}[5.54]\times V_{k}^{T}\begin{bmatrix} -0.39\\ 0.92 \end{bmatrix} ^{T}=\begin{bmatrix} 1.74 & 4.10\\ 1.23 & 2.90\\ 0.35 & 0.84\\ 0.02 & 0.06 \end{bmatrix}$

$T\approx T_{K}=AV_{k}=U_{k}\sum_{k}=\begin{bmatrix} 2 & 4\\ 1 & 3\\ 0 & 1\\ -1 & 0.5 \end{bmatrix}\times\begin{bmatrix} -0.39 \\ 0.92 \end{bmatrix}=\begin{bmatrix} -4.46\\ -3.15\\ -0.92\\ -0.06 \end{bmatrix}$

Dimensions to retain:

• Concept of energy of a dataset
• Total energy is sum of squares of singular value a

$E=\substack{\sum_{i=1}^{n}}\sigma_{ii}^{2}$

• Retain k dimesnions such that p% of the energy is retained

$E_{k}=\substack{\sum_{i=1}^{k}}\sigma_{ii}^{2}$

$E_{k}/E\geq p$

• Generally, p is between 80% to 95%

• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any

• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to define datasets.
• We may not know how many principal components to keep- in practice, some thumb rules are applied. • It is helpful in noise removal also and as result of that we can improve the performance of models.
0