Intuitive Understanding of Single Value Decomposition (SVD)
Single Value Decomposition (SVD) is a method to factorize or decompose a matrix. Let’s say we view $A$, a matrix whose dimension is $n\times{m}$, as a transform or mapping from $x$ which belongs to $R^n$ to $y$ which belongs to $R^m$, i.e.:
$$ y = Ax $$SVD states:
$$ A = U\Sigma{V}^T $$$U$ and $V^T$ orthogonal, unitary matrices can be viewed as two rotational operands, and the $\Sigma$ can be viewed as scaling factors along each eigenvector in $V^T$. But why two rotations and one scaling? In practice, $x,y$ are usually associated with some meaningful properties—for example, features and items. We are really interested in the relationship between $x$ and $y$, defined by $A$. SVD is used to simplify/clarify the relationship. For example, $V^T$ can be viewed as the projections to $x$ space, sorted by eigenvalues given in $\Sigma$. More specifically, if we examine one of the eigenvector-value-vector trio in SVD:
$$ U\Sigma{V}^T = \begin{bmatrix} u_1 & \dots & u_n \\ \end{bmatrix} \begin{bmatrix} \lambda_{1} & & \\ & \ddots & \\ & & \lambda_{m} \\ 0 & \dots & 0 \\ 0 & \dots & 0 \end{bmatrix} \begin{bmatrix} v_1^T \\ \vdots \\ v_m^T \\ \end{bmatrix} $$ $$ = \begin{bmatrix} \lambda_{1}u_{1} & \dots & \lambda_{m}u_{m} \end{bmatrix} \begin{bmatrix} v_1^T \\ \dots \\ v_m^T \\ \end{bmatrix} =\Sigma(u_{i}\lambda_{i}v_i^T) $$$u_{i}\lambda_{i}v_i^T$ here can be view as a mode or pattern in $A$, with $\lambda_1$ describes how important it is. $u_{i}$ describes its distribution along $R^n$ and $v_i$ for $R^m$. The concept of mode is borrowed from vibration analysis. If you hit a metal bar with a beam, it will vibrate. It may look messy, but the vibrations can be decomposed into a superposition of different modes. The eigenvalues and eigenvectors for each mode are associated with how fast it vibrates and how the displacement looks.
Now let’s check a concrete/everyday example; lets say we have a $A$ that records your $m$ friends’{Tim, Jeff,Steve} preferences for $n${1,…,5} resturants, on the scale of $[0,100]$:
$$ A = \begin{bmatrix} 90 & 10 & 10\\ 90 & 10 & 5\\ 20 & 20 & 5\\ 10 & 20 & 5\\ 10 & 30 & 5 \end{bmatrix} $$It may look like an extreme example, but let’s bear it for a moment and look at the SVD results, particularly $u_{1}\lambda_{1}v_1^T$. We can see that $u_{1}[0,1]$ has a larger weight than the other restaurants, and $v_1[0]$ is more significant in $v_1$. So the mode/pattern we can conclude here is Tim likes the first two restaurants . It’s evident, of course; we can even tell by just staring at the original table $A$.
$$ \sigma_1 = 131.7288, u_{1} = \begin{bmatrix} -0.6870 \\ -0.6870 \\ -0.1781 \\ -0.1035 \\ -0.1167 \\ \end{bmatrix}, v_{1} = \begin{bmatrix} -0.9825 \\ -0.1736 \\ -0.0673 \\ \end{bmatrix} $$However, it is more complicated in real-life data, and we can use SVD to help us draw a similar conclusion. For example, a specific group of people likes a particular group of restaurants, and it turns out they are vegans, salad bars. Now lets check the rest of the eigenvalues are 38 and 1.4, that means $\lambda_1$ accounts for 77% of the eigenvalues. If we just calculate $u_{1}\lambda_{1}v_1^T$, it turns out to be the following, which is already very close to the original table/dataset. As you can imagine, SVD can be used for data compression.
$$ u_{i}\lambda_{i}v_i^T = A(\lambda_1(77\%)) = \begin{bmatrix} 88.9 & 15 & 6\\ 88.9 & 15 & 6\\ 23 & 4 & 1.5\\ 23 & 4 & 0\\ 15 & 2 & 1.5 \end{bmatrix} $$Matlab Code
A = [90,10,5;
90,10,5;
20,20,5;
10,20,5;
10,30,5];
[U,S,V] = svd(A)
k = 1;
U1 = U(:,k)
V1 = V(:,k)
U(:,k)*S(k,k)*V(:,k)'