For the first time, we turn out attention to the factorization of a non-square matrix. The technique we describe is widely used in regression and data analysis.
We assume A is an m×n matrix and m≥n. Everything we describe can be applied to AT in the n≥m case, to give full generality.
Let A be an m×n matrix.
The rank of A, denoted rank(A) is the number of linearly independent columns of A (equivalently the number of linearly independent rows).
The nullity of A denoted nullity(A) is given by n−rank(A). It gives the dimension of the null space of A, i.e. the size of the largest linearly independent set of vectors v∈Rn such that Av=0.
Note that rank(A)≤n.
The SVD will be based, first, on a factorzation of the n×n matrix ATA. So the following is of use:
It follows that (ATA)T=AT(AT)T=ATA. Similar calculations follow for AAT.
Let v∈Rn be such that Av=0. Then it is clear that ATAv=0. Now, if v∈Rn such that ATAv=0 we have 0=vTATAv=‖ This then implies that Av = 0. Since the two matrices share the same null vectors (or nullspace) the nullity must agree.
This follows directly from the definition of rank because the two matrices have the same number of columns: n - \mathrm{nullity}(A)= n - \mathrm{nullity}(A^TA).
See the proof that \displaystyle \|A\|_2 = [\rho(A^TA)]^{1/2} in Lecture 17.
Assume for v \neq 0, \lambda \neq 0, A^TAv = \lambda v. Then A A^T Av = \lambda A v. Let w = Av and we have AA^T w = \lambda w. So, if w\neq 0 then \lambda is an eigenvalue of AA^T and w cannob be zero as A^TAv \neq 0 \Rightarrow Av \neq 0. Now assume AA^Tw = \lambda w for \lambda\neq0 and w \neq 0. Then A^T AA^T w = \lambda A^T w. Let v = A^Tw and we have A^TA v = \lambda v and we can conclude that \lambda is an eigenvalue if v \neq 0. Again, if v = 0 then A^TAw = 0 which contradicts \lambda,w\neq 0. So, \lambda is an eigenvalue of A^TA.
The singular values \sigma_1,\sigma_2,\ldots,\sigma_n of an m \times n matrix A are the square roots of the eigenvalues of A^TA.
Note that this means that the rank of A is equal to the number of non-zero singular values.
A singular value decomposition of an m \times n real matrix A is a factorization of the form
\underbrace{A}_{m \times n} = \underbrace{U}_{m\times m, ~\text{orthogonal}}~~ \underbrace{\Sigma}_{m \times n, ~\text{diagonal}} ~~\underbrace{V^T}_{n \times n, ~\text{orthogonal}},where the diagonal matrix \Sigma contains the singular values
\Sigma = \begin{bmatrix} \sigma_1 & 0 & \cdots & 0\\ 0 & \sigma_2 & \ddots & \vdots\\ \vdots & \ddots & \ddots & 0\\ 0 & \cdots & 0 & \sigma_n \\ 0 & \cdots & \cdots & 0\\ \vdots & && \vdots\\ 0 & \cdots & \cdots & 0 \end{bmatrix}.Now, we must discuss the construction of the orthogonal matrices U and V. Consider
A = U \Sigma V^T,A^T A = V \Sigma^T U^T U \Sigma V^T = V \Sigma^T \Sigma V^T.
To compute the product \Sigma^T \Sigma, let D be the (diagonal) matrix of eigenvalues of A^TA and we write
\Sigma = \begin{bmatrix} D^{1/2} \\ 0 \end{bmatrix},where the 0 stands for a m-n \times n block of zeros. Then
\Sigma^T \Sigma = \begin{bmatrix} D^{1/2} & 0 \end{bmatrix} \begin{bmatrix} D^{1/2} \\ 0 \end{bmatrix} = D.Therefore A^TA = V D V^T and we conclude that V is the matrix of (normalized, orthonormal) eigenvectors of the symmetric matrix A^TA.
Perhaps the most difficult step in computing the SVD is finding the matrix U. We write the equation for the SVD, assuming we know V and \Sigma:
AV = U \Sigma.A\begin{bmatrix} v_1 & v_2& \ldots& v_n\end{bmatrix} = \begin{bmatrix} u_1 & u_2 & \ldots &u_n &u_{n+1} & \ldots& u_m\end{bmatrix} \begin{bmatrix} D^{1/2} \\ 0 \end{bmatrix}.The first n columns, give the equations:
Av_j = \sigma_j u_j, \quad j =1,2,\ldots n.We assume the first k singular values are non-zero:
u_j = \sigma_j^{-1} Av_j , \quad j =1,2,\ldots k.The vectors u_{k+1}, \ldots, u_m are arbitrary, with the exception of the fact that we want U to be an orthogonal matrix. So, we can find u_{k+1}, \ldots, u_m via the Gram-Schmidt process.
For 1 \leq i \neq j \leq k we check
u_i^Tu_j = \frac{1}{\sigma_i\sigma_j} v_i^T A^TA v_j = \frac{\sigma_j^2}{\sigma_i\sigma_j} v_i^T v_j = 0,because v_i and v_j are orthonormal eigenvectors (v_j with eigenvalue \sigma_j^2. Normality, \|u_i\|_2 =1 also follows if one takes i = j in the above calculation, assuming, of course, that v_i is normalized.
Find a singular value decomposition of
A = \begin{bmatrix} 1 & -1 \\ 2 & 0 \\ 1 & 1 \end{bmatrix}.