Search⌘ K
AI Features

Singular Value Decomposition: SVD

Explore the concept of Singular Value Decomposition SVD, its role in matrix factorization, and how to compute it using Python. Understand orthogonal diagonalization, the significance of singular values and vectors, and the economic SVD approach to handle zeros in the singular values. This lesson helps you grasp the mathematical foundations and practical implementations essential for data science applications.

Definition

The Singular Value Decomposition (SVD) of an n×dn\times d matrix, AA, is the factorization of AA, multiplied by the product of three matrices: A=UΣVTA = U\Sigma V^T, where the matrices Un×nU_{n\times n} and Vd×dTV^T_{d\times d} are orthogonal and the matrix, Σn×d\Sigma_{n\times d}, is a generalized diagonalGeneralizedDiagonal with non-negative real entries.

SVD in numpy

We can compute SVD of a matrix, A, using U,S,Vt = numpy.linalg.svd(A), where S is an array containing diagonal entries of Σ\Sigma and Vt is VTV^T. The diag function in the implementation below converts the S array into the generalized diagonal matrix, Σ\Sigma. We’ve used the D variable to represent Σ\Sigma in the code.

Python 3.8
import numpy as np
from numpy.linalg import svd
# Function to construct generalized diagonal matrix
def diag(S, m, n):
D = np.zeros((m, n))
k = len(S)
D[:k, :k]=np.diag(S)
return D
m, n = 2, 3
A = np.random.rand(m, n)
# Compute SVD using numpy
U, S, Vt = svd(A)
# Compute Frobenius norm
frobNorm = np.linalg.norm(U.dot(diag(S, m, n)).dot(Vt)-A)
R = [0, frobNorm]
print(f'The Frobenius norm of UDV^T - A is {R[frobNorm>1e-10]}')
print(f'U^TU\n{abs(np.round(U.T.dot(U), 1))}')
print(f'V^TV\n{abs(np.round(Vt.dot(Vt.T), 1))}')

Singular values and vectors

  • The diagonal values of Σ\Sigma are the singular values of AA.
  • The columns of UU are the left singular vectors of AA.
  • The rows of VTV^T are the right singular vectors of AA.

We assume that the singular values in Σ\Sigma are in descending order. That’s how numpy.linalg.svd returns the values in S. We can arrange the values in any order if we’re provided the corresponding arrangement of the columns of UU and the rows of VTV^T.

Economic SVD

When Σ\Sigma has some zeros in the diagonal, we can drop them. We can also drop the corresponding columns in UU and the rows in VTV^T. If Σ^\hat \Sigma is the diagonal matrix with only non-zero singular values, and U^\hat U and V^\hat V are the matrices of the corresponding columns and rows in UU and VV, respectively, then

A=UΣVT=U^Σ^V^TA=U\Sigma V^T=\hat U \hat \Sigma \hat V^T

A=U^Σ^V^TA=\hat U \hat \Sigma \hat V^T is the economic SVD of AA, where the matrices, U^\hat U and V^\hat V, may not be square and the matrix, Σ^\hat \Sigma, is diagonal. The figure below illustrates the relationship between SVD and economic SVD.

Implementation of economic SVD

The economicSVD function below implements the economic SVD for a matrix, A. It’s worth noticing that the number of non-zero singular values of A is equal to the rank of A.

Python 3.8
import numpy as np
from numpy.linalg import svd, matrix_rank as rank
def diag(S, m, n):
D = np.zeros((m, n))
k = len(S)
D[:k, :k]=np.diag(S)
return D
def economicSVD(A):
U, S, Vt = svd(A)
idx = np.where(S>1e-10)[0]
U_hat, S_hat, Vt_hat = U[:, idx], S[idx], Vt[idx, :]
return U_hat, np.diag(S_hat), Vt_hat
m, n = 8, 5
A = np.random.rand(2, n)
A = np.vstack((A, A, A, A)) # to keep the rank smaller we are repeating rows
U, S, Vt = svd(A)
D = diag(S, m, n)
U_hat, D_hat, Vt_hat = economicSVD(A)
print(f'Shapes of U, D and Vt are :', U.shape, D.shape, Vt.shape)
print(f'Shapes of U_hat, D_hat and Vt_hat are :', U_hat.shape, D_hat.shape, Vt_hat.shape)
frobNorm = np.linalg.norm(U.dot(D).dot(Vt)-U_hat.dot(D_hat).dot(Vt_hat))
R = [0, frobNorm]
print(f'The Frobenius norm of UDV^T - hats(UDV^T) is {R[frobNorm>1e-10]}\nRank(A)={rank(A)}')

SVD: the heart of linear algebra!

Consider that the rank of an n×dn\times d matrix, AA, is rr, as shown in the figure above.

  1. The first rr columns of UU form an orthonormal basis of column-space of AA, that is, C(U^)=C(A)C(\hat U)=C(A).
Python 3.8
import numpy as np
from numpy.linalg import svd, matrix_rank as rank
A = np.random.rand(2, 3)
# Apply SVD
U, S, Vt = svd(A)
r = rank(A)
U_hat = U[:, :r]
# Construct AU
AU = np.hstack((A, U_hat))
if rank(A) == rank(AU):
print('C(A)=C(U_hat)')
  1. The first rr rows of VTV^T form an orthonormal basis of row space of AA, that is, R(V^T)=R(A)R(\hat V^T)=R(A).
Python 3.8
import numpy as np
from numpy.linalg import svd, matrix_rank as rank
A = np.random.rand(2, 3)
# Apply SVD
U, S, Vt = svd(A)
r = rank(A)
Vt_hat = Vt[:r, :]
# Construct AVt
AVt = np.vstack((A, Vt_hat))
if rank(A) == rank(AVt):
print('R(A)=R(Vt_hat)')
  1. rank(A)=rank(Σ)=rank(Σ^)=rrank(A)=rank(\Sigma)=rank(\hat{\Sigma})=r
  2. The last nrn-r columns of UU form an orthonormal basis of the null space of ATA^T, also known as the left null space of AA.
Python 3.8
import numpy as np
from numpy.linalg import svd, matrix_rank as rank
A = np.random.rand(2, 3)
# Apply SVD
U, S, Vt = svd(A)
r = rank(A)
# Compute null space
U_n = U[:, r:]
zeroMatrix = A.T.dot(U_n)
print(f'{np.round(zeroMatrix, 2)}')
  1. The last drd-r rows of VTV^T form an orthonormal basis of the null space of AA.
Python 3.8
import numpy as np
from numpy.linalg import svd, matrix_rank as rank
A = np.random.rand(2, 3)
# Apply SVD
U, S, Vt = svd(A)
r = rank(A)
# Compute null space of A
Vt_n = Vt[r:, :]
zeroMatrix = A.dot(Vt_n.T)
print(f'{np.round(zeroMatrix, 2)}')

Note: The SVD of a matrix tells us a lot about the matrix.

Derivation of SVD

To derive the matrices, U,ΣU,\Sigma, and VV for an n×dn \times d matrix, AA, we first note that the matrices, ATAA^TA and AATAA^T, are both symmetric. This is because (AB)T=BTAT(AB)^T=B^TA^T and (AT)T=A(A^T)^T=A. Furthermore, the eigenvalues of ATAA^TA and AATAA^T are non-negative. When λ\lambda is an eigenvalue of ATAA^TA, corresponding to an eigenvector x\bold x, we have:

ATAx=λx    xTATAx=λxTx    λ=Ax2x20A^TA\bold{x}=\lambda \bold{x}\implies \bold{x}^TA^TA\bold{x}=\lambda \bold{x}^T\bold{x} \implies \lambda=\frac{\|A\bold{x}\|^2}{\|\bold{x}\|^2}\ge0

The proof for AATAA^T is similar.

Note: A symmetric matrix with non-negative eigenvalues is called a positive semi-definite matrix.

We need to find matrices U,ΣU,\Sigma, and VV, such that:

  1. UTU=UUT=IU^TU=UU^T=I.
  2. VTV=VVT=IV^TV=VV^T=I.
  3. Σ\Sigma is a generalized diagonal matrix, and hence, ΣTΣ\Sigma^T\Sigma and ΣΣT\Sigma\Sigma^T are diagonal matrices with the same diagonal entries.
  4. A=UΣVTA=U\Sigma V^T.

If A=UΣVTA=U\Sigma V^T, then:

ATA=(UΣVT)T(UΣVT)=VΣTUTUΣVT=VΣTIΣVT=VΣTΣVT=VD1VT\begin{align*} A^TA&=(U\Sigma V^T)^T(U\Sigma V^T) \\ &=V\Sigma^TU^TU\Sigma V^T \\ &=V\Sigma^TI\Sigma V^T \\ &=V\Sigma^T\Sigma V^T \\ &=VD_1V^T \end{align*}

In the LaTeX shown above, VD1VTVD_1V^T is an orthogonal diagonalization of ATAA^TA. This means that VV is a matrix of orthonormal eigenvectors of ATAA^TA and D1D_1 is the diagonal matrix of corresponding eigenvalues. Similarly, we can derive that AAT=UD2UTAA^T=UD_2U^T, where UU is a matrix of orthonormal eigenvectors of AATAA^T and D2D_2 is the diagonal matrix of corresponding eigenvalues. Because the diagonal values of D1D_1 and D2D_2 are the same, it’s convenient to omit the subscript and call it DD. Finally, Σ\Sigma is a generalized diagonal containing the square roots of the diagonal entries of DD. Because both ATAA^TA and AATAA^T are positive semi-definite, all diagonal entries of DD are non-negative, thus giving real square roots.

SVD algorithm

Given a matrix AA.

  1. To compute the eigenvalues, DD, and eigenvectors, UU, of AATAA^T, we can use the code below:
D,U = np.linalg.eig(A.dot(A.T))
  1. We can then compute the eigenvectors, VV, of ATAA^TA by using the following code:
D,V = np.linalg.eig(A.T.dot(A))
  1. Finally, we make a generalized diagonal matrix, Σ\Sigma, with the square roots of DD, by using this code:
Sigma = diag(D,A.shape[0],A.shape[1])**0.5