The singular value decomposition (SVD) factorizes a matrix where are unitary and is diagonal with nonnegative entries. Unlike the eigenvalue decomposition, every (real or complex) matrix is guaranteed to have a SVD. Geometrically, the SVD implies that every linear transformation is equivalent to a rotation, a scaling, then another rotation.
In data analysis, the SVD can be computed to perform principal component analysis (PCA), where an ellipsoid is used to “fit” a data matrix. In other words, once the SVD is computed, a partial SVD, can be used to approximate the matrix, where and are the left and right singular vectors, and are the singular values in decreasing order. Equivalently, a partial SVD is obtained by replacing the smaller singular values with zero in the factorization. Intuitively, with the length of each ellipsoid’s component equal to its corresponding singular value, keeping only the components with non-trivial length makes the ellipsoid a good fit to the data. Formally, the Eckart-Young-Mirsky Theorem states that a partial SVD provides the best approximation to among all low-rank matrices.
The theorem has a short proof for either norm, but here we describe a slightly longer proof using spectral method. Before proceeding, since we will work with singular values of different matrices, we will write to denote the -th singular value of a matrix. We first state the following lemma, which provides an estimate on how much a low-rank perturbation to a matrix can affect its singular values.
The theorem is then an easy application of the above lemma.
which by the lemma (setting ) is at most .
For Frobenius norm, recall that the square of the Frobenius norm of a matrix is equal to the sum of the squares of its singular values. Applying the lemma above, we have
finishing the proof.
The lemma is almost a re-statement of one of Weyl’s inequalities:
To see this, let , , and , and note that is zero. Instead of invoking this inequality directly, we will prove the lemma from first principle.
Let be the vector space spanned by the top right singular vectors , and be the null space of . By dimension argument, there must exist a non-zero vector in . Then bounding from above, we have
Bounding from below and using the facts that is orthogonal to when , Pythagorean Theorem, decreasing magnitude of singular values, and lies in the orthonormal set , respectively, we obtain
Combining both sides yields .
Now we prove the general case. Using the case, we know that for any matrix with rank at most ,
Writing , we can write Since (expressing as the inner product and applying Cauchy-Schwarz, where and are the top left and right singular vectors of , respectively) , we have that
Since the above inequality holds for arbitrary with rank at most , we can choose and . Then we can conclude that
completing the proof since is zero.