The singular value decomposition (SVD) factorizes a matrix where
are unitary and
is diagonal with nonnegative entries. Unlike the eigenvalue decomposition, every (real or complex) matrix is guaranteed to have a SVD. Geometrically, the SVD implies that every linear transformation is equivalent to a rotation, a scaling, then another rotation.
In data analysis, the SVD can be computed to perform principal component analysis (PCA), where an ellipsoid is used to “fit” a data matrix. In other words, once the SVD is computed, a partial SVD, can be used to approximate the matrix, where
and
are the left and right singular vectors, and
are the singular values in decreasing order. Equivalently, a partial SVD is obtained by replacing the smaller singular values with zero in the factorization. Intuitively, with the length of each ellipsoid’s component equal to its corresponding singular value, keeping only the components with non-trivial length makes the ellipsoid a good fit to the data. Formally, the Eckart-Young-Mirsky Theorem states that a partial SVD provides the best approximation to
among all low-rank matrices.







The theorem has a short proof for either norm, but here we describe a slightly longer proof using spectral method. Before proceeding, since we will work with singular values of different matrices, we will write to denote the
-th singular value of a matrix. We first state the following lemma, which provides an estimate on how much a low-rank perturbation to a matrix can affect its singular values.






The theorem is then an easy application of the above lemma.

which by the lemma (setting ) is at most
.
For Frobenius norm, recall that the square of the Frobenius norm of a matrix is equal to the sum of the squares of its singular values. Applying the lemma above, we have
finishing the proof.
The lemma is almost a re-statement of one of Weyl’s inequalities:
To see this, let ,
, and
, and note that
is zero. Instead of invoking this inequality directly, we will prove the lemma from first principle.


Let







Bounding from below and using the facts that is orthogonal to
when
, Pythagorean Theorem, decreasing magnitude of singular values, and
lies in the orthonormal set
, respectively, we obtain
Combining both sides yields .
Now we prove the general case. Using the case, we know that for any matrix
with rank at most
,
Writing , we can write
Since
(expressing
as the inner product
and applying Cauchy-Schwarz, where
and
are the top left and right singular vectors of
, respectively) , we have that
Since the above inequality holds for arbitrary with rank at most
, we can choose
and
. Then we can conclude that
completing the proof since is zero.