- The NNMF library is found in
code/nnmf.py
- A Jupyter notebook containing a comparison of the NNMF implementation to
sklearn.decomposition.NMF
is found incode/experiment.ipynb
- Unit tests are contained in
code/tests.py
- The TeX files for the math questions are in the
math/
folder - Continuous integration is in
.github/workflows/
. On each push and pull request we perform linting withflake8
and run the unit tests incode/tests.py
.
Let
Our implementation follows three steps:
- Initialization - we initialize
$W$ and$H$ via$k$ -means clustering.$W$ is initialized to the matrix consisting of the$k$ centroids, and$H$ is initialized to the indicator matrix which assigns each vector to a cluster - Update - we update
$W$ and$H$ by applying non-negative least squares (NLS) in an alternating manner. That is, we update the rows of$W$ by applying NLS to$H$ and$A$ , then update the columns of$H$ by applying NLS to$W$ and$A$ . - Evalutation - we evaluate the solution with the Frobenius norm
$||A - WH||_F$ .
In this implementation we rely on sklearn.cluster.KMeans
for our initialization. This package assumes the
The library is contained in code/nnmf.py
and consists of the five functions defined here.
Function | Input | Output | Description |
---|---|---|---|
initialize | A: numpy.ndarray k: int |
(numpy.ndarray, numpy.ndarray) |
Returns the initial factorization. |
update | A: numpy.ndarray H: numpy.ndarray W: numpy.ndarray |
(numpy.ndarray, numpy.ndarray) |
Performs one step of the alternating NLS update and return (H, W) |
fnorm | m: numpy.ndarray |
float |
Returns the Frobenius norm of a matrix. |
loss | A: numpy.ndarray H: numpy.ndarray W: numpy.ndarray |
float |
Returns fnorm(A - H@W) . |
nnmf | A: numpy.ndarray k: int max_iter: Optional[int] = 1000 tol: Optional[float] = 0.001 |
(numpy.ndarray, numpy.ndarray) |
Performs the NNMS algorithm. Terminate after max_iter iterations or after achieving an error tolerance of tol . Returns (H, W) |