Implementation of non-negative matrix factorization algorithms

The NNMF library is found in code/nnmf.py
A Jupyter notebook containing a comparison of the NNMF implementation to sklearn.decomposition.NMF is found in code/experiment.ipynb
Unit tests are contained in code/tests.py
The TeX files for the math questions are in the math/ folder
Continuous integration is in .github/workflows/. On each push and pull request we perform linting with flake8 and run the unit tests in code/tests.py.

The NNMF library

Let $A$ be a nonnegative $m \times n$ matrix where the $n$ column vectors are viewed at data points with $m$ features. The goal is to factor $A$ into two nonnegative matrices $A \approx WH$ where the dimensions of $W$ and $H$ are $m \times k$ and $k \times n$. $W$ is the matrix of centroids, an $H$ is the coefficient matrix.

Our implementation follows three steps:

Initialization - we initialize $W$ and $H$ via $k$-means clustering. $W$ is initialized to the matrix consisting of the $k$ centroids, and $H$ is initialized to the indicator matrix which assigns each vector to a cluster
Update - we update $W$ and $H$ by applying non-negative least squares (NLS) in an alternating manner. That is, we update the rows of $W$ by applying NLS to $H$ and $A$, then update the columns of $H$ by applying NLS to $W$ and $A$.
Evalutation - we evaluate the solution with the Frobenius norm $||A - WH||_F$.

A note on notation

In this implementation we rely on sklearn.cluster.KMeans for our initialization. This package assumes the $m \times n$ matrix consists of $m$ data points with $n$ features. This is the transpose of the setup given in the pre-interview task. As a result, our implementation formulates the problem as $A^T \approx H^T W^T$.

Documentation

The library is contained in code/nnmf.py and consists of the five functions defined here.

Function	Input	Output	Description
initialize	`A: numpy.ndarray` `k: int`	`(numpy.ndarray, numpy.ndarray)`	Returns the initial factorization.
update	`A: numpy.ndarray` `H: numpy.ndarray` `W: numpy.ndarray`	`(numpy.ndarray, numpy.ndarray)`	Performs one step of the alternating NLS update and return `(H, W)`
fnorm	`m: numpy.ndarray`	`float`	Returns the Frobenius norm of a matrix.
loss	`A: numpy.ndarray` `H: numpy.ndarray` `W: numpy.ndarray`	`float`	Returns `fnorm(A - H@W)`.
nnmf	`A: numpy.ndarray` `k: int` `max_iter: Optional[int] = 1000` `tol: Optional[float] = 0.001`	`(numpy.ndarray, numpy.ndarray)`	Performs the NNMS algorithm. Terminate after `max_iter` iterations or after achieving an error tolerance of `tol`. Returns `(H, W)`

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
code		code
math		math
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of non-negative matrix factorization algorithms

The NNMF library

A note on notation

Documentation

About

Releases

Packages

Languages

willjmax/nonnegative-matrix-factorization

Folders and files

Latest commit

History

Repository files navigation

Implementation of non-negative matrix factorization algorithms

The NNMF library

A note on notation

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages