Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NMF optimization & documentation (#2361)
* Implement first version of the algorithm * Fix variable names * Add support for streaming corpora * Add benchmark * Fix bugs, introduce batches, add images to the benchmark notebook * Update notebook * Improve model 1. Improved performance ~4x 2. LDA-like API 3. BOW compatibility * Add show topics, change API * Add more LDA-like API * Fix logger name * Add more LDA API * Remove redundant method * Remove commented out lines * Fix flakes * Cythonize * Dramatically improve performance * Add parameters, improve accuracy and speed * Remove redundant W copying * Fix random seed again * Optimize E/M step * Add an eval_every option, use softmax for normalization * Fixes * Improve notebook examples a bit * Fix eval_every * Return outliers * Optimizations * Experimenting with loss * Fix PEP8 * Return nmf import * Revert "Return nmf import" This reverts commit 1c3a064 * Fix * Fix minimum_probability & info -> debug logs * Compute metrics * Count error on-the-fly * Speed optimizations, changed error functions * Beat LDA * Outperform sklearn in speed (WTF) * Remove redundant arg * Add Olivietti faces * Remove redundant code * Add Topics * Make it pretty * Fix wrapper * Save corpus & dict, minor fixes * Add RandomCorpus * Dense -> sparse * First doc2dense * Fix csc again * Fix len * Experimenting * Revert "Experimenting" This reverts commit 7a3ef47. * Fix evaluation * Sparse speedup * Improve performance * Divide A and B again * Fix A and B computation bug * Sparsify W init * Experimenting * New norm * Sparse threshold -> sparse coefficient * Optimize residuals computation * Fix residuals bug * W speedup * Experiment * Revert changes a bit * Fix corpus * Fix init error| * Resolve conflict * Fix corpus iteration issue * Switch to numpy algos * Train on wikipedia * Sparse coef -> density. More stable way to sparsify W matrix * Return old sparse algo * Max * Optimizations * Fix A and B computation * Fix A and B normalization * Add random_state * Infer id2word * Fix tests * Document __init__ * Document whole nmf * Remove unnecessary comments * Add tutorial notebook * Document __init__ * Fix flake version * Fix flake warning * Remove comments, reverse parallelization order * Add NMF's cython extension to setup.py * Fix imports, add solve_r function * Remove comments * Add docstrings * Common corpus and common dictionary * Remove redundant test * Add signature flag * Add files to manifest * Fix flake8 * Fix atol value * Implement top topics * Add rst files * Fix appveyor issue * Fix cython error * Fix fmax/fmin not being on win-python27 * Add word transformation test * Improve readability of residuals computation * Fix tests * A few fixes * Blank line at the end of each docstring * Add blank line * Add the paper reference * Fix long line * Add log_perplexity * Add NMF and LDA comparison table * Change the sign of log perplexity * Add Sklearn NMF comparison * Merge sklearn and tm tables * Add F1 * Remove _solve_r * Merge tutorial and benchmark * Identation's back * Optimize optimizers * Remove unnecessary pic * Optimize memory consumption * Add docstring * Optimize get_topic_words * Fix tests * Fix flake8 * Add missing test * Code review fixes * n_tokens -> num_tokens * [skip ci] Add explicit normalize parameter * [skip ci] Add explicit normalize parameter[2] * [skip ci] Update tutorial notebook * [skip ci] [WIP] Update wikipedia notebook * Add more description and metrics * [skip ci] Fix log_probabiliy * Multiple format fixes in notebook, outputs cleared til tomorrow * Train on full corpus * [skip ci] Remove disclaimer * Add RAM usage stats * Native 20-newsgroups and additional text * Truncate outputs * Fix last cell formatting * [skip ci] Change model hyperparameters back * [skip ci] Add module docstring * [skip ci] Massive speedups Replaced some sparse matrices with dense. * Checkout nmf_wikipedia from develop * Fix tests * Fix corpus description * Add components permutation to coordinate descent * Fix tests * Fix dictionary highlight * Fix tests again * Remove r, it's not used for the time * Deprecate use_r * [skip ci] Rearrange params * [skip ci] Add disclaimer about `r` * Fix `normalize` and `minimum_probability` docstring * Remove unused params * Add csc support * Add examples to the docstring * Update tutorial notebook * [skip ci] Update tutorial again * [skip ci] fix PEP * cast explicitly permutations to int32 * [skip ci] Fix a typo * [skip ci] Remove clip and fix error count in update * [skip ci] Fix error computation * [skip ci] Fix error counting again * [skip ci] Remove redundant imports * Fix grouper for csc matrices * Fix module docstring * Fix training corpus description * Fix pep8 * Fix flake8 for real * Normalize, sparsity and dictionary fixes * Updated module docstring in the notebook
- Loading branch information