NMF optimization & documentation (#2361) · piskvorky/gensim@366d8ae

Commit

NMF optimization & documentation (#2361)

* Implement first version of the algorithm

* Fix variable names

* Add support for streaming corpora

* Add benchmark

* Fix bugs, introduce batches, add images to the benchmark notebook

* Update notebook

* Improve model

1. Improved performance ~4x
2. LDA-like API
3. BOW compatibility

* Add show topics, change API

* Add more LDA-like API

* Fix logger name

* Add more LDA API

* Remove redundant method

* Remove commented out lines

* Fix flakes

* Cythonize

* Dramatically improve performance

* Add parameters, improve accuracy and speed

* Remove redundant W copying

* Fix random seed again

* Optimize E/M step

* Add an eval_every option, use softmax for normalization

* Fixes

* Improve notebook examples a bit

* Fix eval_every

* Return outliers

* Optimizations

* Experimenting with loss

* Fix PEP8

* Return nmf import

* Revert "Return nmf import"

This reverts commit 1c3a064

* Fix

* Fix minimum_probability & info -> debug logs

* Compute metrics

* Count error on-the-fly

* Speed optimizations, changed error functions

* Beat LDA

* Outperform sklearn in speed (WTF)

* Remove redundant arg

* Add Olivietti faces

* Remove redundant code

* Add Topics

* Make it pretty

* Fix wrapper

* Save corpus & dict, minor fixes

* Add RandomCorpus

* Dense -> sparse

* First doc2dense

* Fix csc again

* Fix len

* Experimenting

* Revert "Experimenting"

This reverts commit 7a3ef47.

* Fix evaluation

* Sparse speedup

* Improve performance

* Divide A and B again

* Fix A and B computation bug

* Sparsify W init

* Experimenting

* New norm

* Sparse threshold -> sparse coefficient

* Optimize residuals computation

* Fix residuals bug

* W speedup

* Experiment

* Revert changes a bit

* Fix corpus

* Fix init error|

* Resolve conflict

* Fix corpus iteration issue

* Switch to numpy algos

* Train on wikipedia

* Sparse coef -> density. More stable way to sparsify W matrix

* Return old sparse algo

* Max

* Optimizations

* Fix A and B computation

* Fix A and B normalization

* Add random_state

* Infer id2word

* Fix tests

* Document __init__

* Document whole nmf

* Remove unnecessary comments

* Add tutorial notebook

* Document __init__

* Fix flake version

* Fix flake warning

* Remove comments, reverse parallelization order

* Add NMF's cython extension to setup.py

* Fix imports, add solve_r function

* Remove comments

* Add docstrings

* Common corpus and common dictionary

* Remove redundant test

* Add signature flag

* Add files to manifest

* Fix flake8

* Fix atol value

* Implement top topics

* Add rst files

* Fix appveyor issue

* Fix cython error

* Fix fmax/fmin not being on win-python27

* Add word transformation test

* Improve readability of residuals computation

* Fix tests

* A few fixes

* Blank line at the end of each docstring

* Add blank line

* Add the paper reference

* Fix long line

* Add log_perplexity

* Add NMF and LDA comparison table

* Change the sign of log perplexity

* Add Sklearn NMF comparison

* Merge sklearn and tm tables

* Add F1

* Remove _solve_r

* Merge tutorial and benchmark

* Identation's back

* Optimize optimizers

* Remove unnecessary pic

* Optimize memory consumption

* Add docstring

* Optimize get_topic_words

* Fix tests

* Fix flake8

* Add missing test

* Code review fixes

* n_tokens -> num_tokens

* [skip ci] Add explicit normalize parameter

* [skip ci] Add explicit normalize parameter[2]

* [skip ci] Update tutorial notebook

* [skip ci] [WIP] Update wikipedia notebook

* Add more description and metrics

* [skip ci] Fix log_probabiliy

* Multiple format fixes in notebook, outputs cleared til tomorrow

* Train on full corpus

* [skip ci] Remove disclaimer

* Add RAM usage stats

* Native 20-newsgroups and additional text

* Truncate outputs

* Fix last cell formatting

* [skip ci] Change model hyperparameters back

* [skip ci] Add module docstring

* [skip ci] Massive speedups

Replaced some sparse matrices with dense.

* Checkout nmf_wikipedia from develop

* Fix tests

* Fix corpus description

* Add components permutation to coordinate descent

* Fix tests

* Fix dictionary highlight

* Fix tests again

* Remove r, it's not used for the time

* Deprecate use_r

* [skip ci] Rearrange params

* [skip ci] Add disclaimer about `r`

* Fix `normalize` and `minimum_probability` docstring

* Remove unused params

* Add csc support

* Add examples to the docstring

* Update tutorial notebook

* [skip ci] Update tutorial again

* [skip ci] fix PEP

* cast explicitly permutations to int32

* [skip ci] Fix a typo

* [skip ci] Remove clip and fix error count in update

* [skip ci] Fix error computation

* [skip ci] Fix error counting again

* [skip ci] Remove redundant imports

* Fix grouper for csc matrices

* Fix module docstring

* Fix training corpus description

* Fix pep8

* Fix flake8 for real

* Normalize, sparsity and dictionary fixes

* Updated module docstring in the notebook

Loading branch information

anotherbugmaster authored and menshikh-iv committed Jan 31, 2019

1 parent 949213a commit 366d8ae

0 comments on commit `366d8ae`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `366d8ae`

Commit

There are no files selected for viewing

0 comments on commit 366d8ae

0 comments on commit `366d8ae`