Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMF metrics and wikipedia #2371

Merged
merged 269 commits into from
Mar 20, 2019
Merged
Changes from 1 commit
Commits
Show all changes
269 commits
Select commit Hold shift + click to select a range
a154a6e
Fix random seed again
anotherbugmaster Jun 5, 2018
e82628d
Optimize E/M step
anotherbugmaster Jun 12, 2018
1ca33f8
Add an eval_every option, use softmax for normalization
anotherbugmaster Jun 13, 2018
f19e6ce
Fixes
anotherbugmaster Jun 13, 2018
583cb15
Improve notebook examples a bit
anotherbugmaster Jun 13, 2018
fe0ab0a
Fix eval_every
anotherbugmaster Jun 13, 2018
8e647a1
Return outliers
anotherbugmaster Jun 16, 2018
89cc803
Optimizations
anotherbugmaster Jun 16, 2018
bbd3099
Experimenting with loss
anotherbugmaster Jun 16, 2018
f71ad89
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Aug 14, 2018
936e629
Fix PEP8
anotherbugmaster Aug 14, 2018
1c3a064
Return nmf import
anotherbugmaster Aug 14, 2018
ce4b7ee
Revert "Return nmf import"
anotherbugmaster Aug 20, 2018
f8de1d9
Fix
anotherbugmaster Aug 27, 2018
df9b8c7
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Aug 27, 2018
d159779
Fix minimum_probability & info -> debug logs
anotherbugmaster Aug 27, 2018
3dcdedc
Compute metrics
anotherbugmaster Aug 27, 2018
f11f2e2
Count error on-the-fly
anotherbugmaster Aug 28, 2018
8216541
Speed optimizations, changed error functions
anotherbugmaster Aug 28, 2018
ee3a7c7
Beat LDA
anotherbugmaster Aug 28, 2018
a3315f2
Outperform sklearn in speed (WTF)
anotherbugmaster Aug 28, 2018
3a03ff9
Remove redundant arg
anotherbugmaster Aug 28, 2018
70619e1
Add Olivietti faces
anotherbugmaster Aug 28, 2018
8c47ce0
Remove redundant code
anotherbugmaster Aug 28, 2018
e291664
Add Topics
anotherbugmaster Aug 28, 2018
3302b92
Make it pretty
anotherbugmaster Aug 28, 2018
5616bd6
Fix wrapper
anotherbugmaster Aug 28, 2018
ed8f29f
Save corpus & dict, minor fixes
anotherbugmaster Aug 30, 2018
2117c90
Add RandomCorpus
anotherbugmaster Aug 31, 2018
950115d
Dense -> sparse
anotherbugmaster Aug 31, 2018
54993c6
First doc2dense
anotherbugmaster Aug 31, 2018
572dc6c
Fix csc again
anotherbugmaster Aug 31, 2018
d40d89f
Fix len
anotherbugmaster Aug 31, 2018
7a3ef47
Experimenting
anotherbugmaster Sep 12, 2018
f94de09
Revert "Experimenting"
anotherbugmaster Sep 12, 2018
9ed2167
Fix evaluation
anotherbugmaster Sep 12, 2018
ad9443f
Sparse speedup
anotherbugmaster Sep 23, 2018
1a04660
Improve performance
anotherbugmaster Sep 25, 2018
87981bf
Divide A and B again
anotherbugmaster Sep 25, 2018
0b314c7
Fix A and B computation bug
anotherbugmaster Sep 25, 2018
b024dd6
Sparsify W init
anotherbugmaster Sep 25, 2018
35d5406
Experimenting
anotherbugmaster Sep 25, 2018
74acb37
New norm
anotherbugmaster Sep 25, 2018
8b28675
Sparse threshold -> sparse coefficient
anotherbugmaster Sep 25, 2018
588ef6a
Optimize residuals computation
anotherbugmaster Sep 26, 2018
8f84758
Fix residuals bug
anotherbugmaster Sep 26, 2018
8a67c44
W speedup
anotherbugmaster Sep 26, 2018
560f2bf
Experiment
anotherbugmaster Sep 26, 2018
cac2590
Revert changes a bit
anotherbugmaster Sep 26, 2018
060ab28
Fix corpus
anotherbugmaster Sep 26, 2018
cde937f
Fix init error|
anotherbugmaster Sep 26, 2018
66b753f
Merge branch 'online_nmf' of github.com:anotherbugmaster/gensim into …
anotherbugmaster Sep 26, 2018
18dbb6b
Resolve conflict
anotherbugmaster Sep 26, 2018
4b49d26
Fix corpus iteration issue
anotherbugmaster Sep 26, 2018
9c6cbc6
Switch to numpy algos
anotherbugmaster Oct 7, 2018
b23d016
Merge upstream
anotherbugmaster Oct 7, 2018
74ba37d
Train on wikipedia
anotherbugmaster Oct 7, 2018
c943264
Sparse coef -> density. More stable way to sparsify W matrix
anotherbugmaster Oct 9, 2018
a489807
Merge branch 'online_nmf' of github.com:anotherbugmaster/gensim into …
anotherbugmaster Oct 9, 2018
a95e345
Return old sparse algo
anotherbugmaster Oct 9, 2018
0f90484
Max
anotherbugmaster Oct 9, 2018
6ae43e4
Optimizations
anotherbugmaster Oct 10, 2018
335170b
Fix A and B computation
anotherbugmaster Oct 10, 2018
4cc8f1b
Fix A and B normalization
anotherbugmaster Oct 10, 2018
5c6fe60
Add random_state
anotherbugmaster Oct 23, 2018
dd459a2
Infer id2word
anotherbugmaster Oct 23, 2018
5121d85
Fix tests
anotherbugmaster Nov 6, 2018
5f4018a
Document __init__
anotherbugmaster Nov 14, 2018
dbd8474
Document whole nmf
anotherbugmaster Nov 14, 2018
5904f10
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Nov 14, 2018
cd4b9b0
Remove unnecessary comments
anotherbugmaster Nov 14, 2018
53a02a9
Add tutorial notebook
anotherbugmaster Nov 14, 2018
937e340
Document __init__
anotherbugmaster Nov 20, 2018
26a87bd
Fix flake version
anotherbugmaster Nov 28, 2018
261c13a
Fix flake warning
anotherbugmaster Nov 28, 2018
0147afc
Remove comments, reverse parallelization order
anotherbugmaster Nov 28, 2018
1ece3c1
Add NMF's cython extension to setup.py
anotherbugmaster Nov 28, 2018
e6409fa
Fix imports, add solve_r function
anotherbugmaster Nov 28, 2018
0743624
Remove comments
anotherbugmaster Nov 28, 2018
fd8088b
Add docstrings
anotherbugmaster Nov 28, 2018
e4ba0de
Common corpus and common dictionary
anotherbugmaster Nov 28, 2018
8537eef
Remove redundant test
anotherbugmaster Nov 28, 2018
d2e8385
Add signature flag
anotherbugmaster Nov 28, 2018
b72bf39
Add files to manifest
anotherbugmaster Nov 28, 2018
ed080a3
Fix flake8
anotherbugmaster Nov 29, 2018
67f6e75
Fix atol value
anotherbugmaster Nov 29, 2018
ee4373d
Implement top topics
anotherbugmaster Nov 29, 2018
d01c88c
Add rst files
anotherbugmaster Dec 10, 2018
8111080
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Dec 11, 2018
3de3646
Fix appveyor issue
anotherbugmaster Dec 11, 2018
183ea2d
Fix cython error
anotherbugmaster Dec 11, 2018
d2ac199
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Dec 12, 2018
2d664c6
Fix fmax/fmin not being on win-python27
anotherbugmaster Dec 12, 2018
c9a3577
Add word transformation test
anotherbugmaster Dec 12, 2018
fd0de20
Improve readability of residuals computation
anotherbugmaster Dec 21, 2018
fa384f2
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Dec 21, 2018
a811c67
Fix tests
anotherbugmaster Dec 21, 2018
d063a4f
A few fixes
anotherbugmaster Dec 21, 2018
b8f5d79
Blank line at the end of each docstring
anotherbugmaster Dec 21, 2018
361d160
Add blank line
anotherbugmaster Dec 21, 2018
e214582
Add the paper reference
anotherbugmaster Dec 21, 2018
9527f39
Fix long line
anotherbugmaster Dec 21, 2018
e1e1168
Add log_perplexity
anotherbugmaster Dec 30, 2018
3bf5be3
Merge remote-tracking branch 'remotes/upstream/develop' into online_nmf
anotherbugmaster Jan 7, 2019
d1c6e3e
Add NMF and LDA comparison table
anotherbugmaster Jan 9, 2019
7927b6b
Change the sign of log perplexity
anotherbugmaster Jan 9, 2019
1c6517e
Add Sklearn NMF comparison
anotherbugmaster Jan 9, 2019
278fb05
Merge sklearn and tm tables
anotherbugmaster Jan 9, 2019
a330327
Add F1
anotherbugmaster Jan 10, 2019
7ba9b84
Remove _solve_r
anotherbugmaster Jan 10, 2019
a14bfd3
Merge tutorial and benchmark
anotherbugmaster Jan 10, 2019
d28aef3
Identation's back
anotherbugmaster Jan 10, 2019
83ec0f6
Optimize optimizers
anotherbugmaster Jan 10, 2019
d25332f
Remove unnecessary pic
anotherbugmaster Jan 10, 2019
0e711d9
Optimize memory consumption
anotherbugmaster Jan 10, 2019
cc3085c
Add docstring
anotherbugmaster Jan 10, 2019
b090b6b
Optimize get_topic_words
anotherbugmaster Jan 10, 2019
e05a1c6
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Jan 10, 2019
ba8ce1c
Fix tests
anotherbugmaster Jan 10, 2019
6d78f83
Fix flake8
anotherbugmaster Jan 10, 2019
b16c1dd
Add missing test
anotherbugmaster Jan 11, 2019
7c1e240
Code review fixes
anotherbugmaster Jan 11, 2019
667ae99
n_tokens -> num_tokens
anotherbugmaster Jan 11, 2019
251d5f9
[skip ci] Add explicit normalize parameter
anotherbugmaster Jan 11, 2019
7a3f358
[skip ci] Add explicit normalize parameter[2]
anotherbugmaster Jan 11, 2019
c663f33
[skip ci] Update tutorial notebook
anotherbugmaster Jan 11, 2019
8e15cd4
[skip ci] [WIP] Update wikipedia notebook
anotherbugmaster Jan 11, 2019
b16e108
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Jan 11, 2019
3c76171
Merge branch 'online_nmf' of github.com:anotherbugmaster/gensim into …
anotherbugmaster Jan 15, 2019
4941745
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Jan 15, 2019
c4d6ebd
Add more description and metrics
anotherbugmaster Jan 15, 2019
3b1195d
[skip ci] Fix log_probabiliy
anotherbugmaster Jan 15, 2019
5edec1b
Multiple format fixes in notebook, outputs cleared til tomorrow
anotherbugmaster Jan 15, 2019
33ce1a3
Merge remote-tracking branch 'upstream/develop' into online_nmf
menshikh-iv Jan 16, 2019
1806bf6
Train on full corpus
anotherbugmaster Jan 16, 2019
3b9b8ea
Merge branch 'online_nmf' of github.com:anotherbugmaster/gensim into …
anotherbugmaster Jan 16, 2019
3f1af1d
[skip ci] Remove disclaimer
anotherbugmaster Jan 16, 2019
38143a9
Add RAM usage stats
anotherbugmaster Jan 16, 2019
72a02db
Native 20-newsgroups and additional text
anotherbugmaster Jan 16, 2019
7cf80e1
Truncate outputs
anotherbugmaster Jan 17, 2019
72178c0
Merge remote-tracking branch 'upstream/develop' into online_nmf
anotherbugmaster Jan 17, 2019
467a2ad
Fix last cell formatting
anotherbugmaster Jan 17, 2019
e34b939
[skip ci] Change model hyperparameters back
anotherbugmaster Jan 17, 2019
08b74c4
Merge from upstream
anotherbugmaster Jan 19, 2019
a270557
[skip ci] Add module docstring
anotherbugmaster Jan 28, 2019
6e5b288
Merge remote-tracking branch 'upstream/develop' into nmf_docs
anotherbugmaster Jan 28, 2019
96bb9c9
Merge branch 'online_nmf' of github.com:anotherbugmaster/gensim into …
anotherbugmaster Jan 28, 2019
24c999f
[skip ci] Massive speedups
anotherbugmaster Jan 29, 2019
28b6fa5
Merge branch 'nmf_docs' into nmf_speedups
anotherbugmaster Jan 29, 2019
69aba02
Checkout nmf_wikipedia from develop
anotherbugmaster Jan 29, 2019
d7b29b0
Fix tests
anotherbugmaster Jan 29, 2019
76d4483
Fix corpus description
anotherbugmaster Jan 29, 2019
effa895
Add components permutation to coordinate descent
anotherbugmaster Jan 29, 2019
bfbdc78
Fix tests
anotherbugmaster Jan 29, 2019
f6a0a28
Fix dictionary highlight
anotherbugmaster Jan 29, 2019
377d3b6
Fix tests again
anotherbugmaster Jan 29, 2019
83ea5ad
Remove r, it's not used for the time
anotherbugmaster Jan 30, 2019
f07cafc
Deprecate use_r
anotherbugmaster Jan 30, 2019
273bd04
[skip ci] Rearrange params
anotherbugmaster Jan 30, 2019
d7a94d5
[skip ci] Add disclaimer about `r`
anotherbugmaster Jan 30, 2019
83bc01b
Fix `normalize` and `minimum_probability` docstring
anotherbugmaster Jan 30, 2019
be65d34
Remove unused params
anotherbugmaster Jan 30, 2019
001cc1b
Add csc support
anotherbugmaster Jan 30, 2019
b3ea8ba
Add examples to the docstring
anotherbugmaster Jan 30, 2019
4ae9626
Update tutorial notebook
anotherbugmaster Jan 30, 2019
7458fa5
[skip ci] Update tutorial again
anotherbugmaster Jan 30, 2019
e477cab
[skip ci] Merge remote-tracking branch 'upstream/develop' into nmf_sp…
menshikh-iv Jan 30, 2019
e873434
[skip ci] fix PEP
menshikh-iv Jan 30, 2019
fd16b38
cast explicitly permutations to int32
menshikh-iv Jan 30, 2019
f116b69
[skip ci] Fix a typo
anotherbugmaster Jan 30, 2019
6095a96
Merge remote-tracking branch 'origin/nmf_speedups' into nmf_speedups
anotherbugmaster Jan 30, 2019
52f80fc
[skip ci] Remove clip and fix error count in update
anotherbugmaster Jan 30, 2019
b144de7
[skip ci] Fix error computation
anotherbugmaster Jan 30, 2019
abf3239
[skip ci] Fix error counting again
anotherbugmaster Jan 30, 2019
0948c85
[skip ci] Remove redundant imports
anotherbugmaster Jan 30, 2019
7e2782e
Fix grouper for csc matrices
anotherbugmaster Jan 30, 2019
e213d7e
Fix module docstring
anotherbugmaster Jan 30, 2019
4bb6f9f
Fix training corpus description
anotherbugmaster Jan 30, 2019
700bc36
Fix pep8
anotherbugmaster Jan 30, 2019
0fc38c4
Fix flake8 for real
anotherbugmaster Jan 30, 2019
f77873e
Normalize, sparsity and dictionary fixes
anotherbugmaster Jan 30, 2019
db62b49
Updated module docstring in the notebook
anotherbugmaster Jan 31, 2019
e8b448e
Update wiki
anotherbugmaster Jan 31, 2019
15c245c
Merge branch 'nmf_speedups' of github.com:anotherbugmaster/gensim int…
anotherbugmaster Jan 31, 2019
94cd929
Changed the loss shown in logs
anotherbugmaster Feb 1, 2019
7603007
Merge remote-tracking branch 'origin/nmf_speedups' into nmf_speedups
anotherbugmaster Feb 1, 2019
e741d02
Fix wikipedia metrics
anotherbugmaster Feb 4, 2019
40258b9
Merge upstream
anotherbugmaster Feb 4, 2019
586ffbb
Merge remote-tracking branch 'origin/nmf_speedups' into nmf_speedups
anotherbugmaster Feb 4, 2019
d291a26
Set chunksize to 1000
anotherbugmaster Feb 4, 2019
a4f8acc
Sklearn topics
anotherbugmaster Feb 4, 2019
0cd1811
Fix the issues in PR
anotherbugmaster Feb 4, 2019
5d6721b
Re-order metrics, add more explanations
anotherbugmaster Feb 4, 2019
0875b54
Merge branch 'nmf_speedups' of github.com:anotherbugmaster/gensim int…
anotherbugmaster Feb 4, 2019
72c6bc7
Fix the compatibility test
anotherbugmaster Feb 4, 2019
e006142
Fix flake8
anotherbugmaster Feb 4, 2019
8b97172
Add comment about sparsity
anotherbugmaster Feb 4, 2019
7ecbc70
Add nmf_model
anotherbugmaster Feb 4, 2019
4912bdb
Fix indent and corpus type
anotherbugmaster Feb 4, 2019
fc57b0f
Fix the flake8
anotherbugmaster Feb 4, 2019
225b0b6
Fix smart_open import
anotherbugmaster Feb 5, 2019
f657708
Merge branch 'nmf_speedups' of github.com:anotherbugmaster/gensim int…
anotherbugmaster Feb 5, 2019
c9b209e
Merge branch 'nmf_speedups' of github.com:anotherbugmaster/gensim int…
anotherbugmaster Feb 5, 2019
ee9c7c9
Fix flake8, docstring and comments
anotherbugmaster Feb 5, 2019
a89b81f
Truncate wikipedia
anotherbugmaster Feb 5, 2019
f0d9002
[skip ci] Fix indent
anotherbugmaster Feb 5, 2019
f9dd1c5
Fix CI
anotherbugmaster Feb 5, 2019
c74f782
Fix initialization
anotherbugmaster Feb 6, 2019
447970f
Fix flake8 and sklearn topics in the tutorial
anotherbugmaster Feb 6, 2019
c696186
[skip ci] Update wiki notebook
anotherbugmaster Feb 7, 2019
c1c3156
Merge branch 'nmf_speedups' of github.com:anotherbugmaster/gensim int…
anotherbugmaster Feb 7, 2019
967f4c2
Truncate wiki, remove autoreload
anotherbugmaster Feb 7, 2019
f35bd83
Remove autoreload and line_profiler
anotherbugmaster Feb 7, 2019
62610ef
Fix type checks
anotherbugmaster Feb 7, 2019
d712ad4
[skip ci] Add the comment
anotherbugmaster Feb 7, 2019
f011f2c
update language in the NMF tutorial
piskvorky Feb 11, 2019
b97b395
WIP: NMF tutorial fixes + add Wikipedia section
piskvorky Feb 14, 2019
149bf71
more NMF tutorial fixes
piskvorky Feb 14, 2019
b3e9423
more NMF tutorial fixes
piskvorky Feb 15, 2019
3db06bb
NMF notebook fixes
piskvorky Feb 16, 2019
a9ec07a
more NMF tutorial fixes
piskvorky Feb 16, 2019
cdc84d8
more NMF fixes
piskvorky Feb 17, 2019
e67fce4
NMF tutorial fixes
piskvorky Feb 18, 2019
c2cae87
NMF tutorial fixes
piskvorky Feb 18, 2019
551fe07
Merge remote-tracking branch 'upstream/develop' into nmf_speedups
anotherbugmaster Feb 20, 2019
431289a
Merge remote-tracking branch 'upstream/develop' into nmf_speedups
anotherbugmaster Feb 26, 2019
6161d90
Merge remote-tracking branch 'upstream/nmf_speedups' into nmf_speedups
anotherbugmaster Feb 26, 2019
250f788
[skip ci] O(n_topics^2) -> O(n_topics)
anotherbugmaster Mar 1, 2019
7ef176f
[skip ci] Improve training logs
anotherbugmaster Mar 1, 2019
4d1cce7
[skip ci] Turn on BLAS
anotherbugmaster Mar 1, 2019
5079be4
[skip ci] Optimize conversion to csc
anotherbugmaster Mar 3, 2019
cfa3790
[skip ci] Optimize conversion to csc [2]
anotherbugmaster Mar 3, 2019
adfcd01
[skip ci] Speed up corpus2csc
anotherbugmaster Mar 3, 2019
bd81329
[skip ci] Fix sparsity normalization
anotherbugmaster Mar 3, 2019
38950fb
[skip ci] Fix identation
anotherbugmaster Mar 3, 2019
c918f48
[skip ci] Remove optimize option (numpy version issues)
anotherbugmaster Mar 3, 2019
eeac487
[skip ci] Re-serialize the model
anotherbugmaster Mar 3, 2019
d28c17e
[skip ci] Fix sparse matrix length computation
anotherbugmaster Mar 3, 2019
6105b1e
[skip ci] Print topics only after eval_every batches
anotherbugmaster Mar 3, 2019
982c475
[skip ci] Cosmetic fix
anotherbugmaster Mar 3, 2019
07e12a3
[skip ci] Fix length method for csc corpus
anotherbugmaster Mar 3, 2019
e465460
Lot of changes to the notebook
anotherbugmaster Mar 4, 2019
241754d
Generator test
anotherbugmaster Mar 4, 2019
123d948
Merge remote-tracking branch 'origin/nmf_speedups' into nmf_speedups
anotherbugmaster Mar 4, 2019
c1b60ec
[skip ci] Flake fix
anotherbugmaster Mar 4, 2019
063caa7
Notebook execution is finished
anotherbugmaster Mar 4, 2019
404ee9d
Merge branch 'nmf_speedups' of github.com:anotherbugmaster/gensim int…
anotherbugmaster Mar 4, 2019
39e1ae1
[skip ci] Fix objections
anotherbugmaster Mar 4, 2019
a8b744f
Merge remote-tracking branch 'origin/nmf_speedups' into nmf_speedups
anotherbugmaster Mar 4, 2019
ff0c49e
Trimmed the notebook and added more info about the metrics
anotherbugmaster Mar 4, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Optimize E/M step
anotherbugmaster committed Jun 12, 2018
commit e82628de5ece8d113f5c085a695fd5c122b5303e
8 changes: 7 additions & 1 deletion gensim/models/nmf.py
Original file line number Diff line number Diff line change
@@ -276,10 +276,13 @@ def _solve_w(self, v, h, r):
eta = self._kappa / np.linalg.norm(self.A, "fro")
error = None

for n in range(self._w_max_iter):
for iter_number in range(self._w_max_iter):
self._W -= eta * (np.dot(self._W, self.A) - self.B)
self.__transform()

if iter_number == self._w_max_iter - 1:
break

error_ = self.__w_error()

if error and np.abs(error_ - error) < np.abs(
@@ -344,6 +347,9 @@ def _solveproj(self, v, W, h=None, r=None, v_max=None):

solve_r(r, r_actual, self._lambda_, self.v_max)

if iter_number == self._h_r_max_iter - 1:
break

error_ = self.__h_r_error(v, h, r)

if error and np.abs(error - error_) < np.abs(