-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Sklearn wrapper for RandomProjections Model #1395
[WIP] Sklearn wrapper for RandomProjections Model #1395
Conversation
Base RP module | ||
""" | ||
|
||
def __init__(self, corpus, id2word=None, num_topics=300): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove corpus
argument, you should pass corpus only to fit
method
def fit(self, X, y=None): | ||
""" | ||
For fitting corpus into class object. | ||
Calls gensim.models.RpModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace doc-string to Fit the model according to the given training data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor code style nitpicks :)
@@ -58,7 +58,7 @@ def set_params(self, **parameters): | |||
def fit(self, X, y=None): | |||
""" | |||
For fitting corpus into the class object. | |||
Calls gensim.model.LsiModel: | |||
Calls gensim.models.LsiModel: | |||
>>>gensim.models.LsiModel(corpus=corpus, num_topics=num_topics, id2word=id2word, chunksize=chunksize, decay=decay, onepass=onepass, power_iters=power_iters, extra_samples=extra_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation line doesn't seem to help -- what are these undefined variables like id2word
, chunksize
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These params (id2word
, chunksize
etc) are associated with the LSI model. This change is in the file sklearn_wrapper_gensim_lsimodel.py
. Since this change was so small (literally one word in a docstring), I added this change in this PR (PR concerning RP model wrapper) itself.
There is also a similar change for LDA model here. Should I remove these changes from this PR?
# | ||
# Copyright (C) 2011 Radim Rehurek <[email protected]> | ||
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: remove #
, insert blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated now.
Scikit learn interface for gensim for easy use of gensim with scikit-learn | ||
Follows scikit-learn API conventions | ||
""" | ||
from gensim import models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line before imports.
Also, block the imports: built-in first, 3rd party second, local package imports last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated now.
Same comments as in #1405 |
score = text_rp.score(corpus, data.target) | ||
self.assertGreater(score, 0.40) | ||
|
||
def testPersistence(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as LdaSeq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
text_rp = Pipeline((('features', model,), ('classifier', clf))) | ||
text_rp.fit(corpus, data.target) | ||
score = text_rp.score(corpus, data.target) | ||
self.assertGreater(score, 0.40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as LdaSeq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
This PR creates an sklearn wrapper for Random Projections model.