-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] HDP changes #996
[WIP] HDP changes #996
Conversation
@tmylk , I've also added a method for the HDP to LDA conversion, which returns an LDA model object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add changelog and tests for the new method.
Returns closest corresponding ldamodel object corresponding to current hdp model. | ||
""" | ||
alpha, beta = self.hdp_to_lda() | ||
ldam = ldamodel.LdaModel(num_topics=150, alpha=alpha, id2word=self.id2word) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this number of topics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 150 because the default number of topics in HDP before truncation is 150. This means all the internal matrices are of that size, and to make the expElogbeta
matrice the same shape to accept the beta, we've to initialise ldam
to that many topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add text to the doc-string about this choice
@@ -34,11 +34,11 @@ | |||
from __future__ import with_statement | |||
|
|||
import logging, time | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is standard abbreviation. Please keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LSI, LDA, DTM all use numpy
instead of np
in the code, so I was making it uniform.
Should I make it np
in all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will make reviewing this PR much easier if this decorative change wasn't here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll make it back to np
.
But I still think it should be uniform throughout the package - will open another PR for the other 3.
@tmylk could you have a look? |
An example test: train HDP, create default LDA using the new method, check they are similar. |
@tmylk , added tests, and a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor changes
return result.astype(alpha.dtype) # keep the same precision as input | ||
|
||
|
||
def get_random_state(seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I instead call it from the ldamodel
class? How would this rather be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better move to some shared class like util
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also move dirichlet_expectation
to matutils?
Same code being used in ldamodel
and hdpmodel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -51,12 +52,29 @@ class TestHdpModel(unittest.TestCase, basetests.TestBaseTopicModel): | |||
def setUp(self): | |||
self.corpus = mmcorpus.MmCorpus(datapath('testcorpus.mm')) | |||
self.class_ = hdpmodel.HdpModel | |||
self.model = self.class_(corpus, id2word=dictionary) | |||
self.model = self.class_(corpus, id2word=dictionary, random_state=np.random.seed(0)) | |||
|
|||
def testShowTopic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but that only tests the types of the results; I thought it would be important to also test for particular values. The nature of my test is different. Should I maybe name it something else?
Also, how do I make this test file run the base class tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, give these test a different name.
it should run base class test automatically. can you see them in the log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, and yup the basetest
tests show up in HDP.
Changed the test name, moved functions to utils and matutils, and changed all the numpy to np there as well (I had forgotten to last time, only had changed the models). |
@tmylk , could you please review? Travis fails on all the python versions because of I've added |
Ping @tmylk ! Any clue what to do with the tests? |
Did keyed vectors get merged in correctly from develop branch? |
How do I check if it has been? My guess is it hasn't, because the test is failing again. But when I do |
@tmylk , I checked, and |
@tmylk how do I check if keyedvectors is merged in correctly? |
@bhargavvader may I ask for a new pr with just HDP changes and nothing else? Would like to merge it to new release this week. |
Sure, will close this and open 2 separate PRs this weekend. |
This is to address issues #262 , #901, #937, #945 - basically to attempt to clean up HDP as much as possible.
For starters, I've made HDP resemble
ldamodel
as much as possible - with regard to the imports, and thedirichlet_expectation
method mirrors theldamodel
one.Will keep making changes in this PR.
Edit: Would want to fix #952 with this as well.