-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CoherenceModel
for LdaModel.top_topics
#1427
Use CoherenceModel
for LdaModel.top_topics
#1427
Conversation
…ions for the probabilistic topic models.
…this, and update the `CoherenceModel` to use this for getting topics from models.
@chinmayapancholi13 this PR fix problem with coherence and SklWrappers? |
gensim/models/ldamodel.py
Outdated
topic = topic / topic.sum() # normalize to probability distribution | ||
bestn = matutils.argsort(topic, topn, reverse=True) | ||
return [(id, topic[id]) for id in bestn] | ||
|
||
def top_topics(self, corpus, num_words=20): | ||
def top_topics(self, corpus=None, texts=None, dictionary=None, window_size=None, | ||
coherence='u_mass', topn=20, processes=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should use all coherence types for this method, u_mass
will be enough for it (for this reason please remove coherence parameters from arguments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several papers have shown that sliding-window based coherence metrics have higher correlation with human judgements, and u_mass
has lower correlations with human judgements than these methods. Why limit to u_mass
when it is not the best technique for coherence calculations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, but for "sliding windows" approach user should pass corpus
, texts
and dictionary
, it's slightly complicated.
Although perhaps you are right, if the default method is UMass
I see no difference for users.
gensim/models/ldamodel.py
Outdated
""" | ||
Calculate the Umass topic coherence for each topic. Algorithm from | ||
**Mimno, Wallach, Talley, Leenders, McCallum: Optimizing Semantic Coherence in Topic Models, CEMNLP 2011.** | ||
Calculate the coherence for each topic; default is Umass coherence. See the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Googe-style docstring (here and everywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you're looking for here, though I've noticed this comment in several of my PRs. To clarify, are you asking for the beginning line to start on the same line as the start quotes (""")? Or are you asking that I include the Args
specification for this method and the others? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I mean Args
and Returns
sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed throughout
@menshikh-iv when you have a moment, I'd love your thoughts on my comments. Based on your comments, I think this may still need some changes, but I need your guidance on how to make them. Thanks! |
Sorry for waiting, please fix doc-strings and I'll merge this PR 👍 |
…LdaModel methods.
@menshikh-iv I'm not sure how this error relates to my PR; maybe something with bad scikit version in build server? The current build is only failing due to this error in
|
@macks22 this problem isn't related to your changes. This problem connected with new sklearn release, we will fix it soon. |
Nice work @macks22, you are very productive:+1: |
…iskvorky#1427) * Add a `get_topics` method to all topic models, add test coverage for this, and update the `CoherenceModel` to use this for getting topics from models. * Require topics returned from `get_topics` to be probability distributions for the probabilistic topic models. * Replace code in `LdaModel.top_topics` with use of `CoherenceModel`. * Fix docstrings to use Google style throughout PR changes and various LdaModel methods.
This will resolve #1128.