Use `CoherenceModel` for `LdaModel.top_topics` #1427

macks22 · 2017-06-18T01:42:50Z

This will resolve #1128.

…ions for the probabilistic topic models.

…this, and update the `CoherenceModel` to use this for getting topics from models.

menshikh-iv · 2017-06-22T07:38:02Z

@chinmayapancholi13 this PR fix problem with coherence and SklWrappers?

menshikh-iv · 2017-07-18T09:53:37Z

gensim/models/ldamodel.py

        topic = topic / topic.sum()  # normalize to probability distribution
        bestn = matutils.argsort(topic, topn, reverse=True)
        return [(id, topic[id]) for id in bestn]

-    def top_topics(self, corpus, num_words=20):
+    def top_topics(self, corpus=None, texts=None, dictionary=None, window_size=None,
+                   coherence='u_mass', topn=20, processes=-1):


I don't think that we should use all coherence types for this method, u_mass will be enough for it (for this reason please remove coherence parameters from arguments)

Several papers have shown that sliding-window based coherence metrics have higher correlation with human judgements, and u_mass has lower correlations with human judgements than these methods. Why limit to u_mass when it is not the best technique for coherence calculations?

I agree with you, but for "sliding windows" approach user should pass corpus, texts and dictionary, it's slightly complicated.

Although perhaps you are right, if the default method is UMass I see no difference for users.

menshikh-iv · 2017-07-18T09:53:59Z

gensim/models/ldamodel.py

        """
-        Calculate the Umass topic coherence for each topic. Algorithm from
-        **Mimno, Wallach, Talley, Leenders, McCallum: Optimizing Semantic Coherence in Topic Models, CEMNLP 2011.**
+        Calculate the coherence for each topic; default is Umass coherence. See the


Googe-style docstring (here and everywhere)

Not sure what you're looking for here, though I've noticed this comment in several of my PRs. To clarify, are you asking for the beginning line to start on the same line as the start quotes (""")? Or are you asking that I include the Args specification for this method and the others? Thanks!

Sorry for the confusion, I mean Args and Returns sections.

fixed throughout

macks22 · 2017-07-30T17:06:44Z

@menshikh-iv when you have a moment, I'd love your thoughts on my comments. Based on your comments, I think this may still need some changes, but I need your guidance on how to make them. Thanks!

menshikh-iv · 2017-08-10T11:17:29Z

Sorry for waiting, please fix doc-strings and I'll merge this PR 👍

…LdaModel methods.

macks22 · 2017-08-13T20:25:25Z

@menshikh-iv I'm not sure how this error relates to my PR; maybe something with bad scikit version in build server? The current build is only failing due to this error in test_sklearn_api:

======================================================================
ERROR: testPipeline (gensim.test.test_sklearn_api.TestLdaSeqWrapper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/RaRe-Technologies/gensim/gensim/test/test_sklearn_api.py", line 340, in testPipeline
    text_ldaseq.fit(corpus, test_target)
  File "/home/travis/miniconda2/envs/gensim-test/lib/python3.5/site-packages/sklearn/pipeline.py", line 257, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/home/travis/miniconda2/envs/gensim-test/lib/python3.5/site-packages/sklearn/pipeline.py", line 226, in _fit
    self.steps[step_idx] = (name, fitted_transformer)
TypeError: 'tuple' object does not support item assignment

menshikh-iv · 2017-08-15T13:09:07Z

@macks22 this problem isn't related to your changes. This problem connected with new sklearn release, we will fix it soon.

menshikh-iv · 2017-08-18T07:36:48Z

Nice work @macks22, you are very productive:+1:

…iskvorky#1427) * Add a `get_topics` method to all topic models, add test coverage for this, and update the `CoherenceModel` to use this for getting topics from models. * Require topics returned from `get_topics` to be probability distributions for the probabilistic topic models. * Replace code in `LdaModel.top_topics` with use of `CoherenceModel`. * Fix docstrings to use Google style throughout PR changes and various LdaModel methods.

Sweeney, Mack added 3 commits June 17, 2017 21:26

Require topics returned from get_topics to be probability distribut…

41b038a

…ions for the probabilistic topic models.

Add a get_topics method to all topic models, add test coverage for …

63bd3f9

…this, and update the `CoherenceModel` to use this for getting topics from models.

Replace code in LdaModel.top_topics with use of CoherenceModel.

f98a7bb

menshikh-iv suggested changes Jul 18, 2017

View reviewed changes

Fix docstrings to use Google style throughout PR changes and various …

8ef0e9e

…LdaModel methods.

menshikh-iv merged commit f24ec78 into piskvorky:develop Aug 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `CoherenceModel` for `LdaModel.top_topics` #1427

Use `CoherenceModel` for `LdaModel.top_topics` #1427

macks22 commented Jun 18, 2017

menshikh-iv commented Jun 22, 2017

menshikh-iv Jul 18, 2017

macks22 Jul 27, 2017

menshikh-iv Aug 10, 2017

menshikh-iv Jul 18, 2017

macks22 Jul 27, 2017

menshikh-iv Aug 10, 2017

macks22 Aug 13, 2017

macks22 commented Jul 30, 2017

menshikh-iv commented Aug 10, 2017

macks22 commented Aug 13, 2017

menshikh-iv commented Aug 15, 2017

menshikh-iv commented Aug 18, 2017 •

edited

Loading

Use CoherenceModel for LdaModel.top_topics #1427

Use CoherenceModel for LdaModel.top_topics #1427

Conversation

macks22 commented Jun 18, 2017

menshikh-iv commented Jun 22, 2017

menshikh-iv Jul 18, 2017

Choose a reason for hiding this comment

macks22 Jul 27, 2017

Choose a reason for hiding this comment

menshikh-iv Aug 10, 2017

Choose a reason for hiding this comment

menshikh-iv Jul 18, 2017

Choose a reason for hiding this comment

macks22 Jul 27, 2017

Choose a reason for hiding this comment

menshikh-iv Aug 10, 2017

Choose a reason for hiding this comment

macks22 Aug 13, 2017

Choose a reason for hiding this comment

macks22 commented Jul 30, 2017

menshikh-iv commented Aug 10, 2017

macks22 commented Aug 13, 2017

menshikh-iv commented Aug 15, 2017

menshikh-iv commented Aug 18, 2017 • edited Loading

Use `CoherenceModel` for `LdaModel.top_topics` #1427

Use `CoherenceModel` for `LdaModel.top_topics` #1427

menshikh-iv commented Aug 18, 2017 •

edited

Loading