`infer` method topic distribution of doc mostly zeros #49

ecoronado92 · 2020-05-19T19:07:13Z

Hi -

I fitted an HDP model tried to obtain the topic distribution for an unseen document. I do get a list, however most of the entries are zeros so I'm thinking there might be a rounding issues in the code.

Here's an example of how it looks like

token_list = ['strong', 'organization', 'rusnews', 'line',  'misery', 'write', 'faq', 'ever', 'get', 
'modify', 'define', 'strong', 'atheist', 'believe', 'word']

doc_inst = hdp_model.make_doc(token_list)
topic_dist, ll = hdp_model.infer(doc_inst)

topic_dist
[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 1.0, ## <--- Here's the only non-zero element which is correct, but I'd like to get %'s
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

Here's some other info on my OS

Darwin-18.5.0-x86_64-i386-64bit
Python 3.7.6 (default, Dec 30 2019, 19:38:28) 
[Clang 11.0.0 (clang-1100.0.33.16)]
NumPy 1.18.1
SciPy 1.4.1
tomotopy 0.7.1

The text was updated successfully, but these errors were encountered:

valmirselmani · 2020-05-20T23:17:16Z

Hi,

I can confirm this behaviour, in most documents there are mostly zeros in the topic distributions and one or two topics have values greater than 0, which is usually 1, but it also can happen that the value is 1.0000157356262207 for example. It seems that HDP is very confident with the topic assignments.

I am currently writing a bachelor's thesis, where we are creating a topic model to propose similar documents. It's important that not many documents have the same topic distribution, so that we can sort them and thus improve the recommendation. The results of HDP are quite good, though.

I also use tomotopy 0.7.1 and have seen this behavior in several versions.

bab2min · 2020-05-21T13:52:30Z

Thank you for reporting a bug. I'll examine it.

valmirselmani · 2020-05-22T19:58:35Z

Any plans on when you're going to make a release so I can test?

valmirselmani · 2020-05-24T17:45:37Z

I saw that you released a test version. I installed it and ran it through a small number of documents. I was afraid that the topics would change, but that is not the case. The results look much better now. Thanks for the quick fix.

bab2min · 2020-05-25T04:25:44Z

Oh did you see the test version I'd released? Actually, it has some bugs about segmentation fault. It occurs not always, but often. So, I will check a little more and fix the problem and then include it in the next update.
Thanks for reporting it!

valmirselmani · 2020-05-25T11:36:19Z

Yes, I installed it from test.pypi.org. But, I only tested the inference with a small number of documents, so I did not notice the error at all. Keep up with your good work!

I'll wait for the release to infer my 575k documents. 😅

fixed HDP inference bug (#49) implemented converting HDP to LDA (#50) added used_vocabs (#54) added g-DMR model

bab2min added the bug Something isn't working label May 21, 2020

bab2min added a commit that referenced this issue Jun 4, 2020

preparing 0.8.0

f72d8f6

fixed HDP inference bug (#49) implemented converting HDP to LDA (#50) added used_vocabs (#54) added g-DMR model

bab2min closed this as completed in 039e09d Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`infer` method topic distribution of doc mostly zeros #49

`infer` method topic distribution of doc mostly zeros #49

ecoronado92 commented May 19, 2020 •

edited

Loading

valmirselmani commented May 20, 2020

bab2min commented May 21, 2020

valmirselmani commented May 22, 2020

valmirselmani commented May 24, 2020

bab2min commented May 25, 2020

valmirselmani commented May 25, 2020

infer method topic distribution of doc mostly zeros #49

infer method topic distribution of doc mostly zeros #49

Comments

ecoronado92 commented May 19, 2020 • edited Loading

valmirselmani commented May 20, 2020

bab2min commented May 21, 2020

valmirselmani commented May 22, 2020

valmirselmani commented May 24, 2020

bab2min commented May 25, 2020

valmirselmani commented May 25, 2020

`infer` method topic distribution of doc mostly zeros #49

`infer` method topic distribution of doc mostly zeros #49

ecoronado92 commented May 19, 2020 •

edited

Loading