-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infer
method topic distribution of doc mostly zeros
#49
Comments
Hi, I can confirm this behaviour, in most documents there are mostly zeros in the topic distributions and one or two topics have values greater than 0, which is usually 1, but it also can happen that the value is 1.0000157356262207 for example. It seems that HDP is very confident with the topic assignments. I am currently writing a bachelor's thesis, where we are creating a topic model to propose similar documents. It's important that not many documents have the same topic distribution, so that we can sort them and thus improve the recommendation. The results of HDP are quite good, though. I also use tomotopy 0.7.1 and have seen this behavior in several versions. |
Thank you for reporting a bug. I'll examine it. |
Any plans on when you're going to make a release so I can test? |
I saw that you released a test version. I installed it and ran it through a small number of documents. I was afraid that the topics would change, but that is not the case. The results look much better now. Thanks for the quick fix. |
Oh did you see the test version I'd released? Actually, it has some bugs about segmentation fault. It occurs not always, but often. So, I will check a little more and fix the problem and then include it in the next update. |
Yes, I installed it from test.pypi.org. But, I only tested the inference with a small number of documents, so I did not notice the error at all. Keep up with your good work! I'll wait for the release to infer my 575k documents. 😅 |
Hi -
I fitted an HDP model tried to obtain the topic distribution for an unseen document. I do get a list, however most of the entries are zeros so I'm thinking there might be a rounding issues in the code.
Here's an example of how it looks like
Here's some other info on my OS
The text was updated successfully, but these errors were encountered: