Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'model.vocabs' gives full list of tokens even if it was filtered #54

Closed
Benja1972 opened this issue May 26, 2020 · 3 comments
Closed

'model.vocabs' gives full list of tokens even if it was filtered #54

Benja1972 opened this issue May 26, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@Benja1972
Copy link

'model.vocabs' gives full list of tokens even if it was filtered. At the same time 'num_vocabs' gives number of tokens after filtering. It is confusing as model gives topic-words distribution as list on numbers on 'num_vocabs' and does not provides id2token dictionary. A kind of 'used_vocabs' needed to make any down-steam analysis on produced topics

@bab2min bab2min added the enhancement New feature or request label May 27, 2020
@bab2min
Copy link
Owner

bab2min commented May 27, 2020

Your point makes sense. To avoid confusion, I'll add the used_vocabs and num_used_vocabs properties into models. Thank you for your good suggestion.

@Benja1972
Copy link
Author

Thank you for response and proposal!

bab2min added a commit that referenced this issue Jun 4, 2020
fixed HDP inference bug (#49)
implemented converting HDP to LDA (#50)
added used_vocabs (#54)
added g-DMR model
@bab2min
Copy link
Owner

bab2min commented Jun 6, 2020

A new property named used_vocabs has been added since version 0.8.0 and the property num_vocabs has been deprecated. You can use len(used_vocabs) to get the number of vocabs used.

Thank you for your suggestion again!

@bab2min bab2min closed this as completed Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants