Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assumption that gensim dictionary and corpus are built from same … #17

Merged
merged 1 commit into from
Jul 24, 2015

Conversation

paul-english
Copy link
Contributor

…set of documents

The dictionary may span a larger set of documents than any particular corpus, and the ids aren't necessarily congrouos or in order. Iterating the keys of the term_freqs_dict solves this. Additionally I've simplified term_freqs.

…set of documents

The dictionary may span a larger set of documents than any particular corpus, and the ids aren't necessarily congrouos or in order. Iterating the keys of the term_freqs_dict solves this. Additionally I've simplified term_freqs.
@paul-english
Copy link
Contributor Author

It looks like this may require additional updates in _prepare.py

@paul-english
Copy link
Contributor Author

@bmabey Is this validation strictly required for later computation?

https://github.com/bmabey/pyLDAvis/blob/master/pyLDAvis/_prepare.py#L41-L42

bmabey added a commit that referenced this pull request Jul 24, 2015
Fix assumption that gensim dictionary and corpus are built from same …
@bmabey bmabey merged commit 69719cd into bmabey:master Jul 24, 2015
@bmabey bmabey mentioned this pull request Jul 24, 2015
@bmabey
Copy link
Owner

bmabey commented Jul 24, 2015

@log0ymxm that check is needed because vectorized math is performed under that assumption. Can you replace the missing elements with zeros?

@paul-english
Copy link
Contributor Author

Locally I've commented that check out, and I'm still able to prepare the data for visualization.

@bmabey
Copy link
Owner

bmabey commented Jul 24, 2015

You may be attributing counts to the wrong term. I'd have to walk through the code to be sure though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants