-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in displaying top terms #93
Comments
@lenhhoxung86 I ran into the same issue, the problem here is with the documentation. There is this parameter sort_topics which is set to true by default and what that does is to order them by token proportion, altering the original numbering.
If you set that parameter to false you should be able to see the original topic numbers. |
@betolink: I added this parameter (sort_topics = False), but the issue still remains. Is there any other option to solve it? I worked in anaconda + jupyter notebook Thanks! |
I am observing the same issue, I feel like the sort_topics parameter is not being applied properly. |
I haven't looked at this in a long time but I'll try to check what's going on ASAP. |
Hello,
I use your package to visualize the topic model trained from sklearn. I can see the circles representing topics and the corresponding salient terms.
Now I use the following code to print the top terms:
def print_top_words(model, feature_names, n_top_words):
# import pdb;pdb.set_trace()
for topic_idx, topic in enumerate(model.components_):
print("Topic #%d:" % topic_idx)
print(" ".join([feature_names[i]
for i in topic.argsort()[:-n_top_words - 1:-1]]))
However, the printed terms for each topic are different from terms for the same topics in the figure.
For example, here is the printed terms:
Topic #0:
open really make source yeah version grand mail libre nouvelle team never luxembourg projet sure trouve let idea saturday actually sorry full photos pays questions hack pretty issue monday mort
Topic #1:
après site vidéo france wikipage nouveau enfin cours aller firefox mozilla passe petit vois service vers train future prendre digital system puis accord lien conférence prix sujet tellement cause list
Topic #2:
rien amateur beurre leurre internet sans moins place bonne monde right please maintenant page possible party looks idée things security like français gros machine take belge mettre toutes seems hanoi
Topic #3:
like peut paris pourquoi entre look find viens start test apple public android online surtout mois arduino life lire guess soon dont beau soirée coming badge chaque wifi tête allez
Topic #4:
know time still thanks work twitter good linux free part need well contre back wednesday support think nice home blog post reste aujourd workshop looking hui friday working project privacy
Topic #5:
très faut google also belgique avant using sous news autres deux quel point vrai times going cloud assez long plein tech dernier network festival phone première solution show donne normal
Topic #6:
today great would thursday hsbxl next moment hasselt mieux limburg systemd drinking tuesday question ouais plutôt last much help favorite year come something belgium fosdem already brussels debian vote fail
Topic #7:
plus fait bien tout faire quand merci comme quoi aussi être encore trop donc cette déjà voir alors juste ulb gens toujours sais tous dire peux chez veux coup autre
Topic #8:
data avoir comment played temps video tweet talk article facebook chose check live mobile besoin used meetup sint quelqu update without file apps science microsoft travail give savoir nuit tant
Topic #9:
ubuntu people vraiment first python week fais code bruxelles parce photo aime love windows cool jours veut europe world hein air parler personnes read doit mean trump trucs always sunday.
In fact, the terms in iPython notebook for a given topic is distributed across the printed topics.
This makes me confused, maybe the display is wrong?
Please give me an explaination.
Many thanks.
The text was updated successfully, but these errors were encountered: