Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in displaying top terms #93

Open
lenhhoxung86 opened this issue May 30, 2017 · 4 comments
Open

Inconsistency in displaying top terms #93

lenhhoxung86 opened this issue May 30, 2017 · 4 comments

Comments

@lenhhoxung86
Copy link

Hello,
I use your package to visualize the topic model trained from sklearn. I can see the circles representing topics and the corresponding salient terms.
Now I use the following code to print the top terms:

def print_top_words(model, feature_names, n_top_words):
# import pdb;pdb.set_trace()
for topic_idx, topic in enumerate(model.components_):
print("Topic #%d:" % topic_idx)
print(" ".join([feature_names[i]
for i in topic.argsort()[:-n_top_words - 1:-1]]))

However, the printed terms for each topic are different from terms for the same topics in the figure.
For example, here is the printed terms:
Topic #0:
open really make source yeah version grand mail libre nouvelle team never luxembourg projet sure trouve let idea saturday actually sorry full photos pays questions hack pretty issue monday mort
Topic #1:
après site vidéo france wikipage nouveau enfin cours aller firefox mozilla passe petit vois service vers train future prendre digital system puis accord lien conférence prix sujet tellement cause list
Topic #2:
rien amateur beurre leurre internet sans moins place bonne monde right please maintenant page possible party looks idée things security like français gros machine take belge mettre toutes seems hanoi
Topic #3:
like peut paris pourquoi entre look find viens start test apple public android online surtout mois arduino life lire guess soon dont beau soirée coming badge chaque wifi tête allez
Topic #4:
know time still thanks work twitter good linux free part need well contre back wednesday support think nice home blog post reste aujourd workshop looking hui friday working project privacy
Topic #5:
très faut google also belgique avant using sous news autres deux quel point vrai times going cloud assez long plein tech dernier network festival phone première solution show donne normal
Topic #6:
today great would thursday hsbxl next moment hasselt mieux limburg systemd drinking tuesday question ouais plutôt last much help favorite year come something belgium fosdem already brussels debian vote fail
Topic #7:
plus fait bien tout faire quand merci comme quoi aussi être encore trop donc cette déjà voir alors juste ulb gens toujours sais tous dire peux chez veux coup autre
Topic #8:
data avoir comment played temps video tweet talk article facebook chose check live mobile besoin used meetup sint quelqu update without file apps science microsoft travail give savoir nuit tant
Topic #9:
ubuntu people vraiment first python week fais code bruxelles parce photo aime love windows cool jours veut europe world hein air parler personnes read doit mean trump trucs always sunday.

In fact, the terms in iPython notebook for a given topic is distributed across the printed topics.
screen shot 2017-05-30 at 18 55 58

This makes me confused, maybe the display is wrong?
Please give me an explaination.
Many thanks.

@betolink
Copy link
Contributor

@lenhhoxung86 I ran into the same issue, the problem here is with the documentation. There is this parameter sort_topics which is set to true by default and what that does is to order them by token proportion, altering the original numbering.

sort_topics : sort topics by topic proportion (percentage of tokens covered). Set to false to to keep original topic order.

If you set that parameter to false you should be able to see the original topic numbers.

@ghost
Copy link

ghost commented Jul 1, 2020

@betolink: I added this parameter (sort_topics = False), but the issue still remains. Is there any other option to solve it? I worked in anaconda + jupyter notebook

Thanks!

@ilektram
Copy link

@betolink: I added this parameter (sort_topics = False), but the issue still remains. Is there any other option to solve it? I worked in anaconda + jupyter notebook

Thanks!

I am observing the same issue, I feel like the sort_topics parameter is not being applied properly.

@betolink
Copy link
Contributor

I haven't looked at this in a long time but I'll try to check what's going on ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants