-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in gensim.prepare #5
Comments
Thanks for reporting this. The problem, it seems, is that pyLDAvis is assuming a compacted dictionary with a contiguous list of IDs. This will not be the case however in some dictionaries if you have removed tokens and have not called I'll look into removing this assumption made by pyLDAvis. In the meantime I would suggest that you call |
Hi there, thanks for the quick response. Unfortunately, calling I'll investigate if something else goes wrong further upstream. I'm also somewhat unimpressed with the topics I get, so maybe there is some other problem. I will report back if I come across anything that seems relevant. |
I usually delete low-frequency tokens before constructing the dictionary. I have gotten a similar error before when deleting low frequency tokens from the dictionary manually, but re-compactifying has resolved the problem, not sure about your bug. Perhaps further text preprocessing could help your topics? |
Could you provide me with example code of how you are creating your dictionary? |
Sure, here you go:
|
Basically, I have each document in one file and read them from there into a "one document per line" manner. I keep track of file identifiers for later usage. For modeling, I load the saved dictionary so I don't have to build it every time I change something in the modeling step. |
There is something else wrong. With 400 novels, 5000 iterations and 50 passes, I still get the following virtually identical topics.
Seems like I'm doing something wrong upstream, so this whole thing may be unrelated to pyldavis. Sorry about that. |
Yeah, I certainly can't speak to that problem. The gensim mailing list would be a better place to ask about that issue. |
Hey even i am getting a key error when i am using an external dictionary to convert docs to bow, what i am doing in following : i get a model, which i can explore. but vis = pyLDAvis.gensim.prepare(ldamodel,corpus,id2word) produces key errors i can't understand why? i even tried to compactify() id2word before transforming corpus and learning model but that doesn'r help either. Kindly look into it. |
Please provide me with the code and corpus that created the dictionary. Without a way to reproduce this bug locally it will be hard for me to fix this.
|
I just merged in #17 which may address this bug. If someone running into this error can clone and run |
I 'pip installed' pyLDAvis today and ran into a key error problem as well. The key error could be related to a specific Python 3.x issue. The line where the problem occurred for me is here (in gensim.py): But the real culprit is here: From my understanding, in Python 3, the 'dict_keys' object created by the above call is now an iterable where in Python 2, the object created is a list. A pandas DataFrame appears to be unable to subset using an iterable, so a key error occurs. Changing the vocab assignment to: |
@dpatschke Thanks. So, did you make this change locally and have this work for in in Python 3 then? If so, want to submit a PR for it? |
@bmabey I did change it locally and got it to work in Python 3 successfully. Let me work on the pull request for it. |
@bmabey I had a problem cloning with the 'lfs' error (similar to another issue previously listed). If you care to make the fix yourself, feel free. Otherwise, give me a little time to try and get everything set up on my end so as not to delete necessary files |
dpatschke suggested on Sept 17 that modifying a line gensim.py would help the KeyError problem. I'm running in Windows 10, WinPython 2.7 environment and have installed the pyLDAvis today. Tried the change - no dice. I do really need help with this thing, since I'm terribly new to python and terribly late with my work, so any moral and technical support will be greatly appreciated. Traceback (most recent call last): |
The problem I was experiencing was not the same problem you are mentioning. Mine had to do with an object being an "iterable" in Python 3 and a list in Python 2. I honestly have no idea what could be causing your problem, but I would double check that you are, in fact, passing a corpus in as your second parameter to 'prepare'. From comment above: If you are doing this and still getting the error ... I am sorry ... I will not be of much further help. Good luck :-)! |
dpatschke thank you for your reply. The problem you were experiencing was the only reference to something remotely resembling the problem I had, so tried both - the compactify() and the change you have suggested.
|
Apologies. I have a bit of a different issue - my corpus is empty to begin with. Granted this is not the most descriptive error message, but the problem appears with the pyLDAvis user, not the package itself. |
list(dictionary.token2id.keys()) works for me :) |
This should be fixed with the |
Hi there, I'm using gensim to do LDA on a collection of novels (using just 40 for testing, I have several hundreds). Building the corpus and dictionary seems to work fine, as does the modeling process itself. I can also inspect the resulting model (topics in documents and words in topics, for example). However, when attempting to use pyLDAvis, I run into a KeyError.
I'm on Linux (Ubuntu 14.04) and using Python 3.4 and the following versions of relevant modules:
pyLDAvis 1.2.0
numpy 1.9.2
gensim 0.11.1-1
This is my code (loading corpus, dictionary and model from previous step):
This is the output I get:
Not sure whether this is a bug or bad usage of the module. Any help would be very much appreciated.
The text was updated successfully, but these errors were encountered: