-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc2Vec .save_word2vec_format() doesn't save everything. #1110
Comments
As a follow up, I verified that all that is saved is the word model.
And then
The vocab size is 10543 and the saved model file has the corresponding number of lines (plus the header) |
For Do you think that changing the docstring to "The word vectors of the model can also be instantiated from an existing file on disk in the word2vec C format. NOTE that it excludes the document vectors::" would make it more clear? Have you tried the |
I, too, believe |
Hello tmylk and gojomo, Also, sorry I didn't see that this was a known issue. I guess this thread can be closed then! I do like the idea of changing the docstring, and I would change your note to say : "NOTE document vectors are not saved with .save_word2vec_format(). Use .save() instead" because this clearly states the functionality and a solution. |
Thanks for the comment idea. Fixed in ae04cda |
When I model.save_word2vec_format() or model.save(), it seems that only the word vector information is saved. The following code is almost identical to the wikipedia code in the repo.
I can get most_similar() documents in the same script that trained the model, as above. However, I get this error:
if I reload the model in a different script. I.e.
I can, however, do
In the current directory I see only one file called d2v_model, and when I open it I see the word vectors. I'm thinking there should be another one called d2v_model.doctag_syn0 or something. Help?
The text was updated successfully, but these errors were encountered: