KeyedVectors
Deprecation warning
After upgrading to this release you might see deprecation warnings like this:
WARNING:gensim.models.word2vec:direct access to syn0norm will not be supported in future gensim releases, please use model.wv.syn0norm
These warnings are correct and you are encouraged to change your Word2vec/Doc2vec code to use the new model.wv.syn0norm and model.wv.vocab fields instead of old direct access like model.syn0norm and model.vocab. The direct access will be deprecated in Feb 2017.
Specifically, you should use
model.wv.syn0norm
instead of model.syn0norm
model.wv.syn0
instead of model.syn0
model.wv.vocab
instead of model.vocab
model.wv.index2word
instead of model.index2word
The reason for this deprecation is to separate word vectors from word2vec training. There are now new ways to get word vectors that don't involve training word2vec. We are adding capabilities to use word vectors trained in GloVe, FastText, WordRank, Tensorflow and Deeplearning4j word2vec. In order to have cleaner code and standard APIs for all word embeddings we extracted a KeyedVectors
class and a word-vectors wv
variable into the models.
0.13.4, 2016-12-22
Changelog:
- Evaluation of word2vec models against semantic similarity datasets like SimLex-999 (#1047) (@akutuzov, #1047)
- TensorBoard word embedding visualisation of Gensim Word2vec format (@loretoparisi, #1051)
- Throw exception if load() is called on instance rather than the class in word2vec and doc2vec (@dus0x,(#889)
- Loading and Saving LDA Models across Python 2 and 3. Fix #853 (@anmolgulati, #913, #1093)
- Fix automatic learning of eta (prior over words) in LDA (@olavurmortensen, #1024).
- eta should have dimensionality V (size of vocab) not K (number of topics). eta with shape K x V is still allowed, as the user may want to impose specific prior information to each topic.
- eta is no longer allowed the "asymmetric" option. Asymmetric priors over words in general are fine (learned or user defined).
- As a result, the eta update (
update_eta
) was simplified some. It also no longer logs eta when updated, because it is too large for that. - Unit tests were updated accordingly. The unit tests expect a different shape than before; some unit tests were redundant after the change;
eta='asymmetric'
now should raise an error.
- Optimise show_topics to only call get_lambda once. Fix #1006. (@bhargavvader, #1028)
- HdpModel doc improvement. Inference and print_topics (@dsquareindia, #1029)
- Removing Doc2Vec defaults so that it won't override Word2Vec defaults. Fix #795 (@markroxor, #929)
Remove warning on gensim import "pattern not installed". Fix #1009 (@shashankg7, #1018) - Add delete_temporary_training_data() function to word2vec and doc2vec models. (@deepmipt-VladZhukov, #987)
- New class KeyedVectors to store embedding separate from training code (@anmol01gulati and @droudy, #980)
- Documentation improvements (@IrinaGoloshchapova, #1010, #1011)
- LDA tutorial by Olavur, tips and tricks (@olavurmortensen, #779)
- Add double quote in commmand line to run on Windows (@akarazeev, #1005)
- Fix directory names in notebooks to be OS-independent (@mamamot, #1004)
- Respect clip_start, clip_end in most_similar. Fix #601. (@parulsethi, #994)
- Replace Python sigmoid function with scipy in word2vec & doc2vec (@markroxor, #989)
- WMD to return 0 instead of inf for sentences that contain a single word (@rbahumi, #986)
- Pass all the params through the apply call in lda.get_document_topics(), test case to use the per_word_topics through the corpus in test_ldamodel (@parthoiiitm, #978)
- Pyro annotations for lsi_worker (@markroxor, #968)