Repo for exploring Word2Vec and Doc2Vec
We will be using python 3.5 with Anaconda. Download the installer from
https://www.continuum.io/downloads
and follow installation instructions
conda create -n word2vec_env pip
git clone https://github.com/abtpst/WordVectors.git
Navigate to the cloned location and activate the environment created above
activate word2vec_env
Now install the packages as
pip install -r requirements.txt
If you experience trouble installing scipy
or numpy
, follow these steps. Note that the environment must be activated and you should run these commands from the cloned location.
conda install scipy
pip install -r requirements.txt
In the versions of gensim, post 0.10.1, you need to make the following change
Modify the file site-packages/gensim/models/word2vec.py
by changing line 1013
to
self.syn0[i] = self.seeded_vector(str(self.index2word[i]) + str(self.seed))
This fix is important as otherwise the word2vec training would fail.
The python interpreter for this project should point to the conda virtual environment we created above. Note that we can use this environment for doc2vec
as well.
The src
folder contains the packages for exploring Word2Vec
and Doc2Vec
. The respective Readme files with instructions are also provided.