-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add wordrank in dockerfile #1460
Merged
menshikh-iv
merged 8 commits into
piskvorky:develop
from
parulsethi:add_wordrank_in_docker
Jul 19, 2017
Merged
Changes from 4 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
0faf08a
add wordrank in docker
parulsethi 0158e0b
add spacy also
parulsethi 2c4459a
fix RUN syntax
parulsethi 914b59a
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
parulsethi 99099fd
add np param
parulsethi 02d88db
made requested changes
parulsethi aa573c4
use os.path.join for wordrank binary
parulsethi 3a51301
change to original repo
parulsethi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,8 +2,8 @@ FROM ubuntu:16.04 | |
|
||
MAINTAINER Parul Sethi <[email protected]> | ||
|
||
ENV GENSIM_REPOSITORY https://github.com/RaRe-Technologies/gensim.git | ||
ENV GENSIM_VERSION bd6db9a41baf219ecc4a1770cc21b01c8ff122e5 | ||
ENV GENSIM_REPOSITORY https://github.com/parulsethi/gensim.git | ||
ENV GENSIM_VERSION add_wordrank_in_docker | ||
|
||
# Installs python, pip and setup tools (with fixed versions) | ||
RUN apt-get update \ | ||
|
@@ -47,6 +47,7 @@ RUN pip2 install \ | |
matplotlib==2.0.0 \ | ||
nltk==3.2.2 \ | ||
pandas==0.19.2 \ | ||
spacy==1.8.1 \ | ||
git+https://github.com/mila-udem/blocks.git@7beb788f1fcfc78d56c59a5edf9b4e8d98f8d7d9 \ | ||
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/requirements.txt | ||
|
||
|
@@ -56,13 +57,18 @@ RUN pip3 install \ | |
matplotlib==2.0.0 \ | ||
nltk==3.2.2 \ | ||
pandas==0.19.2 \ | ||
spacy==1.8.1 \ | ||
git+https://github.com/mila-udem/blocks.git@7beb788f1fcfc78d56c59a5edf9b4e8d98f8d7d9 \ | ||
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/requirements.txt | ||
|
||
# avoid using old numpy version installed by blocks requirements | ||
RUN pip2 install -U numpy | ||
RUN pip3 install -U numpy | ||
|
||
# Download english model of Spacy | ||
RUN python2 -m spacy download en | ||
RUN python3 -m spacy download en | ||
|
||
# Download gensim from Github | ||
RUN git clone $GENSIM_REPOSITORY \ | ||
&& cd /gensim \ | ||
|
@@ -76,12 +82,14 @@ RUN git clone $GENSIM_REPOSITORY \ | |
RUN mkdir /gensim/gensim_dependencies | ||
|
||
# Set ENV variables for wrappers | ||
ENV WR_HOME /gensim/gensim_dependencies/wordrank | ||
ENV FT_HOME /gensim/gensim_dependencies/fastText | ||
ENV MALLET_HOME /gensim/gensim_dependencies/mallet | ||
ENV DTM_PATH /gensim/gensim_dependencies/dtm/dtm/main | ||
ENV VOWPAL_WABBIT_PATH /gensim/gensim_dependencies/vowpal_wabbit/vowpalwabbit/vw | ||
|
||
# For fixed version downloads of gensim wrappers dependencies | ||
# For fixed version downloads of gensim wrappers dependencies | ||
ENV WORDRANK_VERSION 44f3f7786f76c79c083dfad9d64e20bacfb4a0b0 | ||
ENV FASTTEXT_VERSION f24a781021862f0e475a5fb9c55b7c1cec3b6e2e | ||
ENV MORPHOLOGICALPRIORSFORWORDEMBEDDINGS_VERSION ec2e37a3bcb8bd7b56b75b043c47076bc5decf22 | ||
ENV DTM_VERSION 67139e6f526b2bc33aef56dc36176a1b8b210056 | ||
|
@@ -90,7 +98,17 @@ ENV VOWPAL_WABBIT_VERSION 69ecc2847fa0c876c6e0557af409f386f0ced59a | |
|
||
# Install custom dependencies | ||
|
||
# TODO: Install wordrank (need to install mpich/openmpi with multithreading enabled) | ||
# Install mpich (a wordrank dependency) and remove openmpi to avoid mpirun conflict | ||
RUN apt-get purge -y openmpi-common openmpi-bin libopenmpi1.10 | ||
RUN apt-get install -y mpich | ||
|
||
# Install wordrank | ||
RUN cd /gensim/gensim_dependencies \ | ||
&& git clone https://bitbucket.org/shihaoji/wordrank \ | ||
&& cp /gensim/docker/wordrank_install.sh /gensim/gensim_dependencies/wordrank/install.sh \ | ||
&& cd /gensim/gensim_dependencies/wordrank \ | ||
&& git checkout $WORDRANK_VERSION \ | ||
&& sh ./install.sh | ||
|
||
# Install fastText | ||
RUN cd /gensim/gensim_dependencies \ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/bin/bash | ||
|
||
printf "1. clean up workspace\n" | ||
./clean.sh | ||
|
||
printf "\n2. install glove to construct cooccurrence matrix\n" | ||
wget http://nlp.stanford.edu/software/GloVe-1.0.tar.gz # if failed, check http://nlp.stanford.edu/projects/glove/ for the original version | ||
tar -xvzf GloVe-1.0.tar.gz; rm GloVe-1.0.tar.gz | ||
patch -p0 -i glove.patch | ||
cd glove; make clean all; cd .. | ||
|
||
printf "\n3. install hyperwords for evaluation\n" | ||
hg clone -r 56 https://bitbucket.org/omerlevy/hyperwords | ||
patch -p0 -i hyperwords.patch | ||
|
||
printf "\n4. build wordrank\n" | ||
#export CC=icc CXX=icpc | ||
export CC=gcc CXX=g++ # uncomment this line if you don't have an Intel compiler, but with gcc all #pragma simd are ignored as of now | ||
cmake . | ||
make clean all |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this script from PR. You already download it with the wordrank repo. After it, use
awk
orsed
to replace#export CC=icc CXX=icc
toexport CC=gcc CXX=g++