-
Notifications
You must be signed in to change notification settings - Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of the insights portion of the Polar Deep Insights system. Two parts of the project, the insight-generator, a python library used to extract information, and the insight-visualizer, a javascript application used for data visualization, can be installed and run in Docker containers using the instructions below.
-
Install docker - if it isn't already installed
-
If you normally log into a docker registry on your machine, do so now
-
At a terminal window type
git clone https://github.com/USCDataScience/polar-deep-insights.git
-
Then type
cd polar-deep-insights/Docker
-
Install npm if it isn't installed already, else skip this step.
-
Install elastic search tools. Depending on your permissions you may have to type
-
npm install -g elasticsearch-tools
or -
sudo npm install -g elasticsearch-tools
and entering your password at the prompt
-
-
Make the sript setup.sh executable by typing
chmod +x setup.sh
- It should be noted that
./setup.sh
creates a data folder and populates it with a variety of other required files and empty folders - If you are planning on analyzing your own files, please put them in the in the
data/files
folder. Any format is acceptable, though the parsers may not extract all possible data if the format is very unusual.
- It should be noted that
-
Export elastic index mappings.
-
If using polar.usc.edu's elastic search data, type
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json
-
If using your own database - Replace the
http://polar.usc.edu/elasticsearch
in the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
-
-
Export elastic index data.
-
If using polar.usc.edu's elastic search data type
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json
-
If using your own database - Replace the
http://polar.usc.edu/elasticsearch
in the above command with your remote elastic search url or your localhost elastic search url and run the above command.
PS: This step may take a while depending on the size of your elasticsearch database. The Polar data set contains 100k documents and takes quite a long time (go get coffee).
-
-
Install some necessary files - description can be found here.
- For Linux based OS(Ubuntu, MacOS, etc):
chmod +x pre_installation.sh
-
./pre_installation.sh
This step will install the necessary sh files from the web and uses the wget command. If you encounter an error : wget not found:-- Install wget (eg: for MacOS :
brew install wget
) OR - Open pre_installation.sh and replace
wget
withcurl -0 filename
where filename is the name of the file on each command OR - Refer to point 1.ii.b
- Install wget (eg: for MacOS :
- For Windows OS:
- For Linux based OS(Ubuntu, MacOS, etc):
-
Add files to the following folders according to these instructions:
-
data/files
: Add your data files of any filetype - to generate insights from -
data/polar
: Contains mappings and data from the elastic search url -
data/ingest
: Output from pdi insight generator will be saved here under the filenameingest_data.json
-
data/sparkler/raw
: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.json
file in this folder -
data/sparkler/parsed
: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json
) is parsed using parse.py and saved insparkler_data.json
-
-
Build Insight Generator
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-generator
-
Build from local
docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .
OR pull from docker hub
docker pull uscdatascience/pdi-generator
-
PDI_JSON_PATH=/data/polar docker-compose up -d
-
-
This container exposes the following ports:
8765 - Geo Topic Parser
9998 - Apache Tika Server
8060 - Grobid Quantities REST API
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-visualizer
-
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .
ORdocker pull uscdatascience/polar-deep-insights
-
PDI_JSON_PATH=data/polar docker-compose up -d
-
Access application at http://localhost/pdi/
-
Access elasticsearch at http://localhost/elasticsearch/
-
This container exposes the following ports:
80 - Apache2/HTTPD server
9000 - Grunt server servig up the PDI application
9200 - Elasticsearch 2.4.6 server
35729 - Auto refresh port for AngularJS apps
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
docker logs -f container_id
- use your docker container's id
docker exec -it container_id bash
- use your docker container's id
Information Retrieval and Data Science (IRDS) research group, University of Southern California.