-
Notifications
You must be signed in to change notification settings - Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of the insights portion of the Polar Deep Insights system. Two parts of the project, the insight-generator, a python library used to extract information, and the insight-visualizer, a javascript application used for data visualization, can be installed and run in Docker containers using PDI Docker.
-
Log into docker on you machine
-
git clone https://github.com/USCDataScience/polar-deep-insights.git
-
cd polar-deep-insights/Docker
-
Install npm if you do not have it, else skip this step.
-
npm install -g elasticsearch-tools
Install elastic search tools -
chmod +x setup.sh
-
./setup.sh
creates a data folder and populates with other required empty folders -
Put your files(pdfs) in the
data/files
folder -
Export elastic index mappings.
-
If using polar.usc.edu's elastic search data
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json
-
If using your own database - Replace the
http://polar.usc.edu/elasticsearch
in the above command with your remote elastic search url or you localhost elastic index's url and run the above command.
-
-
Export elastic index data.
-
If using polar.usc.edu's elastic search data
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json
-
If using your own database - Replace the
http://polar.usc.edu/elasticsearch
in the above command with your remote elastic search url or you localhost elastic index's url and run the above command.
PS: This step may take a while as it exports a huge database with around 100k documents(for polar data).
-
-
Install some necessary files - description can be found here.
- For Linux based OS(Ubuntu, MacOS, etc):
chmod +x pre_installation.sh
-
./pre_installation.sh
This step will install the necessary sh files from the web and uses the wget command. If you encounter an error : wget not found:-- Install wget (eg: for MacOS :
brew install wget
) OR - Open pre_installation.sh and replace
wget
withcurl -0 filename
where filename is the name of the file on each command OR - Refer to point 1.ii.b
- Install wget (eg: for MacOS :
- For Windows OS:
- For Linux based OS(Ubuntu, MacOS, etc):
-
Add files to the following folders according to these instructions:
-
data/files
: Add your data files of any filetype - to generate insights from -
data/polar
: Contains mappings and data from the elastic search url -
data/ingest
: Output from pdi insight generator will be saved here under the filenameingest_data.json
-
data/sparkler/raw
: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.json
file in this folder -
data/sparkler/parsed
: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json
) is parsed using parse.py and saved insparkler_data.json
-
-
Build Insight Generator
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-generator
-
Build from local
docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .
OR pull from docker hub
docker pull uscdatascience/pdi-generator
-
PDI_JSON_PATH=/data/polar docker-compose up -d
-
-
This container exposes the following ports:
8765 - Geo Topic Parser
9998 - Apache Tika Server
8060 - Grobid Quantities REST API
-
git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-visualizer
-
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .
ORdocker pull uscdatascience/polar-deep-insights
-
PDI_JSON_PATH=data/polar docker-compose up -d
-
Access application at http://localhost/pdi/
-
Access elasticsearch at http://localhost/elasticsearch/
-
This container exposes the following ports:
80 - Apache2/HTTPD server
9000 - Grunt server servig up the PDI application
9200 - Elasticsearch 2.4.6 server
35729 - Auto refresh port for AngularJS apps
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
docker logs -f container_id
- use your docker container's id
docker exec -it container_id bash
- use your docker container's id
Information Retrieval and Data Science (IRDS) research group, University of Southern California.