Skip to content

Instructions to install and run PDI Docker

rduerr edited this page Nov 20, 2018 · 6 revisions

This is the Dockerized version of the insights portion of the Polar Deep Insights system. Two parts of the project, the insight-generator, a python library used to extract information, and the insight-visualizer, a javascript application used for data visualization, can be installed and run in Docker containers using PDI Docker.

Pre-requisites

  1. Log into docker on you machine

  2. git clone https://github.com/USCDataScience/polar-deep-insights.git

  3. cd polar-deep-insights/Docker

  4. Install npm if you do not have it, else skip this step.

  5. npm install -g elasticsearch-tools Install elastic search tools

  6. chmod +x setup.sh

  7. ./setup.sh creates a data folder and populates with other required empty folders

  8. Put your files(pdfs) in the data/files folder

  9. Export elastic index mappings.

    1. If using polar.usc.edu's elastic search data

      es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json

    2. If using your own database - Replace the http://polar.usc.edu/elasticsearch in the above command with your remote elastic search url or you localhost elastic index's url and run the above command.

  10. Export elastic index data.

    1. If using polar.usc.edu's elastic search data

      es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json

    2. If using your own database - Replace the http://polar.usc.edu/elasticsearch in the above command with your remote elastic search url or you localhost elastic index's url and run the above command.

    PS: This step may take a while as it exports a huge database with around 100k documents(for polar data).

Insight Generator Installation

  1. Install some necessary files - description can be found here.

    1. For Linux based OS(Ubuntu, MacOS, etc):
      1. chmod +x pre_installation.sh
      2. ./pre_installation.sh This step will install the necessary sh files from the web and uses the wget command. If you encounter an error : wget not found:-
        1. Install wget (eg: for MacOS : brew install wget) OR
        2. Open pre_installation.sh and replace wget with curl -0 filename where filename is the name of the file on each command OR
        3. Refer to point 1.ii.b
    2. For Windows OS:
      1. If you have wget for windows as mentioned here, replace wget in the pre_installation.sh file with wget for windows.
      2. A more hassle-free solution is to manually download the files from their source web pages as mentioned here.
  2. Add files to the following folders according to these instructions:

    1. data/files : Add your data files of any filetype - to generate insights from
    2. data/polar : Contains mappings and data from the elastic search url
    3. data/ingest : Output from pdi insight generator will be saved here under the filename ingest_data.json
    4. data/sparkler/raw : Add Sparkler crawled data from the SOLR index into the sparkler_rawdata.json file in this folder
    5. data/sparkler/parsed : Sparkler data (in data/sparkler/raw/sparkler_rawdata.json) is parsed using parse.py and saved in sparkler_data.json
  3. Build Insight Generator

    1. git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-generator

    2. Build from local

      docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .

      OR pull from docker hub

      docker pull uscdatascience/pdi-generator

    3. PDI_JSON_PATH=/data/polar docker-compose up -d

  4. This container exposes the following ports:

    8765 - Geo Topic Parser

    9998 - Apache Tika Server

    8060 - Grobid Quantities REST API

Insight Visualizer Installation

  1. git clone https://github.com/USCDataScience/polar-deep-insights.git && cd polar-deep-insights/Docker/insight-visualizer

  2. docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile . OR docker pull uscdatascience/polar-deep-insights

  3. PDI_JSON_PATH=data/polar docker-compose up -d

  4. Access application at http://localhost/pdi/

  5. Access elasticsearch at http://localhost/elasticsearch/

  6. This container exposes the following ports:

    80 - Apache2/HTTPD server

    9000 - Grunt server servig up the PDI application

    9200 - Elasticsearch 2.4.6 server

    35729 - Auto refresh port for AngularJS apps

PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.

Monitoring the Container

docker logs -f container_id - use your docker container's id

Logging onto the Container with a Bash Shell

docker exec -it container_id bash - use your docker container's id