Documentation for the project is available in project wiki
- Download stanford coreNLP model from here.
- Put the downloaded model file into lib folder.
- Run "mvn clean package" in the project directory.
- The deployment package should be ready at ie-dist folder.
- Finish the build process.
- Copy the ie-dist folder to your spark cluster.
- RunSparkBatchDriver.sh will start the batch processing, where you can input sentences or (hdfs) file paths.
- To run Relation Evaluation, please refer to RunRelationEvaluation.sh, where you might need to change the file location according to your cluster settings. Please copy the data folder to your cluster and upload to hdfs before running evaluation (This only need to be performed once).
- Download latest stanford coreNLP model from here.
- Put the downloaded model file into lib folder.
- Open/import the project as Maven project.
- Add lib folder to the project library, and click Build.
- Refer to config.properties for configuration change, such like pipeline components, NER models, dictionary and regex rules;
- Cutomized training for NER and Relation Extractor can be supported by com.intel.ie.training package.