Skip to content

Run the NLP Pipeline

jdchoi77 edited this page Nov 7, 2014 · 2 revisions

Contents

NLP Components

ClearNLP supports NLP components for the following tasks.

Command-Line Tools

  • The following shows the command to run the NLP pipeline. The pos, morph, dep, srl modes perform the pipeline for part-of-speech tagging, morphological analysis, dependency parsing, and semantic role labeling, respectively.

    java com.clearnlp.nlp.engine.NLPDecode -z <mode> -c <filename> -i <filepath> [-ie <regex> -oe <string>]
    
    -z <mode>     : pos|morph|dep|srl
    -c <filename> : configuration file (required)
    -i <filepath> : input path (required)
    -ie <regex>   : input file extension (default: .*)
    -oe <string>  : output file extension (default: labeled)
    
  • Download the following files and put them under clearnlp.

  • config_decode.xml, log4j.properties, clearnlp.txt.

  • If you are using the medical models, replace <model>general-en</model> with <model>medical-en</model>.

  • The following command takes the input file clearnlp.txt, performs semantic role labeling (srl) using the configuration file config_decode.xml, and generates the output file clearnlp.txt.cnlp.

    $ java -Xmx4g -XX:+UseConcMarkSweepGC -Dlog4j.configuration=file:log4j.properties com.clearnlp.nlp.engine.NLPDecode -z srl -c config_decode.xml -i clearnlp.txt
    Loading feature templates.
    Loading models.
    Loading lexica.
    ...
    Decoding:
    /Users/jdchoi/Desktop/clearnlp.txt
  • Use our visualization tool to view the output.

Using APIs with Maven and Eclipse

  • Install maven: (guidelines).

  • Create a maven project using the following command (use your own VERSION, GROUP_ID, and ARTIFACT_ID).

    VERSION=3.0
    GROUP_ID=edu.emory.clir
    ARTIFACT_ID=clearnlp-demo
    mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=$GROUP_ID -DartifactId=$ARTIFACT_ID -Dversion=$VERSION-SNAPSHOT
  • Add the following dependency to your pom.xml (here, we use the snapshot version).

    <dependency>
      <groupId>edu.emory.clir</groupId>
      <artifactId>clearnlp</artifactId>
      <version>3.0.0-SNAPSHOT</version>
    </dependency>
    
  • Goto the clearnlp-demo directory and enter the following command.

    mvn eclipse:eclipse
    
  • Create a Java project using the clearnlp-demo directory.