Skip to content

mheilman/clear-nlp-packaged

 
 

Repository files navigation

clear-nlp-packaged

A packaged client and server for Clear NLP 2.0.2. This software defines an interface for a client/server system that takes as input raw text and produces analyses based on tokenization, lemmatization, part-of-speech tagging and dependency parsing.

The software uses the Thrift framework to define and use a service that offers labeling for strings and files. The NLP functionality is from Jinho D. Choi's Clear NLP, which is Apache licensed and available on

http://clearnlp.com

The first step is to download and build the software. The software is kept, for now, at

https://github.com/cbrew/clear-nlp-packaged

This version contains no ETS specific material, other than the use of org.ets.nlp as the Java package name.

To build and run the server, first, do:

mvn package

which downloads the dependencies and builds a jar file :

target/uber-clearserver-1.0.jar

This contains the code, and all the statistical models necessary to run the server. Next, run the server by invoking:

java -Xmx3g -jar target/uber-clearserver-1.0.jar

I do this in a screen session. It takes a while to load models, then sets up a server waiting for calls from clients. Thrift handles the server client interaction nicely. It works across machines, since that is the point of Thrift. My colleague Daehee Lee, at Nuance, has similar code for a large number of different parsers.

Once the server is running, the following call can be made.

python demo.py
0	_R_	_R_	_R_	_	_	_	_
1	Please	please	UH	_	4	intj	4:AM-DIS
2	do	do	VB	p2=VBP	4	aux	_
3	not	not	RB	_	4	neg	4:AM-NEG
4	lean	lean	VB	p2=JJ|pb=lean.01	0	root	_
5	out	out	IN	_	4	prep	4:A2-LOC
6	of	of	IN	_	5	prep	_
7	the	the	DT	_	8	det	_
8	window	window	NN	_	6	pobj	_
9	when	when	WRB	_	12	advmod	12:R-AM-TMP
10	the	the	DT	_	11	det	_
11	train	train	NN	_	12	nsubj	12:A1-PPT
12	is	be	VBZ	pb=be.01	4	advcl	4:AM-TMP
13	in	in	IN	_	12	prep	12:A2-PRD
14	motion	motion	NN	_	13	pobj	_
15	.	.	.	_	4	punct	_

0	_R_	_R_	_R_	_	_	_	_
1	deoxyribonucleic	deoxyribonucleic	JJ	p2=NN	2	amod	_
2	acid	acid	NN	_	0	root	_
3	with	with	IN	_	2	prep	_
4	Chinese	chinese	JJ	_	5	amod	_
5	tendencies	tendency	NNS	_	3	pobj	_
6	.	.	.	_	2	punct	_

Lasix 40-mg p.o. q.d.
Lasix forty-milligrams po qdpropranolol 50-mg.
0	_R_	_R_	_R_	_	_	_	_
1	Lasix	lasix	VB	p2=RB|pb=lasix.01	0	root	_
2	forty	#crd#	CD	p2=NN	4	hmod	_
3	-	-	HYPH	_	4	hyph	_
4	milligrams	milligram	NNS	p2=CD	5	nn	_
5	po	po	NN	p2=VBP	6	nn	_
6	qdpropranolol	qdpropranolol	NN	p2=VBP	9	nsubj	_
7	50	0	CD	_	9	hmod	_
8	-	-	HYPH	p2=SYM	9	hyph	_
9	mg	mg	NN	p2=NNS	1	dobj	1:A1
10	.	.	.	_	1	punct	_

these doses of antihypertensives.
0	_R_	_R_	_R_	_	_	_	_
1	these	these	DT	_	2	det	_
2	doses	dose	NNS	_	0	root	_
3	of	of	IN	_	2	prep	_
4	antihypertensives	antihypertensive	NNS	_	3	pobj	_
5	.	.	.	_	2	punct	_

the beta blocker.
0	_R_	_R_	_R_	_	_	_	_
1	the	the	DT	_	3	det	_
2	beta	beta	NN	p2=JJ	3	nn	_
3	blocker	blocker	NN	_	0	root	_
4	.	.	.	_	3	punct	_

red belt and shirt and suspenders.
0	_R_	_R_	_R_	_	_	_	_
1	red	red	JJ	_	2	amod	_
2	belt	belt	NN	_	0	root	_
3	and	and	CC	_	2	cc	_
4	shirt	shirt	NN	_	2	conj	_
5	and	and	CC	_	4	cc	_
6	suspenders	suspender	NNS	p2=NN	4	conj	_
7	.	.	.	_	1	punct	_

There is also a client in the Java part of the code, and Thrift supports many other languages.

About

A packaged client for Clear NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 60.9%
  • Python 39.1%