Skip to content

ltrc/ilparser-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ilparser-api

An API for Parsing Indic Languages

Note: This API runs the full dependency parser which can take, on an average, up to 2.6 seconds on a sentence of length 20 words.

Installing Dependencies (Ubuntu):

$ sudo -E apt-get install autoconf cpanminus gcc git libgdbm-dev libglib2.0-dev make python-numpy \
  python-pydot python-urllib3 python-pip

Installing Dependencies (Fedora):

$ sudo -E dnf install autoconf git gcc gdbm-devel glib2-devel make numpy pydot perl-App-cpanminus \
  python-urllib3 python-pip

Installing Perl Dependencies:

$ cpanm Data::Dumper Dir::Self IPC::Run List::Util Config::IniFiles Mojolicious::Lite

Install CRF++

$ curl -L https://github.com/ltrc/ilparser-api/releases/download/0.1/CRF.-0.58.tar.gz | tar -xz
$ cd CRF++-0.58
$ ./configure ; sudo make install
$ echo "/usr/local" | sudo tee /etc/ld.so.conf.d/crfpp.conf
$ sudo ldconfig

Setting up the repo:

$ git clone https://github.com/ltrc/ilparser-api.git
$ cd ilparser-api
$ ./setup.sh

To run the api server:

$ perl api.pl prefork

Tu run from pre-built tocker container, type:

$ docker pull ltrc/ilparser-api:v1 
$ docker run --name ilparser-api -dit ltrc/ilparser-api:v1
#To find out the IP address of the container:
$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' ilparser-api 
# Replace 172.17.0.2 below with the IP obtained from the above command.
$ curl -s 172.17.0.2/parse --data lang=hin --data data=" देश के टूरिजम में राजस्थान एक अहम जगह रखता है।"

Example use of the API:

$ curl -s localhost:3000/parse --data lang=hin \
    --data data="माना जाता है कि अमृतमंथन के बाद अमृत की कुछ बूँदें यहाँ गिरी थीं , इसलिए इसे ब्रह्मकुंड कहा जाता है ." | \
    jq '.["dependencyparse-11"]' | \
    sed -e 's/\\t/\t/g' -e 's/\\n/\n/g'  -e 's/\\"/\"/g' -e 's/^"//' -e 's/"$//'
1	माना	मान	v	VM	case-|vib-या_जा+ता_है|psd-|chunkId-VGF|pers-2|num-sg|tam-yA|sem-|cp-|gen-m	0	root	_	_
2	जाता	जा	v	VAUX	case-|vib-ता|psd-|chunkId-VGF|pers-any|num-sg|tam-wA|sem-|cp-|gen-m	1	lwg__vaux	_	_
3	है	है	v	VAUX	case-|vib-है|psd-|chunkId-VGF|pers-2|num-sg|tam-hE|sem-|cp-|gen-any	1	lwg__vaux	_	_
4	कि	कि	avy	CC	case-|vib-|psd-|chunkId-CCP|pers-|num-|tam-|sem-|cp-|gen-	1	k2	_	_
5	अमृतमंथन	अमृतमंथन	n	NNP	case-o|vib-०_का_बाद|psd-|chunkId-NP|pers-3|num-sg|tam-0|sem-|cp-|gen-m	14	k7t	_	_
6	के	का	psp	PSP	case-o|vib-का|psd-|chunkId-NP|pers-|num-sg|tam-kA|sem-|cp-|gen-m	5	lwg__psp	_	_
7	बाद	बाद	adv	NST	case-|vib-|psd-|chunkId-NP|pers-|num-|tam-|sem-|cp-|gen-	5	lwg__psp	_	_
8	अमृत	अमृत	n	NNP	case-o|vib-०_का|psd-|chunkId-NP2|pers-3|num-sg|tam-0|sem-|cp-|gen-m	11	r6	_	_
9	की	का	psp	PSP	case-d|vib-का|psd-|chunkId-NP2|pers-|num-sg|tam-kA|sem-|cp-|gen-f	8	lwg__psp	_	_
10	कुछ	कुछ	adj	QF	case-any|vib-|psd-|chunkId-NP3|pers-|num-any|tam-|sem-|cp-|gen-any	11	nmod__adj	_	_
11	बूँदें	बूँद	n	NN	case-d|vib-0|psd-|chunkId-NP3|pers-3|num-pl|tam-0|sem-|cp-|gen-f	14	k1	_	_
12	यहाँ	यहाँ	adv	PRP	case-|vib-|psd-|chunkId-NP4|pers-|num-|tam-|sem-|cp-|gen-	14	k7p	_	_
13	गिरी	गिरी	adj	JJ	case-any|vib-|psd-|chunkId-JJP|pers-|num-any|tam-|sem-|cp-|gen-f	14	k1s	_	_
14	थीं	था	v	VM	case-|vib-था|psd-|chunkId-VGF2|pers-any|num-pl|tam-WA|sem-|cp-|gen-f	4	ccof	_	_
15	,	&चोम्म	punc	SYM	case-|vib-|psd-|chunkId-VGF2|pers-|num-|tam-|sem-|cp-|gen-	14	rsym	_	_
16	इसलिए	इसलिए	adv	PRP	case-|vib-|psd-|chunkId-NP5|pers-|num-|tam-|sem-|cp-|gen-	19	rh	_	_
17	इसे	यह	pn	PRP	case-o|vib-को|psd-|chunkId-NP6|pers-3|num-sg|tam-ko|sem-|cp-|gen-any	19	k2	_	_
18	ब्रह्मकुंड	ब्रह्मकुंड	n	NNP	case-d|vib-0|psd-|chunkId-NP7|pers-3|num-pl|tam-0|sem-|cp-|gen-m	19	k1	_	_
19	कहा	कहा	v	VM	case-|vib-०_जा+ता_है|psd-|chunkId-VGF3|pers-2|num-sg|tam-0|sem-|cp-|gen-m	4	ccof	_	_
20	जाता	जा	v	VAUX	case-|vib-ता|psd-|chunkId-VGF3|pers-any|num-sg|tam-wA|sem-|cp-|gen-m	19	lwg__vaux	_	_
21	है	है	v	VAUX	case-|vib-है|psd-|chunkId-VGF3|pers-2|num-sg|tam-hE|sem-|cp-|gen-any	19	lwg__vaux	_	_
22	.	.	punc	SYM	case-|vib-|psd-|chunkId-BLK|pers-|num-|tam-|sem-|cp-|gen-	1	rsym	_	_

Querying the API through browser (output rendered using a JSON plugin):

GoogleChromeILParserAPI

Querying the API through browser (♥ pretty print!):

Enter into the browser: http://localhost:3000/parse?lang=hin&data=देश के टूरिजम में राजस्थान&pretty

PrettyPrint

In case of port conflicts, edit the file(s): ./lib/${lang}/daemons.ini

Note: Please make sure no proxy environment variables are present while contacting the server, if deployed locally.