LexTex

This repository contains the LexTex sources, the main outcome of the work: LexTex: A Novel FrameWork to Automatically Assign Scores to Word Senses in Domain Specific Categories

LexTex is a python developed tool that enables to build new lexicons starting from a set of annotated resources. It is written in python end exploits functionalities of Stanford Core NLP and UKB.

Authors: Danilo Dessi' and Diego Reforgiato Recupero

Installation

Requirements

Python version >= 3.6

Java version >= 1.8

Install other requirements with: pip3 install -r src/requirements.txt

Finally, download the stopwords module of nltk with python3 -m nltk.downloader stopwords

Other software

LexTex uses corenlp-server and ukb-3.1 in its pipeline. Please use this versions.

ukb-3.1/ can be downloaded from http://ixa2.si.ehu.es/ukb/. Extract the archive under the LexTex directory. Compile its KB following the point 1.2 of its README (see readme under the ./script directory). In our experiments we employed the version 3.0 of WordNet.
stanford-corenlp-full-2018-10-05 can be downloaded from https://stanfordnlp.github.io/CoreNLP/. Extract the archive under the LexTex directory.

How to use

To use LexTex the command python3 LexText.py must be executed. Accepted parameters are:

-d, --directory : the direcry that contains all resources that will be employed to generate a new lexicon (mandatory)
-m, --mode : the mode with which UKB will be used by the Word Sense Disambiguation Module. The default mode is ppr_w2w.
-c, --categories : the number of categories (mandatory)
-lc, --label-categories : the list of categories separated by coma (es. joy,sadness,fear)
-ln, --lexicon-name : the name that will be assiged to the lexicon
-nn, --no-norm: flag that indicates that averages over categories must not be normalized
-nc, --no-coeff: flag that indicates that averages over categories must not be weighted with "cf"

Examples

python3 LexTex.py -d training -c 5

python3 LexTex.py -d training -c 5 -ln my_new_super_lexicon

python3 LexTex.py -d training -c 5 -m ppr_w2w -ln my_new_super_lexicon++

python3 LexTex.py -d training -c 5 -m ppr_w2w -ln my_new_super_lexicon++ --no-coeff

Input

Input files must be added into a unique directory. Each row of a file must contain a text followed by a score for each categories. The separation character must be '\t'.

Output

The output is a lexicon where in each row there is a WordNet synset and a value for each input category. See the directory lexicons for examples.

Planned developments and fixes:

adding the headers to columns of generated lexicons

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
datasets		datasets
lexicons		lexicons
results-supervised		results-supervised
src		src
test		test
training_data		training_data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexTex

Contents

Installation

Requirements

Other software

How to use

Input

Output

About

Releases

Packages

Languages

License

danilo-dessi/LexTex

Folders and files

Latest commit

History

Repository files navigation

LexTex

Contents

Installation

Requirements

Other software

How to use

Input

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages