GitHub - sarnoult/opinion_miner_deluxePP: Opinion miner (opinion expressions, targets and holders) based on machine learning

#Opinion miner deluxe ++#

##Introduction##

The opinion miner deluxe "plus plus" is an improved version of the opinion miner based on machine learning that can be trained using a list of KAF/NAF files. It is important to notice that the opinion miner module will not call to any external module to obtain features. It will read all the features from the input KAF/NAF file, so you have to make sure that your input file contains all the required information in advance (tokens, terms, polarities, constituents, entitiess, dependencies...). This is an example that the opinion miner would extract from the sentence I said that the hotel is nice, but the staff is the best!!:

    <opinion oid="o2">
      <opinion_holder>
        <!-- I -->
        <span>
          <target id="t1"/>
        </span>
      </opinion_holder>
      <opinion_target>
        <!-- staff -->
        <span>
          <target id="t10"/>
        </span>
      </opinion_target>
      <opinion_expression polarity="DSE" strength="1">
        <!-- the best !! -->
        <span>
          <target id="t12"/>
          <target id="t13"/>
          <target id="t14"/>
        </span>
      </opinion_expression>
    </opinion>

The task is general divided into 2 steps

Detection of opinion entities (holder, target and expression): using Conditional Random Fields
Opinion entity linking (expression<-target and expression-<holder): using a simple heuristic (in the near future it will implemented as a binary Support Vector Machines)

##Quick Installation##

To install this software, just follow these steps:

git clone https://github.com/rubenIzquierdo/opinion_miner_deluxePP
cd opinion_miner_deluxePP
. install_me.sh

The install_me.sh will download and compile the required dependencies, and it will also download the trained models. You will be asked for a password during the download process. To obtain the password please mail me (you will find my contact details at the end of this documentation) To check if the installation was correct you can run this command:

cat example_en.naf | tag_file.py -d hotel

You should get a NAF file with opinions in the output. The script will assume to find the models in a predefined path, you can also specify the path to the folder where you have the models with the option -f:

cat example_en.naf | tag_file.py -f path/to/my_model/

##Usage of the opinion tagger##

The main script for tagging opinions is tag_file.py. It takes a KAF/NAF file as input stream and it has some parameters. You can see the parameters by running:

tag_file.py -h
usage: tag_file.py [-h] [-v] (-d DOMAIN | -f PATH_TO_FOLDER) [-log]

Detects opinions in KAF/NAF files

optional arguments:
  -h, --help         show this help message and exit
  -v, --version      show program's version number and exit
  -d DOMAIN          Domain for the model (hotel,news)
  -f PATH_TO_FOLDER  Path to a folder containing the model
  -log               Show log information
  -polarity          Run the polarity (positive/negative) classifier too

Example of use: cat example.naf | tag_file.py -d hotel -polarity

##Description of the internal process##

In next subsections, a brief explanation of the 2 steps is given.

###Opinion Entity detection###

The first step when extracting opinions from text is to determine which portions of text represent the different opinion entities:

Opinion expressions: very nice, really ugly ...
Opinion targets: the hotel, the rooms, the staff ...
Opinion holders: I, our family, the manager ...

In order to do this, three different Conditional Random Fields (CRF) classifiers have been trained using by default this set of features: tokens, lemmas, part-of-speech tags, constituent labels and polarity of words and entities. These classifiers detect portions of text representeing differnet opinion entities.

###Opinion Entity linking###

This step takes as input the opinion entities detected in the previous step, and links them to create the final opinions <expression/target/holder>. In this case we have trained two binary Support Vector Machines (SVM), one that indicates the degree of association between a given target and a given expression, and another one that gives the degree of linkage between a holder and an opinion expression. So given a list of expressions, a list of targets and holders detected by the CRF classifiers, the SVM models try to select the best candidate from the target list for each expressions, and the best holder from the holder list, to create the final opinion triple.

Considering a certain opinion expression and a target, these are the features by default used to represent this data for the SVM engine:

Textual features: tokens and lemmas of the expression and the target
Distance features: features representing the relative distance of both elements in the text (normalized to a discrete list of possible values: far/medium/close for instance), and if both elements are in the same sentence or not
Dependency features: to indicate the dependency relations between the two elements in the text (dependency path, and dependencies relations with the root of the sentence)

##Training## To be completed...

##Contact##

Ruben Izquierdo
Vrije University of Amsterdam
[email protected] [email protected]
http://rubenizquierdobevia.com/

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
crf_lib		crf_lib
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_en.naf		example_en.naf
extract_features_expression.py		extract_features_expression.py
extract_features_holder.py		extract_features_holder.py
extract_features_target.py		extract_features_target.py
extract_sequences.py		extract_sequences.py
install_me.sh		install_me.sh
match_entities_by_distance.py		match_entities_by_distance.py
mpqa_lexicon.py		mpqa_lexicon.py
path_crf.py		path_crf.py
polarity_classifier.py		polarity_classifier.py
tag_file.py		tag_file.py
tag_polarity_classifier.py		tag_polarity_classifier.py
train_polarity_classifier.py		train_polarity_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

sarnoult/opinion_miner_deluxePP

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages