UPDATE:

Please check out our new paper on accurately predict real-valued sentence specificity with no domain restrictions, presented at AAAI 2019:

Wei-Jen Ko, Greg Durrett and Junyi Jessy Li, Domain Agnostic Real-Valued Specificity Prediction, AAAI 2019

The new system is located in this repo: https://github.com/wjko2/Domain-Agnostic-Sentence-Specificity-Prediction

This system works well on non-news domains, such as Twitter, online reviews, and conversational data.

SPECITELLER is a tool to predict sentence specificity.

The models in this package are obtained using co-training as described in Li and Nenkova, Fast and Accurate Prediction of Sentence Specificity, AAAI 2015.

Dependencies

Speciteller is implemented using Python 2.7. It depends on the following packages:

numpy
liblinear (in particular, liblinearutil.py; be sure you have a liblinear.so.<x> file in its python/ directory. If not, type make in python/)

Data and resources

Word lexicons for the models are available for download here. Please note that these resources come with license(s). Decompress the tar ball under this (i.e., the speciteller) directory.

Running Speciteller

Call:

$ python speciteller.py --inputfile inputfile --outputfile predfile

<inputfile> should consists of word-tokenized sentences, one sentence per line;
<predfile> will be the destination file which Speciteller will write the specificity scores to, one score per line in the same order as sentences in <inputfile>.
An optional argument is --write_all_preds. When flagged this will generate two addtional files: <predfile>.s (prediction from the shallow model) and <predfile>.w (prediction from the word representation model).

For example:

$ python speciteller.py --inputfile sents_test --outputfile test.probs

This will give you specificity scores for the two sentences in sents_test in test.probs.

The scores range from 0 to 1, with 0 being most general and 1 being most specific.

Practical notes

It is best that you word-tokenize your sentences. If you don't, you will still get a score, but less good (~4% less accurate if you translate them into labels with a cutoff at 0.5).
Note that the word embedding file is a compressed ~190mb .gz file. Each run of speciteller.py will load the file to generate features. Thus it is best to avoid loading it multiple times, or modify predict.py and tailor it for your data loading purpose.

Citation and contact

Please cite the following paper:

Junyi Jessy Li and Ani Nenkova. 2015. Fast and Accurate Prediction of Sentence Specificity. Twenty-Ninth Conference on Artificial Intelligence (AAAI). [bibtex]

Please send comments and feedback to Jessy Li.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
cotraining_models		cotraining_models
python3_code		python3_code
README.md		README.md
__init__.py		__init__.py
features.py		features.py
generatefeatures.py		generatefeatures.py
multisent.py		multisent.py
perwordfeatures.py		perwordfeatures.py
sents_test		sents_test
speciteller.py		speciteller.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UPDATE:

SPECITELLER is a tool to predict sentence specificity.

Dependencies

Data and resources

Running Speciteller

Practical notes

Citation and contact

About

Releases

Packages

Contributors 2

Languages

jjessyli/speciteller

Folders and files

Latest commit

History

Repository files navigation

UPDATE:

SPECITELLER is a tool to predict sentence specificity.

Dependencies

Data and resources

Running Speciteller

Practical notes

Citation and contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages