TwitterSentimentBenchmarkDataAnalysis

Analysis on twitter sentiment analysis benchmark datasets as described in the paper Shubhanshu Mishra and Jana Diesner. 2018. Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora. In Proceedings of the 29th on Hypertext and Social Media (HT '18). ACM, New York, NY, USA, 2-10. DOI: https://doi.org/10.1145/3209542.3209562

If you plan to use this analysis please cite the following items:

@inproceedings{Mishra2018,
  doi = {10.1145/3209542.3209562},
  url = {https://doi.org/10.1145/3209542.3209562},
  year  = {2018},
  publisher = {{ACM} Press},
  author = {Shubhanshu Mishra and Jana Diesner},
  title = {Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora},
  booktitle = {Proceedings of the 29th on Hypertext and Social Media  - {HT} {\textquotesingle}18}
}

@misc{shubhanshu_mishra_2018_1308462,
  author       = {Shubhanshu Mishra},
  title        = {Twitter sentiment benchmark data analysis},
  month        = jul,
  year         = 2018,
  doi          = {10.5281/zenodo.1308462},
  url          = {https://doi.org/10.5281/zenodo.1308462}
}

Download the data with training, validation, and test splits

You can use the training, validation, and test splits data_with_train_dev_test_split.txt.gz as used in the paper by downloading the data in the data folder:

$ ls -ltrh data/
total 11M
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:26 joined_data_all.txt.gz
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:48 data_with_train_dev_test_split.txt.gz

The file was created as follows:

cd data && gunzip joined_data_all.txt.gz
python create_data_splits.py

Data sources:

SemEval - http://alt.qcri.org/semeval2017/task4/
Airline - https://www.kaggle.com/crowdflower/twitter-airline-sentiment
GOP Debate - https://www.kaggle.com/crowdflower/first-gop-debate-twitter-sentiment
Clarin - https://www.clarin.si/repository/xmlui/handle/11356/1054
HCR - https://bitbucket.org/speriosu/updown/wiki/Getting%20Started
Obama - https://bitbucket.org/speriosu/updown/wiki/Getting%20Started

Detecting the correlation between sentiment and user-level as well as text-level meta-data from benchmark corpora

Code for this analysis will can be seen in following files:

Code released under GNU General Public License v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
figures		figures
.gitignore		.gitignore
Empirical_Analysis.ipynb		Empirical_Analysis.ipynb
Explore_tweet_data.ipynb		Explore_tweet_data.ipynb
Explore_user_timelines.ipynb		Explore_user_timelines.ipynb
Get_user_timelines.ipynb		Get_user_timelines.ipynb
Join_all_data.ipynb		Join_all_data.ipynb
LICENSE		LICENSE
Meta data models.ipynb		Meta data models.ipynb
README.md		README.md
Text_models.ipynb		Text_models.ipynb
Time_line_aggregates.ipynb		Time_line_aggregates.ipynb
Vader_sentiment_prediction.ipynb		Vader_sentiment_prediction.ipynb
create_data_splits.py		create_data_splits.py
emoticons.py		emoticons.py
twarc_get_timelines.py		twarc_get_timelines.py
twokenize.py		twokenize.py
twokenize_wrapper.py		twokenize_wrapper.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwitterSentimentBenchmarkDataAnalysis

Download the data with training, validation, and test splits

Data sources:

Detecting the correlation between sentiment and user-level as well as text-level meta-data from benchmark corpora

About

Releases 2

Packages

Languages

License

napsternxg/TwitterSentimentBenchmarks

Folders and files

Latest commit

History

Repository files navigation

TwitterSentimentBenchmarkDataAnalysis

Download the data with training, validation, and test splits

Data sources:

Detecting the correlation between sentiment and user-level as well as text-level meta-data from benchmark corpora

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages