Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



11 Commits

Repository files navigation



Analysis on twitter sentiment analysis benchmark datasets as described in the paper Shubhanshu Mishra and Jana Diesner. 2018. Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora. In Proceedings of the 29th on Hypertext and Social Media (HT '18). ACM, New York, NY, USA, 2-10. DOI:

If you plan to use this analysis please cite the following items:

  doi = {10.1145/3209542.3209562},
  url = {},
  year  = {2018},
  publisher = {{ACM} Press},
  author = {Shubhanshu Mishra and Jana Diesner},
  title = {Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora},
  booktitle = {Proceedings of the 29th on Hypertext and Social Media  - {HT} {\textquotesingle}18}

  author       = {Shubhanshu Mishra},
  title        = {Twitter sentiment benchmark data analysis},
  month        = jul,
  year         = 2018,
  doi          = {10.5281/zenodo.1308462},
  url          = {}

Download the data with training, validation, and test splits

You can use the training, validation, and test splits data_with_train_dev_test_split.txt.gz as used in the paper by downloading the data in the data folder:

$ ls -ltrh data/
total 11M
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:26 joined_data_all.txt.gz
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:48 data_with_train_dev_test_split.txt.gz

The file was created as follows:

cd data && gunzip joined_data_all.txt.gz

Data sources:

Detecting the correlation between sentiment and user-level as well as text-level meta-data from benchmark corpora

Code for this analysis will can be seen in following files:

Code released under GNU General Public License v3.0