Skip to content

Commit

Permalink
Add yelp and yahoo data
Browse files Browse the repository at this point in the history
  • Loading branch information
JamesHujy committed Aug 20, 2022
1 parent f1668b5 commit 9d06cde
Show file tree
Hide file tree
Showing 8 changed files with 260,021 additions and 0 deletions.
19,997 changes: 19,997 additions & 0 deletions data/yahoo/vocab.txt

Large diffs are not rendered by default.

10,000 changes: 10,000 additions & 0 deletions data/yahoo/yahoo.test.txt

Large diffs are not rendered by default.

100,000 changes: 100,000 additions & 0 deletions data/yahoo/yahoo.train.txt

Large diffs are not rendered by default.

10,000 changes: 10,000 additions & 0 deletions data/yahoo/yahoo.valid.txt

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions data/yelp/readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Yelp15 data set used in the following paper:

@article{yang2017improved,
title={Improved Variational Autoencoders for Text Modeling using Dilated
Convolutions},
author={Yang, Zichao and Hu, Zhiting and Salakhutdinov, Ruslan and
Berg-Kirkpatrick, Taylor},
journal={arXiv preprint arXiv:1702.08139},
year={2017}
}

DESCRIPTION

This data set is constructed from Yelp15 from:
Duyu Tang, Bing Qin, Ting Liu.
Document Modeling with Gated Recurrent Neural Network
for Sentiment Classification. EMNLP 2017.
http://ir.hit.edu.cn/~dytang/paper/emnlp2015/emnlp-2015-data.7z

The original dataset contains 1.2 milliion training samples and 156k validation
and testing samples. We sample 100k as training, 10k as validation and 10k as
testing from the respective sets.

We use a vocabulary size of 20k and replace out of vocabulary tokens as _UNK.
10,000 changes: 10,000 additions & 0 deletions data/yelp/yelp.test.txt

Large diffs are not rendered by default.

100,000 changes: 100,000 additions & 0 deletions data/yelp/yelp.train.txt

Large diffs are not rendered by default.

10,000 changes: 10,000 additions & 0 deletions data/yelp/yelp.valid.txt

Large diffs are not rendered by default.

0 comments on commit 9d06cde

Please sign in to comment.