Automatic Paraphrase Dataset Augmentation

This repository includes data and code for implementing the paper Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory.

Dependencies

You can install all the required packages by running the following command:
python -m pip install -r requirements.txt

Datasets

Quora Question Pairs
We used the train/dev splits from the GLUE benchmark, which you can download from here.

Generating Augmented QQP Dataset

python generate_qqp_datasets.py -o OUTPUT_DIR -d [original_flipped | augmented | augmented_flipped]

Bibtex

@inproceedings{chen-etal-2020-finding,
    title = "Finding {F}riends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory",
    author = "Chen, Hannah  and
      Ji, Yangfeng  and
      Evans, David",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.426",
    doi = "10.18653/v1/2020.findings-emnlp.426",
    pages = "4741--4751"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic Paraphrase Dataset Augmentation

Dependencies

Datasets

Generating Augmented QQP Dataset

Bibtex

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Paraphrase Dataset Augmentation

Dependencies

Datasets

Generating Augmented QQP Dataset

Bibtex