TransGEC: Improving Grammaticial Error Correction with Translationese

The code for "TransGEC: Improving Grammaticial Error Correction with Translationese". Our models were trained using the NVIDIA Tesla V100 32G and A100 40G GPUs.

Citation

@inproceedings{fang-etal-2023-transgec,
    title = "{T}rans{GEC}: Improving Grammatical Error Correction with Translationese",
    author = "Fang, Tao  and
      Liu, Xuebo  and
      Wong, Derek F.  and
      Zhan, Runzhe  and
      Ding, Liang  and
      Chao, Lidia S.  and
      Tao, Dacheng  and
      Zhang, Min",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.223",
    pages = "3614--3633",
}

Simplified Instruction

We released the translationese GEC models (TransGEC) fine-tuned on (m)T5-large pre-trained language model. If you want to quickly explore our job, the following instructions may be useful to you.

Step 1: Requirements and Installation

This implementation is based on huggingface/transformers(v4.13.0)
- PyTorch version >= 1.3.1
- Python version >= 3.6
```
git clone https://github.com/NLP2CT/Trans4GEC.git
cd transformers
pip install .
pip install -r requirements.txt
```

Step 2: Download Translationese (m)T5-GEC Models and Data

Lang.	Model	Description	Model-Download	Data-Download
En	`TransGEC`	Fine-tuned with cLang8-en and translationese	TransGEC.en.model	data.en
De	`TransGEC`	Fine-tuned with cLang8-de and translationese	TransGEC.de.model	data.de
Ru	`TransGEC`	Fine-tuned with cLang8-ru and translationese	TransGEC.ru.model	data.ru
Zh	`TransGEC`	Fine-tuned with Lang8-zh and translationese	TransGEC.zh.model	data.zh

The directory of the downloaded data follows the following format:

data_xx/
 |--train
   |--translationese.tsv
   |--train-translationese.json
 |--dev
   |--dev.xx.json
 |--test
   |--test.xx.json
   |--test.xx.M2

Step 3: Generation and Evaluation

If you want to use the downloaded TransGEC models to generate and evaluate, please refer to the script transgec_generate.sh for detailed information.

Usage

If you want to fine-tune (m)T5-large pre-trained language model from scratch using translationese, please follow the instructions below.

Fine-tuning

sh /shell_finetune-T5/train_en.sh
sh /shell_finetune-T5/train_de.sh
sh /shell_finetune-T5/train_ru.sh
sh /shell_finetune-T5/train_zh.sh

Generation and Evaluation

sh /shell_finetune-T5/Generate_evaluate_en.sh
sh /shell_finetune-T5/Generate_evaluate_de.sh
sh /shell_finetune-T5/Generate_evaluate_ru.sh
sh /shell_finetune-T5/Generate_evaluate_zh.sh

Quick Links

Please refer to the following instructions for more information on our work:

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
bert-tf		bert-tf
fairseq		fairseq
script		script
shell_finetune-T5		shell_finetune-T5
transformers		transformers
README.md		README.md
bpe.sh		bpe.sh
detokenization.sh		detokenization.sh
processing.sh		processing.sh
tokenization.sh		tokenization.sh
transgec_generate.sh		transgec_generate.sh
tsv2json.sh		tsv2json.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransGEC: Improving Grammaticial Error Correction with Translationese

Citation

Simplified Instruction

Usage

Fine-tuning

Generation and Evaluation

Quick Links

About

Releases

Packages

Languages

NLP2CT/TransGEC

Folders and files

Latest commit

History

Repository files navigation

TransGEC: Improving Grammaticial Error Correction with Translationese

Citation

Simplified Instruction

Usage

Fine-tuning

Generation and Evaluation

Quick Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages