Skip to content

Latest commit

 

History

History
43 lines (29 loc) · 2.45 KB

README.md

File metadata and controls

43 lines (29 loc) · 2.45 KB

CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks.

Datasets

🤗Hugging Face or Google Drive

Code

The MultilingualTrans, NicheTrans, and DLTrans datasets were experimented with on CodeT5+, and the code is in the CodeT5+ file.

The LLMTrans dataset was experimented with on GPT-3.5, and the code is in the ChatGPT file.

Citation

Please cite the paper if you use the data or code from CodeTransOcean.

@article{yan2023codetransocean,
  title={CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation},
  author={Yan, Weixiang and Tian, Yuchen and Li, Yunzhe and Chen, Qian and Wang, Wen},
  journal={arXiv preprint arXiv:2310.04951},
  year={2023}
}

Contact

For questions, please feel free to reach out via email at [email protected].