CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks.
The MultilingualTrans, NicheTrans, and DLTrans datasets were experimented with on CodeT5+, and the code is in the CodeT5+ file.
The LLMTrans dataset was experimented with on GPT-3.5, and the code is in the ChatGPT file.
Please cite the paper if you use the data or code from CodeTransOcean.
@article{yan2023codetransocean,
title={CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation},
author={Yan, Weixiang and Tian, Yuchen and Li, Yunzhe and Chen, Qian and Wang, Wen},
journal={arXiv preprint arXiv:2310.04951},
year={2023}
}
For questions, please feel free to reach out via email at [email protected]
.