The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple yet effective framework CodeS, which decomposes NL2Repo into multiple sub-tasks by a multi-layer sketch. Specifically, CodeS includes three modules: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher first generates a repository's directory structure for given requirements; FileSketcher then generates a file sketch for each file in the generated structure; SketchFiller finally fills in the details for each function in the generated file sketch. To rigorously assess CodeS on the NL2Repo task, we carry out evaluations through both automated benchmarking and manual feedback analysis. For benchmark-based evaluation, we craft a repository-oriented benchmark, SketchEval, and design an evaluation metric, SketchBLEU. For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies. Extensive experiments prove the effectiveness and practicality of CodeS on the NL2Repo task.
.
├── assets
├── clean_repo.py # ./repos/ -> ./cleaned_repos/
├── cleaned_repos
├── craft_train_data.py # ./output -> ./training_data
├── extract_sketch.py # ./cleaned_repos/ -> ./output
├── outputs
├── projects # two projects
├── prompt_construction_utils.py
├── repos
├── requirements.txt
├── run_step1_clean.sh # runing ./clean_repo.py
├── run_step2_extract_sketch.sh # runing ./extract_sketch.py
├── run_step3_make_data.sh # runing ./craft_train_data.py
├── scripts
├── train # *train codes model* scripts
├── training_data
└── validation # *evaluation* scripts
- Download the selected repositories to the
./repos
directory and unzip them; - Preprocess the repositories;
bash run_step1_clean.sh
- Extract instruction training data for
RepoSketcher
,FileSketcher
, andSketchFiller
.
bash run_step2_extract_sketch.sh
bash run_step3_make_data.sh
-
Place the created instruction data into
./train/data
and configuredataset_info.json
according to the structure described at https://github.com/hiyouga/LLaMA-Factory/tree/main/data. -
Start the training process:
vim ./train/run_train_multi_gpu.sh
bash ./train/run_train_multi_gpu.sh
-
Install
SketchBLEU
, similar toCodeBLEU
. -
Perform inference on
SketchEval
:
python ./codes/validation/evaluation-scripts/from_scratch_inference.py
- Convert the inference results for the entire repository:
python ./codes/validation/evaluation-scripts/transfer_output_to_repo.py
- Evaluate the generated repository as with
CodeBLEU
:
python ./codes/validation/evaluation-scripts/batch_eval/get_metric.py