Code-Structure-Aware-Transformer

This is a replication package for CSA-Trans. Through the repository, you are able to run all experiments in "CSA-Trans: Code Structure Aware Transformer for AST". To replicate the results, follow the following steps.

1. Prepare dataset

If you want to build the dataset for yourself, first download Python and Java dataset from dataset link and put them inside /py and /java directories. Also, download each tree-sitter parser for python and java under directory named tree_sitter. The tree_sitter directory should be outside CSA-Trans directory. tree_sitter_parse.ipynb in each /py and /java guides through AST parsing for each languages, generating tree_sitter_python and tree_sitter_java directories.
We provide the parsed ASTs in anonymous link.

2. Preprocess.

For preprocessing Java / Python dataset, set work_dir in process.py as either 'tree_sitter_java' or 'tree_sitter_python'. Run

python process.py -data_dir ./ -max_ast_len 150 -process -make_vocab

3. Running experiments.

For single GPU, run

python main.py --config=./config/python.py --exp_type summary --g 0

For multi GPU, (4 GPUs are used for experiments) run

python -u -m torch.distributed.launch --nproc_per_node 4 --use_env main.py --config=./config/python.py --exp_type summary --g 0,1,2,3

4. Comparing with AST-Trans and CodeScribe

For comparison with ast-trans for python dataset

Uncomment ignore_idx in process.py
Set processed_path to ./processed_ast_trans_data/.
Run process.py
Run python_compare_asttrans.py.

For comparison with CodeScribe

Copy each ast.original in train/test/dev to compare_codescribe_{language} train/test/dev.
Run process.py with languages = ["compare_codescribe_java/"] / ["compare_codescribe_python/"].
Run python_compare_codescribe.py or java_compare_codescribe.py.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
dataset		dataset
java		java
module		module
py		py
script		script
utils		utils
valid_metrices		valid_metrices
.gitignore		.gitignore
README.md		README.md
csa_trans_time_memory.py		csa_trans_time_memory.py
inp_java.py		inp_java.py
inp_py.py		inp_py.py
main.py		main.py
my_ast.py		my_ast.py
process.py		process.py
readme.txt		readme.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code-Structure-Aware-Transformer

1. Prepare dataset

2. Preprocess.

3. Running experiments.

For single GPU, run

For multi GPU, (4 GPUs are used for experiments) run

4. Comparing with AST-Trans and CodeScribe

For comparison with ast-trans for python dataset

For comparison with CodeScribe

About

Releases

Packages

Languages

saeyoon17/Code-Structure-Aware-Transformer

Folders and files

Latest commit

History

Repository files navigation

Code-Structure-Aware-Transformer

1. Prepare dataset

2. Preprocess.

3. Running experiments.

For single GPU, run

For multi GPU, (4 GPUs are used for experiments) run

4. Comparing with AST-Trans and CodeScribe

For comparison with ast-trans for python dataset

For comparison with CodeScribe

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages