GitHub - AISE-TUDelft/curating-code-completions: Replication Package for "A Transformer-Based Approach for Smart Invocation of Automatic Code Completion"

A Transformer-Based Approach for
Smart Invocation of Automatic Code Completion

Aral de Moor, Arie van Deursen, and Maliheh Izadi

Delft University of Technology
AISE Lab @ Software Engineering Research Group

Full Paper (with Appendices)
Workshop Paper (AIWARE @ FSE'24)
HuggingFace Model Collection

Abstract

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data.

To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.

Analysis of the features used in Copilot's Filter.
Comparison of our novel tokenisation strategy against baselines.
The effect of class-distribution on CodeBERTa and Logistic Regression performance.
Performance of JonBERTa-head architecture variations.
Performance of JonBERTa-attn architecture variations.
Exhaustive comparison of alternative logistic-regression approaches for integrating (tokenised) code context with scalar telemetry features.

Offline Experiments

Our training & inference scripts are prefixed with a 1.

xx_[logres | codeberta | jonberta]_classifier.py are the training scripts for the Logistic Regression, CodeBERTa, and JonBERTa models, respectively.
modeling_jonberta is a Pytorch implementation of JonBERTa.
14_eval_models.ipynb contains our test setup.

Our evaluation scripts are prefixed with a 2.

20_statistics.ipynb bootstraps the results as described in the Evaluation Metrics sub-section of the Experimental Setup.
21_user_study.ipynb tracks usage data of the deployed filters.
22_codebertscore.ipynb performs CodeBERTScore computation on the accepted-completion / ground-truth pairs.

Online Experiments

Implementation of the filters for the Code4Me user study can be found in the code4me public repository on GitHub.

Models

The median-performing model across dataset splits, for every hyperparameter combination, is published on our AISE Lab's huggingface collection.

Cite

To cite the paper, you may use

@misc{de_moor_smart_invocation_2024,
	title = {A {Transformer}-{Based} {Approach} for {Smart} {Invocation} of {Automatic} {Code} {Completion}},
	url = {http://arxiv.org/abs/2405.14753},
	doi = {10.1145/3664646.3664760},
	author = {de Moor, Aral and van Deursen, Arie and Izadi, Maliheh},
	month = may,
	year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
results/bootstrap		results/bootstrap
11_logres_classifier.py		11_logres_classifier.py
12_codeberta_classifier.py		12_codeberta_classifier.py
13_jonberta_classifier.py		13_jonberta_classifier.py
14_eval_models.ipynb		14_eval_models.ipynb
20_statistics.ipynb		20_statistics.ipynb
21_user_study.ipynb		21_user_study.ipynb
22_codebertscore.ipynb		22_codebertscore.ipynb
99_upload_models.ipynb		99_upload_models.ipynb
README.md		README.md
a_tokenisation_strategies.py		a_tokenisation_strategies.py
appendix.pdf		appendix.pdf
modeling_jonberta.py		modeling_jonberta.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Transformer-Based Approach for
Smart Invocation of Automatic Code Completion

A Transformer-Based Approach for
Smart Invocation of Automatic Code Completion

Abstract

Contents

Appendix to the Paper

Offline Experiments

Online Experiments

Models

Cite

About

Releases

Packages

Languages

AISE-TUDelft/curating-code-completions

Folders and files

Latest commit

History

Repository files navigation

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Abstract

Contents

Appendix to the Paper

Offline Experiments

Online Experiments

Models

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

A Transformer-Based Approach for
Smart Invocation of Automatic Code Completion

A Transformer-Based Approach for
Smart Invocation of Automatic Code Completion

Packages