This repository contains all our exploration on the use of machine learning methods to automatically recommend topics (tags) for software repositories.
It currently is organized as following:
data-preparation
: Our scripts for pre-processing the data, the data containing rules for preprocessing tags and generating sub-topics. Also, we include the final dataset of preprocessed sub-topics along with their featured Github topics and set of aliases.machine-learning
: Our python scripts for training various machine-learning-based algorithms for recommending topics.
The paper can be found here.