This project aims to introduce CodeTracker, a refactoring-aware tool that can generate the commit change history for method and variable declarations in a Java project with a very high accuracy.
- How to cite CodeTracker
- Requirements
- How to Build and Run
- How to add as a Maven dependency
- How to Track Blocks
- How to Track Methods
- How to Track Variables
- How to Track Attributes
- How to Run the REST API
- REST API Endpoints
- Oracle
- Experiments
If you are using CodeTracker in your research, please cite the following papers:
Mehran Jodavi and Nikolaos Tsantalis, "Accurate Method and Variable Tracking in Commit History," pp. 183-195, 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'2022), Singapore, Singapore, November 14–18, 2022.
@inproceedings{10.1145/3540250.3549079,
author = {Jodavi, Mehran and Tsantalis, Nikolaos},
title = {Accurate Method and Variable Tracking in Commit History},
year = {2022},
isbn = {9781450394130},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3540250.3549079},
doi = {10.1145/3540250.3549079},
booktitle = {Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
pages = {183–195},
numpages = {13},
keywords = {commit change history, refactoring-aware source code tracking},
location = {Singapore, Singapore},
series = {ESEC/FSE 2022}
}
Mohammed Tayeeb Hasan, Nikolaos Tsantalis, and Pouria Alikhanifard, "Refactoring-aware Block Tracking in Commit History," IEEE Transactions on Software Engineering, 2024.
@article{Hasan:TSE:2024:CodeTracker2.0,
author = {Hasan, Mohammed Tayeeb and Tsantalis, Nikolaos and Alikhanifard, Pouria},
journal = {IEEE Transactions on Software Engineering},
title = {Refactoring-aware Block Tracking in Commit History},
year = {2024},
pages = {1-20},
doi = {10.1109/TSE.2024.3484586}
}
Java 11.0.15 or newer
Apache Maven 3.6.3 or newer
- Clone repository
git clone https://github.com/jodavimehran/code-tracker.git
- Cd in the locally cloned repository folder
cd code-tracker
- Build code-tracker
mvn install
- Run the API usage examples shown in README
mvn compile exec:java -Dexec.mainClass="org.codetracker.Main"
Note: by default the repository https://github.com/checkstyle/checkstyle.git will be cloned in folder "code-tracker/tmp".
If you want to change folder where the repository will be cloned, you have to edit the field FOLDER_TO_CLONE
in class org.codetracker.Main
and execute mvn install
again
- Run the method tracking experiment (takes around 20 minutes for 200 tracked methods)
mvn compile exec:java -Dexec.mainClass="org.codetracker.experiment.MethodExperimentStarter"
- Run the variable tracking experiment (takes around 2 hours for 1345 tracked variables)
mvn compile exec:java -Dexec.mainClass="org.codetracker.experiment.VariableExperimentStarter"
- Run the block tracking experiment (takes around 2 hours for 1280 tracked blocks)
mvn compile exec:java -Dexec.mainClass="org.codetracker.experiment.BlockExperimentStarter"
Note: by default the analyzed repositories will be cloned in folder "code-tracker/tmp".
If you want to change folder where the repositories will be cloned, you have to edit the field FOLDER_TO_CLONE
in class org.codetracker.experiment.AbstractExperimentStarter
and execute mvn install
again
- Clone repository
git clone https://github.com/jodavimehran/code-tracker.git
- Import project
Go to File -> Import... -> Maven -> Existing Maven Projects
Browse to the root directory of project code-tracker
Click Finish
The project will be built automatically.
- Run the API usage examples shown in README
From the Package Explorer navigate to org.codetracker.Main
Right-click on the file and select Run as -> Java Application
- Run the method tracking experiment (takes around 20 minutes for 200 tracked methods)
From the Package Explorer navigate to org.codetracker.experiment.MethodExperimentStarter
Right-click on the file and select Run as -> Java Application
- Run the variable tracking experiment (takes around 2 hours for 1345 tracked variables)
From the Package Explorer navigate to org.codetracker.experiment.VariableExperimentStarter
Right-click on the file and select Run as -> Java Application
- Run the block tracking experiment (takes around 2 hours for 1280 tracked blocks)
From the Package Explorer navigate to org.codetracker.experiment.BlockExperimentStarter
Right-click on the file and select Run as -> Java Application
- Clone repository
git clone https://github.com/jodavimehran/code-tracker.git
- Import project
Go to File -> Open...
Browse to the root directory of project code-tracker
Click OK
The project will be built automatically.
- Run the API usage examples shown in README
From the Project tab navigate to org.codetracker.Main
Right-click on the file and select Run Main.main()
- Run the method tracking experiment (takes around 20 minutes for 200 tracked methods)
From the Project tab navigate to org.codetracker.experiment.MethodExperimentStarter
Right-click on the file and select Run MethodExperimentStarter.main()
- Run the variable tracking experiment (takes around 2 hours for 1345 tracked variables)
From the Project tab navigate to org.codetracker.experiment.VariableExperimentStarter
Right-click on the file and select Run VariableExperimentStarter.main()
- Run the block tracking experiment (takes around 2 hours for 1280 tracked blocks)
From the Project tab navigate to org.codetracker.experiment.BlockExperimentStarter
Right-click on the file and select Run BlockExperimentStarter.main()
Since version 1.0, CodeTracker is available in the Maven Central Repository. In order to use CodeTracker as a maven dependency in your project, add the following snippet to your project's build configuration file:
pom.xml
<dependency>
<groupId>io.github.jodavimehran</groupId>
<artifactId>code-tracker</artifactId>
<version>2.6</version>
</dependency>
build.gradle
implementation 'io.github.jodavimehran:code-tracker:2.6'
CodeTracker can track the history of code blocks in git repositories.
In the code snippet below we demonstrate how to print all changes performed in the history of for (final AuditListener listener : listeners)
.
.codeElementType()
can take the following values:
CodeElementType.FOR_STATEMENT
CodeElementType.ENHANCED_FOR_STATEMENT
CodeElementType.WHILE_STATEMENT
CodeElementType.IF_STATEMENT
CodeElementType.DO_STATEMENT
CodeElementType.SWITCH_STATEMENT
CodeElementType.SYNCHRONIZED_STATEMENT
CodeElementType.TRY_STATEMENT
CodeElementType.CATCH_CLAUSE
CodeElementType.FINALLY_BLOCK
GitService gitService = new GitServiceImpl();
try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
"https://github.com/checkstyle/checkstyle.git")){
BlockTracker blockTracker = CodeTracker.blockTracker()
.repository(repository)
.filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
.startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
.methodName("fireErrors")
.methodDeclarationLineNumber(384)
.codeElementType(CodeElementType.ENHANCED_FOR_STATEMENT)
.blockStartLineNumber(391)
.blockEndLineNumber(393)
.build();
History<Block> blockHistory = blockTracker.track();
for (History.HistoryInfo<Block> historyInfo : blockHistory.getHistoryInfoList()) {
System.out.println("======================================================");
System.out.println("Commit ID: " + historyInfo.getCommitId());
System.out.println("Date: " +
LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
System.out.println("Before: " + historyInfo.getElementBefore().getName());
System.out.println("After: " + historyInfo.getElementAfter().getName());
for (Change change : historyInfo.getChangeList()) {
System.out.println(change.getType().getTitle() + ": " + change);
}
}
System.out.println("======================================================");
}
CodeTracker can track the history of methods in git repositories.
In the code snippet below we demonstrate how to print all changes performed in the history of public void fireErrors(String fileName, SortedSet<LocalizedMessage> errors)
.
GitService gitService = new GitServiceImpl();
try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
"https://github.com/checkstyle/checkstyle.git")){
MethodTracker methodTracker = CodeTracker.methodTracker()
.repository(repository)
.filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
.startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
.methodName("fireErrors")
.methodDeclarationLineNumber(384)
.build();
History<Method> methodHistory = methodTracker.track();
for (History.HistoryInfo<Method> historyInfo : methodHistory.getHistoryInfoList()) {
System.out.println("======================================================");
System.out.println("Commit ID: " + historyInfo.getCommitId());
System.out.println("Date: " +
LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
System.out.println("Before: " + historyInfo.getElementBefore().getName());
System.out.println("After: " + historyInfo.getElementAfter().getName());
for (Change change : historyInfo.getChangeList()) {
System.out.println(change.getType().getTitle() + ": " + change);
}
}
System.out.println("======================================================");
}
CodeTracker can track the history of variables in git repositories.
In the code snippet below we demonstrate how to print all changes performed in the history of final String stripped
.
GitService gitService = new GitServiceImpl();
try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
"https://github.com/checkstyle/checkstyle.git")){
VariableTracker variableTracker = CodeTracker.variableTracker()
.repository(repository)
.filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
.startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
.methodName("fireErrors")
.methodDeclarationLineNumber(384)
.variableName("stripped")
.variableDeclarationLineNumber(385)
.build();
History<Variable> variableHistory = variableTracker.track();
for (History.HistoryInfo<Variable> historyInfo : variableHistory.getHistoryInfoList()) {
System.out.println("======================================================");
System.out.println("Commit ID: " + historyInfo.getCommitId());
System.out.println("Date: " +
LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
System.out.println("Before: " + historyInfo.getElementBefore().getName());
System.out.println("After: " + historyInfo.getElementAfter().getName());
for (Change change : historyInfo.getChangeList()) {
System.out.println(change.getType().getTitle() + ": " + change);
}
}
System.out.println("======================================================");
}
CodeTracker can track the history of attributes in git repositories.
In the code snippet below we demonstrate how to print all changes performed in the history of private PropertyCacheFile cacheFile
.
GitService gitService = new GitServiceImpl();
try (Repository repository = gitService.cloneIfNotExists("tmp/checkstyle",
"https://github.com/checkstyle/checkstyle.git")) {
AttributeTracker attributeTracker = CodeTracker.attributeTracker()
.repository(repository)
.filePath("src/main/java/com/puppycrawl/tools/checkstyle/Checker.java")
.startCommitId("119fd4fb33bef9f5c66fc950396669af842c21a3")
.attributeName("cacheFile")
.attributeDeclarationLineNumber(132)
.build();
History<Attribute> attributeHistory = attributeTracker.track();
for (History.HistoryInfo<Attribute> historyInfo : attributeHistory.getHistoryInfoList()) {
System.out.println("======================================================");
System.out.println("Commit ID: " + historyInfo.getCommitId());
System.out.println("Date: " +
LocalDateTime.ofEpochSecond(historyInfo.getCommitTime(), 0, ZoneOffset.UTC));
System.out.println("Before: " + historyInfo.getElementBefore().getName());
System.out.println("After: " + historyInfo.getElementAfter().getName());
for (Change change : historyInfo.getChangeList()) {
System.out.println(change.getType().getTitle() + ": " + change);
}
}
System.out.println("======================================================");
}
You can serve CodeTracker as a REST API.
In the command line, run
mvn compile exec:java -Dexec.mainClass="org.codetracker.rest.REST"
To provide GitHub credentials for tracking private repositories, set environment variables GITHUB_USERNAME
and GITHUB_KEY
before running the API.
set GITHUB_USERNAME=<your_username>
set GITHUB_KEY=<your_github_key>
HTTP Method
: GET
Endpoint URL
: /api/track
Initiate one of the four supported Trackers on a given code element. Returns the change history of the selected element in the form of a JSON array. Works for all types of supported code elements (methods, attributes, variables, blocks).
Parameter | Type | Description |
---|---|---|
owner |
String |
The owner of the repository. |
repoName |
String |
The name of the repository. |
commitId |
String |
The commit Id to start tracking from. |
filePath |
String |
The path of the file the code element is defined in. |
selection |
String |
The code element to be tracked. |
lineNumber |
String |
The line the code element is defined on |
gitHubToken |
String |
[Optional] The GitHub access token for private repositories. |
{
"owner": "checkstyle",
"repoName": "checkstyle",
"filePath": "src/main/java/com/puppycrawl/tools/checkstyle/JavadocDetailNodeParser.java",
"commitId": "119fd4fb33bef9f5c66fc950396669af842c21a3",
"selection": "stack",
"lineNumber": "486"
}
HTTP Method
: GET
Endpoint URL
: /api/codeElementType
Detect the type of code element selected using the CodeElementLocator
API. Returns the type of code element selected. Works for all types of supported code elements (methods, attributes, variables, blocks).
Parameter | Type | Description |
---|---|---|
owner |
String |
The owner of the repository. |
repoName |
String |
The name of the repository. |
commitId |
String |
The commit Id to start tracking from. |
filePath |
String |
The path of the file the code element is defined in. |
selection |
String |
The code element to be tracked. |
lineNumber |
String |
The line the code element is defined on |
gitHubToken |
String |
[Optional] The GitHub access token for private repositories. |
{
"owner": "checkstyle",
"repoName": "checkstyle",
"filePath": "src/main/java/com/puppycrawl/tools/checkstyle/JavadocDetailNodeParser.java",
"commitId": "119fd4fb33bef9f5c66fc950396669af842c21a3",
"selection": "stack",
"lineNumber": "486"
}
The oracle we used to evaluate CodeTracker is an extension of CodeShovel oracle, including the evolution history of 200 methods and the evolution history of 1345 variables and 1280 blocks declared in these methods, is available in the following links:
repositoryName: folder in which the repository is cloned
repositoryWebURL: Git repository URL
filePath: file path in the start commit
functionName: method declaration name in the start commit
functionKey: unique string key of the method declaration in the start commit
functionStartLine: method declaration start line in the start commit
variableName: variable declaration name in the start commit
variableKey: unique string key of the variable declaration in the start commit
variableStartLine: variable declaration start line in the start commit
startCommitId: start commit SHA-1
expectedChanges: list of changes on the tracked program element in the commit history of the project
parentCommitId: parent commit SHA-1
commitId: child commit SHA-1
commitTime: commit time in Unix epoch (or Unix time or POSIX time or Unix timestamp) format
changeType: type change
elementFileBefore: file path in the parent commit
elementNameBefore: unique string key of the program element in the parent commit
elementFileAfter: file path in the child commit
elementNameAfter: unique string key of the program element in the child commit
comment: Refactoring or change description
In the extended oracle we fixed all inaccuracies that we found in the original oracle. For example, the following methods in the original oracle are erroneously matched with another method which is extracted from their body. In fact, these methods are introduced as a result of an Extract Method refactoring.
- Training
- checkstyle-CommonUtils-createPattern
- checkstyle-WhitespaceAroundCheck-shouldCheckSeparationFromNextToken
- checkstyle-WhitespaceAroundCheck-isNotRelevantSituation
- commons-lang-EqualsBuilder-reflectionAppend
- commons-lang-RandomStringUtils-random
- commons-lang-NumberUtils-isCreatable
- flink-FileSystem-getUnguardedFileSystem
- flink-RemoteStreamEnvironment-executeRemotely
- hibernate-orm-SimpleValue-buildAttributeConverterTypeAdapter
- javaparser-MethodResolutionLogic-isApplicable
- javaparser-Difference-applyRemovedDiffElement
- javaparser-JavaParserFacade-getTypeConcrete
- jgit-IndexDiff-diff
- jgit-UploadPack-sendPack
- junit4-ParentRunner-applyValidators
- junit5-TestMethodTestDescriptor-invokeTestMethod
- junit5-DefaultLauncher-discoverRoot
- okhttp-Http2Connection-newStream
- Test
- commons-io-IOUtils-toInputStream
- commons-io-FilenameUtils-wildcardMatch
- hadoop-SchedulerApplicationAttempt-resetSchedulingOpportunities
- hibernate-search-ClassLoaderHelper-instanceFromName
- spring-boot-DefaultErrorAttributes-addErrorMessage
- lucene-solr-QueryParserBase-addClause
- intellij-community-ModuleCompileScope-isUrlUnderRoot
- intellij-community-TranslatingCompilerFilesMonitor-isInContentOfOpenedProject
- mockito-MatchersBinder-bindMatchers
To avoid unnecessary processing and speed up the tracking process, CodeTracker excludes some files from the source code model. The excluding action may cause misreporting of change type in some special scenarios. Although CodeTracker supports three scenarios in which additional files need to be included in the source code model, it may misreport MoveMethod changes as FileMove because the child commit model did not include the origin file of the method. In the test oracle, there are three such cases: case 1, case 2 and case 3.
As part of our experiments, we measured the execution time of CodeTracker and CodeShovel to track each method's change history in the training and testing sets. All data we recorded for this experiment and the script for generating the execution time plots are available here.
All data we collect to compute the precision and recall of CodeTracker and CodeShovel at commit level and change level are available in the following links:
detailed-tracker-training.csv
detailed-tracker-test.csv
file_name: corresponding JSON file name in the oracle
repository: Git repository URL
element_key: unique string key of the program element in the start commit
parent_commit_id: parent commit SHA-1
commit_id: child commit SHA-1
commit_time: commit time in Unix epoch (or Unix time or POSIX time or Unix timestamp) format
change_type: type of change
element_file_before: file path in the parent commit
element_file_after: file path in the child commit
element_name_before: unique string key of the program element in the parent commit
element_name_after: unique string key of the program element in the child commit
result: True Positive (TP), False Positive (FP) or False Negative (FN)
comment: Refactoring or change description
summary-tracker-training.csv
summary-tracker-test.csv
instance: unique string key of the program element in the start commit
processing_time: total execution time in milliseconds
analysed_commits: total number of processed commits
git_log_command_calls: number of times git log command was executed (step 1 of our approach)
step2: number of times step 2 of our approach was executed
step3: number of times step 3 of our approach was executed
step4: number of times step 4 of our approach was executed
step5: number of times step 5 of our approach was executed
tp_change_type: number of True Positives (TP) for this specific change_type
fp_change_type: number of False Positives (FP) for this specific change_type
fn_change_type: number of False Negatives (FN) for this specific change_type
tp_all: total number of True Positives (TP)
fp_all: total number of False Positives (FP)
fn_all: total number of False Negatives (FN)
final.csv
tool: tool name (tracker or shovel)
oracle: oracle name (training or test)
level: change report level (commit or change)
processing_time_avg: average processing time
processing_time_median: median processing time
tp: total number of True Positives (TP)
fp: total number of False Positives (FP)
fn: total number of False Negatives (FN)
precision: precision percentage
recall: recall percentage