-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLE-900: Speed up indexing by excluding unrelated files #729
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
An extension point is implemented that is used for narrowing down the focus of files that are actually part of a project and relevant to SonarLint. This will speed up SonarLint and lower the memory footprint inside the IDE and for SLCORE. To speed up operations on the IDE side, especially for importing projects and/or opening a workspace, caches were put in place for both the new extension point and files of a project as calculating both are very costly. This also includes the exclusion of VCS files and narrowing the focus for Node.js related files as this wouldn't make sense to be put on sub-plugins.
All JVM related projects are touching JDT (even Maven / Gradle ones). The focus is narrowed here by removing the output folders of all source entries in the classpath and the default one that is always present for Eclipse.
All Python related projects can have virtual environments, most of them use the PyDev plug-in. The focus is narrowed here by removing the possible virtual environments that are created via Python or other tools.
As Maven projects are hierarchical in nature but flat in Eclipse we have to exclude all the content of sub-modules from being indexed in a parent project. The focus is narrowed down here as well for the output directories that are coming from Maven directly and not JDT!
As Gradle projects are hierarchical in nature but flat in Eclipse we have to exclude all the content of sub-projects from being indexed in a parent project. The focus is narrowed down here as well for the Gradle wrapper storage and the output directories coming from Gradle itself and not JDT!
Correctly react to changes done on importing a project by not stopping at the "root" of the resources. Also invalidate the cache when there are actual changes in order to not yield potentially incorrect results on a manual analysis or project import.
...int.eclipse.core/src/org/sonarlint/eclipse/core/internal/backend/FileSystemSynchronizer.java
Show resolved
Hide resolved
...lipse.pydev/src/org/sonarlint/eclipse/pydev/internal/PythonProjectConfiguratorExtension.java
Outdated
Show resolved
Hide resolved
org.sonarlint.eclipse.m2e/src/org/sonarlint/eclipse/m2e/internal/MavenUtils.java
Show resolved
Hide resolved
...lipse.pydev/src/org/sonarlint/eclipse/pydev/internal/PythonProjectConfiguratorExtension.java
Outdated
Show resolved
Hide resolved
org.sonarlint.eclipse.m2e/src/org/sonarlint/eclipse/m2e/internal/MavenUtils.java
Show resolved
Hide resolved
The Maven integration into Eclipse (m2e) changed the signatures of methods we use. Therefore we have to use reflection. On the FileSystemSynchronizer fix a possible array index out of bounds error that can happen when only files are removed.
eray-felek-sonarsource
requested changes
Sep 2, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"that is their own "fault"" we can remove this comment or change it to make it sound more friendly
Based on PR feedback, the comment was overhauled. The flaky ITs were overhauled as it was a timing issue between the cache being cleared and accessed when new files are added when a project is imported!
Quality Gate passedIssues Measures |
eray-felek-sonarsource
approved these changes
Sep 2, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Currently, when a project is imported or a workspace is opened, all its files are indexed (with very light filtering) and sent to SLCORE for housekeeping (
SonarLintEclipseHeadlessRpcClient#listFiles()
andFileSystemSynchronizer#getSonarLintJsonFiles()
).When one or more projects are analyzed, this is done as well per project to get all relevant files to be sent to SLCORE for analysis (
SelectionUtils#collectFiles()
).When any file is removed, changed, or added we listen for that and calculate that delta on every file in the project no matter whether it is relevant or not (
FileSystemSynchronizer#resourceChanged()
that invokesFileSystemSynchronizer#visitDeltaPostChange()
).This is very slow and inefficient, both in the performance and the high memory consumption it carries. This is especially the case for hierarchical projects as in Eclipse parent projects will contain all the sub-modules/-projects content as well but keep it hidden - even if the sub-modules/-projects are also present in the workspace.
Compilation output (e.g. bytecode)
We want to narrow down the focus of files that are "present" and available for SonarLint even though they are not relevant. For example, the compiled code files themselves: For the Java analysis, the bytecode is necessary to yield better results, but we don't need to have all the files indexed; we just need SonarLint to know where the bytecode is stored to populate the analysis properties correctly - the bytecode cannot be analyzed itself.
For this, the JDT / Maven and Gradle sub-plugins are enhanced with a new extension point to aggregate the "output" directories, which are then excluded from the indexing.
Sub-modules/-projects content in the parent project
We also want to narrow down the focus for projects having multiple sub-modules/-projects themselves as their files are indexed for them and shouldn't be present as well and taken into account in the parent project.
For this, the Maven and Gradle sub-plugins are enhanced by excluding all the files that are not actually part of their project scope.
Additional, unrelated content
Additionally to this, we also want to exclude files that might be used by a build tool or version control system. In a Git repository, we don't want to index the
.git
folder as an example for a VCS. Python, on the other hand, brings libraries and tools to have a virtual Python environment directly saved in the project that we shouldn't analyze.For this, the Python sub-plugin is enhanced by excluding common locations in a project where virtual Python environments are saved.
For version control systems we check for the most common systems like Git, Mercurial, and Subversion and ignore their "special" directories. This is done in
SonarLintUtils#insideVCSFolder(...)
.For Node.js-related files, there is no sub-plugin as compared to the other sub-plugins and their changes; there is no linkable Eclipse plug-in (there is WTP, but having another optional dependency for this would be overkill). This is done in
SonarLintUtisl#inNodeJsRelated(...)
.Testing
Exemplary results based on hierarchical Maven projects. The check is with the
SonarLintEclipseRpcClient#listFiles()
method called when a project is indexed. This is both for speed and number of files that are indexed and stored in memory.SonarLint CORE w. 45.700 / 28 modules
When the project is clean (
git clean -dfx
), it is first imported (1), and then the workspace is re-opened (2).When the project is dirty (
mvn clean verify -DskipTests
), it is first imported (1), and then the workspace is re-opened (2).Orchestrator w. 6.800 LOC / 4 modules
When the project is clean (
git clean -dfx
), it is first imported (1), and then the workspace is re-opened (2).When the project is dirty (
mvn clean verify -DskipTests
), it is first imported (1), and then the workspace is re-opened (2).