-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API for retrieving the list of files that will be accessed #1659
Add API for retrieving the list of files that will be accessed #1659
Conversation
…inting or formatting a given path Closes pinterest#1446
…lter results to files in the tempdir only as (on WindowsOS) it seems that the result may also contain files created for other tests
…ference of other tests
/** | ||
* Get the list of files which will be accessed by KtLint when linting or formatting the given file or directory. | ||
*/ | ||
public fun getInputPaths(path: Path): List<Path> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Gradle Plugin perspective, we want to start watching input files to be able to reload ktlint
caches when their content changes. The preferred way is to call KtLint#reloadEditorConfigFile
, which name suggests it's meant to be used with editorconfig
files only.
So, I started wondering if these two shouldn't be symmetrical 🤔 With getInputPaths()
and reloadEditorConfigFile
each library consumer, who also experiences ktlint config being cached across re-runs, will have to filter out unknown files. If we had a separate method per each type (getEditorConfigPaths
+ reloadEditorConfigFile
) or more generic methods (getInputPaths
+ reloadInput
) it would be clear how they interact with each other.
I know currently, mostly .editorconfig
files will be returned here, I'm just trying to future-proof the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it works to rename getInputPaths
to getEditorConfigFilePaths
and leave out the source file itself which I added solely because the getInputPath
sounded more generic ;-)
I see no need to the generic methods (getInputPaths
+ reloadInput
) until we actually have other sources of inputs.
if (path.isDirectory()) { | ||
result += findEditorConfigsInSubDirectories(normalizedPath) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started wondering if there shouldn't be a way to exclude searching in subdirectories 🤔 I initially thought of passing project root directory as the path
, but that would result in traversing through all files and folders, including build
, docs
or resources
directory, which would be wasteful.
Gradle plugins would have to be smarter, and we'd have to figure out a way to identify the root folder for all directories containing Kotlin sources, which unfortunately I failed to do so :/ For Kotlin plugin we get SourceDirectorySet#getSrcDirTrees which would allow us to identify multiple root directories, but I'm unaware of a similar way to obtain such information for other plugins (for example, for Android Gradle Plugin).
Given all of the above, I started leaning towards ignoring subdirectories, for performance reasons (and I believe sharing common editorconfig
settings for a compilation unit is a fairly common situation). Having the subdirectories' traversal optional, ktlint
consumers could switch to a more precise input calculation if they are able to identify a proper top-most path.
Do you have any thoughts/ideas/advices here?
I can link my comment on the same topic: jeremymailen/kotlinter-gradle#265 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the performance impact of scanning all subdirectories it too big? This call should be made only once. For testing KtLint I always run KtLint on a collection of open source projects which in total contain 2,800+ directories and more than 5,000+ files (both excluding files in .git
and build
dirs) and the scanning is really fast.
I have not encountered many use cases in which a project contains multiple .editorconfig
files. But I think it would be really confusing for developers if the .editorconfig
file in the root of the project and its parents is treated differently from .editorconfig
files in subdirectories.
As with all performance issues, my advice would be first to measure before deciding to optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the performance impact of scanning all subdirectories it too big?
I don't know. I haven't ever done any sort of file performance things. My assumption was that doing things takes more time that completely avoiding it 😛
This call should be made only once
From Gradle Plugin perspective, it's once per task invocation, so every time someone calls ./gradlew ktlintTask
.
As with all performance issues, my advice would be first to measure before deciding to optimize.
Sure, sounds reasonable 👍 I'll make sure to do extra profiling when switching to the new API 👍 Thanks for answering my doubts (and for finding time to work on the new api) 🙏
@mateuszkwiecinski The change has been merged to master. Can you please test the API against the latest snapshot? |
@paul-dingemans Thanks! So yeah, the conclusion I draw is I'd have to do some research and see if I can identify, in an idiomatic way, the top-most folder I should pass to look for nested editorconfig files so looking for nested editorconfigs doesn't come at that significant cost :/ If possible, I'd still propose to add ability to parameterize the search, so consumers could try to aim to use the default implementation, but always have an ability for a more performant and less correct alternative. If that's not something you'd be open to introduce, I'll have to dig into Gradle, AGP and Kotlin Plugin APIs and will be beack with similar, more precise benchmarks based on a proper implementation (assuming I'll manage to come up with such) |
I have very limited knowledge and expertise with Gradle. But having hundreds of executions of the Is there any chance that you can check that the number of invocations of
Scanning
500ms does not sound bad, except when you do it hundreds of times of course ;-)
Would it be of any help in case you can pass a list of paths to
I am still not convinced that paremeterization is the solution. At least it seems a solution for your problem based on the assumption that most projects do not use multiple |
…accessed Closes pinterest#1446 Closes pinterest#1659
Caching has been added. Please checkout the new snapshot. There should be some log lines:
|
I'm fairly confident I know when things are called by Gradle. I aim to make the new API to be called once per sourceset, so it finds only related editorconfig files. Here's the branch: link.
You mean the plugin would have to scan all subdirectories, find
I tested the latest snapshot and there is an observable improvement, you managed to reduce time to scan directories from 200μs, to 15-17μs 🚀 The current state of my wip branch is as close I could get trying to find balance between correctness and performance. Ideally, I'd be able to replace the TODO |
Description
The list of files which will be accessed by KtLint when linting or formatting a given path can now be retrieved with the new API
KtLint.getInputPaths(path: Path): List<Path>
. Currently, the.editorconfig
files are the only files which will be accessed during linting or formatting in addition to the source files itself.This API can be called with either a file or a directory. It's intended usage is that it is called once with a directory path before actually linting or formatting files. When called with a directory path, all
.editorconfig
in the directory or any of its subdirectories (except hidden directories) are returned. In case the given directory does not contain an.editorconfig
file or if it does not contain theroot=true
setting, the parent directories are scanned as well until a root.editorconfig
file is found.Calling this API with a file path results in the
.editorconfig
files that will be accessed when processing that specific file. In case the directory in which the file resides does not contain an.editorconfig
file or if it does not contain theroot=true
setting, the parent directories are scanned until a root.editorconfig
file is found.Closes #1446
Checklist
CHANGELOG.md
is updatedIn case of adding a new rule: