-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scm: implement gitignore #5243
Merged
Merged
scm: implement gitignore #5243
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
skshetry
added
enhancement
Enhances DVC
performance
improvement over resource / time consuming tasks
skip-changelog
Skips changelog
labels
Jan 11, 2021
skshetry
commented
Jan 11, 2021
dvc/scm/git/backend/dulwich.py
Outdated
Comment on lines
228
to
235
@cached_property | ||
def ignore_manager(self): | ||
from dulwich.ignore import IgnoreFilterManager | ||
|
||
return IgnoreFilterManager.from_repo(self.repo) | ||
|
||
manager = ignore.IgnoreFilterManager.from_repo(self.repo) | ||
return manager.is_ignored(relpath(path, self.root_dir)) | ||
def is_ignored(self, path: str) -> bool: | ||
return self.ignore_manager.is_ignored(relpath(path, self.root_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I have to drop this change and make it start using pygit2
by default.
skshetry
changed the title
scm: implement gitignore; optimize on dulwich
scm: implement gitignore
Jan 11, 2021
Okay, decided to implement |
skshetry
force-pushed
the
gitignore
branch
2 times, most recently
from
January 11, 2021 08:40
9c66e11
to
283fed8
Compare
* Add is_ignored implementation on `pygit2` and `gitpython` backends. * Also implement `._reset()` on `repo.scm` which is most of the changes.
pmrowla
approved these changes
Jan 11, 2021
Great stuff, @skshetry , thank you! 🙏 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
Enhances DVC
performance
improvement over resource / time consuming tasks
skip-changelog
Skips changelog
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[UPDATE]: Went to implement
scm._reset()
function and usedulwich
by caching itsIgnoreManager
. Regardingpygit2
, leaving as-is till the migrations.If we start making a lot of
is_ignored
checks,dulwich
is a lot slower thanpygit2
. I'm toying with the idea of not walking through.gitignore
-d files/directories forrepo.stage
andrepo.graph
collection.On master version, the performance on
dvc status
is following:On a modified version that does not traverse through git-ignored directory, the performance with the
dulwich
/gitpython
andpygit2
respectively are:Note that
dulwich
's performance can be made comparable to pygit2 if we cache theIgnoreManager
, but by doing so, we won't be able to use it easily in the API (we need toreset
it somewhere/somehow).Just to compare, this is the performance with
IgnoreManager
cached.There is deviation in
dulwich
for some reason though, but it comparable with pygit2. Another reason to choosedulwich
is that, we can clear the state of the.gitignore
, butpygit2
does not provide an API to do it (the way is just clearing the backends I think).❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏