-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc: get rid of CleanTree #4221
Conversation
CleanTree is a very awkward wrapper that spreads the context across the codebase and results in unexpected behaviour when using it. This PR starts moving dvcignore-related logic into the trees themselves (it makes a lot of sense, kinda like `state`) so they could deal with it however they like. There are at least two temporary ugly parts about this PR: 1) dvcignore is used by individual trees and not packed into tree/base.py yet; 2) `dvcignore_root` argument. This one is caused by the dvcignore trying to collect everything topdown starting from the certain root dir. What it should do instead is for a certain path that is being checked look up the tree through the parents until it finds repo root (.dvc dir) and then stop. That would handle subrepos as well. At the same time we need to leverage existing dvcignore trie structure to cache those results.
root = self.dvcignore_root or self.tree_root | ||
if not self.use_dvcignore: | ||
return DvcIgnoreFilterNoop(self, root) | ||
self.use_dvcignore = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding recursion. Could wrap this hack in try: finally
but will be replaced in the following patch anyway.
if self._git_object_by_path(path) is None: | ||
return False | ||
|
||
return not self.dvcignore.is_ignored_file( | ||
path | ||
) and not self.dvcignore.is_ignored_dir(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just mimicing old CleanTree. Could reconsider whether or not we really need to deny direct access here. E.g. you could force git add
for a gitignored file.
CleanTree is a very awkward wrapper that spreads the context across
the codebase and results in unexpected behaviour when using it.
This PR starts moving dvcignore-related logic into the trees themselves
(it makes a lot of sense, kinda like
state
) so they could deal withit however they like.
There are at least two temporary ugly parts about this PR:
dvcignore is used by individual trees and not packed into
tree/base.py yet;
dvcignore_root
argument. This one is caused by the dvcignore tryingto collect everything topdown starting from the certain root dir. What
it should do instead is for a certain path that is being checked look up
the tree through the parents until it finds repo root (.dvc dir) and
then stop. That would handle subrepos as well. At the same time we need
to leverage existing dvcignore trie structure to cache those results.
Related to #4050
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. 🙏