-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc: get rid of can_be_skipped logic #5385
Conversation
if stage.can_be_skipped: | ||
stage = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I find this helpful though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry But it was only used for this semi-run-cache, while it wasn't the behavior in real run-cache. So inconsistent and confusing 🙁 Thinking if maybe we could introduce some flag that will signal that the stage was actually ran. E.g. if stage.ran: ...
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or stage.restored
. Seems to be more useful than returning None
, as you have more information and more control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but, for dvc add
-ed files, it would use changed_outs()
result to determine this, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry You mean ran/restored
? Those are close to meaningless for dvc add
, as they are more of a pipeline thing. We were calling commit
in is_cached
anyway, so without it we'll just verify that outputs and cache are in place. If anyone needs to know whether or not there have been any workspace changes, they could call status
before running dvc add
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point! Need to remove that filtering too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the following:
$ dvc add foo
Stage is cached, skipping
0% Add| |0/0 [00:00, ?file/s]
vs the current PR:
$ dvc add foo
100% Add|███████████████████████████████████████████████████████████████|1/1 [00:00, 4.34file/s]
To track the changes with git, run:
git add foo.dvc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry Previous one was confusing #4529 .
The current one is related to the way we have the progress bar in add()
. Definitely could consider running the targets through dvc status
first to filter-out unchanged ones and not reflect them in the progress bar. Or to somehow pass the pbar into the saving logic and let it handle the progress. To me 0% seems confusing, and the current PR behavior makes more sense, as it is a corner case of you already having the data in the cache (e.g. some colleague already added it to the shared cache, or you've already added it in some other branch). No strong opinion though, this is murky ui water. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern was with "is cached" part, which I find helpful, and there's no need for a hint in that case as well. But, fine for now.
I'll create a separate issue with dvcfile.remove()
which I find unsafe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That issue already exists 😄 #3362
stage.outs[0].desc = desc | ||
Dvcfile(repo, stage.path).remove() | ||
if desc: | ||
stage.outs[0].desc = desc | ||
|
||
repo._reset() # pylint: disable=protected-access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, we can check the graph's correctness first without doing other stuff and fail. I don't think we need to keep resetting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. 👍🏽
Leftover from iterative#5385
This is a legacy logic that we used to have before run-cache. At this point it is not needed and is broken in some scenarios.
Part of #4841
Related to #5368
Fixes #4529
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏