Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc: get rid of can_be_skipped logic #5385

Merged
merged 1 commit into from
Feb 2, 2021
Merged

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Feb 2, 2021

This is a legacy logic that we used to have before run-cache. At this point it is not needed and is broken in some scenarios.

Part of #4841
Related to #5368
Fixes #4529

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@efiop efiop requested a review from skshetry February 2, 2021 09:58
Comment on lines -258 to -259
if stage.can_be_skipped:
stage = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I find this helpful though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry But it was only used for this semi-run-cache, while it wasn't the behavior in real run-cache. So inconsistent and confusing 🙁 Thinking if maybe we could introduce some flag that will signal that the stage was actually ran. E.g. if stage.ran: ... or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or stage.restored. Seems to be more useful than returning None, as you have more information and more control.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, for dvc add-ed files, it would use changed_outs() result to determine this, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry You mean ran/restored? Those are close to meaningless for dvc add, as they are more of a pipeline thing. We were calling commit in is_cached anyway, so without it we'll just verify that outputs and cache are in place. If anyone needs to know whether or not there have been any workspace changes, they could call status before running dvc add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point! Need to remove that filtering too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the following:

$ dvc add foo
Stage is cached, skipping                                                                        
  0% Add|                                                               |0/0 [00:00,     ?file/s]

vs the current PR:

$ dvc add foo
100% Add|███████████████████████████████████████████████████████████████|1/1 [00:00,  4.34file/s]

To track the changes with git, run:

	git add foo.dvc

Copy link
Contributor Author

@efiop efiop Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Previous one was confusing #4529 .

The current one is related to the way we have the progress bar in add(). Definitely could consider running the targets through dvc status first to filter-out unchanged ones and not reflect them in the progress bar. Or to somehow pass the pbar into the saving logic and let it handle the progress. To me 0% seems confusing, and the current PR behavior makes more sense, as it is a corner case of you already having the data in the cache (e.g. some colleague already added it to the shared cache, or you've already added it in some other branch). No strong opinion though, this is murky ui water. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was with "is cached" part, which I find helpful, and there's no need for a hint in that case as well. But, fine for now.
I'll create a separate issue with dvcfile.remove() which I find unsafe here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That issue already exists 😄 #3362

stage.outs[0].desc = desc
Dvcfile(repo, stage.path).remove()
if desc:
stage.outs[0].desc = desc

repo._reset() # pylint: disable=protected-access
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, we can check the graph's correctness first without doing other stuff and fail. I don't think we need to keep resetting it.

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. 👍🏽

@efiop efiop merged commit 8932e0b into iterative:master Feb 2, 2021
@efiop efiop deleted the fix-5368 branch February 2, 2021 11:13
efiop added a commit to efiop/dvc that referenced this pull request Feb 2, 2021
@efiop efiop mentioned this pull request Feb 2, 2021
2 tasks
efiop added a commit that referenced this pull request Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add: improve message for overlapping outputs
2 participants