Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make comparisons in is_cached independent of order #3731

Merged
merged 1 commit into from
May 4, 2020

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented May 4, 2020

On is_cached, we used to compare two dicts: one of current stage in memory
created by run and one that's already written to the file.
As outs and dicts are lists (which is generate by dumpd), the
comparison was dependent on the order of outs and deps.

So, the run before and now, can be in different order and would fail the comparisons.

Also, given that we can compare without order, we can make pipeline files sorted by outs and deps.

Plus, in the post-pipeline-file world, we don't even need is_cached. We could just throw
DuplicateStage or OutputDuplicationError. It's convenient of course, to be able to dvc run
same command many times but, I'm not sure if it's worth the hassle of having is_cached for just that.

  • ❗ I have followed the Contributing to DVC checklist.

  • 📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

On is_cached, we used to compare two dicts: one of current stage in memory
created by `run` and one that's already written to the file.
As `outs` and `dicts` are lists (which is generate by `dumpd`), the
comparison was dependent on the order of outs and dicts.

So, the `run` before and now, can be in different order and would fail
the comparisons.
@skshetry skshetry added the enhancement Enhances DVC label May 4, 2020
@skshetry skshetry requested review from pmrowla, efiop and pared May 4, 2020 06:31
@skshetry skshetry self-assigned this May 4, 2020
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test for this?

@skshetry
Copy link
Member Author

skshetry commented May 4, 2020

@efiop, we already have a test for pipeline stage that should cover this (tests.func.test_run_multistage.test_multi_stage_run_cached).

The external behavior that needs to be tested is following:

dvc run  -d foo0 -d foo1 -o bar0 -o bar1 "cp foo0 bar0 && cp foo1 bar1"
dvc run  -d foo1 -d foo0 -o bar0 -o bar1 "cp foo0 bar0 && cp foo1 bar1"

Previous behavior considered these to be different, and would ask for overwriting the file bar0.dvc it would output. Now, it'll be considered as cached.

@efiop efiop merged commit 2dda540 into iterative:master May 4, 2020
@skshetry skshetry deleted the unordered_compare_is_cached branch May 4, 2020 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants