Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill_from_lock called when stage.cmd_chaged == True #3918

Closed
tall-josh opened this issue May 31, 2020 · 5 comments · Fixed by #3922
Closed

fill_from_lock called when stage.cmd_chaged == True #3918

tall-josh opened this issue May 31, 2020 · 5 comments · Fixed by #3922
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.

Comments

@tall-josh
Copy link

Bug Report

Using mutlistage dvc file. dvc repro for the fist time on dvc.yaml works no issue. However, if I change the cmd in dvc.yaml the new command is not run, instead dvc says Stage 'st2' is cached (interestingly it omits skipping). After doing some digging I noticed this

https://github.com/iterative/dvc/blob/master/dvc/stage/loader.py#L178

Is using the lock_data correct? If the cmd has changed, shouldn't we use the stage_data?

Please provide information about your setup

dvc.yaml

stages:
  st1:
    deps:
    - data.txt
    outs:
    - base_dir/stage1_output.txt
    cmd: >-
        mkdir base_dir;
        cat data.txt > base_dir/stage1_output.txt
  st2:
    deps:
    - base_dir/stage1_output.txt
    outs:
    - base_dir/stage2_output.txt
    cmd: >-
        echo SOMETHING
        > base_dir/stage2_output.txt;
        cat base_dir/stage1_output.txt >>
        base_dir/stage2_output.txt

$ dvc repro dvc.yaml

Correctly runs both stages

Modify cmd in dvc.yaml ie: set st2.cmd to

    cmd: >-
        echo SOMEOTHERTHING
        > base_dir/stage2_output.txt;
        cat base_dir/stage1_output.txt >>
        base_dir/stage2_output.txt

$ dvc repro dvc.yaml

Expected Output

Does not pickup change in st2.cmd


$ dvc version
DVC version: 1.0.0a6+68bea2.mod
Python version: 3.7.6
Platform: Linux-4.15.0-99-generic-x86_64-with-debian-buster-sid
Binary: False
Package: None
Supported remotes: http, https
Cache: reflink - not supported, hardlink - supported, symlink - supported
Repo: dvc, git

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

@tall-josh tall-josh added bug Did we break something? triage Needs to be triaged labels May 31, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 31, 2020
@skshetry skshetry added the p0-critical Critical issue. Needs to be fixed ASAP. label May 31, 2020
@skshetry
Copy link
Member

Hi, @tall-josh, thanks a lot for a great bug report/repro.

Looks like _is_cached() here only cares if deps/outs are cached already.

dvc/dvc/stage/run.py

Lines 81 to 86 in 0bb49d2

def _is_cached(stage):
cached = (
not stage.is_callback
and not stage.always_changed
and stage.already_cached()
)

A quick-fix (untested) might be something like:

diff --git a/dvc/stage/run.py b/dvc/stage/run.py
index 53bab38d..59086365 100644
--- a/dvc/stage/run.py
+++ b/dvc/stage/run.py
@@ -82,6 +82,7 @@ def _is_cached(stage):
     cached = (
         not stage.is_callback
         and not stage.always_changed
+        and not stage.changed_stage()
         and stage.already_cached()
     )
     if cached:

Would you like to give it a try? 🙂

@skshetry
Copy link
Member

Is using the lock_data correct? If the cmd has changed, shouldn't we use the stage_data?

The reason why it uses lock_data is because we might need to do checkouts with stale lock data even if cmd has changed.

@tall-josh
Copy link
Author

Hi @skshetry. I tried commenting out cls.fill_from_lock(stage, lock_data) and it appeared to do what I wanted, but I'm unsure if it would bite me in the arse in the future. I'll give your suggestion a go now.

@tall-josh
Copy link
Author

tall-josh commented May 31, 2020

@skshetry Looks like it worked :-) I have been playing with adding templating for a vars section in the multistage dvc.yaml. The parsing function is a bit raw bit it looks like it's doing what I want it to.

I've chucked it on a fork if you're interested

I'll link this on the #3633 thread

skshetry added a commit to skshetry/dvc that referenced this issue Jun 1, 2020
Previously, the stage would not rerun if the deps/outs are cached
even if the cmd has changed.

Versions affected: >0.93.0

Fixes iterative#3918
skshetry added a commit that referenced this issue Jun 1, 2020
Previously, the stage would not rerun if the deps/outs are cached
even if the cmd has changed.

Versions affected: >0.93.0

Fixes #3918
@tall-josh
Copy link
Author

Loving how responsive you guys are. Amazing! Definitely owe y'all 🍻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants