Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp run: pass recursive args to checkout #6458

Merged
merged 1 commit into from
Aug 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions dvc/repo/experiments/executor/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ def filter_pipeline(stages):
quiet=True,
allow_missing=True,
checkpoint_reset=checkpoint_reset,
recursive=kwargs.get("recursive", False),
Copy link
Member Author

@skshetry skshetry Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla, Is it not possible to checkout as we go run-ing? This way, it may miss a lot of things. foreach based stages come to mind, as only repro supports them. Also, glob is not supported by checkout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we could do that. I think we'd just need to pass a flag into repro (and subsequently stage.run) that indicates we need to checkout(allow_missing=True) dependencies prior to reproducing a stage, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I'm not sure on is how this affects the run cache lookup - do deps need to be checked out before that lookup or can we just skip the dep checkout entirely in the event we have a cached stage?

Copy link
Member Author

@skshetry skshetry Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run-cache lookup does get affected, as if there's no dependency, there's no cached stage. This will work for workspace based run, but we may need to checkout dependencies into the tmpdir repo for the other one.

)

checkpoint_func = partial(
Expand Down
11 changes: 10 additions & 1 deletion tests/func/experiments/test_experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from dvc.dvcfile import PIPELINE_FILE
from dvc.repo.experiments.utils import exp_refs_by_rev
from dvc.utils.serialize import PythonFileCorruptedError
from dvc.utils.serialize import PythonFileCorruptedError, load_yaml
from tests.func.test_repro_multistage import COPY_SCRIPT


Expand Down Expand Up @@ -657,3 +657,12 @@ def test_modified_data_dep(tmp_dir, scm, dvc, workspace, params, target):
if workspace:
assert (tmp_dir / "metrics.yaml").read_text().strip() == params
assert (tmp_dir / "data").read_text().strip() == "modified"


def test_exp_run_recursive(tmp_dir, scm, dvc, run_copy_metrics):
tmp_dir.dvc_gen("metric_t.json", "foo: 1")
run_copy_metrics(
"metric_t.json", "metric.json", metrics=["metric.json"], no_exec=True
)
assert dvc.experiments.run(".", recursive=True)
assert load_yaml(tmp_dir / "metric.json") == {"foo": 1}