Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro: glob-style selection of stages #4912

Closed
mdekstrand opened this issue Nov 19, 2020 · 2 comments · Fixed by #4976
Closed

repro: glob-style selection of stages #4912

mdekstrand opened this issue Nov 19, 2020 · 2 comments · Fixed by #4976
Assignees
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@mdekstrand
Copy link

It would be useful to be able to select stages by glob, either at the top level or within a specific dvc.yaml file.

For example, if I have eval/dvc.yaml that evaluates several models over several data sets, I would like to be able to reproduce the evaluations over one data set with something like:

dvc repro eval/dvc.yaml:AZ-*

Or:

dvc repro --stages='AZ-*' eval/dvc.yaml

With the much-improved support for executing unrelated jobs in parallel in recent versions of DVC, this will enable me to submit each data set as a separate job to the cluster's batch scheduler, along with I imagine a host of other use cases.

@efiop efiop added feature request Requesting a new feature p2-medium Medium priority, should be done, but less important labels Nov 19, 2020
@shcheklein shcheklein changed the title Glob-style selection of stages repro: glob-style selection of stages Nov 19, 2020
@efiop
Copy link
Contributor

efiop commented Nov 19, 2020

We've added --glob for dvc add very recently #4864 , so it makes to continue #4864 with this ticket as well. I would say dvc repro eval/dvc.yaml:AZ-* works well, but I would probably make it an opt-in behavior with --glob flag for now, same as we currently have in dvc add, just to be safe. We'll turn this on everywhere by-default in the near future.

@skshetry
Copy link
Member

@mdekstrand, I have created #4976 to support this feature. Please take a look.

@skshetry skshetry self-assigned this Nov 26, 2020
skshetry added a commit that referenced this issue Nov 27, 2020
* repro: support regex/foreach-group to run at once

Fixes #4912
Fixes #4886
Fixes #4958

* Use `tree.isdir` rather than `os.path.isdir`

* Use glob rather than regex

* Update dvc/command/repro.py

* s/regex/glob

* disable glob on `collect_granular`

There's no need for a glob here

* add tests for `collect` and `collect_granular`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants