Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

params: bulk params from file #6605

Closed
dberenbaum opened this issue Sep 13, 2021 · 4 comments
Closed

params: bulk params from file #6605

dberenbaum opened this issue Sep 13, 2021 · 4 comments
Assignees
Labels
A: params Related to dvc params A: pipelines Related to the pipelines feature

Comments

@dberenbaum
Copy link
Collaborator

Params need to be specified individually, which can be repetitive and inconvenient where users want to simply include all parameters from a file.

For background, see:

Instead of specifying every parameter individually, it should be possible to do dvc stage add foo -p my_params.yaml python foo.py or dvc stage add foo -p "my_params.yaml:*" python foo.py, which could generate this dvc.yaml:

stages:
  foo:
    cmd: python foo.py
    params:
    - my_params.yaml

or

stages:
  foo:
    cmd: python foo.py
    params:
    - my_params.yaml:*

After running dvc repro, dvc.lock could treat my_params.yaml like a regular file dependency:

stages:
  foo:
    cmd: python foo.py
    deps:
    - path: my_params.yaml
    - md5: abc123

However, commands that collect parameters to show info about them, like dvc params diff and dvc exp show, would parse my_params.yaml and show each individual parameter.

@daavoo daavoo added the A: templating Related to the templating feature label Sep 16, 2021
@skshetry skshetry added A: pipelines Related to the pipelines feature and removed A: templating Related to the templating feature labels Oct 5, 2021
@skshetry skshetry mentioned this issue Oct 5, 2021
16 tasks
@skshetry
Copy link
Member

skshetry commented Oct 5, 2021

Some questions that I have:

  1. Does this mean tracking all the params of the file at a time or all the time? Do we need support for both?
    See run: allow tracking of all params in a file with -p #4112.
  2. Do we want to save the contents of it or the complete file?
    We have never saved the params file before. We have only used the data from dvc.lock to see if they have changed.
  3. What should be the order of resolving params in the following case, foo from params.yaml or foo the params file?
params:
- foo

Regarding the structure, I prefer dvc.lock mirror dvc.yaml, having params section from dvc.yaml to not refer to params section in dvc.lock maybe confusing.

This will also affect run-cache. It's not a big issue, as it's just a dvc.lock like file, but this change will also affect run-cache.

@dberenbaum
Copy link
Collaborator Author

@skshetry Returning to this. Thanks for the reference to #4112. I would probably go so far as to mark this a duplicate, close this, and continue conversation there. WDYT?

@skshetry
Copy link
Member

Duplicate of #4112

@skshetry skshetry marked this as a duplicate of #4112 Jan 24, 2022
@skshetry
Copy link
Member

Let's close this then @dberenbaum.

@daavoo daavoo added the A: params Related to dvc params label Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: params Related to dvc params A: pipelines Related to the pipelines feature
Projects
None yet
Development

No branches or pull requests

3 participants