Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatten deps to facilitate more flexible templating #7151

Open
itcarroll opened this issue Dec 14, 2021 · 13 comments
Open

flatten deps to facilitate more flexible templating #7151

itcarroll opened this issue Dec 14, 2021 · 13 comments
Labels
A: templating Related to the templating feature enhancement Enhances DVC feature request Requesting a new feature

Comments

@itcarroll
Copy link

The templating feature currently admits little flexibility for stage keys (e.g. deps) that expect lists. As a result, foreach stages cannot be given a variable number of dependencies. Causing lists to be flattened would make templating more powerful. For example, you could give extra dependencies to certain templated stages:

stages:
  build:
    foreach:
      us:
        deps:
      uk:
        deps:
        - uk_data
    do:
      cmd: myscript ${key}
      deps:
      - myscript
      - ${item.deps}

I have implemented the change at https://github.com/itcarroll/dvc/blob/feature/flattened-deps/dvc/dependency/__init__.py#L52, by (non-recursive) list flattening in dvc.dependency.loads_from. I am sure it's not the right way to do it, and probably has unintended consequences, but it demonstrates the concept (and no tests failed).

I don't know how templating responds to missing keys, but not having to include the empty list for us would be even more better.

@karajan1001 karajan1001 added enhancement Enhances DVC feature request Requesting a new feature labels Dec 15, 2021
@itcarroll
Copy link
Author

In ancient discussions on foreach I found a suggestion (by @jorgeorpinel):

I wonder actually if you can just do deps: ${list} (a list from params.yaml)

This would add the same flexibility I'm interested in with the above, but it fails YAML validation, which appears to require deps to get a list before ${item} is expanded. So, some change in the validation process might be an alternate route to this feature.

@daavoo daavoo added the A: templating Related to the templating feature label Feb 22, 2022
@alealv
Copy link

alealv commented Feb 25, 2022

Hi, I'm also interested in this feature https://discord.com/channels/485586884165107732/485596304961962003/946817686220984350

   data:
    foreach:
      librispeech:
        script: libri.sh
        links:
          - https://www.openslr.org/resources/12/dev-clean.tar.gz
          - https://www.openslr.org/resources/12/dev-other.tar.gz
          - https://www.openslr.org/resources/12/test-clean.tar.gz
          - https://www.openslr.org/resources/12/test-other.tar.gz
          - https://www.openslr.org/resources/12/train-clean-100.tar.gz
          - https://www.openslr.org/resources/12/train-clean-360.tar.gz
          - https://www.openslr.org/resources/12/train-other-500.tar.gz
      tedlium:
        script: tedlium.sh
        links:
          - https://www.openslr.org/resources/51/TEDLIUM_release-3.tgz
    do:
      cmd:
        - ./proc/${item.script} --output ${nas_data_dir}/${key} --tmp ${nas_data_dir} ${item.links}
      deps:
        - ${item.script}
        - ${item.links}
      outs:
        - data/${key}/${key}_text
        - data/${key}/${key}_utt2spk
        - data/${key}/${key}_wav.scp
        - data/${key}/${key}_waves

I have stages that have different list lengths

@daavoo
Copy link
Contributor

daavoo commented Aug 23, 2022

So, some change in the validation process might be an alternate route to this feature.

This relates to other requests where templating is being limited by the YAML validation which occurs before the interpolation, for example #6989

@dberenbaum
Copy link
Collaborator

@daavoo Do you know if these scenarios work fine if not for YAML validation failures?

@daavoo
Copy link
Contributor

daavoo commented Aug 24, 2022

@daavoo Do you know if these scenarios work fine if not for YAML validation failures?

For #6989 yes, for this issue additional work would be needed

@dberenbaum
Copy link
Collaborator

If all the deps are a single list (deps: ${those}, as suggested in #8171), is additional work needed?

@m-gris
Copy link

m-gris commented Mar 3, 2023

Big up-vote for this one 👍

It would be really, really helpful.

@bezbahen0
Copy link

Is there any progress on this issue?
It really is a much-needed feature.

@dberenbaum
Copy link
Collaborator

@bezbahen0 We have not made any progress yet.

@skshetry @daavoo Can we resolve the templating before doing validation? If that's not straightforward, can we skip validation errors triggered by unresolved interpolation?

@daavoo
Copy link
Contributor

daavoo commented Mar 7, 2023

@skshetry @daavoo Can we resolve the templating before doing validation? If that's not straightforward, can we skip validation errors triggered by unresolved interpolation?

Will take a look, it has been a while 😅

@JulianoLagana
Copy link

This feature would greatly simplify our current pipeline specification!

@skshetry
Copy link
Member

skshetry commented Mar 7, 2023

We don't have a good way to validate after templating, say, for example, to determine that deps has to be a list of strings/file-paths after the interpolation.

That's why we limited templated dvc.yaml and the resolved dvc.yaml to follow the same schema.

@dberenbaum
Copy link
Collaborator

Will take a look, it has been a while 😅

I already ended up taking a look and might open a PR for discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: templating Related to the templating feature enhancement Enhances DVC feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

9 participants