Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dvc.yaml generation #4194

Closed
3 tasks done
julieg18 opened this issue Jun 29, 2023 · 5 comments
Closed
3 tasks done

Improve dvc.yaml generation #4194

julieg18 opened this issue Jun 29, 2023 · 5 comments
Assignees
Labels
A: onboarding Improving and simplifying users happy path. How do we get them have value asap? priority-p1 Regular product backlog

Comments

@julieg18
Copy link
Contributor

julieg18 commented Jun 29, 2023

Current

UI:

image

Generated dvc.yaml or update to already created file:


# Read about DVC pipeline configuration (https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#stages)
# to customize your stages even more
stages:
  train:
    cmd: python requirements.txt
    deps:
      - requirements.txt

Improvements

  • Try alternative phrasing in place of "pipeline" for more clarity. Referring to it asdvc.yaml for example.
  • Expand UI text a bit more, giving a more full explanation on what pressing the button will do for the user
  • Add more comments to dvc.yaml, giving a more in-depth explanation of dvc.yaml with some configuration examples

Related #3331

@julieg18 julieg18 added the A: onboarding Improving and simplifying users happy path. How do we get them have value asap? label Jun 29, 2023
@SoyGema
Copy link

SoyGema commented Jun 29, 2023

Hey @julieg18 ! 👋
What an impactful feature you are working on!
If you allow me, would like to give you some food for thought regarding pipeline blocks that users face regarding design that might have an impact on how this feature is designed or guided in a multiple stage scenario.

Edited for clarifying : UI dvc pipeline might take into account dependency management and linearity for better user control in terms of pipeline design. The main mental model under this comment is to pinpoint the consecuences and differences in between WHAT a stage is and HOW a stage is designed from the user perspective. Principle taken from engineering section 100 mental models book.

This might influence two of the three points of Improvement .

  1. Random execution of pipeline stages. This Q that has arised on discord and appears in some past CG and is one of your common questions. In order to ensure linear order in the pipeline, the user should concatenate all pipeline stages, taking into account that the previous stage output will be the next dependency, from the beginning to the end of thr pipeline.
    Please make sure that you specify dependencies and outputs for each stage: that will introduce the order to provide an end result
Captura de pantalla 2023-06-29 a las 20 23 08

Would you think about how this might be in UI terms or construct this having this common roadblock in mind?

Thinking about this, there might be 3 cases

  1. First stage. You don´t have dependency but you have an output ( ej : load data )
  2. In-between-stages (Image above) You have an output(to next stages ) and a dependency( from previous stages) (ej: all feature engineering process , normalization, feautre selection, filtering, etc)
  3. Final stage . You might not have an output, but the metrics section ( this could change)

I might think if the final design should lead to the pipeline to be created linearly to ensure that is constructed correctly .

I had at some point this idea of putting two buttons - one for the dependency and another for the output - and select them from a folder , but that should entail that they were previously created...
Captura de pantalla 2023-06-29 a las 20 37 10

maybe some ideas from this ? Anyway , maybe these thoughts expand on more complex pipelines, but having this in mind might avoid certain frustation / churn stuff from DS .

Hope you are having as much fun these days as I am in the Open!
Take care 🧠 ♥ 👾👾 ╭(◔ ◡ ◔)/

@mattseddon
Copy link
Member

mattseddon commented Jul 7, 2023

In #4233 I have added some helper snippets that can be used for building pipelines.

Demo

Screen.Recording.2023-07-07.at.11.30.40.am.mov

We can extend these to include other parts of the pipeline. E.g. plots.

cc @dberenbaum, @daavoo WDYT? Please LMK if there is anything that can be done to improve the copy in the snippets 🙏🏻.

edit: PTAL at #4234 & #4235 as well.

I also intend to add snippets for each of the frameworks supported by DVCLive but that work will be done under a separate issue.

@dberenbaum
Copy link
Contributor

Thanks @mattseddon! Some thoughts:

  • metrics: I would drop it. Since we are encouraging metrics through dvclive these days, I don't see a lot of reasons why anyone really needs the metrics type in their stage. They can either be regular outs or excluded from the stage altogether. I'd like to encourage a simplification of pipelines to being only about commands, dependencies, and outputs (with params being the only "special" type since it's a more granular dependency).
  • params: It's bit more complicated than a list of params files, as explained in https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameters.
  • Is it possible to show the message from the top of dvc.yaml in a popup or somehow make it more prominent?
  • I'm not that familiar with too many other snippets, but is it typical to have descriptions instead of working example text? For example, instead of command for the stage, e.g. python train.py, should it just be python train.py, or maybe python train.py # command for the stage?

@mattseddon
Copy link
Member

Thanks for the feedback @dberenbaum. All good points.

  • metrics: I would drop it. Since we are encouraging metrics through dvclive these days, I don't see a lot of reasons why anyone really needs the metrics type in their stage. They can either be regular outs or excluded from the stage altogether. I'd like to encourage a simplification of pipelines to being only about commands, dependencies, and outputs (with params being the only "special" type since it's a more granular dependency).

No worries, I will drop metrics.

I will update this too.

  • Is it possible to show the message from the top of dvc.yaml in a popup or somehow make it more prominent?

If we show this message as a popup it will be easily dismissed and the information could be missed/lost (and there is no undo button). Happy to update if you want but it was a conscious decision to put the information into the file.

  • I'm not that familiar with too many other snippets, but is it typical to have descriptions instead of working example text? For example, instead of command for the stage, e.g. python train.py, should it just be python train.py, or maybe python train.py # command for the stage?

Generally, if I was going to be making snippets for a group/team I would not put any suggestions into the template but I see these snippets as more of an educational piece to get people familiar with the layout before they switch over to completions. My first instinct was to use the format python train.py for each example. I think this will work well if we can get iterative/dvcyaml-schema#40 sorted out so that the correct on-hover information is always provided. Happy to update to that format.

The only outstanding question is on whether or not to switch to a popup for the message written at the top of the dvc.yaml - LMK what you think.

Again, it would be good to get iterative/dvcyaml-schema#40 sorted out sooner rather than later 🙏🏻.

@dberenbaum
Copy link
Contributor

Sounds good, @mattseddon. I was thinking a pop-up in addition to having it in the file, but I think it's fine as is. Your call on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: onboarding Improving and simplifying users happy path. How do we get them have value asap? priority-p1 Regular product backlog
Projects
None yet
Development

No branches or pull requests

5 participants