-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consolidate repro
and exp run
#7866
Comments
I would argue that Using CLI to define stages gets very cumbersome for all but most simple stages: I'm assuming most people just just use For example, comparing --no-exec Only create dvc.yaml without actually running it.
--no-commit Don't put files/directories into cache.
--no-run-cache Execute the command even if this stage has already
been run with the same
command/dependencies/outputs/etc before. My ideal workflow would be To summarize:
|
I think this is mostly a duplicate of #5846 |
In the docs,
Mostly, although the suggested syntax is slightly different and means getting rid of We could also keep it open or move it to a discussion to keep track of ideas for renaming subcommands generally. We are already planning to split I don't think we are ready to promise this yet, though, and it's unclear whether we would actually deprecate the existing commands, which could be painful for many existing users.
The ideal workflow is probably to generate stages in VS Code with auto-completion (and maybe other helpers). I think it's already on the roadmap but not sure when it will happen. For now, |
If we migrate all of the existing command to an |
If we migrate all of the existing command to an |
Is there an issue or story about this? |
Currently not. |
|
Why can't we have them both, |
Why do we need them both? |
It's not clear to me that #5846 is a solution. I'm asking about use cases rather than implementation/adding-even-more-confusingly-named-options. A) What are the underlying concepts? afaik it's:
B) does the commandline interface map 1:1 to the above concepts? Am I missing something? Need to sort out (A) before there's any hope of addressing (B) & (C). |
The use case for The one thing that To fill this use case,
This seems pretty clear-cut to me.
IMO this is the same as Likewise |
So for clarification we want to support: A)
If that is correct, then CLI suggestion: B) # TL;DR:
## 1. pipelines
dvc stage {add,list,rm,verify,repro} [stage_names... (default: all)]
## 2. experiments
dvc exp {add,list,rm,repro,stage} [exp_names... (default: all)]
|
I do agree extracting the stage-related functions from
|
After the discussions in iterative/dvc.org#4460 about whether pipelines should be part of data management, experiments, or neither, I think we need to revisit this not only from a docs perspective, but also a product perspective (thanks to @casperdcl for repeatedly trying to move this forward here and in iterative/dvc.org#3630). DVC experiments were initially an extension of DVC pipelines, which likely led to a lot of this confusion. We have done a lot of work to separate experiments from pipelines and can now better reposition them.
In this context, Why it matters:
|
I'll suggest naming it simply |
The first step could be adding a flag (
I like it and agree with having a top-level command, although the obvious downside is that it could be confusing to replace an existing command with a new one that has different functionality. WDYT about using some synonym like |
repro
and exp run
repro
and exp run
Reminder that we also need to update the landing page if we decide to add a top-level command because it currently shows |
My 2cs on this - I would still prefer |
🤔 I prefer a top-level command to
@dacbd You should also be aware that we are considering this change to rename. |
Yep, but what is the primary user scenario, high level case that we'll be explaining. E.g. can we start by writing a summary for this command that people, and ideally it should be in a way that people can understand (btw, we might realize that it's better to keep two commands still). |
The high-level use case is running a data workflow. Maybe I want to include ML training and compare metrics at the end, but maybe not. Maybe I want to version my data so I can go back to any previous iteration, but maybe not. This is how I used it in the past, and even though the end result was an ML model, I only really cared about executing my pipeline steps in a make-like way. Maybe I misunderstand you because I don't see how it is only about experiments from a high-level user scenario. I think at that point you could argue all of dvc belongs under exp because the entire product targets ml training scenarios. How is data management in DVC any more of a high-level case? |
Is it the way you write an intro for this new command? Can we actually try to draft the summary and description? I feel it can be complicated. I understand where you are coming from I think. My concern that it can become too abstract for people. It's easier for me to think like this- we have a high level scenario - e.g. experiment tracking (versioning). If people come to DVC because of it we should make it simple to understand. In this case I see that trying to nicely generalize and squeeze everything can be quite a hard task. In case of DVC we could have done something like:
and it's should be clear that may be still keep data commands top level for historic reasons and since it's similar to Git and I don't see people being confused a lot with them. wdyt? |
Stepping back, here were my goals for prioritizing this:
I think the first point is the biggest pain point for users, and without it, I wouldn't prioritize this for 3.0 release. However, replacing the existing commands with a 3rd new command feels a bit like https://xkcd.com/927. WDYT about adding a flag to @shcheklein Responding to your comments below.
I would say it like this:
Do we want to emphasize a single scenario? Some people come to dvc to run a multi-step data process first and experiment tracking is secondary or never needed. For people who come for experiment tracking first, they can get far with the other exp commands before needing to run a pipeline. |
Yep, that already sounds a bit too complicated (you have to understand something about pipelines?). It contradicts a bit with
Yep. And that's why we introduced trails, etc. Still not sure that was the best decision though. It complicates everything a lot. You are right that people come for different things. But I feel it would be a mistake to try to make a single command that is so general in its description that encompasses all the scenarios at once.
I'm fine with that. I would try to keep the description simple "Runs an experiment". (pipelines - it's an implementation detail - e.g. "In order for DVC to know which exactly command to run you need to specify an entry point (stage) in DVC.yaml. It can be as simple as: ---> single stage But also DVC supports multiple stages, etc ... with such and such benefits ..." All of this ^^ should be part of the Experiments "trail"/set of commands/use case. If people care only about pipelines, why won't we introduce (again, not a blocker from my end - just sharing my thoughts. Priority for me would be to keep the happy path around each scenario as simple as possible. I would be carefully looking into how docs would look like - are they easy to read, etc, etc). |
Okay, let's start with this. We can discuss the rest for product/docs discussion but doesn't need to block this.
Not sure I follow how it contradicts. From what I can tell, the difference is that you think the target user for the command should be someone coming for experiment tracking and I think the target user should be someone coming for pipelines. Am I misunderstanding? |
May be. Yes, I would be optimizing this command for a single audience. (and, yes probably experiments). If your idea was optimized for pipelines, then yes it sounds good. |
@dberenbaum - wdyt about 2 ? in scope for 3.0 ? |
Thanks @omesser! I don't think 2 is critical, especially since it doesn't sound like a breaking change. I was thinking of limiting 3.0 scope to 1 and making If we started from scratch, I agree |
This comment was marked as off-topic.
This comment was marked as off-topic.
@dberenbaum - to give context, I'm bringing this up because I'm working on the get-started docs at the moment, and |
A possible reason why some features might be underused is naming inconsistency.
dvc stage {add,list}
dvc repro
dvc run
surely should be unified as
dvc stage {add,list,run}
ordvc stage {add,list,repro}
? Could sanitising these CLI subcommands be part of the next major release?The text was updated successfully, but these errors were encountered: