Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline experiments commands #5413

Closed
dberenbaum opened this issue Feb 4, 2021 · 13 comments
Closed

Streamline experiments commands #5413

dberenbaum opened this issue Feb 4, 2021 · 13 comments
Assignees
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion

Comments

@dberenbaum
Copy link
Collaborator

Experiments already has ten different commands before even being released:

$ dvc exp --help
usage: dvc experiments [-h] [-q | -v] {show,apply,diff,run,reset,gc,branch,list,push,pull} ...

Commands to run and compare experiments.
Documentation: <https://man.dvc.org/exp>

positional arguments:
  {show,apply,diff,run,reset,gc,branch,list,push,pull}
                        Use `dvc experiments CMD --help` to display command-specific help.
    show                Print experiments.
    apply               Apply the changes from an experiment to your workspace.
    diff                Show changes between experiments in the DVC repository.
    run                 Reproduce complete or partial experiment pipelines.
    reset               Reset and restart checkpoint experiments.
    gc                  Garbage collect unneeded experiments.
    branch              Promote an experiment to a Git branch.
    list                List local and remote experiments.
    push                Push a local experiment to a Git remote.
    pull                Pull an experiment from a Git remote.

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.

This may be confusing for new users, and more commands are likely over time. Once experiments is released, it will be harder to get rid of commands. Are there any commands that can be dropped or streamlined?

Some specific commands to possibly streamline:

  • dvc exp run: Given that dvc run is likely to be deprecated and dvc exp is likely to keep growing, what is the role of this command? What about dvc stage add and dvc repro, which have similar functionality? Should dvc exp run merely add/edit a stage by default rather than execute, or should this at least be an option?
  • dvc exp branch: This behaves similarly to dvc exp apply followed by git checkout -b. One major difference is that dvc exp branch will promotes all experiments in the workspace to git commits in the new branch. Does this make sense as a separate command, or could it be captured as an option? Is there any other way to promote experiments to commits without creating a new branch? Do the command name and description adequately describe its behavior?
  • dvc exp reset: Does this need its own command or could it be an option? Does it make sense that the command both deletes checkpoints and then duplicates dvc exp run functionality?
@dberenbaum dberenbaum added the A: experiments Related to dvc exp label Feb 4, 2021
@shcheklein
Copy link
Member

One random suggestion/thought I had about DVC in general - we can at least/also group them? (if it's impossible to get rid of some). Basic commands, commands to share experiments (pull/push/list), checkpoint? Help should be structured with these groups in mind then (similar to git I think).

@dberenbaum
Copy link
Collaborator Author

Yes, grouping and ordering is likely key to help guide users. I think this fits under general UI guidelines that @skshetry is researching.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 5, 2021

Given that dvc run is likely to be deprecated and dvc exp to keep growing, what is the role of this command?

It also seems to overlap (conceptually) with repro. In fact it was dvc repro --experiment originally, I think. From the wiki: "This is essentially now dvc repro with the results (and intermediate checkpoints) saved in an experiment branch."

@pmrowla
Copy link
Contributor

pmrowla commented Feb 5, 2021

In the long term I think the plan is to eventually replace repro with exp run entirely, but for now I would not consider experiments stable enough to really have a discussion about potentially deprecating repro yet

@pmrowla
Copy link
Contributor

pmrowla commented Feb 9, 2021

  • dvc exp reset: Does this need its own command or could it be an option? Does it make sense that the command both deletes checkpoints and then duplicates dvc exp run functionality?

Regarding this question, I wasn't sure whether or not exp reset should run an experiment automatically or if we should require a workflow like:

$ dvc exp reset
$ dvc exp run

But if we are going to keep the behavior to run automatically we could also just make it a flag like dvc exp run --reset.

@pmrowla pmrowla added the discussion requires active participation to reach a conclusion label Feb 9, 2021
@dberenbaum
Copy link
Collaborator Author

I think there still needs to be a way for users to easily delete specified experiments or checkpoints (possibly after the 2.0 release). dvc exp gc doesn't seem quite right for this, so I think it would make sense to either use dvc exp reset for this or to add a dvc exp delete command (in which case, it probably makes sense to do dvc exp run --reset rather than have a separate dvc exp reset command just for this limited usage).

I prefer that reset doesn't run the experiment automatically because:

  • It provides more flexibility to expand that command to do things like delete specified experiments/checkpoints in the future.
  • Although it's an extra command for the user to type, I think it's a better UX since users can examine the changes made by reset before kicking off a potentially long-running new job.
  • Other dvc commands borrow from git, and having reset only delete experiments is probably closer to its git equivalent.

@pmrowla
Copy link
Contributor

pmrowla commented Feb 9, 2021

Implementing something like exp rm for removing specific experiments is simple enough and the functionality for doing it already exists internally, it's just not exposed via the CLI right now. We will also probably want to have exp rm support removing pushed experiments from the git remote as well, since there is currently no way to do that via the CLI yet either.

@pmrowla pmrowla self-assigned this Feb 9, 2021
This was referenced Feb 9, 2021
@dberenbaum
Copy link
Collaborator Author

Moved the discussion of dropping experiments to #5437.

One major difference is that `dvc exp branch` will promotes all experiments in the workspace to git commits in the new branch.

Sorry, this was incorrectly stated. dvc exp branch will promote all checkpoints from an experiment to individual git commits in the new branch. Still, is there any other way to promote checkpoints? This is interesting and potentially useful, but it doesn't seem like it has much to do with branches.

Overall, it looks like dvc exp branch does the following:

  • Creates a new branch.
  • Creates a commit for each checkpoint/run in that experiment in the new branch.
  • Renames the experiment using the branch name.

It's doing a lot, and it wasn't clear to me from the help text what it would do. I think at minimum, the docs likely need to have an example showing all this functionality, like:

$ dvc exp show --no-pager
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Experiment    ┃ Created      ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace     │ -            │     1 │ 0     │
│ main          │ Jan 21, 2021 │     - │ 0     │
│ │ ╓ exp-f3c4a │ 12:59 PM     │     1 │ 0     │
│ ├─╨ bb6cf9d   │ 12:59 PM     │     0 │ 0     │
└───────────────┴──────────────┴───────┴───────┘
$ dvc exp branch exp-f3c4a new_branch
$ git checkout -f new_branch
$ dvc exp show --no-pager
┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Experiment ┃ Created  ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace  │ -        │     1 │ 0     │
│ exp        │ 12:59 PM │     1 │ 0     │
└────────────┴──────────┴───────┴───────┘
$ git log
commit 60fa72a9061ff91023aef9b6a16889545e3cd238 (HEAD -> exp, refs/exps/exec/EXEC_CHECKPOINT, refs/exps/exec/EXEC_BRANCH, refs/exps/exec/EXEC_APPLY, refs/exps/01/4ea37f98c0221bdbfba8528e92920bf1dc35ba/exp-f3c4a)
Author: David Berenbaum <[email protected]>
Date:   Tue Feb 9 17:59:58 2021 +0000

    dvc: commit experiment c4b8912ed1b9a83602114b2d09431069a5dc97ee892b001b7721d91a694ceb58

commit bb6cf9d63c7d75e47dc8209695d02d4f5ec9e28a
Author: David Berenbaum <[email protected]>
Date:   Tue Feb 9 17:59:57 2021 +0000

    dvc: commit experiment f3c4a8986bbca3e6732d6a594e10b6b0794f34d1ed291b223837c53f94eff32c

Alternatively, there could be a command that does less, like dvc exp promote that creates commits in the current branch from checkpoints in the specified experiment using the existing experiment name.

Thoughts?

One random suggestion/thought I had about DVC in general - we can at least/also group them? (if it's impossible to get rid of some). Basic commands, commands to share experiments (pull/push/list), checkpoint? Help should be structured with these groups in mind then (similar to git I think).

For the exp subcommands, grouping might be overkill, although I love the idea for the high-level dvc commands. For exp, is there any particular order now? I would put them in order based roughly on some combination of usage frequency and shared functionality, which I would guess to be something like {run,show,list,diff,apply,branch,reset,gc,push,pull}.

@pmrowla
Copy link
Contributor

pmrowla commented Feb 10, 2021

Sorry, this was incorrectly stated. dvc exp branch will promote all checkpoints from an experiment to individual git commits in the new branch. Still, is there any other way to promote checkpoints? This is interesting and potentially useful, but it doesn't seem like it has much to do with branches.

Overall, it looks like dvc exp branch does the following:

So internally, an experiment is already equivalent to a git branch, and individual git commits for each checkpoint already exist. The only difference between an experiment and a regular git branch is that experiments go in refs/exps/... and git branches go in refs/heads. Regular git commands like git merge/git rebase/etc only "see" branches inside refs/heads though. So if you wanted to merge an experiment into your current branch, you can't directly do git merge exp-1234 - you would have to do git merge refs/exps/.../exp-1234` with the full ref path.

dvc exp branch creates an entry in refs/heads that points to the tip of the experiment. exp branch just makes it so that the experiment ref now also appears as a regular git branch to regular git commands - meaning that the shorthand experiment names like exp-1234 can be used in git commands.

Essentially it's just a wrapper for

git branch exp-1234 refs/exps/.../exp-1234

the same way you can create a new branch from any other existing git branch/tag/sha/ref

Alternatively, there could be a command that does less, like dvc exp promote that creates commits in the current branch from checkpoints in the specified experiment using the existing experiment name.

Thoughts?

This is the same thing as just doing a git merge from an experiment into master (or whatever your current branch is). So dvc exp promote would really just be an alias for

git merge refs/exps/.../exp-1234  # into current branch

The idea behind dvc exp branch was that if users want to get the full set of checkpoint commits in their current branch, they would do something like

dvc exp branch exp-1234  # git branch exp-1234 containing checkpoint commits now exists
# user can now do whatever they want with that branch (merge, rebase, add commits, etc)
# using whatever normal workflow they are used to doing with git branches
git merge exp-1234  # into current branch

Basically we can add wrappers for whatever git functionality we want (like merge/rebase) but originally we thought it might just be easier to have a command that lets the user see a "normal" git branch and then let the user do whatever they want with it.

@dberenbaum
Copy link
Collaborator Author

Thanks, @pmrowla! I know this is more in-depth than what we will want in the docs, but hopefully we can use similar language there since this explanation made the command clear to me. Closing this out.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 24, 2021

Hm. I see this is closed but I wanted to mention I've detected some possible consistency issues in short flag naming e.g. exp run -n stands for name, while exp show -n stands for number I think. Like those there are others that may lack consistency among themselves and vs. existing command flags.

Sorry it tool me this long to bring this up but it's been a concern in the past with other commands (see #3422) so prob worth mentioning still — is there still time to rename now or should I just add to the list in #3422?

@dberenbaum
Copy link
Collaborator Author

@jorgeorpinel Can you document the issues in here or #3422 and then we can decide what to do?

@jorgeorpinel
Copy link
Contributor

OK, as soon as I finish writing all the refs I should be able to do that comprehensively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion
Projects
None yet
Development

No branches or pull requests

4 participants