Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify pipeline stages vs experiments #3630

Closed
1 task
casperdcl opened this issue Jun 9, 2022 · 8 comments
Closed
1 task

clarify pipeline stages vs experiments #3630

casperdcl opened this issue Jun 9, 2022 · 8 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: discussion Requires active participation to reach a conclusion.

Comments

@casperdcl
Copy link
Contributor

casperdcl commented Jun 9, 2022

Some features often underused/misunderstood/unknown could be helped by better docs/messaging/onboarding clarity.

  • Should there be a page clearly describing the difference between stages and experiements?

Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.

It doesn't seem clear to users what's the difference between stage/repro (i.e. pipelines) and exp (i.e. experiments).

  • A feature comparison table would be epic.
@casperdcl casperdcl added type: discussion Requires active participation to reach a conclusion. C: guide Content of /doc/user-guide C: start Content of /doc/start C: cases Content of /doc/use-cases labels Jun 9, 2022
@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) and removed C: guide Content of /doc/user-guide C: start Content of /doc/start C: cases Content of /doc/use-cases labels Jun 11, 2022
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 11, 2022

I think we're still waiting to see if repro is going to be deprecated in an upcoming release.

Rel iterative/dvc#7866 (comment)

@jorgeorpinel
Copy link
Contributor

Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.

We do mention exp run vs. repro specifically in several places like https://dvc.org/doc/user-guide/experiment-management/experiments-overview#basic-workflow, https://dvc.org/doc/user-guide/experiment-management/running-experiments#running-the-pipelines, and https://dvc.org/doc/command-reference/exp/run.

@casperdcl
Copy link
Contributor Author

casperdcl commented Jun 16, 2022

None of those links make it remotely clear what the difference is.

The closest near-miss to being potentially helpful is:

📖 dvc exp run is an experiment-specific alternative to dvc repro.

What are the use cases? When would you use one over another? Are there any examples? Does the description meaningfully reduce a confused user's frustration?

Related to https://stackoverflow.blog/2022/04/25/empathy-for-the-dev-avoiding-common-pitfalls-when-communicating-with-developers/

TL;DR:

  • don't forget the purpose
  • keep in mind the users
    • what do they already know?
    • what problem do they want to solve?
  • focus on how not what: "A common mistake [...] is to describe the what of the interface, instead of the how of a user’s workflow [e.g.] “Click the Confirm Button to confirm” [lol]"
  • have a quick-start guide
  • don’t ship your org chart, ship a solution (instead of categorising into products/features, categorise into use-cases/solutions)

very few users want to be using software. Instead, they want to do the things that software enables. [...] Users don’t want to buy your software, and they don’t want to read your documentation—they just want to have their problems solved

and http://mkremins.github.io/blog/doors-headaches-intellectual-need/

TL;DR:

A hammer (numerous dvc subcommands) seems pointless if you’ve never seen a nail (what are the different problems?)

  • solutions seem pointless if the corresponding problem/purpose isn’t clear… even if the problem is encountered later
  • it’s better to first demonstrate the problem before introducing a solution
  • examples
    • video gamers who find a locked door before finding a key make the logical connection (use key to unlock door) more often than those who find the key first
    • children often hate the (advanced) mathematics taught in school because it often seems pointless
    • functional programming monads are arguably simple, yet newcomers find them difficult… because they try to learn what they are are rather than what they’re for

@shcheklein
Copy link
Member

I think, I missing the point of the question, or I also have some bias.

exp is captured repro. exp enables a higher lever use case of "experiments" on top of some low level building blocks like pipelines (including repro), etc. Do we need a separate command like dvc repro - I don't know. I don't like it personally "aesthetically" (that it's disconnected from dvc stage, that it overlaps with exp, etc). I also don't like dvc run that is hopefully will be replaced finally with dvc stage add. But it feels that some low level "make"-file like interface has its place.

Can I come up with a use case where dvc exp run won't solve the problem? Don't know tbh, feels like no, so again it will be only some aesthetics, or some edge cases. May be some automation, when it's clear that you don't want to deal with some overhead (no matter how small it is) of the dvc exp run. May be we can rename it to dvc stage run --all to make it cleaner.

Nothing in use-cases/experiment-tracking nor user-guide/experiment-management seems to tell existing dvc repro users why they should bother with/what are the use cases of dvc exp.

the whole point was not to complicate this and not bother users of dvc exp with low level details like dvc repro - why should they care? why do you think it's important for people who come to experiments to know about some strange alternative?

It doesn't seem clear to users what's the difference between stage/repro (i.e. pipelines) and exp (i.e. experiments).

as I mentioned, what you call pipelines is just one of the building blocks for experiments

Should there be a page clearly describing the difference between stages and experiements?

I can only see it from the perspective of a single command (repro vs exp run), what else? stage add does not compete at all with experiments.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 19, 2022

In case I wasn't clear earlier: I also wish this topic was clearer, but there's ambiguity in the product itself, and the docs are reflecting that. Deprecating repro or even exp is constantly chattered about, for example. @casperdcl do you have a suggestion on how to clarify this?

exp is captured repro
low level "make"-file like interface

I like this. exp builds on top of repro and the latter becomes more of a "helper" (kind of how we expose fetch even when it's part of pull). Good notes for the cmd ref as @shcheklein points out.

why do you think it's important for people who come to experiments to know about some strange alternative?

Yes, we consciously decided not to do this. In fact we have a pending task to remove all or most "pipeline" info from https://dvc.org/doc/user-guide/experiment-management/running-experiments (see #2768).

@casperdcl
Copy link
Contributor Author

CLI discussion at iterative/dvc#7866 is a prerequisite to docs.

@drozzy
Copy link

drozzy commented Sep 1, 2022

These two clarification points I've found in various places (the latter one from @SoyGema) have been very useful for me as a user:

  1. Experiments commands exp produce a git ref, that is how it stores its state.
  2. "If you use dvc repro, each time you execute it will overwrite everything without going back unless you commit in between each execution." "dvc exp run allows to run different experiments, for example hyper parameter changes without having to create a commit for each one"

@dberenbaum
Copy link
Contributor

Some additional feedback.

From @mvshmakov:

We’ve recently discovered that dvc repro is not really suitable for CI if the user wants live experiments in Studio to be enabled. As dvc repro does not create a new experiment, we don’t log params to the Studio, thus the experiment will be displayed only partially.

From https://discord.com/channels/485586884165107732/1065577177007018015/1065630078668648458:

I guess I was confused because when I checked the difference in docu, dvc exp run has the comment "Provides a way to execute and track experiments in your project without polluting it with unnecessary commits, branches, directories, etc." so I thought dvc exp is only "experimental" mode for stuff I don't want to have tracked (which I wanted). A remark about legacy in dvc run docs could be preventing further newbies like me asking stupid questions 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: discussion Requires active participation to reach a conclusion.
Projects
None yet
Development

No branches or pull requests

5 participants