-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: exp init
improvements
#3071
Conversation
This comment has been minimized.
This comment has been minimized.
└── src/ | ||
``` | ||
|
||
> Note that `params.yaml` is the only required file (see `dvc params`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note ... for the sake of my education: why is it required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To parse the parameters and put them into dvc.yaml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but even if you don't have params an empty file is still required.
Related to iterative/dvc#6446 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also related to iterative/dvc#6605. The params
section of dvc.yaml
currently expects keys from a file (see below), not a filename itself, so all the parameters need to be defined before creating the stage. If we want to stop requiring this file at the time of dvc exp init
, we need to change params
to accept filenames.
stages:
train:
cmd: python train.py
deps:
- data
- src
params:
- params.py:
- batch_size
- data_path
- epochs
- latent_dim
- num_samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about generating a dvc.yaml without params
if there's no parmas.yaml file? This is a core discussion though! Moved to iterative/dvc#6446 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood you, and now I think it's actually more of a docs issue 🤔 . It is possible to use dvc exp init
without params
. You can either use --explicit
and not provide --params
, or you can use dvc exp init -i
and type n
at the params prompt.
The note here is to indicate that if you do use either the default or some other params
path, then that path must exist and be parseable to extract the parameters for the stage. So the current note probably needs to be reworded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE: Looks like the discussion about empty params.yaml files is in a few core repo tickets now (e.g. iterative/dvc#7138) so I won't keep discussing here (still a bit confused though).
It is possible to use dvc exp init without params...
So the current note probably needs to be reworded.
Clarified in 2273431.
### The `command` argument | ||
|
||
The `command` argument is optional, if you are using `--interactive` mode. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW a question I have is whether to keep this section or move command
under Options like we did for targets
in https://dvc.org/doc/command-reference/repro#options. I personally like the section more but I remember we discussed it (cc @shcheklein) this and using Options was picked, so for consistency I'd move this under Options as well.
From #3015 (review)
mention --explicit avoids a params.yaml file too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A couple small notes to help clarify for users what the expectations are for the different paths.
@@ -19,43 +18,57 @@ usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f] | |||
|
|||
## Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in the description, it might be useful to explain that dvc exp init
by default expects that input data, parameters, and source code paths exist before running an experiment, and that the command is expected to generate models, metrics, and plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more or a problem for exp run
though, @dberenbaum (already linked form the --run
option). But should we state that exp run
(and repro
for that matter) expect that the stage definition and code are good? Hopefully its evident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more or a problem for
exp run
though
Well that's the point of exp init
, right? Better to have users understand what's needed up front than to have them run exp init
only to fail on exp run
.
Doesn't need to be part of this PR. It could also be handled by some of the suggested changes to the core command rather than the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to have users understand what's needed up front than to have them run exp init only to fail on exp run.
I'm not sure. If users need to read that on the cmd ref., is exp init
serving as a self-explanatory command? I think those notes should be in the command's output fist (and added to this doc as a result of that product update).
looks good to me (I didn't check the details though, since Dave've done this already) Do we want to add an example? (may be even animation?) |
Yep @shcheklein it's a check box I didn't get to yet ⌛ (could be a separate PR)
Good idea in general to consider gifs but in this case I think a static console sample with the entire |
UPDATE: I added a simple example just to give a sense of the interactive-mode UI. Not sure which features we want to highlight with examples otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Adding an example is great, but why not focus on a model training example?
@@ -19,43 +18,57 @@ usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f] | |||
|
|||
## Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more or a problem for
exp run
though
Well that's the point of exp init
, right? Better to have users understand what's needed up front than to have them run exp init
only to fail on exp run
.
Doesn't need to be part of this PR. It could also be handled by some of the suggested changes to the core command rather than the docs.
|
||
Let's prepare a raw data ingestion script to start running experiments on it. | ||
|
||
```dvc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a model training example? Data ingestion is a nice application to show how this can be useful even for non-training stages, but model training is the primary use case. It's unlikely for someone to run experiments on a data ingestion script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data ingestion example produced a very small dvc.yaml file and then I had that note about using stage add
to produce the same. Idk, I thought it was interesting. Anyway, I agree the training scenario is more applicable so I replaced all that now.
by our code in question, so the default is used. Models, metrics, and plots are | ||
omitted as there are none for an ingestion process. | ||
|
||
The resulting `dvc.yaml` file is simply: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning pipelines briefly to explain what's in dvc.yaml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actively avoiding mixing in that concept here but it is mentioned strategically once in the Description. I did elaborate on this sentence a bit though.
OK @dberenbaum . Done in 56f4a46. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Thanks! |
Looks like the failing link check is a false positive. I'll see if this is already reported to the link checker repo. Merging 🔀 |
# exp init | ||
|
||
Codify project using [DVC metafiles](/doc/user-guide/project-structure) to run | ||
[experiments](/doc/user-guide/experiment-management). | ||
Quickly setup any project to use [DVC Experiments]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this (per last check box in this PR desc.) @dberenbaum @skshetry . Currently dvc exp init -h
prints Initialize DVC in the current directory. Expects directory to be a Git repository...
Do we want to update that text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the dvc init -h
text 😄 . dvc exp init -h
says Initialize experiments
. However, I think we agreed this new language is better, so I'm happy to put in a PR to update in the core repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the dvc init -h text 😄
yep 🤦 sry. OK, thanks for the PR
* ref: first copy edits to exp init * ref: clarify exp init explanations * ref: clarify `exp init` option descriptions * ref: re-describe `exp init` + reorder in nav and `exp` help per #3071 (review) * ref: clarify params.yaml is needed only with defaults in params.yaml per #3071 (comment) * ref: clarify what --interactive prompts user for per #3071 (review) * ref: link from exp init to config section and mention --explicit avoids a params.yaml file too. * ref: simplify exp init --explicit explanation * ref: explain why params.yaml are required (by default) in exp init per #3071 (comment) * ref: copy edits to exp init * ref: add simple example to exp init rel. #3071 (comment) * ref: use model training example in exp init per #3071 (review) * ref: shorten sample block
exp run
in nav[WIP] cmd-ref: document exp init #3015 (review) if still neededref:exp init
improvements #3071 (review).Full Configuration sectionLinked to https://dvc.org/doc/command-reference/config#exp + config: reorder and summarize sections (in ref.) #3098command
section or move to Options?See ref:
exp init
improvements #3071 (review)exp (init)
help output decided here?See ref:
exp init
improvements #3071 (review) below.