-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using dvc.yaml
instead of artifacts.yaml
#4516
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me and they technically document all parts.
I am missing some sort of narrative tying things together, though.
Not sure where it would fit*, but it feels that even after these changes it would be hard for someone entering the DVC docs to find out about / understand the model registry.
*According to the metrics in plausible, people usually flow from root->get started->user guide/cmd ref
in a decreasing number of views.
default_root_dir="mymodel" | ||
) | ||
trainer.fit(model) | ||
live.log_artifact("mymodel", type="model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this pattern is that helpful for model registry since it will save the full directory of checkpoints. WDYT about something like this:
with Live(save_dvc_exp=True) as live:
checkpoint = pl.callbacks.ModelCheckpoint(dirpath="mymodel")
trainer = Trainer(
logger=DVCLiveLogger(save_dvc_exp=True), callbacks=checkpoint
)
trainer.fit(model)
live.log_artifact(checkpoint.best_model_path, type="model")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this for lightning, we should probably also update the setup for the other tabs so the best model is the one logged instead of the latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably also update the setup for the other tabs
Not quite get for which tabs exactly. For HF we save best iteration explicitly IIUC, for General Python API this is irrelevant, and for keras I'm also not sure we're saving each iteration. Can you please explain @daavoo ?
|
||
trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True)) | ||
trainer.fit(model) | ||
with Live(save_dvc_exp=True) as live: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add log_artifact
to the examples in https://dvc.org/doc/dvclive/ml-frameworks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take a while since I'm not that familiar with each of them. Adding this to your comment, and let's do this in a follow-up PR.
using Iterative Studio, watch this tutorial video or read on below: | ||
`dvc.yaml` file in your Git repository. If you are using the [GTO] command line | ||
tool, you can also add models [from the CLI][gto annotate]. To add models using | ||
Iterative Studio, watch this tutorial video or read on below: | ||
|
||
https://www.youtube.com/watch?v=szzv4ZXmYAs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update the video?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should. Added that to your comment.
- If you use [MLEM] to save your model, use the path to the binary file that | ||
MLEM generates. After you have run | ||
[`mlem init`](https://mlem.ai/doc/command-reference/init), Iterative Studio | ||
will be able to parse the `.mlem` file to extract model metadata. | ||
- If you use [MLEM] to save your model, use the path to the binary file or | ||
folder that MLEM generates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this much text about what path to use? Seems like the options are either:
- If the model file is in the Git repository (including if it is saved with DVC and/or MLEM), enter the relative path of the model (from the repository root).
- Otherwise, enter the URL to the model file in the cloud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aguschin Thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right to me. Fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aguschin!
In addition to the inline comments, there's a mention of artifacts.yaml
in https://dvc.org/doc/studio/get-started#manage-models.
I think we can address those comments and merge, but I wouldn't close #4423 because I think we need to make bigger changes how we explain the model registry throughout the docs, so let's discuss in #4423.
Co-authored-by: Dave Berenbaum <[email protected]>
@aguschin Could you update this page also? |
@dberenbaum, looks like all requested changes are addressed. Let's merge and I'm starting to work on the next chunk of changes. |
@dberenbaum Please note I enabled auto-merge to merge this once you approve. Thank you and @daavoo for the review! |
implements part of #4423
https://dvc-org-dvc-artifacts-8epucpc9.herokuapp.com/doc