Using `dvc.yaml` instead of `artifacts.yaml` #4516

aguschin · 2023-05-05T11:53:12Z

implements part of #4423

https://dvc-org-dvc-artifacts-8epucpc9.herokuapp.com/doc

github-actions · 2023-05-05T12:06:41Z

Link Check Report

There were no links to check!

daavoo

Changes look good to me and they technically document all parts.

I am missing some sort of narrative tying things together, though.

Not sure where it would fit*, but it feels that even after these changes it would be hard for someone entering the DVC docs to find out about / understand the model registry.

*According to the metrics in plausible, people usually flow from root->get started->user guide/cmd ref in a decreasing number of views.

content/docs/dvclive/index.md

content/docs/start/experiments/experiment-tracking.md

content/docs/dvclive/index.md

dberenbaum · 2023-05-05T17:44:52Z

content/docs/start/experiments/experiment-tracking.md

+        default_root_dir="mymodel"
+    )
+    trainer.fit(model)
+    live.log_artifact("mymodel", type="model")


Not sure this pattern is that helpful for model registry since it will save the full directory of checkpoints. WDYT about something like this:

with Live(save_dvc_exp=True) as live: checkpoint = pl.callbacks.ModelCheckpoint(dirpath="mymodel") trainer = Trainer( logger=DVCLiveLogger(save_dvc_exp=True), callbacks=checkpoint ) trainer.fit(model) live.log_artifact(checkpoint.best_model_path, type="model")

If we do this for lightning, we should probably also update the setup for the other tabs so the best model is the one logged instead of the latest

we should probably also update the setup for the other tabs

Not quite get for which tabs exactly. For HF we save best iteration explicitly IIUC, for General Python API this is irrelevant, and for keras I'm also not sure we're saving each iteration. Can you please explain @daavoo ?

dberenbaum · 2023-05-05T17:46:12Z

content/docs/start/experiments/experiment-tracking.md

-
-trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True))
-trainer.fit(model)
+with Live(save_dvc_exp=True) as live:


Let's also add log_artifact to the examples in https://dvc.org/doc/dvclive/ml-frameworks.

This will take a while since I'm not that familiar with each of them. Adding this to your comment, and let's do this in a follow-up PR.

content/docs/start/experiments/experiment-tracking.md

content/docs/user-guide/experiment-management/index.md

content/docs/user-guide/project-structure/dvcyaml-files.md

content/docs/dvclive/index.md

content/docs/studio/user-guide/model-registry/add-a-model.md

dberenbaum · 2023-05-05T18:17:14Z

content/docs/studio/user-guide/model-registry/add-a-model.md

-using Iterative Studio, watch this tutorial video or read on below:
+`dvc.yaml` file in your Git repository. If you are using the [GTO] command line
+tool, you can also add models [from the CLI][gto annotate]. To add models using
+Iterative Studio, watch this tutorial video or read on below:

 https://www.youtube.com/watch?v=szzv4ZXmYAs


Do we need to update the video?

Yes, we should. Added that to your comment.

dberenbaum · 2023-05-05T18:21:09Z

content/docs/studio/user-guide/model-registry/add-a-model.md

-   - If you use [MLEM] to save your model, use the path to the binary file that
-     MLEM generates. After you have run
-     [`mlem init`](https://mlem.ai/doc/command-reference/init), Iterative Studio
-     will be able to parse the `.mlem` file to extract model metadata.
+   - If you use [MLEM] to save your model, use the path to the binary file or
+     folder that MLEM generates.


Do we need this much text about what path to use? Seems like the options are either:

If the model file is in the Git repository (including if it is saved with DVC and/or MLEM), enter the relative path of the model (from the repository root).

Otherwise, enter the URL to the model file in the cloud.

@aguschin Thoughts on this?

Looks right to me. Fixing!

dberenbaum

Thanks @aguschin!

In addition to the inline comments, there's a mention of artifacts.yaml in https://dvc.org/doc/studio/get-started#manage-models.

I think we can address those comments and merge, but I wouldn't close #4423 because I think we need to make bigger changes how we explain the model registry throughout the docs, so let's discuss in #4423.

…-artifacts

Co-authored-by: Dave Berenbaum <[email protected]>

content/docs/start/experiments/experiment-tracking.md

dberenbaum · 2023-05-12T17:26:35Z

In addition to the inline comments, there's a mention of artifacts.yaml in https://dvc.org/doc/studio/get-started#manage-models.

@aguschin Could you update this page also?

aguschin · 2023-05-15T10:03:53Z

@dberenbaum, looks like all requested changes are addressed. Let's merge and I'm starting to work on the next chunk of changes.

aguschin · 2023-05-15T10:05:21Z

@dberenbaum Please note I enabled auto-merge to merge this once you approve. Thank you and @daavoo for the review!

content/docs/start/experiments/experiment-tracking.md

Changes were addressed.

initial updates

15d6827

aguschin self-assigned this May 5, 2023

shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 5, 2023 11:56 Inactive

aguschin requested review from dberenbaum and daavoo May 5, 2023 11:57

daavoo reviewed May 5, 2023

View reviewed changes

content/docs/dvclive/index.md Outdated Show resolved Hide resolved