Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dvc.yaml instead of artifacts.yaml #4516

Merged
merged 11 commits into from
May 16, 2023
9 changes: 9 additions & 0 deletions content/docs/dvclive/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,15 @@ live.log_artifact("model.pt")

See `Live.log_artifact()`.

</tab>
<tab title="Models">
aguschin marked this conversation as resolved.
Show resolved Hide resolved

```python
live.log_artifact("model.pt", type="model")
aguschin marked this conversation as resolved.
Show resolved Hide resolved
```

See `Live.log_artifact()`.

aguschin marked this conversation as resolved.
Show resolved Hide resolved
</tab>
<tab title="Images">

Expand Down
2 changes: 1 addition & 1 deletion content/docs/dvclive/live/log_artifact.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ If `Live` was initialized with `dvcyaml=True` (which is the default), it will
add an [artifact](/doc/user-guide/project-structure/dvcyaml-files#artifacts) and
all the metadata passed as arguments to the corresponding `dvc.yaml`. Passing
`type="model"` will mark it as a `model` for DVC and will make it appear in
[Studio Model Registry](/doc/studio) (coming soon).
[Studio Model Registry](/doc/studio).

If `name` is not provided, the path stem (last part of the path without the file
extension) will be used as the artifact name.
Expand Down
32 changes: 22 additions & 10 deletions content/docs/start/experiments/experiment-tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,13 @@ There are some examples below
from dvclive.lightning import DVCLiveLogger

...

trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True))
trainer.fit(model)
with Live(save_dvc_exp=True) as live:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add log_artifact to the examples in https://dvc.org/doc/dvclive/ml-frameworks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take a while since I'm not that familiar with each of them. Adding this to your comment, and let's do this in a follow-up PR.

trainer = Trainer(
logger=DVCLiveLogger(save_dvc_exp=True),
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
default_root_dir="mymodel"
)
trainer.fit(model)
live.log_artifact("mymodel", type="model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this pattern is that helpful for model registry since it will save the full directory of checkpoints. WDYT about something like this:

with Live(save_dvc_exp=True) as live:
    checkpoint = pl.callbacks.ModelCheckpoint(dirpath="mymodel")
    trainer = Trainer(
        logger=DVCLiveLogger(save_dvc_exp=True), callbacks=checkpoint
    )
    trainer.fit(model)
    live.log_artifact(checkpoint.best_model_path, type="model")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this for lightning, we should probably also update the setup for the other tabs so the best model is the one logged instead of the latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably also update the setup for the other tabs

Not quite get for which tabs exactly. For HF we save best iteration explicitly IIUC, for General Python API this is irrelevant, and for keras I'm also not sure we're saving each iteration. Can you please explain @daavoo ?

```

</tab>
Expand All @@ -51,9 +55,11 @@ trainer.fit(model)
from dvclive.huggingface import DVCLiveCallback

...

trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
trainer.train()
with Live(save_dvc_exp=True) as live:
trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
trainer.train()
trainer.save_model("mymodel")
live.log_artifact("mymodel", type="model")
```

</tab>
Expand All @@ -64,10 +70,14 @@ trainer.train()
from dvclive.keras import DVCLiveCallback

...

model.fit(
train_dataset, validation_data=validation_dataset,
callbacks=[DVCLiveCallback(save_dvc_exp=True)])
with Live(save_dvc_exp=True) as live:
model.fit(
train_dataset,
validation_data=validation_dataset,
callbacks=[DVCLiveCallback(save_dvc_exp=True)]
)
model.save("mymodel")
live.log_artifact("mymodel", type="model")
```

</tab>
Expand All @@ -86,6 +96,8 @@ with Live(save_dvc_exp=True) as live:
for metric_name, value in metrics.items():
live.log_metric(metric_name, value)
live.next_step()

live.log_artifact("model.pkl", type="model")
```
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

</tab>
Expand Down
20 changes: 9 additions & 11 deletions content/docs/studio/user-guide/model-registry/add-a-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

You can add models from any ML project to the model registry. To add a model to
your model registry, Iterative Studio creates an annotation for it in an
aguschin marked this conversation as resolved.
Show resolved Hide resolved
`artifacts.yaml` file in your Git repository. If you are using the [GTO] command
line tool, you can also add models [from the CLI][gto annotate]. To add models
using Iterative Studio, watch this tutorial video or read on below:
`dvc.yaml` file in your Git repository. If you are using the [GTO] command line
tool, you can also add models [from the CLI][gto annotate]. To add models using
Iterative Studio, watch this tutorial video or read on below:

https://www.youtube.com/watch?v=szzv4ZXmYAs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the video?

Copy link
Contributor Author

@aguschin aguschin May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. Added that to your comment.


Expand All @@ -28,10 +28,8 @@ https://www.youtube.com/watch?v=szzv4ZXmYAs
project path of the corresponding `.dvc` file.
- If the model file is in remote storage and is not DVC-tracked, enter the
absolute path of the model file.
- If you use [MLEM] to save your model, use the path to the binary file that
MLEM generates. After you have run
[`mlem init`](https://mlem.ai/doc/command-reference/init), Iterative Studio
will be able to parse the `.mlem` file to extract model metadata.
- If you use [MLEM] to save your model, use the path to the binary file or
folder that MLEM generates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this much text about what path to use? Seems like the options are either:

  1. If the model file is in the Git repository (including if it is saved with DVC and/or MLEM), enter the relative path of the model (from the repository root).
  2. Otherwise, enter the URL to the model file in the cloud.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aguschin Thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me. Fixing!


If the path you entered is a cloud path, Iterative Studio will ask you for
the repository path where the dvc reference to the model should be saved.
Expand All @@ -54,10 +52,10 @@ https://www.youtube.com/watch?v=szzv4ZXmYAs
8. At this point, the new model appears in the models dashboard.

9. In your Git repository, you will find that an entry for the new model has
been created in the `artifacts.yaml` file in the repository's root. If you
had committed to a new branch, a new pull request (or merge request in the
case of GitLab) will also have been created to merge the new branch into the
base branch.
been created in the `dvc.yaml` file in the repository's root. If you had
committed to a new branch, a new pull request (or merge request in the case
of GitLab) will also have been created to merge the new branch into the base
branch.
10. If you had added a model from a cloud storage, the following will also
happen before the commit is created:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Note that while you can get the basic Model Registry functionality within
Iterative Studio, there are more things you can do using the [MLEM] and [GTO]
command line interface (CLI). For example, to save and deploy models, you will
need to use MLEM, although future iterations of the Model Registry may
incorporate these tasks also. Similarly, you can use GTO in your CI/CD actions
incorporate these tasks also. Similarly, you can use [GTO] in your CI/CD actions
to interpret Git tags for deploying the models to the desired environment.

[semantic versioning]: https://semver.org/
Expand Down
12 changes: 11 additions & 1 deletion content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,14 @@ To save an experiment, you can follow one of these roads:
Experiments are saved locally by default but you can [share] them so that anyone
can reproduce your work.

## Metrics, plots, and parameters
## Datasets and models
aguschin marked this conversation as resolved.
Show resolved Hide resolved

DVC can track datasets or models as part of your repo. One way to let DVC know
the specific artifact is a model or a dataset is to use [DVCLive]. You can also
manually add them to `dvc.yaml`. For models, you'll see them appear in [Studio
Model Registry].
aguschin marked this conversation as resolved.
Show resolved Hide resolved

## Metrics, plots, parameters

DVC can track and compare <abbr>parameters</abbr>, <abbr>metrics</abbr>, and
<abbr>plots</abbr> data saved in standard structured files like YAML, JSON, and
Expand Down Expand Up @@ -74,9 +81,12 @@ https://www.youtube.com/watch?v=LHi3SWGD9nc
[pipeline]: /doc/user-guide/pipelines
[run]: /doc/user-guide/experiment-management/running-experiments
[share]: /doc/user-guide/experiment-management/sharing-experiments
[artifacts]: /doc/user-guide/project-structure/dvcyaml-files#artifacts
[parameters]: /doc/user-guide/project-structure/dvcyaml-files#params
[metrics]: /doc/user-guide/project-structure/dvcyaml-files#metrics
[plots]: /doc/user-guide/project-structure/dvcyaml-files#plots
[visualize plots]: /doc/user-guide/experiment-management/visualizing-plots
[from the vs code ide]: /doc/vs-code-extension
[iterative studio]: /doc/studio
[studio model registry]:
/doc/studio/user-guide/model-registry/what-is-a-model-registry
5 changes: 2 additions & 3 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ Although you can specify artifacts of any `type`, we are in the process of
building a DVC-based [model registry](/doc/use-cases/model-registry) that will
pick up any artifacts with type `model`. Additionally, they will be picked up
aguschin marked this conversation as resolved.
Show resolved Hide resolved
and supported by
[Studio Model Registry](/doc/studio/user-guide/model-registry/what-is-a-model-registry)
(coming soon).
[Studio Model Registry](/doc/studio/user-guide/model-registry/what-is-a-model-registry).

```yaml
artifacts:
Expand All @@ -37,7 +36,7 @@ artifacts:
```

Artifact IDs must consist of letters and numbers, and use '-' as separator (but
not at the start or end). The first character must be a letter.
not at the start or end).

## Metrics

Expand Down