Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dvc.yaml instead of artifacts.yaml #4516

Merged
merged 11 commits into from
May 16, 2023
2 changes: 1 addition & 1 deletion content/docs/dvclive/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Including `save_dvc_exp=True` will automatically
<tab title="Artifacts">

```python
live.log_artifact("model.pt")
live.log_artifact("model.pt", type="model", name="gpt")
```

See `Live.log_artifact()`.
Expand Down
11 changes: 10 additions & 1 deletion content/docs/dvclive/live/log_artifact.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,25 @@ If `Live` was initialized with `dvcyaml=True` (which is the default), it will
add an [artifact](/doc/user-guide/project-structure/dvcyaml-files#artifacts) and
all the metadata passed as arguments to the corresponding `dvc.yaml`. Passing
`type="model"` will mark it as a `model` for DVC and will make it appear in
[Studio Model Registry](/doc/studio) (coming soon).
[Studio Model Registry](/doc/studio).

## Parameters

- `path` - an existing directory or file.

- `type` - an optional type of the artifact. Common types are `model` or
`dataset`.

- `name` - an optional custom name of an artifact. If not provided the path stem
(last part of the path without the file extension) will be used as the
artifact name.

- `desc` - an optional description of an artifact.

- `labels` - optional labels describing the artifact.

- `meta` - optional metainformation in `key: value` format.

- `copy` - copy a directory or file at `path` into the `dvclive/artifacts`
location ([default](/doc/dvclive/how-it-works#directory-structure)) before
tracking it. The new path is used instead of the original one to track the
Expand Down
49 changes: 38 additions & 11 deletions content/docs/start/experiments/experiment-tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,21 @@ There are some examples below
from dvclive.lightning import DVCLiveLogger

...

trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True))
trainer.fit(model)
with Live(save_dvc_exp=True) as live:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add log_artifact to the examples in https://dvc.org/doc/dvclive/ml-frameworks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take a while since I'm not that familiar with each of them. Adding this to your comment, and let's do this in a follow-up PR.

checkpoint = ModelCheckpoint(dirpath="mymodel")
trainer = Trainer(
logger=DVCLiveLogger(
save_dvc_exp=True,
experiment=live
),
callbacks=checkpoint
)
trainer.fit(model)
live.log_artifact(
checkpoint.best_model_path,
type="model",
name="lightning-model"
)
```

</tab>
Expand All @@ -51,9 +63,13 @@ trainer.fit(model)
from dvclive.huggingface import DVCLiveCallback

...

trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))
trainer.train()
with Live(save_dvc_exp=True) as live:
trainer.add_callback(
DVCLiveCallback(save_dvc_exp=True, live=live)
)
trainer.train()
trainer.save_model("mymodel")
live.log_artifact("mymodel", type="model")
```

</tab>
Expand All @@ -64,10 +80,16 @@ trainer.train()
from dvclive.keras import DVCLiveCallback

...

model.fit(
train_dataset, validation_data=validation_dataset,
callbacks=[DVCLiveCallback(save_dvc_exp=True)])
with Live(save_dvc_exp=True) as live:
model.fit(
train_dataset,
validation_data=validation_dataset,
callbacks=[
DVCLiveCallback(save_dvc_exp=True, live=live)
]
)
model.save("mymodel")
live.log_artifact("mymodel", type="model")
```

</tab>
Expand All @@ -86,6 +108,8 @@ with Live(save_dvc_exp=True) as live:
for metric_name, value in metrics.items():
live.log_metric(metric_name, value)
live.next_step()

live.log_artifact("model.pkl", type="model")
```
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

</tab>
Expand All @@ -99,7 +123,10 @@ containing the results and the changes needed to reproduce it.
Framework and any
[data tracked by DVC](/doc/start/data-management/data-versioning) but you can
also [log additional info](/doc/dvclive#log-data) to be included in the
experiment.
experiment. `live.log_artifact("mymodel", type="model")` will
[track your model with DVC](/doc/dvclive/live/log_artifact) and enable managing
it with
[Studio Model Registry](/doc/studio/user-guide/model-registry/what-is-a-model-registry).

<admon type="info">

Expand Down
2 changes: 1 addition & 1 deletion content/docs/studio/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ details]).
## Manage models

1. Click on the `Models` tab to open the central [Models dashboard]. Iterative
Studio uses your project's `artifacts.yaml` file to identify ML models and
Studio uses your project's `dvc.yaml` files to identify ML models and
specially formatted Git tags to identify model versions and stage
assignments.

Expand Down
48 changes: 21 additions & 27 deletions content/docs/studio/user-guide/model-registry/add-a-model.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Add a model

You can add models from any ML project to the model registry. To add a model to
your model registry, Iterative Studio creates an annotation for it in an
`artifacts.yaml` file in your Git repository. If you are using the [GTO] command
line tool, you can also add models [from the CLI][gto annotate]. To add models
using Iterative Studio, watch this tutorial video or read on below:
your model registry, Iterative Studio creates an annotation for it in a
`dvc.yaml` file in your Git repository. If you are using the [GTO] command line
tool, you can also add models [from the CLI][gto annotate]. To add models using
Iterative Studio, watch this tutorial video or read on below:

https://www.youtube.com/watch?v=szzv4ZXmYAs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the video?

Copy link
Contributor Author

@aguschin aguschin May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. Added that to your comment.


Expand All @@ -22,16 +22,10 @@ https://www.youtube.com/watch?v=szzv4ZXmYAs

3. Enter the path of the model file as follows:

- If the model file is in the Git repository, enter the relative path of the
model (from the repository root).
- If the model file is in remote storage but is DVC-tracked, enter the
project path of the corresponding `.dvc` file.
- If the model file is in remote storage and is not DVC-tracked, enter the
absolute path of the model file.
- If you use [MLEM] to save your model, use the path to the binary file that
MLEM generates. After you have run
[`mlem init`](https://mlem.ai/doc/command-reference/init), Iterative Studio
will be able to parse the `.mlem` file to extract model metadata.
- If the model file is in the Git repository (including if it is saved with
DVC and/or [MLEM]), enter the relative path of the model (from the
repository root).
- Otherwise, enter the URL to the model file in the cloud.

If the path you entered is a cloud path, Iterative Studio will ask you for
the repository path where the dvc reference to the model should be saved.
Expand All @@ -54,22 +48,22 @@ https://www.youtube.com/watch?v=szzv4ZXmYAs
8. At this point, the new model appears in the models dashboard.

9. In your Git repository, you will find that an entry for the new model has
been created in the `artifacts.yaml` file in the repository's root. If you
had committed to a new branch, a new pull request (or merge request in the
case of GitLab) will also have been created to merge the new branch into the
base branch.
been created in the `dvc.yaml` file in the repository's root. If you had
committed to a new branch, a new pull request (or merge request in the case
of GitLab) will also have been created to merge the new branch into the base
branch.
10. If you had added a model from a cloud storage, the following will also
happen before the commit is created:

- If the repository does not contain DVC, Iterative Studio will run `dvc init`.
It is needed to version the model in the git repository.
[Learn more](/doc/command-reference/init).
- If the specified directory does not exist yet, it will be created.
- Iterative Studio will import the model to the repository by executing
`dvc import-url <remote_path> <directory_path>/<filename from remote_path> --no-exec`.
- Iterative Studio annotate the model by executing
`gto annotate <model_name> --path <directory_path>/<filename from remote_path> --type model`.
[Learn more][gto annotate].
- If the repository does not contain DVC, Iterative Studio will run
`dvc init`. It is needed to version the model in the git repository.
[Learn more](/doc/command-reference/init).
- If the specified directory does not exist yet, it will be created.
- Iterative Studio will import the model to the repository by executing
`dvc import-url <remote_path> <directory_path>/<filename from remote_path> --no-exec`.
- Iterative Studio annotate the model by executing
`gto annotate <model_name> --path <directory_path>/<filename from remote_path> --type model`.
[Learn more][gto annotate].

[connected repository]:
/doc/studio/user-guide/projects-and-experiments/create-a-project
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Note that while you can get the basic Model Registry functionality within
Iterative Studio, there are more things you can do using the [MLEM] and [GTO]
command line interface (CLI). For example, to save and deploy models, you will
need to use MLEM, although future iterations of the Model Registry may
incorporate these tasks also. Similarly, you can use GTO in your CI/CD actions
incorporate these tasks also. Similarly, you can use [GTO] in your CI/CD actions
to interpret Git tags for deploying the models to the desired environment.

[semantic versioning]: https://semver.org/
Expand Down
12 changes: 11 additions & 1 deletion content/docs/user-guide/experiment-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ To save an experiment, you can follow one of these roads:
Experiments are saved locally by default but you can [share] them so that anyone
can reproduce your work.

## Metrics, plots, and parameters
## Metrics, plots, parameters

DVC can track and compare <abbr>parameters</abbr>, <abbr>metrics</abbr>, and
<abbr>plots</abbr> data saved in standard structured files like YAML, JSON, and
Expand All @@ -56,6 +56,13 @@ parameters, metrics, and plots (and to automatically configure them) is with
metafiles to specify which files are [parameters], [metrics], or [plots] (and to
specify how to [visualize plots]).

## Models and datasets

DVC can track models or datasets as part of your repo, and you can manage those
models with [Studio Model Registry]. One way to log models or other artifacts is
with [DVCLive]. You can also track them with `dvc add` and declare metadata for
the [Studio Model Registry] in [`dvc.yaml`][artifacts].

## Work with DVC Experiments from a GUI

DVC Experiments can be used directly [from the VS Code IDE] or online with
Expand All @@ -74,9 +81,12 @@ https://www.youtube.com/watch?v=LHi3SWGD9nc
[pipeline]: /doc/user-guide/pipelines
[run]: /doc/user-guide/experiment-management/running-experiments
[share]: /doc/user-guide/experiment-management/sharing-experiments
[artifacts]: /doc/user-guide/project-structure/dvcyaml-files#artifacts
[parameters]: /doc/user-guide/project-structure/dvcyaml-files#params
[metrics]: /doc/user-guide/project-structure/dvcyaml-files#metrics
[plots]: /doc/user-guide/project-structure/dvcyaml-files#plots
[visualize plots]: /doc/user-guide/experiment-management/visualizing-plots
[from the vs code ide]: /doc/vs-code-extension
[iterative studio]: /doc/studio
[studio model registry]:
/doc/studio/user-guide/model-registry/what-is-a-model-registry
5 changes: 2 additions & 3 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ Although you can specify artifacts of any `type`, we are in the process of
building a DVC-based [model registry](/doc/use-cases/model-registry) that will
pick up any artifacts with type `model`. Additionally, they will be picked up
aguschin marked this conversation as resolved.
Show resolved Hide resolved
and supported by
[Studio Model Registry](/doc/studio/user-guide/model-registry/what-is-a-model-registry)
(coming soon).
[Studio Model Registry](/doc/studio/user-guide/model-registry/what-is-a-model-registry).

```yaml
artifacts:
Expand All @@ -37,7 +36,7 @@ artifacts:
```

Artifact IDs must consist of letters and numbers, and use '-' as separator (but
not at the start or end). The first character must be a letter.
not at the start or end).

## Metrics

Expand Down