Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dvc.yaml instead of artifacts.yaml #4516

Merged
merged 11 commits into from
May 16, 2023
Merged

Using dvc.yaml instead of artifacts.yaml #4516

merged 11 commits into from
May 16, 2023

Conversation

aguschin
Copy link
Contributor

@aguschin aguschin commented May 5, 2023

@aguschin aguschin self-assigned this May 5, 2023
@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 5, 2023 11:56 Inactive
@aguschin aguschin requested review from dberenbaum and daavoo May 5, 2023 11:57
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2023

Link Check Report

There were no links to check!

Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me and they technically document all parts.

I am missing some sort of narrative tying things together, though.

Not sure where it would fit*, but it feels that even after these changes it would be hard for someone entering the DVC docs to find out about / understand the model registry.


*According to the metrics in plausible, people usually flow from root->get started->user guide/cmd ref in a decreasing number of views.

content/docs/dvclive/index.md Outdated Show resolved Hide resolved
Comment on lines 44 to 47
default_root_dir="mymodel"
)
trainer.fit(model)
live.log_artifact("mymodel", type="model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this pattern is that helpful for model registry since it will save the full directory of checkpoints. WDYT about something like this:

with Live(save_dvc_exp=True) as live:
    checkpoint = pl.callbacks.ModelCheckpoint(dirpath="mymodel")
    trainer = Trainer(
        logger=DVCLiveLogger(save_dvc_exp=True), callbacks=checkpoint
    )
    trainer.fit(model)
    live.log_artifact(checkpoint.best_model_path, type="model")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this for lightning, we should probably also update the setup for the other tabs so the best model is the one logged instead of the latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably also update the setup for the other tabs

Not quite get for which tabs exactly. For HF we save best iteration explicitly IIUC, for General Python API this is irrelevant, and for keras I'm also not sure we're saving each iteration. Can you please explain @daavoo ?


trainer = Trainer(logger=DVCLiveLogger(save_dvc_exp=True))
trainer.fit(model)
with Live(save_dvc_exp=True) as live:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add log_artifact to the examples in https://dvc.org/doc/dvclive/ml-frameworks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take a while since I'm not that familiar with each of them. Adding this to your comment, and let's do this in a follow-up PR.

using Iterative Studio, watch this tutorial video or read on below:
`dvc.yaml` file in your Git repository. If you are using the [GTO] command line
tool, you can also add models [from the CLI][gto annotate]. To add models using
Iterative Studio, watch this tutorial video or read on below:

https://www.youtube.com/watch?v=szzv4ZXmYAs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the video?

Copy link
Contributor Author

@aguschin aguschin May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. Added that to your comment.

Comment on lines 31 to 32
- If you use [MLEM] to save your model, use the path to the binary file that
MLEM generates. After you have run
[`mlem init`](https://mlem.ai/doc/command-reference/init), Iterative Studio
will be able to parse the `.mlem` file to extract model metadata.
- If you use [MLEM] to save your model, use the path to the binary file or
folder that MLEM generates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this much text about what path to use? Seems like the options are either:

  1. If the model file is in the Git repository (including if it is saved with DVC and/or MLEM), enter the relative path of the model (from the repository root).
  2. Otherwise, enter the URL to the model file in the cloud.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aguschin Thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me. Fixing!

Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aguschin!

In addition to the inline comments, there's a mention of artifacts.yaml in https://dvc.org/doc/studio/get-started#manage-models.

I think we can address those comments and merge, but I wouldn't close #4423 because I think we need to make bigger changes how we explain the model registry throughout the docs, so let's discuss in #4423.

@dberenbaum dberenbaum mentioned this pull request May 5, 2023
13 tasks
@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 12, 2023 08:17 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 12, 2023 08:31 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 12, 2023 11:28 Inactive
@dberenbaum
Copy link
Contributor

In addition to the inline comments, there's a mention of artifacts.yaml in https://dvc.org/doc/studio/get-started#manage-models.

@aguschin Could you update this page also?

@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 15, 2023 08:08 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 15, 2023 10:02 Inactive
@aguschin
Copy link
Contributor Author

@dberenbaum, looks like all requested changes are addressed. Let's merge and I'm starting to work on the next chunk of changes.

@aguschin aguschin enabled auto-merge (squash) May 15, 2023 10:04
@aguschin
Copy link
Contributor Author

@dberenbaum Please note I enabled auto-merge to merge this once you approve. Thank you and @daavoo for the review!

@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 15, 2023 12:50 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvc-artifacts-8epucpc9 May 16, 2023 09:38 Inactive
@daavoo daavoo requested a review from dberenbaum May 16, 2023 09:38
@aguschin aguschin dismissed dberenbaum’s stale review May 16, 2023 09:39

Changes were addressed.

@aguschin aguschin disabled auto-merge May 16, 2023 09:39
@aguschin aguschin enabled auto-merge (squash) May 16, 2023 10:00
@aguschin aguschin merged commit a3303c8 into main May 16, 2023
@aguschin aguschin deleted the dvc-artifacts branch May 16, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants