Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log_artifact #378

Closed
dberenbaum opened this issue Nov 30, 2022 · 9 comments · Fixed by #464
Closed

log_artifact #378

dberenbaum opened this issue Nov 30, 2022 · 9 comments · Fixed by #464
Labels
A: log_artifact Area: `live.log_artifact` feature request p1-important Include in the next sprint

Comments

@dberenbaum
Copy link
Collaborator

Similar to https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifact, dvclive can have a log_artifact method, which can do dvc add to track an artifact.

@shcheklein
Copy link
Member

@tapadipti @dmpetrov @aguschin I mentioned this in the conversation on the model registry. I think this workflow should be supported. It's a way for people to dump their models to Studio, or to DVC first and connect in Studio, etc. I think this is the basic approach that MLFlow, W&B, etc are taking and it makes sense to me:

  • it's aligned with DVCLive efforts and Live metrics, etc.
  • It's aligned with simplifying MR experience - we can show a single line people would need to copy into their Notebook to start
  • it's even aligned with an effort of disconnecting it from Git to the same extent as Live experiments

Related - #305 (but I would not be making this about MLEM - I think it's an optional advanced feature for now there). log_model can exist by itself.

@daavoo
Copy link
Contributor

daavoo commented Dec 1, 2022

do dvc add to track an artifact.

🤔 top-level outputs so that they can be added to the dvc.yaml of #366 ?

@tapadipti
Copy link

So a dependency for this would be that the user has dvc set up, right? Should it also be possible to log_artifact without dvc tracking (like log_metric)?

For registering new versions of an existing artifact, we'll need an option to provide an existing artifact identifier (and possibly a new version number).

log_model can exist by itself

It could also be a type=model option to log_artifact, right? We will need the artifact type for the model registry anyway.

@shcheklein some thoughts about this simplifying the MR experience:

It's aligned with simplifying MR experience - we can show a single line people would need to copy into their Notebook to start

The users would have to leave the Studio UI and edit their ML code/training to start registering models. And they cannot register models after they have been created. Currently, there's dependency on GTO; with this there will be dependency on DVCLive. Also, they'll still need to add the Git repo in Studio (assuming we don't provide a Studio server endpoint to log artifacts). So, it'll probably not be simple.

it's even aligned with an effort of disconnecting it from Git to the same extent as Live experiments

Are you suggesting that we will provide a Studio server endpoint for all users to log their artifacts to? And Studio will maintain a database of all artifacts instead of users saving it to their Git repos? (This is how the live metrics are saved).

@shcheklein
Copy link
Member

Are you suggesting that we will provide a Studio server endpoint for all users to log their artifacts to? And Studio will maintain a database of all artifacts instead of users saving it to their Git repos? (This is how the live metrics are saved).

Yes, it should be an available alternative for this.

@dberenbaum
Copy link
Collaborator Author

@shcheklein @tapadipti I was thinking the starting point here would be to still rely on DVC and Git here and save it locally by starting to track the artifact with DVC.

Some follow-up steps could be:

  1. Providing some Studio storage so a user could dvc push the artifact without setting up their own cloud remote.
  2. Auto-push the artifact along with metrics, parameters, and plots. Does it make sense to use dvc exp push functionality at that point, or do we want two different ways to push experiment results?

What do you think?

🤔 top-level outputs so that they can be added to the dvc.yaml of #366 ?

@daavoo I'm fine to start with saving in .dvc files, especially since the artifacts themselves aren't likely to be inside the dvclive directory and it should be easier than adding a new dvc.yaml section and dvclive-based dvc.lock.

@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Feb 6, 2023

Lowering to p2 since I think we still need to better scope what's needed here, and it doesn't break any current workflows

@daavoo
Copy link
Contributor

daavoo commented Feb 7, 2023

Lowering to p2 since I think we still need to better scope what's needed here, and it doesn't break any current workflows

The save_dvc_exp feels broken/incomplete to me without something like log_artifact (even just with a very limited scope based on dvc add + ensure .dvc is in include_untracked), mostly because of the scenario described in iterative/example-repos-dev#171

@tapadipti
Copy link

This will also be needed at some point to simplify the model registry onboarding experience (although we've not scoped it out yet).

@dberenbaum
Copy link
Collaborator Author

even just with a very limited scope based on dvc add + ensure .dvc is in include_untracked

Okay, let's add it back if it's needed there with the limited scope you mention for now.

I'd also like to address iterative/dvc#8986 if we do this to improve the transition to pipelines.

@dberenbaum dberenbaum added p1-important Include in the next sprint and removed p2-medium labels Feb 8, 2023
@daavoo daavoo linked a pull request Feb 13, 2023 that will close this issue
karajan1001 added a commit to karajan1001/dvclive that referenced this issue Feb 15, 2023
@daavoo daavoo added the A: log_artifact Area: `live.log_artifact` label Feb 15, 2023
This was referenced Feb 23, 2023
daavoo added a commit that referenced this issue Mar 9, 2023
* Early draft of log_artifact

* Implement log_artifact method for live

fix: #378

* tests: Create `mocked_dvc_repo` and `dvc_repo` fixtures.

* log_artifact: Changes from review.

- Don't use `commit`
- Don't raise error
- Add `test_log_artifact`

* pylint fixes

* mypy fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: David de la Iglesia Castro <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: log_artifact Area: `live.log_artifact` feature request p1-important Include in the next sprint
Projects
None yet
4 participants