log_artifact #378

dberenbaum · 2022-11-30T20:28:28Z

Similar to https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifact, dvclive can have a log_artifact method, which can do dvc add to track an artifact.

The text was updated successfully, but these errors were encountered:

shcheklein · 2022-11-30T21:15:05Z

@tapadipti @dmpetrov @aguschin I mentioned this in the conversation on the model registry. I think this workflow should be supported. It's a way for people to dump their models to Studio, or to DVC first and connect in Studio, etc. I think this is the basic approach that MLFlow, W&B, etc are taking and it makes sense to me:

it's aligned with DVCLive efforts and Live metrics, etc.
It's aligned with simplifying MR experience - we can show a single line people would need to copy into their Notebook to start
it's even aligned with an effort of disconnecting it from Git to the same extent as Live experiments

Related - #305 (but I would not be making this about MLEM - I think it's an optional advanced feature for now there). log_model can exist by itself.

daavoo · 2022-12-01T13:58:08Z

do dvc add to track an artifact.

🤔 top-level outputs so that they can be added to the dvc.yaml of #366 ?

tapadipti · 2022-12-07T11:07:56Z

So a dependency for this would be that the user has dvc set up, right? Should it also be possible to log_artifact without dvc tracking (like log_metric)?

For registering new versions of an existing artifact, we'll need an option to provide an existing artifact identifier (and possibly a new version number).

log_model can exist by itself

It could also be a type=model option to log_artifact, right? We will need the artifact type for the model registry anyway.

@shcheklein some thoughts about this simplifying the MR experience:

It's aligned with simplifying MR experience - we can show a single line people would need to copy into their Notebook to start

The users would have to leave the Studio UI and edit their ML code/training to start registering models. And they cannot register models after they have been created. Currently, there's dependency on GTO; with this there will be dependency on DVCLive. Also, they'll still need to add the Git repo in Studio (assuming we don't provide a Studio server endpoint to log artifacts). So, it'll probably not be simple.

it's even aligned with an effort of disconnecting it from Git to the same extent as Live experiments

Are you suggesting that we will provide a Studio server endpoint for all users to log their artifacts to? And Studio will maintain a database of all artifacts instead of users saving it to their Git repos? (This is how the live metrics are saved).

shcheklein · 2022-12-07T16:47:13Z

Are you suggesting that we will provide a Studio server endpoint for all users to log their artifacts to? And Studio will maintain a database of all artifacts instead of users saving it to their Git repos? (This is how the live metrics are saved).

Yes, it should be an available alternative for this.

dberenbaum · 2023-01-05T13:32:22Z

@shcheklein @tapadipti I was thinking the starting point here would be to still rely on DVC and Git here and save it locally by starting to track the artifact with DVC.

Some follow-up steps could be:

Providing some Studio storage so a user could dvc push the artifact without setting up their own cloud remote.
Auto-push the artifact along with metrics, parameters, and plots. Does it make sense to use dvc exp push functionality at that point, or do we want two different ways to push experiment results?

What do you think?

🤔 top-level outputs so that they can be added to the dvc.yaml of #366 ?

@daavoo I'm fine to start with saving in .dvc files, especially since the artifacts themselves aren't likely to be inside the dvclive directory and it should be easier than adding a new dvc.yaml section and dvclive-based dvc.lock.

dberenbaum · 2023-02-06T21:23:14Z

Lowering to p2 since I think we still need to better scope what's needed here, and it doesn't break any current workflows

daavoo · 2023-02-07T17:35:14Z

Lowering to p2 since I think we still need to better scope what's needed here, and it doesn't break any current workflows

The save_dvc_exp feels broken/incomplete to me without something like log_artifact (even just with a very limited scope based on dvc add + ensure .dvc is in include_untracked), mostly because of the scenario described in iterative/example-repos-dev#171

tapadipti · 2023-02-08T08:10:24Z

This will also be needed at some point to simplify the model registry onboarding experience (although we've not scoped it out yet).

dberenbaum · 2023-02-08T17:24:16Z

even just with a very limited scope based on dvc add + ensure .dvc is in include_untracked

Okay, let's add it back if it's needed there with the limited scope you mention for now.

I'd also like to address iterative/dvc#8986 if we do this to improve the transition to pipelines.

fix: iterative#378

* Early draft of log_artifact * Implement log_artifact method for live fix: #378 * tests: Create `mocked_dvc_repo` and `dvc_repo` fixtures. * log_artifact: Changes from review. - Don't use `commit` - Don't raise error - Add `test_log_artifact` * pylint fixes * mypy fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: David de la Iglesia Castro <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

dberenbaum added the p2-medium label Nov 30, 2022

daavoo mentioned this issue Dec 7, 2022

Make checkpoints part of the logger setup? #300

Closed

daavoo mentioned this issue Dec 19, 2022

Added Optuna callback. #356

Merged

dberenbaum added p1-important Include in the next sprint and removed p2-medium labels Jan 5, 2023

dberenbaum mentioned this issue Jan 5, 2023

Cache plots, models, etc. #312

Closed

daavoo mentioned this issue Feb 1, 2023

experiments: Include model in the experimetns inside notebook iterative/example-repos-dev#171

Closed

dberenbaum added p2-medium and removed p1-important Include in the next sprint labels Feb 6, 2023

dberenbaum added p1-important Include in the next sprint and removed p2-medium labels Feb 8, 2023

daavoo linked a pull request Feb 13, 2023 that will close this issue

Early draft of log_artifact #462

Closed

karajan1001 added a commit to karajan1001/dvclive that referenced this issue Feb 15, 2023

Implement log_artifact method for live

f3b4e0f

fix: iterative#378

karajan1001 mentioned this issue Feb 15, 2023

Implement log_artifact method for live #464

Merged

daavoo added the feature request label Feb 15, 2023

daavoo mentioned this issue Feb 15, 2023

frameworks: Use log_artifact #465

Closed

daavoo added the A: log_artifact Area: `live.log_artifact` label Feb 15, 2023

This was referenced Feb 23, 2023

generate pipeline #468

Closed

log_model #472

Closed

daavoo closed this as completed in #464 Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log_artifact #378

log_artifact #378

dberenbaum commented Nov 30, 2022

shcheklein commented Nov 30, 2022

daavoo commented Dec 1, 2022

tapadipti commented Dec 7, 2022

shcheklein commented Dec 7, 2022

dberenbaum commented Jan 5, 2023

dberenbaum commented Feb 6, 2023 •

edited

Loading

daavoo commented Feb 7, 2023 •

edited

Loading

tapadipti commented Feb 8, 2023

dberenbaum commented Feb 8, 2023

log_artifact #378

log_artifact #378

Comments

dberenbaum commented Nov 30, 2022

shcheklein commented Nov 30, 2022

daavoo commented Dec 1, 2022

tapadipti commented Dec 7, 2022

shcheklein commented Dec 7, 2022

dberenbaum commented Jan 5, 2023

dberenbaum commented Feb 6, 2023 • edited Loading

daavoo commented Feb 7, 2023 • edited Loading

tapadipti commented Feb 8, 2023

dberenbaum commented Feb 8, 2023

dberenbaum commented Feb 6, 2023 •

edited

Loading

daavoo commented Feb 7, 2023 •

edited

Loading