live: Revisit output names and structure. #322

daavoo · 2022-10-04T17:38:44Z

Closes #246

Note this targets branch 1.0

Method	Output
live.log	`dvclive/plots/metrics` & `dvclive/metrics.json`
live.log_image	`dvclive/plots/images`
live.log_param	`dvclive/params.yaml`
live.log_sklearn_plot	`dvclive/plots/sklearn`

Example:

import random

from dvclive import Live
from PIL import Image
from PIL import ImageDraw 

live = Live()

live.log_param("goo", 1)

for i in range(2):
    live.log("foobar", i + random.random())
    live.log("foo/bar", i + random.random())

    img = Image.new("RGB", (50, 50), (255, 255, 255))
    draw = ImageDraw.Draw(img)
    draw.text((20, 20), f"{i  + random.random():.1f}", (0,0,0))
    live.log_image("img.png", img)

    live.next_step()

live.set_step(None)
live.log_sklearn_plot("confusion_matrix", [0, 0, 1, 1], [0, 1, 0, 1])

Output:

dvclive
├── metrics.json
├── params.yaml
├── plots
│   ├── images
│   │   ├── 0
│   │   │   └── img.png
│   │   └── 1
│   │       └── img.png
│   ├── metrics
│   │   ├── foo
│   │   │   └── bar.tsv
│   │   └── foobar.tsv
│   └── sklearn
│       └── confusion_matrix.json
└── report.html

daavoo · 2022-10-04T17:39:51Z

src/dvclive/data/plot.py

@@ -6,7 +6,7 @@

 class Plot(Data):
    suffixes = [".json"]
-    subfolder = "plots"
+    subfolder = "data_series"


I believe this is the only part I don't like. It doesn't really match the meaning of VSCode extension as their data series is our metrics and this folder can contain things like confusion matrix, so it doesn't really apply

sklearn seems too implementation-focused, but it's mentioned in the docs, and it's more descriptive than data_series.

json is another that's too implementation-focused but at least descriptive.

Or maybe something like builtin or presets?

@shcheklein @jorgeorpinel
Any suggestions regarding alternatives to the data_series name (and/or the rest of the outputs)?

data_series is meant to store pre-defined plots based on scikit-learn visualizations .

The current state is:

Method Output

live.log dvclive/plots/metrics & dvclive/metrics.json

live.log_image dvclive/plots/images

live.log_param dvclive/params.yaml

live.log_plot dvclive/plots/data_series

(btw, by saying "I don't like it" I don't mean that we should not go with this - I'm sharing my thoughts. If we don't come with something better we should proceed with this)

With that in mind, I think live.log_sklearn makes sense since the inputs closely follow the underlying sklearn methods. The output could then be plots/sklearn.

hmm, that's interesting ... maybe I missed something, but why sklearn specifically?

From https://dvc.org/doc/dvclive/api-reference/live/log_plot and the link at the top (which may need to be updated because it doesn't link to the expected methods), the method arguments come from sklearn. Most sklearn evaluation methods take similar arguments like metrics.method(y_true, y_score).

If we start to log any other plot types that take different arguments, we are likely to need a separate method.

For users who are familiar, it's the easiest way to explain the inputs, and sklearn is used for evaluation even when it's not used for modeling (the top result for pytorch roc curve is https://discuss.pytorch.org/t/how-to-plot-roc-curve-using-pytorch-model/122033).

From https://dvc.org/doc/dvclive/api-reference/live/log_plot and the link at the top ...

Got it, thanks! I forgot (not the first time :) ) that interface there was coupled with the sklearn types of plots. Yes, it makes more sense to rename into sklearn_plot. On the other hand, we already do things like this:

https://github.com/iterative/example-get-started/blob/main/src/evaluate.py#L49-L59

should it be taken care of by the some kind of `log_plot("") ?

feels like even if rename the existing one to log_sklearn_plot or something, we'll need the second log_plot pretty soon (now?)

the method arguments come from sklearn. Most sklearn evaluation methods take similar arguments like

Not only that. We use sklearn internally to compute the results to be plotted.

If we start to log any other plot types that take different arguments, we are likely to need a separate method.

Indeed, we would need to define a different interface for plots beyond the predefined sklearn set.

What wandb does is provide secondary methods that are passed as arguments to wandb.log:

data = [[x, y] for (x, y) in zip(x_values, y_values)] table = wandb.Table(data=data, columns = ["x", "y"]) wandb.log({"my_custom_plot_id" : wandb.plot.line(table, "x", "y", title="Custom Y vs X Line Plot")})

should it be taken care of by the some kind of `log_plot("") ?

It could already use log_sklearn_plot("precision_recall", ...) but afaik (per the comment above) it was kept to showcase an alternative approach of manually generating a plot file.

Sounds like we can move forward with log_sklearn_plot?

should it be taken care of by the some kind of `log_plot("") ?

It could already use log_sklearn_plot("precision_recall", ...) but afaik (per the comment above) it was kept to showcase an alternative approach of manually generating a plot file.

Even if we don't need it for the example get started repo, it would probably make sense after #311 to introduce a more generic method that logs any dvc plot (log_custom_plot?). Should we discuss in #271?

codecov-commenter · 2022-10-04T18:14:31Z

Codecov Report

Base: 94.85% // Head: 94.92% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (ac07e9e) compared to base (e670f38).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##              1.0     #322      +/-   ##
==========================================
+ Coverage   94.85%   94.92%   +0.06%     
==========================================
  Files          36       36              
  Lines        1770     1774       +4     
  Branches      165      165              
==========================================
+ Hits         1679     1684       +5     
+ Misses         66       65       -1     
  Partials       25       25

Impacted Files	Coverage Δ
src/dvclive/data/__init__.py	`100.00% <100.00%> (ø)`
src/dvclive/data/metric.py	`95.55% <100.00%> (ø)`
src/dvclive/data/sklearn_plot.py	`89.87% <100.00%> (ø)`
src/dvclive/live.py	`95.28% <100.00%> (+0.52%)`	⬆️
src/dvclive/report.py	`94.59% <100.00%> (ø)`
src/dvclive/studio.py	`95.55% <100.00%> (ø)`
src/dvclive/utils.py	`97.05% <100.00%> (ø)`
tests/test_catalyst.py	`95.12% <100.00%> (ø)`
tests/test_data/test_image.py	`100.00% <100.00%> (ø)`
tests/test_data/test_plot.py	`100.00% <100.00%> (ø)`
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Applied #246 (comment) Closes #246

shcheklein · 2022-10-10T18:15:50Z

src/dvclive/data/sklearn.py

@@ -4,7 +4,7 @@
 from .base import Data


-class Plot(Data):
+class SKLearn(Data):


should we name it SKLeanrPlot (sklearn alone can be misleading?)

* live: Revisit output names and structure. Applied #246 (comment) Closes #246 * Rename `log_plot` to `log_sklearn_plot`. * Rename `plot` -> `sklearn` * Rename `sklearn` -> `sklearn_plot`

Per iterative/dvclive#322

* live: Revisit output names and structure. Applied #246 (comment) Closes #246 * Rename `log_plot` to `log_sklearn_plot`. * Rename `plot` -> `sklearn` * Rename `sklearn` -> `sklearn_plot`

* initial refactor * Mention API Reference * Fix link * Add redirect * Update content/docs/dvclive/how-to/checkpoints.md Co-authored-by: Dave Berenbaum <[email protected]> * Remove how-to folder * Add outputs content * Reorder sidebar * Add tabs with popular frameworks * Remove outdated `dvclive with DVC` * clean * Add params * Clarify value without DVC * Update outputs to current status * Update content/docs/dvclive/get-started.md Co-authored-by: Dave Berenbaum <[email protected]> * Reduce wording * Add Python API tab * More details * Updates from review * Capitalize pages * Mention studio for sharing experiments * New section for Studio * Add sample output * Move ml-frameworks to api-refeference * Drop resume page * Apply suggestions from code review Co-authored-by: Dave Berenbaum <[email protected]> * Restyled by prettier (#4050) Co-authored-by: Restyled.io <[email protected]> * Suggestions from code review * caps * Add sections * Add example output * Add admon * Share experiment -> Share Results * Update output structure. Per iterative/dvclive#322 * Revisit output names * path -> dir * Add make_report call * Rename log -> log_metric * Update log_sklearn_plot. (#4074) Per iterative/dvclive#339 * Apply suggestions from code review Co-authored-by: Jorge Orpinel <[email protected]> * Replace dvc with cli * Remove tree * Update output dir in frameworks * grammarly * Revisit frameworks naming. * Remplaze `path` with `dir` * Rename `outputs` to `how-it-works` * Rename `Output Folder Structure` to `How it Works` * Rename Scalars to Metrics * Simplify `Live` Attributes description * Renamed folder to directory * Revisit `step` definition * Fix tree * Remove cat * Replace vscode plots with data series * Updates with new summary behavior and step property * Fix links * Consistent framework tabs * Update content/docs/dvclive/api-reference/live/log_metric.md * start: dvclive 1.0 updates to exp viz page * Update content/docs/start/experiment-management/visualization.md Co-authored-by: David de la Iglesia Castro <[email protected]> * Revisit dvclive usage outside doc/dvclive * Focus on context manager usage. * hint about next_step * Update content/docs/dvclive/api-reference/live/next_step.md Co-authored-by: Dave Berenbaum <[email protected]> * Link for `Live.metrics_file` * Restyled by prettier (#4097) Co-authored-by: Restyled.io <[email protected]> * Updates per iterative/example-repos-dev#143 Co-authored-by: Dave Berenbaum <[email protected]> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <[email protected]> Co-authored-by: Jorge Orpinel <[email protected]> Co-authored-by: rogermparent <[email protected]>

daavoo requested a review from dberenbaum October 4, 2022 17:38

daavoo commented Oct 4, 2022

View reviewed changes

daavoo force-pushed the revisit-output-names branch from 948b76a to f607e4e Compare October 5, 2022 14:27

daavoo closed this Oct 5, 2022

daavoo force-pushed the revisit-output-names branch from f607e4e to e670f38 Compare October 5, 2022 14:27

live: Revisit output names and structure.

e15f3b6

Applied #246 (comment) Closes #246

daavoo reopened this Oct 5, 2022

daavoo mentioned this pull request Oct 5, 2022

html: How to track/ignore it in DVC #242

Closed

Rename log_plot to log_sklearn_plot.

1ed4609

dberenbaum mentioned this pull request Oct 7, 2022

log_plot(): add custom line plot #271

Closed

daavoo marked this pull request as ready for review October 10, 2022 09:43

daavoo requested a review from dtrifiro October 10, 2022 09:44

Rename plot -> sklearn

9047d97

daavoo added the refactoring label Oct 10, 2022

shcheklein reviewed Oct 10, 2022

View reviewed changes

Rename sklearn -> sklearn_plot

ac07e9e

dberenbaum approved these changes Oct 14, 2022

View reviewed changes

dberenbaum mentioned this pull request Oct 14, 2022

plots: make output structure consistent between plot types #326

Closed

daavoo merged this pull request into 1.0 Oct 17, 2022

daavoo deleted the revisit-output-names branch October 17, 2022 09:07

daavoo added a commit to iterative/dvc.org that referenced this pull request Oct 20, 2022

Update output structure.

e1644d9

Per iterative/dvclive#322

daavoo mentioned this pull request Oct 21, 2022

Introduce public summary. Remove "no step" / "step" logic from plots. #331

Merged

daavoo added a commit to iterative/dvc.org that referenced this pull request Oct 27, 2022

Update output structure.

5cc4b57

Per iterative/dvclive#322

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

live: Revisit output names and structure. #322

live: Revisit output names and structure. #322

daavoo commented Oct 4, 2022 •

edited

Loading

daavoo Oct 4, 2022 •

edited

Loading

dberenbaum Oct 4, 2022

dberenbaum Oct 4, 2022

dberenbaum Oct 4, 2022

daavoo Oct 5, 2022

shcheklein Oct 6, 2022

dberenbaum Oct 6, 2022

shcheklein Oct 6, 2022

daavoo Oct 7, 2022

dberenbaum Oct 7, 2022

codecov-commenter commented Oct 4, 2022 •

edited

Loading

shcheklein Oct 10, 2022

live: Revisit output names and structure. #322

live: Revisit output names and structure. #322

Conversation

daavoo commented Oct 4, 2022 • edited Loading

daavoo Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 4, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

daavoo commented Oct 4, 2022 •

edited

Loading

daavoo Oct 4, 2022 •

edited

Loading

codecov-commenter commented Oct 4, 2022 •

edited

Loading