Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live: Revisit output names and structure. #322

Merged
merged 4 commits into from
Oct 17, 2022
Merged

live: Revisit output names and structure. #322

merged 4 commits into from
Oct 17, 2022

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Oct 4, 2022

Applied #246 (comment)

Closes #246

Note this targets branch 1.0


Method Output
live.log dvclive/plots/metrics & dvclive/metrics.json
live.log_image dvclive/plots/images
live.log_param dvclive/params.yaml
live.log_sklearn_plot dvclive/plots/sklearn

Example:

import random

from dvclive import Live
from PIL import Image
from PIL import ImageDraw 

live = Live()

live.log_param("goo", 1)

for i in range(2):
    live.log("foobar", i + random.random())
    live.log("foo/bar", i + random.random())

    img = Image.new("RGB", (50, 50), (255, 255, 255))
    draw = ImageDraw.Draw(img)
    draw.text((20, 20), f"{i  + random.random():.1f}", (0,0,0))
    live.log_image("img.png", img)

    live.next_step()

live.set_step(None)
live.log_sklearn_plot("confusion_matrix", [0, 0, 1, 1], [0, 1, 0, 1])

Output:

dvclive
├── metrics.json
├── params.yaml
├── plots
│   ├── images
│   │   ├── 0
│   │   │   └── img.png
│   │   └── 1
│   │       └── img.png
│   ├── metrics
│   │   ├── foo
│   │   │   └── bar.tsv
│   │   └── foobar.tsv
│   └── sklearn
│       └── confusion_matrix.json
└── report.html

@daavoo daavoo requested a review from dberenbaum October 4, 2022 17:38
@@ -6,7 +6,7 @@

class Plot(Data):
suffixes = [".json"]
subfolder = "plots"
subfolder = "data_series"
Copy link
Contributor Author

@daavoo daavoo Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only part I don't like. It doesn't really match the meaning of VSCode extension as their data series is our metrics and this folder can contain things like confusion matrix, so it doesn't really apply

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn seems too implementation-focused, but it's mentioned in the docs, and it's more descriptive than data_series.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json is another that's too implementation-focused but at least descriptive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe something like builtin or presets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein @jorgeorpinel
Any suggestions regarding alternatives to the data_series name (and/or the rest of the outputs)?

data_series is meant to store pre-defined plots based on scikit-learn visualizations .

The current state is:

Method Output
live.log dvclive/plots/metrics & dvclive/metrics.json
live.log_image dvclive/plots/images
live.log_param dvclive/params.yaml
live.log_plot dvclive/plots/data_series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, by saying "I don't like it" I don't mean that we should not go with this - I'm sharing my thoughts. If we don't come with something better we should proceed with this)

With that in mind, I think live.log_sklearn makes sense since the inputs closely follow the underlying sklearn methods. The output could then be plots/sklearn.

hmm, that's interesting ... maybe I missed something, but why sklearn specifically?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://dvc.org/doc/dvclive/api-reference/live/log_plot and the link at the top (which may need to be updated because it doesn't link to the expected methods), the method arguments come from sklearn. Most sklearn evaluation methods take similar arguments like metrics.method(y_true, y_score).

If we start to log any other plot types that take different arguments, we are likely to need a separate method.

For users who are familiar, it's the easiest way to explain the inputs, and sklearn is used for evaluation even when it's not used for modeling (the top result for pytorch roc curve is https://discuss.pytorch.org/t/how-to-plot-roc-curve-using-pytorch-model/122033).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://dvc.org/doc/dvclive/api-reference/live/log_plot and the link at the top ...

Got it, thanks! I forgot (not the first time :) ) that interface there was coupled with the sklearn types of plots. Yes, it makes more sense to rename into sklearn_plot. On the other hand, we already do things like this:

https://github.com/iterative/example-get-started/blob/main/src/evaluate.py#L49-L59

should it be taken care of by the some kind of `log_plot("") ?

feels like even if rename the existing one to log_sklearn_plot or something, we'll need the second log_plot pretty soon (now?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method arguments come from sklearn. Most sklearn evaluation methods take similar arguments like

Not only that. We use sklearn internally to compute the results to be plotted.

If we start to log any other plot types that take different arguments, we are likely to need a separate method.

Indeed, we would need to define a different interface for plots beyond the predefined sklearn set.

What wandb does is provide secondary methods that are passed as arguments to wandb.log:

data = [[x, y] for (x, y) in zip(x_values, y_values)]
table = wandb.Table(data=data, columns = ["x", "y"])
wandb.log({"my_custom_plot_id" : wandb.plot.line(table, "x", "y",
           title="Custom Y vs X Line Plot")})

should it be taken care of by the some kind of `log_plot("") ?

It could already use log_sklearn_plot("precision_recall", ...) but afaik (per the comment above) it was kept to showcase an alternative approach of manually generating a plot file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we can move forward with log_sklearn_plot?

should it be taken care of by the some kind of `log_plot("") ?

It could already use log_sklearn_plot("precision_recall", ...) but afaik (per the comment above) it was kept to showcase an alternative approach of manually generating a plot file.

Even if we don't need it for the example get started repo, it would probably make sense after #311 to introduce a more generic method that logs any dvc plot (log_custom_plot?). Should we discuss in #271?

@codecov-commenter
Copy link

codecov-commenter commented Oct 4, 2022

Codecov Report

Base: 94.85% // Head: 94.92% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (ac07e9e) compared to base (e670f38).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.0     #322      +/-   ##
==========================================
+ Coverage   94.85%   94.92%   +0.06%     
==========================================
  Files          36       36              
  Lines        1770     1774       +4     
  Branches      165      165              
==========================================
+ Hits         1679     1684       +5     
+ Misses         66       65       -1     
  Partials       25       25              
Impacted Files Coverage Δ
src/dvclive/data/__init__.py 100.00% <100.00%> (ø)
src/dvclive/data/metric.py 95.55% <100.00%> (ø)
src/dvclive/data/sklearn_plot.py 89.87% <100.00%> (ø)
src/dvclive/live.py 95.28% <100.00%> (+0.52%) ⬆️
src/dvclive/report.py 94.59% <100.00%> (ø)
src/dvclive/studio.py 95.55% <100.00%> (ø)
src/dvclive/utils.py 97.05% <100.00%> (ø)
tests/test_catalyst.py 95.12% <100.00%> (ø)
tests/test_data/test_image.py 100.00% <100.00%> (ø)
tests/test_data/test_plot.py 100.00% <100.00%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@daavoo daavoo force-pushed the revisit-output-names branch from 948b76a to f607e4e Compare October 5, 2022 14:27
@daavoo daavoo closed this Oct 5, 2022
@daavoo daavoo force-pushed the revisit-output-names branch from f607e4e to e670f38 Compare October 5, 2022 14:27
@daavoo daavoo marked this pull request as ready for review October 10, 2022 09:43
@daavoo daavoo requested a review from dtrifiro October 10, 2022 09:44
@@ -4,7 +4,7 @@
from .base import Data


class Plot(Data):
class SKLearn(Data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we name it SKLeanrPlot (sklearn alone can be misleading?)

@daavoo daavoo merged this pull request into 1.0 Oct 17, 2022
@daavoo daavoo deleted the revisit-output-names branch October 17, 2022 09:07
daavoo added a commit that referenced this pull request Oct 18, 2022
* live: Revisit output names and structure.

Applied #246 (comment)

Closes #246

* Rename `log_plot` to `log_sklearn_plot`.

* Rename `plot` -> `sklearn`

* Rename `sklearn` -> `sklearn_plot`
daavoo added a commit to iterative/dvc.org that referenced this pull request Oct 20, 2022
daavoo added a commit to iterative/dvc.org that referenced this pull request Oct 27, 2022
daavoo added a commit that referenced this pull request Oct 31, 2022
* live: Revisit output names and structure.

Applied #246 (comment)

Closes #246

* Rename `log_plot` to `log_sklearn_plot`.

* Rename `plot` -> `sklearn`

* Rename `sklearn` -> `sklearn_plot`
daavoo added a commit that referenced this pull request Nov 4, 2022
* live: Revisit output names and structure.

Applied #246 (comment)

Closes #246

* Rename `log_plot` to `log_sklearn_plot`.

* Rename `plot` -> `sklearn`

* Rename `sklearn` -> `sklearn_plot`
daavoo added a commit that referenced this pull request Nov 4, 2022
* live: Revisit output names and structure.

Applied #246 (comment)

Closes #246

* Rename `log_plot` to `log_sklearn_plot`.

* Rename `plot` -> `sklearn`

* Rename `sklearn` -> `sklearn_plot`
daavoo added a commit to iterative/dvc.org that referenced this pull request Nov 5, 2022
* initial refactor

* Mention API Reference

* Fix link

* Add redirect

* Update content/docs/dvclive/how-to/checkpoints.md

Co-authored-by: Dave Berenbaum <[email protected]>

* Remove how-to folder

* Add outputs content

* Reorder sidebar

* Add tabs with popular frameworks

* Remove outdated `dvclive with DVC`

* clean

* Add params

* Clarify value without  DVC

* Update outputs to current status

* Update content/docs/dvclive/get-started.md

Co-authored-by: Dave Berenbaum <[email protected]>

* Reduce wording

* Add Python API tab

* More details

* Updates from review

* Capitalize pages

* Mention studio for sharing experiments

* New section for Studio

* Add sample output

* Move ml-frameworks to api-refeference

* Drop resume page

* Apply suggestions from code review

Co-authored-by: Dave Berenbaum <[email protected]>

* Restyled by prettier (#4050)

Co-authored-by: Restyled.io <[email protected]>

* Suggestions from code review

* caps

* Add sections

* Add example output

* Add admon

* Share experiment -> Share Results

* Update output structure.

Per iterative/dvclive#322

* Revisit output names

* path -> dir

* Add make_report call

* Rename log -> log_metric

* Update log_sklearn_plot. (#4074)

Per iterative/dvclive#339

* Apply suggestions from code review

Co-authored-by: Jorge Orpinel <[email protected]>

* Replace dvc with cli

* Remove tree

* Update output dir in frameworks

* grammarly

* Revisit frameworks naming.

* Remplaze `path` with `dir`

* Rename `outputs` to `how-it-works`

* Rename `Output Folder Structure` to `How it Works`

* Rename Scalars to Metrics

* Simplify `Live` Attributes description

* Renamed folder to directory

* Revisit `step` definition

* Fix tree

* Remove cat

* Replace vscode plots with data series

* Updates with new summary behavior and step property

* Fix links

* Consistent framework tabs

* Update content/docs/dvclive/api-reference/live/log_metric.md

* start: dvclive 1.0 updates to exp viz page

* Update content/docs/start/experiment-management/visualization.md

Co-authored-by: David de la Iglesia Castro <[email protected]>

* Revisit dvclive usage outside doc/dvclive

* Focus on context manager usage.

* hint about next_step

* Update content/docs/dvclive/api-reference/live/next_step.md

Co-authored-by: Dave Berenbaum <[email protected]>

* Link for `Live.metrics_file`

* Restyled by prettier (#4097)

Co-authored-by: Restyled.io <[email protected]>

* Updates per iterative/example-repos-dev#143

Co-authored-by: Dave Berenbaum <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: rogermparent <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants