Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

live: Add log_image and log_plot. #189

Merged
merged 1 commit into from
Feb 2, 2022
Merged

live: Add log_image and log_plot. #189

merged 1 commit into from
Feb 2, 2022

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Nov 8, 2021


Adds new plots data type.
Refactor each data type to separated methods:

  • Scalars -> Live.log -> Saves to dvclive.dir / scalars
  • Images -> Live.log_image -> Saves to dvclive.dir / images
  • Plots -> Live.log_plot -> Saves to dvclive.dir / plots

live.log_plot("roc", y_true, y_score)

Supported plot_type (first argument):

  • calibration
  • confusion_matrix
  • det
  • precission_recall
  • roc

Full example:

from dvclive import Live

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
clf = RandomForestClassifier(random_state=0)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
y_score = clf.predict_proba(X_test)[:, 1]

live = Live()

live.log_plot("calibration", y_test, y_score)
live.log_plot("confusion_matrix", y_test, y_pred)
live.log_plot("confusion_matrix", y_test, y_score)
live.log_plot("precission_recall", y_test, y_score)
live.log_plot("roc", y_test, y_score)
stages:
  foo:
    cmd: python foo.py
    metrics:
    - dvclive.json
    plots:
    - dvclive/plots/calibration.json:
        cache: false
        x: prob_pred
        y: prob_true
        x_label: Mean Predicted Probability
        y_label: Fraction of Positives
        title: Calibration Curve
    - dvclive/plots/confusion_matrix.json:
        cache: false
        template: confusion
        x: actual
        y: predicted
        title: Confusion Matrix
    - dvclive/plots/precision_recall.json:
        cache: false
        x: recall
        y: precision
        title: Precision Recall Curve
    - dvclive/plots/det.json:
        cache: false
        x: fpr
        y: fnr
        title: DET curve
    - dvclive/plots/roc.json:
        cache: false
        x: fpr
        y: tpr
        title: ROC curve

dvc plots show

Screenshot 2022-01-03 at 22-49-29 DVC Plot

@daavoo daavoo requested review from dberenbaum and pared November 8, 2021 20:15
@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2021

Codecov Report

Merging #189 (fb7e19f) into main (4aa81dd) will decrease coverage by 0.97%.
The diff coverage is 92.85%.

❗ Current head fb7e19f differs from pull request most recent head 200ce5d. Consider uploading reports for the commit 200ce5d to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #189      +/-   ##
==========================================
- Coverage   92.81%   91.83%   -0.98%     
==========================================
  Files          18       19       +1     
  Lines         487      588     +101     
==========================================
+ Hits          452      540      +88     
- Misses         35       48      +13     
Impacted Files Coverage Ξ”
dvclive/error.py 84.00% <40.00%> (-11.00%) ⬇️
dvclive/live.py 96.37% <91.42%> (-1.90%) ⬇️
dvclive/data/base.py 91.80% <93.54%> (+0.13%) ⬆️
dvclive/data/plot.py 94.64% <94.64%> (ΓΈ)
dvclive/data/__init__.py 100.00% <100.00%> (ΓΈ)
dvclive/data/image.py 90.90% <100.00%> (-9.10%) ⬇️
dvclive/data/scalar.py 100.00% <100.00%> (ΓΈ)
dvclive/version.py 72.22% <100.00%> (ΓΈ)

Continue to review full report at Codecov.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update 4aa81dd...200ce5d. Read the comment docs.

@pared
Copy link
Contributor

pared commented Nov 9, 2021

The integration looks alright. One question that I have is whether we want to promote logging confusion matrix as image file. While we do support images on DVC side, I am not sure we should promote that.
Alternatives:

  • just dump matrix data, but that will only work with dvc/visualization lib that we might extract from DVC
  • dump the confusion matrix as vega spec, start supporting that on dvc side - that is probably worse than previous since we are thinking about supporting other libraries

@daavoo
Copy link
Contributor Author

daavoo commented Nov 9, 2021

The integration looks alright. One question that I have is whether we want to promote logging confusion matrix as image file. While we do support images on DVC side, I am not sure we should promote that. Alternatives:

* just dump matrix data, but that will only work with dvc/visualization lib that we might extract from DVC

I think I implemented it as an image mainly for saving users from having to set custom plot properties (i.e. template: confusion) and for some additional features provided by the sklearn plot (i.e. set custom display_labels).

However, the other functions save the data in "DVC plots format", requiring custom plot properties as well, so it would be consistent.

@pared
Copy link
Contributor

pared commented Nov 9, 2021

Well, maybe it's another reason to reconsider iterative/dvc#6944 - maybe we should discuss somehow providing config for each image which would define how to plot it?

dvclive/sklearn.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

Agree with @pared about the confusion matrix and plots in general. Seems better to save the data and work on making it easier to specify plots properties.

For the other plots, it seems awkward to have to call them in dvclive and configure them in dvc.

Also, I'm not sure how much value there is in these functions that lightly wrap existing sklearn functions. Seems more useful to do one or both of:

What do you think?

@daavoo
Copy link
Contributor Author

daavoo commented Dec 20, 2021

Revisiting this.

Also, I'm not sure how much value there is in these functions that lightly wrap existing sklearn functions. Seems more useful to do one or both of:

* `log_data()` to log any of these results. We can work on making it easy to show those results as plots, and it gives users flexibility to keep the raw results to process themselves.

I don't fully get log_data (perhaps we discussed offline and I forgot πŸ˜“ ) . Could you describe would it work (on a high-level)?

* Some sort of auto-logging at a higher level for sklearn (similar to https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog) so users have an easy default logger.

What do you think?

About the value of these light wraps, I don't really know.

My view is that there are many stages between this light wraps and automagic logging "a la mlflow".

I would say that a good intermediate would be log_classifier ("a la wandb" https://docs.wandb.ai/guides/integrations/scikit).

However, it seems that it's up to user taste and there is no real need to don't support any stage. Automagic methods are going to be using something similar to (if not exactly) these light wraps so, why not expose all different levels of granularity?

We could start with this low level integration and incrementally add magic

@daavoo daavoo force-pushed the sklearn branch 2 times, most recently from 04ba8a0 to c511533 Compare December 28, 2021 16:05
@daavoo daavoo requested a review from dberenbaum December 28, 2021 16:05
@daavoo daavoo force-pushed the sklearn branch 2 times, most recently from 4ead8a4 to 5845af9 Compare December 29, 2021 19:49
@daavoo daavoo self-assigned this Dec 29, 2021
@dberenbaum
Copy link
Collaborator

I'm still unsure about this. My hesitations are:

  1. The included methods are a pretty arbitrary selection. How do we decide what to include, and do we want to potentially support wrappers for every kind of sklearn metric/plot?
  2. There's no obvious way to include this (or any sklearn integration) as a sort of integrated callback like other frameworks that automatically logs info as the model trains.
  3. If we eventually want something like log_classifier, the closest thing AFAIK is https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html#sklearn.metrics.classification_report. This wouldn't even utilize these methods I don't think.

Another option is to add support solely via docs (once #203 is merged), similar to https://dvc.org/doc/dvclive/ml-frameworks/tensorflow.

How much value do you think the current PR adds, and how much added value do you see in further integration?

@daavoo
Copy link
Contributor Author

daavoo commented Dec 30, 2021

1. The included methods are a pretty arbitrary selection. How do we decide what to include, and do we want to potentially support wrappers for every kind of sklearn metric/plot?

I would not say it's arbitrary. It's not complete, for sure, but includes 3 out of the 6 visualizations supported in scikit-learn (https://scikit-learn.org/stable/visualizations.html).

I would support any scikit-learn visualization that can be seamlessly integrated with dvc plots usage. Under that condition, the only missing from the list would be the det_curve (could be added in the P.R, I just forgot)

It's a similar selection to the integrations in https://docs.wandb.ai/guides/integrations/scikit . The difference is that some of those plots types are not directly integrable with dvc plots and thus I don't see much sense in adding support.

2. There's no obvious way to include this (or any sklearn integration) as a sort of integrated callback like other frameworks that automatically logs info as the model trains.

Indeed, but it's a common limitation for any logger that doesn't perform the "magic" patching. This is addressable on docs, similar to the existing barebones PyTorch and TensorFlow guides.

3. If we eventually want something like `log_classifier`, the closest thing AFAIK is https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html#sklearn.metrics.classification_report. This wouldn't even utilize these methods I don't think.

With log_classifier/log_regressor I was thinking something more in the lines of https://docs.wandb.ai/guides/integrations/scikit#or-visualize-all-plots-at-once which would call internally this log_{X} functions.

Potentially, after #203, this could be extended to include live.log for multiple classification/regression metrics but I don't see how these are mutually exclusive.

Another option is to add support solely via docs (once #203 is merged), similar to https://dvc.org/doc/dvclive/ml-frameworks/tensorflow.

How much value do you think the current PR adds, and how much added value do you see in further integration?

I don't / haven't really use scikit-learn beyond toy projects, so I have a lack of context on what it's valuable in real scenarios πŸ˜…

I see immediate value to be used in https://github.com/iterative/example-get-started to reduce code and promote dvclive.
I think it prevents users from having to be aware of the format dvc plots expects.

I don't see how different these functions are from the basic usage of dvclive, where live.log is just saving lines of code and hiding the expected dvc format from the user.

Added value in further integration should move towards "magic" patching and/or log_classifier. These functions can still have value there both for internal usage but also for users preferring more fine-grained control on what to log.


Given that the inputs are arrays of y_true/y_pred, my main doubt is whether we should explicitly save this under sklearn or consider this a new plots module that has sklearn as an optional dependency.

@dberenbaum
Copy link
Collaborator

I would not say it's arbitrary. It's not complete, for sure, but includes 3 out of the 6 visualizations supported in scikit-learn (https://scikit-learn.org/stable/visualizations.html).

I would support any scikit-learn visualization that can be seamlessly integrated with dvc plots usage. Under that condition, the only missing from the list would be the det_curve (could be added in the P.R, I just forgot)

It's a similar selection to the integrations in https://docs.wandb.ai/guides/integrations/scikit . The difference is that some of those plots types are not directly integrable with dvc plots and thus I don't see much sense in adding support.

Thanks for the explanation. That makes sense.

Why can't calibration curves and partial dependence plots also be supported in DVC? They are all linear unless I'm missing something.

With log_classifier/log_regressor I was thinking something more in the lines of https://docs.wandb.ai/guides/integrations/scikit#or-visualize-all-plots-at-once which would call internally this log_{X} functions.

I think this is where we had different ideas for sklearn integration. I thought we would first need to capture the scalar metrics before worrying about plots. I guess you are thinking that the plots are harder for the user to log, and they can easily log scalar metrics already? Still, it would be nice to automatically capture all of the relevant scalar metrics (in addition to plots) if there is some higher-level integration.

Given that the inputs are arrays of y_true/y_pred, my main doubt is whether we should explicitly save this under sklearn or consider this a new plots module that has sklearn as an optional dependency.

Are you suggesting that these could be generic functions rather than specific to sklearn? I have previously kept a mini-library of classification metrics/plots to have a consistent, lightweight way to evaluate models across ML frameworks. Maybe that's more useful here than any sklearn-specific integration. It's definitely a different direction for dvclive, but it might be worth considering.

@daavoo
Copy link
Contributor Author

daavoo commented Jan 3, 2022

Why can't calibration curves and partial dependence plots also be supported in DVC? They are all linear unless I'm missing something.

You are right, calibration curves can be supported. And also partial dependence, but only in "average" mode.

Still, it would be nice to automatically capture all of the relevant scalar metrics (in addition to plots) if there is some higher-level integration.

I agree, after #203 , a higher level log_classifier would also log scalar metrics.

Are you suggesting that these could be generic functions rather than specific to sklearn? I have previously kept a mini-library of classification metrics/plots to have a consistent, lightweight way to evaluate models across ML frameworks. Maybe that's more useful here than any sklearn-specific integration. It's definitely a different direction for dvclive, but it might be worth considering.

We can start documenting as part of sklearn integration but I can see the plots being useful for basically any other framework that supports some classification task (so, pretty much, all).

@daavoo daavoo force-pushed the sklearn branch 3 times, most recently from 8fdd0f8 to a6aab0f Compare January 11, 2022 08:57
@daavoo daavoo changed the title sklearn: Add basic logging methods. Add plots module. Jan 11, 2022
@dberenbaum
Copy link
Collaborator

Should these be logged inside the dvclive dir?

dvclive/plots.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

Ultimately, I think the workflow for this should depend on #82 and #203. Right now, it's impossible to log from within a Live() instance, which makes it a bit awkward to use. I would expect a workflow like:

live = dvclive.Live()
live.log_calibration(y_test, y_score, "calibration.json")

Do you think it's worth merging as is and then modifying it after? I think it will require breaking changes to address those issues.

@dberenbaum
Copy link
Collaborator

Do you mention #82 because you consider that these plots should be loggable at every step?

I mention it because this PR seems like a specific application of #82. Similar to #166, it seems that we will need to decide on the canonical workflow and directory structure for this type of data in both non-step and step-based workflows. So, we can try to decide that now, or we can merge knowing we will need to break it later.

Copy link
Contributor Author

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all the logic to Live.log and made plots a new "data type".

So the idea is that Live.log can log any of the 3 supported data types: scalars, images, and plots.

  • Scalar
live.log("accuracy", 0.9)
  • Image
img = np.ones((500, 500, 3), np.uint8)
live.log("image.png", img)
  • Plot
live.log("roc_curve.json", (y_true, y_score, "roc"))
live.log("cm.json", (y_true, y_pred, "confusion_matrix"))

The internals make sense for me and the no step / step logic is resolved equally for images and plots.

However, I'm not sure how intuitive the API for plots is.

It could be a matter of documentation or it could be better to have explicitly separated methods for each data type log_metric, log_image, log_plot.

I kind of prefer separate methods.

@daavoo daavoo force-pushed the sklearn branch 2 times, most recently from 6f9b868 to 2ac8394 Compare January 17, 2022 22:02
@pared
Copy link
Contributor

pared commented Jan 18, 2022

I think that separate methods make sense too, as the distinction in use will be seen at first sight. I am not sure we should rename log to log_metric.

@dberenbaum
Copy link
Collaborator

Nice! I agree with @pared that it seems best to have separate methods but leave log for metrics.

@dberenbaum
Copy link
Collaborator

Do you mention #82 because you consider that these plots should be loggable at every step?

Returning to this, yes, I think it should be possible to see how the ROC curve changes at each step, for example. Right now, saving the data works, but it's almost impossible to configure dvc.yaml to show the plots, and in default mode they make it hard to even see the regular metrics history plots.

Screen Shot 2022-01-28 at 10 38 05 AM

@daavoo
Copy link
Contributor Author

daavoo commented Jan 31, 2022

Returning to this, yes, I think it should be possible to see how the ROC curve changes at each step, for example. Right now, saving the data works, but it's almost impossible to configure dvc.yaml to show the plots, and in default mode they make it hard to even see the regular metrics history plots.

The initial scope for this kind of plot was for non-step.

Properly supporting multi-step curves would require a new template would be required on DVC side using facet for comparing revision.

@dberenbaum
Copy link
Collaborator

The initial scope for this kind of plot was for non-step.

Properly supporting multi-step curves would require a new template would be required on DVC side using facet for comparing revision.

Okay, let's discuss separately in #82 or elsewhere the per-step scenario. Why support logging at each step then in this PR? Should an error be thrown if using log_plot per step?

@daavoo
Copy link
Contributor Author

daavoo commented Feb 1, 2022

Why support logging at each step then in this PR?

No strong reason, it just reuses the logic already present for images.

Should an error be thrown if using log_plot per step?

Will do

Decouple data type logging into separated methods.

Use subfolders for each data type.

Raise NotImplementedError in `log_plot` when using steps.
@dberenbaum
Copy link
Collaborator

Sorry for the confusion over step/no-step scenarios. Limiting to no-step makes sense.

@daavoo daavoo merged commit e4bc27a into main Feb 2, 2022
@daavoo daavoo deleted the sklearn branch February 2, 2022 18:47
Comment on lines +161 to +163
def log_image(self, name: str, val):
if not Image.could_log(val):
raise InvalidDataTypeError(name, type(val))
Copy link

@jorgeorpinel jorgeorpinel Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe val should be renamed. val made sense for scalars (log()) but image_data or something more descriptive would be better here IMO.

Comment on lines +173 to +174
def log_plot(self, name, labels, predictions, **kwargs):
val = (labels, predictions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. plot_data?

Comment on lines +165 to +169
if name in self._images:
data = self._images[name]
else:
data = Image(name, self.dir)
self._images[name] = data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for name, should it be filename or out instead? Again for clarity

Comment on lines +176 to +182
if name in self._plots:
data = self._plots[name]
elif name in PLOTS and PLOTS[name].could_log(val):
data = PLOTS[name](name, self.dir)
self._plots[name] = data
else:
raise InvalidPlotTypeError(name)
Copy link

@jorgeorpinel jorgeorpinel Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. filename? plot_fname?

Copy link

@jorgeorpinel jorgeorpinel Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I guess (plot_)type would be most accurate here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants