Separation plot #1359

agustinaarroyuelo · 2020-08-21T14:29:47Z

Description

The separation plot is a really interesting and simple way for assessing a model's fit when the outcome is binary. After fitting, model predictions are sorted and represented as vertical lines. When adding a color scheme that identifies the positive and negative class, for a good fit, we should see most of the instances of the positive class on the right hand side of the plot. That is, where the highest valued predictions are located.

idata = az.load_arviz_data('classification10d')
ax = az.plot_separation(
    idata=idata, 
    y='outcome',
    y_hat_line=True,
    expected_events=True,
    figsize=(10, 1),
)

~~It would be great if you could suggest how to choose default colors for this plot. Finally, I would like to know your overall comments on this code.~~

Checklist

Follows official PR format
Includes a sample plot to visually illustrate the changes (only for plot-related functions)
New features are properly documented (with an example if appropriate)?
Includes new or updated tests to cover the new feature
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

arviz/plots/backends/bokeh/separationplot.py

arviz/plots/backends/matplotlib/separationplot.py

arviz/plots/separationplot.py

aloctavodia

Remember to also update arviz/doc/api.rst

arviz/plots/separationplot.py

arviz/plots/backends/bokeh/separationplot.py

OriolAbril · 2020-08-27T05:45:17Z

arviz/plots/backends/bokeh/separationplot.py

+    backend_kwargs,
+    show,
+):
+    """Matplotlib separation plot."""


Suggested change

"""Matplotlib separation plot."""

"""Bokeh separation plot."""

arviz/plots/backends/bokeh/separationplot.py

OriolAbril · 2020-08-27T05:57:47Z

arviz/plots/backends/bokeh/separationplot.py

+    if idata is not None and not isinstance(idata, InferenceData):
+        raise ValueError("idata must be of type InferenceData or None")
+
+    if idata is None:
+        if not all(isinstance(arg, (np.ndarray, xr.DataArray)) for arg in (y, y_hat)):
+            raise ValueError(
+                "y and y_hat must be array or DataArray when idata is None "
+                "but they are of types {}".format([type(arg) for arg in (y, y_hat)])
+            )
+    else:
+
+        if y_hat is None and isinstance(y, str):
+            label_y_hat = y
+            y_hat = y
+        elif y_hat is None:
+            raise ValueError("y_hat cannot be None if y is not a str")
+
+        if isinstance(y, str):
+            y = idata.observed_data[y].values
+        elif not isinstance(y, (np.ndarray, xr.DataArray)):
+            raise ValueError("y must be of types array, DataArray or str, not {}".format(type(y)))
+
+        if isinstance(y_hat, str):
+            label_y_hat = y_hat
+            y_hat = idata.posterior_predictive[y_hat].mean(axis=(1, 0)).values
+        elif not isinstance(y_hat, (np.ndarray, xr.DataArray)):
+            raise ValueError(
+                "y_hat must be of types array, DataArray or str, not {}".format(type(y_hat))
+            )


I think this can be done in the general function and then the backend specific ones only get the arrays (plus maybe labels too?)

OriolAbril · 2020-08-27T06:01:00Z

arviz/plots/separationplot.py

+    expected_events=False,
+    figsize=None,
+    textsize=None,
+    color=None,


should we set color="C0" here and then convert to hex before passing it to the backend specific function? This is more a philosophical question about whether we want this behaviour than anything else.

Yeah, I think we want this everywhere, not only here.

OriolAbril · 2020-08-27T06:04:26Z

arviz/plots/backends/matplotlib/separationplot.py

+        ax.scatter(y_hat[idx][expected_events - 1], 0, label="Expected events", **exp_events_kwargs)
+
+    if legend and expected_events or y_hat_line:
+        handles, labels = plt.gca().get_legend_handles_labels()


why gca instead of using ax? not sure how is this different from ax.legend() 😅

arviz/plots/separationplot.py

doc/api.rst

codecov · 2020-08-27T12:58:04Z

Codecov Report

Merging #1359 into master will decrease coverage by 0.01%.
The diff coverage is 90.69%.

@@            Coverage Diff             @@
##           master    #1359      +/-   ##
==========================================
- Coverage   91.74%   91.73%   -0.02%     
==========================================
  Files         102      105       +3     
  Lines       10778    10907     +129     
==========================================
+ Hits         9888    10005     +117     
- Misses        890      902      +12

Impacted Files	Coverage Δ
arviz/plots/separationplot.py	`72.72% <72.72%> (ø)`
arviz/plots/backends/matplotlib/separationplot.py	`96.07% <96.07%> (ø)`
arviz/plots/backends/bokeh/separationplot.py	`97.72% <97.72%> (ø)`
arviz/plots/__init__.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7221d7...cb91fa1. Read the comment docs.

pydocstyle

Co-authored-by: Oriol Abril-Pla <[email protected]>

OriolAbril

Added a small comment to try to minimize duplicated code, but it's note necessary to do, can be merged as is.

I love the PR and having the separation plot on ArviZ. I have been asked a couple times for posterior predictive plots for binary outcomes in Discourse, if you are up for it, it could be interesting to make a blogpost about it, how to use it, how to interpret it and share it in pymc and stan discourses

OriolAbril · 2020-08-31T23:56:14Z

arviz/plots/backends/bokeh/separationplot.py

+    if len(y) != len(y_hat):
+        warnings.warn(
+            "y and y_hat must be the same lenght",
+            UserWarning,
+        )
+
+    locs = np.linspace(0, 1, len(y_hat))
+    width = np.diff(locs).mean()


maybe this could also go in the general function as it's not backend specific

aloctavodia · 2020-09-02T11:25:22Z

thanks @agustinaarroyuelo!

aloctavodia reviewed Aug 21, 2020

View reviewed changes

arviz/plots/backends/bokeh/separationplot.py Outdated Show resolved Hide resolved

arviz/plots/backends/matplotlib/separationplot.py Outdated Show resolved Hide resolved

arviz/plots/separationplot.py Show resolved Hide resolved

aloctavodia reviewed Aug 26, 2020

View reviewed changes

arviz/plots/separationplot.py Show resolved Hide resolved

arviz/plots/backends/bokeh/separationplot.py Outdated Show resolved Hide resolved

agustinaarroyuelo changed the title ~~[WIP] Separation plot~~ Separation plot Aug 26, 2020

OriolAbril reviewed Aug 27, 2020

View reviewed changes

agustinaarroyuelo and others added 13 commits August 27, 2020 17:54

separation plot

b8b2576

add squeeze argument

8470952

remove cmap argument

c47cb43

add example and tests

63fb226

pydocstyle

add gallery examples

9825389

update changelog

c4cc963

run black

64b1158

fix legend

0d6ac2d

use labeled dimensions

0d3194f

Co-authored-by: Oriol Abril-Pla <[email protected]>

update docstring

6e652af

Co-authored-by: Oriol Abril-Pla <[email protected]>

update doc/api.rst

ca54234

Co-authored-by: Oriol Abril-Pla <[email protected]>

change expected events plot

e5a3c33

fix label

5b4f80f

aloctavodia approved these changes Aug 31, 2020

View reviewed changes

OriolAbril approved these changes Sep 1, 2020

View reviewed changes

move locs and width to general function

cb91fa1

aloctavodia merged commit 5a6bbee into arviz-devs:master Sep 2, 2020

agustinaarroyuelo deleted the separationplot branch September 2, 2020 12:18

sethaxen mentioned this pull request Oct 3, 2020

Support new arviz v0.10.0 features arviz-devs/ArviZ.jl#92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separation plot #1359

Separation plot #1359

agustinaarroyuelo commented Aug 21, 2020 •

edited by aloctavodia

Loading

aloctavodia left a comment

OriolAbril Aug 27, 2020 •

edited

Loading

OriolAbril Aug 27, 2020

OriolAbril Aug 27, 2020

aloctavodia Aug 27, 2020

OriolAbril Aug 27, 2020

codecov bot commented Aug 27, 2020 •

edited

Loading

OriolAbril left a comment

OriolAbril Aug 31, 2020

aloctavodia commented Sep 2, 2020

	"""Matplotlib separation plot."""
	"""Bokeh separation plot."""

Separation plot #1359

Separation plot #1359

Conversation

agustinaarroyuelo commented Aug 21, 2020 • edited by aloctavodia Loading

Description

Checklist

aloctavodia left a comment

Choose a reason for hiding this comment

OriolAbril Aug 27, 2020 • edited Loading

Choose a reason for hiding this comment

OriolAbril Aug 27, 2020

Choose a reason for hiding this comment

OriolAbril Aug 27, 2020

Choose a reason for hiding this comment

aloctavodia Aug 27, 2020

Choose a reason for hiding this comment

OriolAbril Aug 27, 2020

Choose a reason for hiding this comment

codecov bot commented Aug 27, 2020 • edited Loading

Codecov Report

OriolAbril left a comment

Choose a reason for hiding this comment

OriolAbril Aug 31, 2020

Choose a reason for hiding this comment

aloctavodia commented Sep 2, 2020

agustinaarroyuelo commented Aug 21, 2020 •

edited by aloctavodia

Loading

OriolAbril Aug 27, 2020 •

edited

Loading

codecov bot commented Aug 27, 2020 •

edited

Loading