Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix credible_interval=None and plot_dist with ndim>2 #1115

Merged
merged 9 commits into from
Mar 15, 2020

Conversation

smit-s
Copy link
Contributor

@smit-s smit-s commented Mar 13, 2020

Description

This pull request fixes #1104 wherein changes have been made to distplot.py to flatten the array if it is a Dataarray and changes have been made to posteriorplot.py to change credible_interval to "auto" and added a condition for handling None for credible_interval.
Plots
In case of credible_interval=None
none

In case of credible_interval is not set, it will become auto-

auto

Checklist

  • Follows official PR format
  • Includes a sample plot to visually illustrate the changes (only for plot-related functions)
  • Code style correct (follows pylint and black guidelines)
  • Include changes to changelog

Comment on lines 94 to 97
if type(values).__name__ == "DataArray":
ax.hist(values.values.flatten(), bins=bins, **hist_kwargs)
else:
ax.hist(values, bins=bins, **hist_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'd use only values = np.asarray(values).faltten() (and probably the same in bokeh file, haven't checked though). This will accept DataArrays stright away.

Moreover, it is probably a good idea to add some errors in the general file plots/distplot.py in a similar way to plot_kde, so users get some informative error if they try to use the function on an inference data or a dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this solution is also working fine.

For, the second suggestion, Are following error appropriate?

"Cannot directly convert xarray.Dataset to numpy array. Instead, create an xarray.DataArray first or Use plot_posterior, plot_density, plot_joint" "or plot_pair instead of plot_dist"
and
" Inference Data object detected. Use plot_posterior instead of plot_dist"

@OriolAbril
Copy link
Member

Thanks for the PR, I love having the plots here in the description, if I may, some minor comments.

Try to make the title as short and descriptive as possible (i.e. Fix credible_interval=None and plot_dist with ndim>2).

Avoid using the issue it fixes/is related in the title, we generally don't remember issues by name and GitHub does not render links in titles, only in description, moreover, if you use some keywords right before the issue number GitHub will do some magic and close the issue automatically once the PR is merged!

And finally but not least, these are relevant fixes! They should be included in the changelog so users can see it has been fixed.

@smit-s smit-s changed the title Addresses #1104:Making changes to address the error generated due to Dataarray of more than 2 dimension for histogram and changing behavior of credible_interval in posteriorplot.py Addresses #1104:Fix credible_interval=None and plot_dist with ndim>2 Mar 14, 2020
@smit-s
Copy link
Contributor Author

smit-s commented Mar 14, 2020

Thanks for review @OriolAbril . I will make relevant changes.

@ahartikainen ahartikainen changed the title Addresses #1104:Fix credible_interval=None and plot_dist with ndim>2 Fix credible_interval=None and plot_dist with ndim>2 Mar 14, 2020
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three files relevant to any ArviZ plot, and plot_dist is no exception:

  • plots/xxxplot.py: The function exposed to users is the one defined here. It takes care of all preprocessing tasks common between backends.
    • plots/backends/bokeh/xxxplot.py: This function is called by the one above if using bokeh backend and does the actual plotting in bokeh
    • plots/backends/matplotlib/xxxplot.py: Same as above but when using matplotlib bokeh

arviz/plots/backends/matplotlib/distplot.py Outdated Show resolved Hide resolved
arviz/plots/backends/matplotlib/distplot.py Show resolved Hide resolved
arviz/plots/backends/matplotlib/distplot.py Show resolved Hide resolved
@@ -115,7 +115,7 @@ def _histplot_bokeh_op(values, values2, rotated, ax, hist_kwargs):
if bins is None:
bins = get_bins(values)
density = hist_kwargs.pop("density", True)
hist, edges = np.histogram(values, density=density, bins=bins)
hist, edges = np.histogram(np.asarray(values).flatten(), density=density, bins=bins)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move this before using values for bins and do values = np.asarray(values).flatten() to use always flattened array.

@@ -91,8 +91,10 @@ def _histplot_mpl_op(values, values2, rotated, ax, hist_kwargs):
raise NotImplementedError("Insert hexbin plot here")

bins = hist_kwargs.pop("bins")
if bins is None:
bins = get_bins(values)
ax.hist(np.asarray(values).flatten(), bins=bins, **hist_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines 147 to 155
if isinstance(values, xr.Dataset):
raise ValueError(
"Cannot directly convert xarray.Dataset to numpy array."
" Instead, create an xarray.DataArray first "
"or Use plot_posterior, plot_density, plot_joint"
"or plot_pair instead of plot_dist"
)
if isinstance(values, InferenceData):
raise ValueError(" Inference Data object detected. Use plot_posterior instead of plot_dist")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though everything it says is correct, I would change it slightly to guide users towards our intended workflow, moreover, I figured we can have a single message for both cases:

    if isinstance(values, xr.Dataset) or isinstance(values, InferenceData):
        raise ValueError(
            "InferenceData or xarray.Dateset object detected,"
            " use plot_posterior, plot_density or plot_pair"
            " instead of plot_dist"
        )

They can always create a DataArray manually, but we have the var_names argument so users should not create dataarrays manually. We also recently deprecated plot_joint in favour of plot_pair so error messages should not continue recommending its usage.

@smit-s
Copy link
Contributor Author

smit-s commented Mar 14, 2020

@OriolAbril As the pull request is approved can I add it to changelog under "Maintenance and fixes"?

@OriolAbril
Copy link
Member

There is no need to wait for the PR to be approved to add to changelog, we often merge PRs without any approval too.

We have no convention on using approved tag on reviews either, I generally use it to indicate that bulk or PR contribution is finished and only nits are pending.

CHANGELOG.md Outdated
@@ -5,7 +5,7 @@
### New features
* Integrate jointplot into pairplot, add point-estimate and overlay of plot kinds #1079
### Maintenance and fixes

* Fixed parameter `credible_interval` and fix for Dataarray input with dimensions greater than 2(#1115)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use one bullet point per fix as they are actually unrelated:

* Fixed behaviour of `credible_interval=None` in `plot_posterior` (#1115)
* Fixed hist kind of `plot_dist` with multidimensional input (#1115)

@OriolAbril
Copy link
Member

LGTM

Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahartikainen ahartikainen merged commit e5208d1 into arviz-devs:master Mar 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Histogram distribution plot fails on hist, succeeds on KDE
3 participants