-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved pair plots for scenario discovery #288
Conversation
Adds new plotting options for upper and lower triangles.
It ain't much but it's honest work.
for more information, see https://pre-commit.ci
Thanks for this effort! I really dig the bivariate histograms. The contour plots look great for float and int inputs, but for categorical and bool inputs I find them confusing. My suggestion for the default would be scatter plot on the upper triangle (like now), bivariate histograms on the lower one (new), and kernel density estimates on the diagonal (like now). But maybe we can just add three arguments to the function, like def show_pairs_scatter(upper=“scatter”, lower=“bivariate”, diag=“kde”): And then we also allow Also really curious what @quaquel thinks! Edit: Just saw you already added those arguments, great idea! |
I agree that bivariate histograms are clearer across different variable types. Because of the binning, the resulting plots are offset a bit against the subspace boxes. If it becomes the default plot, that should probably be addressed. |
for more information, see https://pre-commit.ci
I have updated the default behavior as suggested by @EwoutH, and fiddled a bit with padding to make the boxes look better on top of categorical variables. Please note that bivariate histograms are considered experimental according to Seaborn docs. EDIT: To clarify, from my perspective, this branch is ready for merging. |
Looks very good. I'll check the code asap and merge if everything is fine now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this effort! I will let the formal review to @quaquel, but I have a few minor suggestions.
Co-authored-by: Ewout ter Hoeven <[email protected]>
Co-authored-by: Ewout ter Hoeven <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Ewout ter Hoeven <[email protected]>
for more information, see https://pre-commit.ci
Thanks, @EwoutH for these suggestions. I agree with all of them and have added the easy ones already as commits. Only the enum/string thing still needs to be done. |
Co-authored-by: Ewout ter Hoeven <[email protected]>
for more information, see https://pre-commit.ci
I added the enum to the code. I guess this is ready to be merged, but @EwoutH if you have a chance to give it a last quick check, that would be great. |
Personally I find ENUM a bit overkill for such a construct, but it can't hurt either. Did some manual testing happen? If nothing major has changed since my last review, it's good to go. |
I agree on the enum comment. I decided to go down the enum route for parallelism with density plots. I would even argue that the density plots might also be overkill (4 options only). I have tested the code with the prim example. No further changes took place. |
Thanks @steipatr |
Thanks a lot Patrick! |
Thank you both for your efforts! Especially @quaquel for finishing the last bits. I think it's a great enhancement. Now to generalize it to multiple subspaces... |
These changes add some interesting plotting options to the pair plots for scenario discovery. New options include contour plots and bivariate histograms, as well as the option to leave the upper triangle empty to reduce visual clutter. There are no new required arguments, so existing code should still run, but will produce different figures. These changes are possible by moving the pair plot code from Seaborn's PairPlot to the more flexible PairGrid.
If accepted, this closes #98 .
Current EMA code.
Proposed new default behavior, with kernel density estimates on diagonal, contour plot and scatter plot. Less whitespace between plots and axes.
Some alternative options: cumulative density functions on diagonal, bivariate histogram in lower triangle, and empty upper triangle. Same whitespace between plots and axes as current implementation.