Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved pair plots for scenario discovery #288

Merged
merged 21 commits into from
Nov 15, 2023

Conversation

steipatr
Copy link
Contributor

These changes add some interesting plotting options to the pair plots for scenario discovery. New options include contour plots and bivariate histograms, as well as the option to leave the upper triangle empty to reduce visual clutter. There are no new required arguments, so existing code should still run, but will produce different figures. These changes are possible by moving the pair plot code from Seaborn's PairPlot to the more flexible PairGrid.

If accepted, this closes #98 .

Current EMA code.
download

Proposed new default behavior, with kernel density estimates on diagonal, contour plot and scatter plot. Less whitespace between plots and axes.
download

Some alternative options: cumulative density functions on diagonal, bivariate histogram in lower triangle, and empty upper triangle. Same whitespace between plots and axes as current implementation.
download

steipatr and others added 4 commits July 13, 2023 14:40
Adds new plotting options for upper and lower triangles.
It ain't much but it's honest work.
@coveralls
Copy link

coveralls commented Jul 13, 2023

Coverage Status

coverage: 80.764% (-0.03%) from 80.798%
when pulling 58b84c4 on steipatr:better-pairplots
into 9090851 on quaquel:master.

@EwoutH
Copy link
Collaborator

EwoutH commented Jul 14, 2023

Thanks for this effort! I really dig the bivariate histograms. The contour plots look great for float and int inputs, but for categorical and bool inputs I find them confusing.

My suggestion for the default would be scatter plot on the upper triangle (like now), bivariate histograms on the lower one (new), and kernel density estimates on the diagonal (like now).

But maybe we can just add three arguments to the function, like

def show_pairs_scatter(upper=scatter”, lower=bivariate”, diag=kde”):

And then we also allow “contour” for upper and lower and “cumulative” for diag.

Also really curious what @quaquel thinks!

Edit: Just saw you already added those arguments, great idea!

@steipatr
Copy link
Contributor Author

steipatr commented Jul 16, 2023

I agree that bivariate histograms are clearer across different variable types. Because of the binning, the resulting plots are offset a bit against the subspace boxes. If it becomes the default plot, that should probably be addressed.

@quaquel
Copy link
Owner

quaquel commented Aug 31, 2023

I agree with the defaults suggested by @EwoutH and the offset suggested by @steipatr. Once those things are added, this is ready to be merged. Very good stuff!

@steipatr
Copy link
Contributor Author

steipatr commented Nov 6, 2023

I have updated the default behavior as suggested by @EwoutH, and fiddled a bit with padding to make the boxes look better on top of categorical variables. Please note that bivariate histograms are considered experimental according to Seaborn docs.

EDIT: To clarify, from my perspective, this branch is ready for merging.

Current look:
image

@quaquel
Copy link
Owner

quaquel commented Nov 6, 2023

Looks very good. I'll check the code asap and merge if everything is fine now.

Copy link
Collaborator

@EwoutH EwoutH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this effort! I will let the formal review to @quaquel, but I have a few minor suggestions.

ema_workbench/analysis/prim.py Outdated Show resolved Hide resolved
ema_workbench/analysis/scenario_discovery_util.py Outdated Show resolved Hide resolved
ema_workbench/analysis/scenario_discovery_util.py Outdated Show resolved Hide resolved
ema_workbench/analysis/scenario_discovery_util.py Outdated Show resolved Hide resolved
ema_workbench/analysis/scenario_discovery_util.py Outdated Show resolved Hide resolved
@quaquel
Copy link
Owner

quaquel commented Nov 6, 2023

Thanks, @EwoutH for these suggestions. I agree with all of them and have added the easy ones already as commits. Only the enum/string thing still needs to be done.

@quaquel quaquel requested a review from EwoutH November 15, 2023 15:43
@quaquel
Copy link
Owner

quaquel commented Nov 15, 2023

I added the enum to the code. I guess this is ready to be merged, but @EwoutH if you have a chance to give it a last quick check, that would be great.

@EwoutH
Copy link
Collaborator

EwoutH commented Nov 15, 2023

Personally I find ENUM a bit overkill for such a construct, but it can't hurt either.

Did some manual testing happen?

If nothing major has changed since my last review, it's good to go.

@quaquel
Copy link
Owner

quaquel commented Nov 15, 2023

I agree on the enum comment. I decided to go down the enum route for parallelism with density plots. I would even argue that the density plots might also be overkill (4 options only). I have tested the code with the prim example. No further changes took place.

@quaquel quaquel merged commit c9049bb into quaquel:master Nov 15, 2023
11 checks passed
@quaquel quaquel added this to the 2.5.0 milestone Nov 15, 2023
@quaquel
Copy link
Owner

quaquel commented Nov 15, 2023

Thanks @steipatr

@EwoutH
Copy link
Collaborator

EwoutH commented Nov 15, 2023

Thanks a lot Patrick!

@steipatr
Copy link
Contributor Author

steipatr commented Nov 16, 2023

Thank you both for your efforts! Especially @quaquel for finishing the last bits. I think it's a great enhancement. Now to generalize it to multiple subspaces...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KDE/Contour plot option in PRIM show_pairs
4 participants