-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using weighted mean estimator for bootstrapped confidence intervals in seaborn plots #3563
Comments
Hm, I do remember that complex dtype trick — very clever but still ultimately a hack. I don't know exactly what broke it in v0.13 (the relevant code was more or less completely rewritten) and it's pretty unlikely that it's going to come back as a supported use case. That said, seaborn has support for weights in a few other places (i.e. the distribution plots) and it probably makes some sense to have them in the categorical plots too. In fact, with the v0.13 rewrite, it'll be a lot easier to add. But there are still a few challenges:
FWIW while there isn't currently a weighted-average stat in the objects interface I'd be on board with adding one, and doing so would skirt most of these questions. |
I guess also the other part of what makes this challenging in the function interface is that the bootstrap logic would need to work a little bit differently to bootstrap the observations and weights together. |
Hey Michael, thanks for the reply. Yeah I figured that restoring this behaviour would be a long shot - only asked in case there was a simple one/two line fix which would do the job. But I guess if this area of the code base has be rewritten then that will be a non-starter. Agreed that this would be useful on other functions beside When you say a weighted stat, do you mean something like statsmodels Anyway, I see this is not a straightforward feature so wouldn't expect something quick. However, I'd be more than happy to continue discussing potential implementations if/when this is added. |
Well that's the thing: there's no "list of supported estimators": from a a seaborn perspective the estimator just needs to be a function that takes a vector and returns a scalar (or the name of a method on a pandas series that operates that way). There are no other operational constraints. Which is why adding hard-to-explain nuances like "if you pass |
OK I understand the complexity this adds to the current API. And it possible that Then perhaps an alternative is to allow the user to pass an arbitrary data type to the plotters (but only as a single vector), and it is their responsibility to ensure that the estimator that is passed is compatible with the data type they've supplied. Similar to the original complex number system above, but perhaps slightly more formal. Perhaps this is what you meant in your original reply?
This would allow something like this This currently throws as below, but perhaps it's simple to relax the restriction the data type must be numeric?
|
See #3580 for what this looks like in the objects interface. |
Hi Mark thanks for working on this! Much appreciated. Is the plan for this to be included in the next seaborn release? Also, as a more general question about seaborn - are you gradually moving towards developing new features for the objects interface only? Or will support for the function interface continue? |
Seaborn doesn't have an explicit roadmap or release schedule, but once the PR is merged it would be in master and would be part of the next release (0.13.1).
I'd say these are two different things. "Support" for the function interface in the form of bug fixes and core functionality will continue. And even in terms of major feature development, you'll note that the 0.13.0 release was focused on the function interface, with lots of new features. But one motivation for the development of the objects interface is that the function interface's design can impose API-usability cost to new features, and also it can be annoying to need to add some functionality in lots of places, as I've sort of elaborated above. So there will be some kinds of features where we could add them to the function interface, but it would be a lot easier and cleaner to just do it in the objects interface. I'm not totally sure which class this issue falls into though; I think that basic support for a weighted mean + CI would be pretty straightforward (since the computational code is already shared by the two interfaces) and then it's just a matter of coming to terms with the additional complexity where |
Thanks Michael, understood |
Would weighted mean+ci be sufficient for you? Do you have a use case for other estimators / errorbars? |
Personally, I just use the weighted mean at the moment. So that would be sufficient. In terms of plotting, I tend to use a mixture of |
Please take a look at #3586 and see if that would work for you. |
That looks great! I've pulled the branch and tested locally with a couple of examples and it works as I would expect. Thanks a lot for implementing this - really appreciate it! |
Hi, I would like to use a weighted mean estimator for calculating confidence intervals on various seaborn plots. In the past I have done this via a 'hack' suggested here which uses complex numbers to encode the data and its weights before passing to a seaborn plotting function.
Unfortunately, as of upgrading to seaborn v0.13.0, this approach no longer works as it seems like the complex numbers are cast to reals at some point in the plotting process (and hence lose part of the data). This had previously worked up until v0.12.2.
I appreciate this was always a bit of a hack, but would either of the following be possible:
a) Add native support for weighted mean estimators to the seaborn plotting functions or,
b) Restore this hacky behaviour for now in a future release
I have tried alternatives such as storing the data and its weights in tuples or dataclasses, however neither of these approaches work as the data types are not numeric.
Language and package versions:
Example code:
Output using seaborn v0.12.2
Output using seaborn v0.13.0
The text was updated successfully, but these errors were encountered: