-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use scipy.stats.bootstrap in pingouin.compute_bootci? #189
Comments
I think The BCa being only available for univariate statistics is not a deal breaker, but kinda annoying. I've hit that issue a few times. Maybe I'll stick a feature request into Scipy. EDIT: scipy/scipy#16433 Thank you for all the improvements. |
Thanks for opening the issue on SciPy! I think that scipy's bootstrap module is still under development so I'll stick to Pingouin's internal implementation for now. I'm not sure to understand your method to get the bootstrapped distribution, could you please provide some example code? Thanks, |
Sure. I did this recently with Scipy: Data set: Code: import pandas as pd
import numpy as np
from scipy import stats
BirdPecks = pd.read_csv("BirdPecks.csv")
Nboot = 10000
pg1 = BirdPecks[BirdPecks["group"] == 1]["pecks"].to_numpy()
pg2 = BirdPecks[BirdPecks["group"] == 2]["pecks"].to_numpy()
bdist = []
global bdist
def boot_mean(x, y):
global bdist
m = np.mean(x) - np.mean(y)
bdist.append(m)
return m
bs_out = stats.bootstrap(
data=(pg1, pg2),
statistic=boot_mean,
vectorized=False,
paired=False,
confidence_level=0.9,
n_resamples=Nboot,
random_state=123,
method="basic"
)
print(bdist) |
@raphaelvallat There are proposals for significant improvements to They're asking for extra comments, but right now I'm under a big time crunch, so I'm not sure I can help. |
I'd be happy to add other enhancements to As @FlorinAndrei mentioned, domain knowledge review of PRs would help - the main reason I haven't made enhancements already is shortage of reviewer time. We can definitely return the bootstrap distribution in 1.10, and I now see no technical impediment to extending BCa to the multi-sample case. I decided against returning a normal approximation confidence interval because Efron does not seem to recommended them. Of course, that would be super easy to add, either in a Pingouin wrapper or SciPy itself (if there's a good case made for it). I'm currently writing a SciPy tutorial for all the resampling methods we offer ( |
Hi @mdhaber, Thanks for all your work on the bootstrap module. I'd be happy to review a PR and/or tutorial 👍 I think that returning the bootstrapped distribution in SciPy would be great. If so, the current pingouin.compute_bootci would essentially become a wrapper around scipy.stats.bootstrap. Two other potential ideas:
|
Great! If you'd like to look at doc additions in scipy/scipy#16454, which returns the bootstrap distribution, I'd appreciate it! If you approve, be sure to select that radio button when you submit your review. As a maintainer, it certainly improves my confidence in merging when the PR has the approval of others.
OK, I'll be thinking about that.
Yup, I can do that. Good idea. Update: see scipy/scipy#16651. |
@raphaelvallat Now that scipy/scipy#16651 addresses your idea 2 above, here are some thoughts about your idea 1. In retrospect, I would have made But perhaps it's for the best: a really flexible way of doing this is to add Another possibility is to make the new |
Awesome @mdhaber, I just reviewed scipy/scipy#16651!
I think that having a dedicated
I agree. |
Most CI methods need to know the observed value of the statistic to compute the confidence interval. I suppose I could allow |
Just thought I'd mention that scipy/scipy#16455 would add BCa bootstrap for multi-sample statistics. I think it's correct now, and I'm working on unit tests for it. Reviews appreciated! |
And scipy/scipy#16714 adds the functionality suggested in #189 (comment) to address @raphaelvallat's idea 1. With that, one can change confidence level, CI type, and add resamples (or not) like: res = stats.bootstrap((x,), np.mean, confidence_level=0.95, method='percentile')
# without any new resampling, change the confidence level and method
res2 = stats.bootstrap((x,), np.mean, n_resamples=0, bootstrap_result=res, confidence_level=0.9, method='BCa') |
@mdhaber I reviewed the second PR. Unfortunately, I don't think I have the stats skills nor time to do a full in-depth review of scipy/scipy#16455 — @FlorinAndrei might be better able to help here. But please let me know how I can be helping. Thanks! |
No problem. Thanks for taking a look at scipy/scipy#16714! |
The multi-sample BCa PR merged. Hope it helps! |
pingouin.compute_bootci could leverage the newly-added scipy.stats.bootstrap function (SciPy 1.7.0+) to calculate the bootstrapped confidence intervals.
Pros
Cons
return_dist
argument.The text was updated successfully, but these errors were encountered: