Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: use studentized range over statsmodels for pairwise tests #229

Conversation

swallan
Copy link
Contributor

@swallan swallan commented Feb 3, 2022

This PR replaces use of statsmodels' studentized range distribution functions (statsmodels.stats.libqsturng.qsturng, statsmodels.stats.libqsturng.psturng) with those from SciPy now that scipy.stats.studentized_range is available (SciPy 1.7.0+).

Speed is maintained, and results are more accurate because SciPy's implementation evaluates the integral for the provided arguments rather than interpolating from tabulated results. statsmodels has adopted this change internally as well.

For example, this box and whiskey plot (from here) compares the relative accuracy of the studentized range CDF in SciPy, statsmodels, and R (ptukey) across a wide range of inputs.

Screen Shot 2022-01-24 at 9 27 21 PM

@raphaelvallat raphaelvallat self-requested a review February 3, 2022 23:50
@raphaelvallat raphaelvallat added the feature request 🚧 New feature or request label Feb 3, 2022
@raphaelvallat
Copy link
Owner

This is fantastic, thanks @swallan! I somehow missed that addition in SciPy. The unit tests are failing because of an unrelated issue (#227), but have you tried to run the test and doctest locally? Are there any modifications required? I am assuming that the overall impact on the resulting p-values is going to be so small that it might not even require updating the unit tests and examples in the docstring, but could you please check?

Also, could you edit the changelog.rst file, under the 0.6.0.dev section?

PS: read more about the contributing guidelines here.

Thank you!

@swallan
Copy link
Contributor Author

swallan commented Feb 4, 2022

@raphaelvallat I did run the pytest suite locally after switching pandas versions and it was all successful.

Output of locally running tests: (all passing)

❯ pytest pingouin
================================================= test session starts =================================================
platform darwin -- Python 3.8.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/swallan/Documents/SciPy/scipy_meson/pingouin, configfile: setup.cfg
collected 86 items                                                                                                    

pingouin/tests/test_bayesian.py ... [ 3%]
pingouin/tests/test_circular.py ........ [ 12%]
pingouin/tests/test_config.py . [ 13%]
pingouin/tests/test_contingency.py .... [ 18%]
pingouin/tests/test_correlation.py .... [ 23%]
pingouin/tests/test_distribution.py ...... [ 30%]
pingouin/tests/test_effsize.py ..... [ 36%]
pingouin/tests/test_equivalence.py . [ 37%]
pingouin/tests/test_multicomp.py ..... [ 43%]
pingouin/tests/test_multivariate.py ... [ 46%]
pingouin/tests/test_nonparametric.py ........ [ 55%]
pingouin/tests/test_pairwise.py .... [ 60%]
pingouin/tests/test_pandas.py . [ 61%]
pingouin/tests/test_parametric.py ....... [ 69%]
pingouin/tests/test_plotting.py ....... [ 77%]
pingouin/tests/test_power.py ...... [ 84%]
pingouin/tests/test_regression.py ... [ 88%]
pingouin/tests/test_reliability.py .. [ 90%]
pingouin/tests/test_utils.py ........ [100%]

================================================== warnings summary ===================================================
venv/lib/python3.8/site-packages/seaborn/rcmod.py:82
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/seaborn/rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(mpl.version) >= "3.0":

venv/lib/python3.8/site-packages/setuptools/_distutils/version.py:351
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

venv/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:2631
pingouin/tests/test_parametric.py::TestParametric::test_anova
pingouin/tests/test_parametric.py::TestParametric::test_rm_anova2
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:2631: FutureWarning: The inplace parameter in pandas.Categorical.add_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
res = method(*args, **kwargs)

venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:35
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:35: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
version = LooseVersion(pd.version)

venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:37
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
pandas_lt_1_0_0 = version < LooseVersion("1.0.0")

venv/lib/python3.8/site-packages/statsmodels/compat/numpy.py:46
venv/lib/python3.8/site-packages/statsmodels/compat/numpy.py:46
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/numpy.py:46: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
NP_LT_114 = LooseVersion(np.version) < LooseVersion("1.14")

venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:6
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:6: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SP_VERSION = LooseVersion(scipy.version)

venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:7
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:7: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SCIPY_GT_14 = SP_VERSION >= LooseVersion("1.5")

venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:8
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:8: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SP_LT_15 = SP_VERSION < LooseVersion("1.5")

venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:9
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:9: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SP_LT_16 = SP_VERSION < LooseVersion("1.6")

pingouin/tests/test_correlation.py::TestCorrelation::test_corr
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
warnings.warn(PearsonRConstantInputWarning())

pingouin/tests/test_equivalence.py::TestEquivalence::test_tost
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/pingouin/bayesian.py:146: RuntimeWarning: invalid value encountered in double_scalars
bf10 = 1 / ((1 + t2 / df)(-(df + 1) / 2) / integr)

pingouin/tests/test_equivalence.py::TestEquivalence::test_tost
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/pingouin/bayesian.py:146: RuntimeWarning: divide by zero encountered in double_scalars
bf10 = 1 / ((1 + t2 / df)(-(df + 1) / 2) / integr)

pingouin/tests/test_equivalence.py::TestEquivalence::test_tost
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/pingouin/parametric.py:263: RuntimeWarning: invalid value encountered in multiply
ci = np.array([tval - tcrit, tval + tcrit]) * se

-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================ slowest 10 durations =================================================
9.29s call pingouin/tests/test_effsize.py::TestEffsize::test_compute_boot_esci
6.51s call pingouin/tests/test_regression.py::TestRegression::test_mediation_analysis
1.48s call pingouin/tests/test_pairwise.py::TestPairwise::test_pairwise_corr
1.43s call pingouin/tests/test_pairwise.py::TestPairwise::test_pairwise_ttests
1.25s call pingouin/tests/test_plotting.py::TestPlotting::test_plot_paired
0.57s call pingouin/tests/test_pandas.py::TestParametric::test_pandas
0.50s call pingouin/tests/test_plotting.py::TestPlotting::test_plot_rm_corr
0.42s call pingouin/tests/test_parametric.py::TestParametric::test_mixed_anova
0.33s call pingouin/tests/test_plotting.py::TestPlotting::test_plot_blandaltman
0.30s call pingouin/tests/test_plotting.py::TestPlotting::test_plot_shift
========================================== 86 passed, 17 warnings in 30.41s ===========================================

impact on the resulting p-values is going to be so small

I didn't realize running the doctests before, but I have now updated the p-values. The changes needed were small. Here are the tests passing with that change (to be pushed). (all passing)

❯ pytest --doctest-modules
============================================================== test session starts ==============================================================
platform darwin -- Python 3.8.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /Users/swallan/Documents/SciPy/scipy_meson/pingouin, configfile: setup.cfg
collected 165 items                                                                                                                             

bayesian.py ... [ 1%]
circular.py ........ [ 6%]
contingency.py ... [ 8%]
correlation.py ...... [ 12%]
distribution.py ...... [ 15%]
effsize.py ..... [ 18%]
equivalence.py . [ 19%]
multicomp.py ..... [ 22%]
multivariate.py ... [ 24%]
nonparametric.py ........ [ 29%]
pairwise.py .... [ 31%]
parametric.py ...... [ 35%]
plotting.py ...... [ 38%]
power.py ...... [ 42%]
regression.py ... [ 44%]
reliability.py .. [ 45%]
utils.py .. [ 46%]
datasets/init.py .. [ 47%]
tests/test_bayesian.py ... [ 49%]
tests/test_circular.py ........ [ 54%]
tests/test_config.py . [ 55%]
tests/test_contingency.py .... [ 57%]
tests/test_correlation.py .... [ 60%]
tests/test_distribution.py ...... [ 63%]
tests/test_effsize.py ..... [ 66%]
tests/test_equivalence.py . [ 67%]
tests/test_multicomp.py ..... [ 70%]
tests/test_multivariate.py ... [ 72%]
tests/test_nonparametric.py ........ [ 76%]
tests/test_pairwise.py .... [ 79%]
tests/test_pandas.py . [ 80%]
tests/test_parametric.py ....... [ 84%]
tests/test_plotting.py ....... [ 88%]
tests/test_power.py ...... [ 92%]
tests/test_regression.py ... [ 93%]
tests/test_reliability.py .. [ 95%]
tests/test_utils.py ........ [100%]

=============================================================== warnings summary ================================================================
../venv/lib/python3.8/site-packages/seaborn/rcmod.py:82
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/seaborn/rcmod.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(mpl.version) >= "3.0":

../venv/lib/python3.8/site-packages/setuptools/_distutils/version.py:351
pingouin/plotting.py::pingouin.plotting.plot_circmean
pingouin/plotting.py::pingouin.plotting.plot_rm_corr
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

../venv/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:2631
pingouin/tests/test_parametric.py::TestParametric::test_anova
pingouin/tests/test_parametric.py::TestParametric::test_rm_anova2
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:2631: FutureWarning: The inplace parameter in pandas.Categorical.add_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
res = method(*args, **kwargs)

../venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:35
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:35: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
version = LooseVersion(pd.version)

../venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:37
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/pandas.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
pandas_lt_1_0_0 = version < LooseVersion("1.0.0")

../venv/lib/python3.8/site-packages/statsmodels/compat/numpy.py:46
../venv/lib/python3.8/site-packages/statsmodels/compat/numpy.py:46
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/numpy.py:46: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
NP_LT_114 = LooseVersion(np.version) < LooseVersion("1.14")

../venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:6
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:6: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SP_VERSION = LooseVersion(scipy.version)

../venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:7
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:7: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SCIPY_GT_14 = SP_VERSION >= LooseVersion("1.5")

../venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:8
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:8: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SP_LT_15 = SP_VERSION < LooseVersion("1.5")

../venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:9
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/statsmodels/compat/scipy.py:9: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
SP_LT_16 = SP_VERSION < LooseVersion("1.6")

pingouin/plotting.py::pingouin.plotting.plot_circmean
pingouin/plotting.py::pingouin.plotting.plot_rm_corr
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/seaborn/rcmod.py:400: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(mpl.version) >= "3.0":

pingouin/plotting.py::pingouin.plotting.plot_rm_corr
pingouin/plotting.py::pingouin.plotting.plot_rm_corr
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/seaborn/axisgrid.py:130: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(mpl.version) < LooseVersion("3.0"):

pingouin/tests/test_correlation.py::TestCorrelation::test_corr
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/venv/lib/python3.8/site-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
warnings.warn(PearsonRConstantInputWarning())

pingouin/tests/test_equivalence.py::TestEquivalence::test_tost
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/pingouin/bayesian.py:146: RuntimeWarning: invalid value encountered in double_scalars
bf10 = 1 / ((1 + t2 / df)(-(df + 1) / 2) / integr)

pingouin/tests/test_equivalence.py::TestEquivalence::test_tost
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/pingouin/bayesian.py:146: RuntimeWarning: divide by zero encountered in double_scalars
bf10 = 1 / ((1 + t2 / df)(-(df + 1) / 2) / integr)

pingouin/tests/test_equivalence.py::TestEquivalence::test_tost
/Users/swallan/Documents/SciPy/scipy_meson/pingouin/pingouin/parametric.py:263: RuntimeWarning: invalid value encountered in multiply
ci = np.array([tval - tcrit, tval + tcrit]) * se

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================= slowest 10 durations ==============================================================
10.04s call pingouin/tests/test_effsize.py::TestEffsize::test_compute_boot_esci
6.90s call pingouin/tests/test_regression.py::TestRegression::test_mediation_analysis
2.76s call pingouin/tests/test_plotting.py::TestPlotting::test_plot_paired
2.06s call pingouin/regression.py::pingouin.regression.mediation_analysis
1.67s call pingouin/tests/test_pairwise.py::TestPairwise::test_pairwise_ttests
1.64s call pingouin/tests/test_pairwise.py::TestPairwise::test_pairwise_corr
1.58s call pingouin/effsize.py::pingouin.effsize.compute_bootci
0.96s call pingouin/tests/test_plotting.py::TestPlotting::test_plot_rm_corr
0.69s call pingouin/plotting.py::pingouin.plotting.plot_rm_corr
0.57s call pingouin/tests/test_pandas.py::TestParametric::test_pandas
======================================================= 165 passed, 23 warnings in 39.72s =======================================================

I also updated the changelog.

pingouin/pairwise.py Outdated Show resolved Hide resolved
@raphaelvallat
Copy link
Owner

Merging now, thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 🚧 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants