Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Pandas and SciPy dependencies requires building #89

Open
sqr00t opened this issue Nov 7, 2024 · 3 comments
Open

[BUG] Pandas and SciPy dependencies requires building #89

sqr00t opened this issue Nov 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@sqr00t
Copy link

sqr00t commented Nov 7, 2024

Describe the bug

The version restrictions on Pandas and SciPy are deprecated or has dependency conflicts. Restriction of SciPy <= 1.9.2 resolved to SciPy==1.6.3 for my project, which does not have a cp310 wheel. This triggers complex building of wheels for SciPy for Python 3.10.14, and a few of the build dependencies cannot be installed due to admin restrictions.

Also note that Pandas <2 will face deprecation and PyPI hosted wheels for ARM builds are mostly unavailable already (again, triggering rebuilding).

Currently, the support window for SciPy 1.9.x and Pandas <2.x.x has passed.

Additional context

Machine Info:

  • M1 Pro 2021, macOS Sonoma 14.7
  • Python: 3.10.14

Edit:

  • Looks like maximum version of SciPy that can still be supported by glmnet_python is SciPy==1.11.4, due to reliance on scipy.empty attribute. This is an open issue for SciPy>=1.12.0
  • The implied changes by this issue results in 17 tests failing, all related to Pandas. Would be best to support Pandas>2.x.x

Failing tests

Failed tests for pandas==1.5.3, pandas<2.0.0, 3 tests

FAILED tests/test_sample.py::TestSample_metrics_methods::test_Sample_keep_only_some_rows_columns - AttributeError: module 'pandas.core.computation.ops' has no attribute 'UndefinedVariableError'
FAILED tests/test_sample.py::TestSample_NA_behavior::test_can_handle_various_NAs - AssertionError: Exception not raised
FAILED tests/test_util.py::TestUtil::test_fct_lump_by - AssertionError: Series are different

Failed tests for pandas==2.0.3, pandas<2.1.0, >2.0.0, 16 tests

FAILED tests/test_balancedf.py::TestBalanceOutcomesDF::test_BalanceOutcomesDF_df - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="o") are different
FAILED tests/test_balancedf.py::TestBalanceCovarsDF::test_BalanceCovarsDF_df - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
FAILED tests/test_balancedf.py::TestBalanceDF__str__::test_BalanceOutcomesDF___str__ - AssertionError: False is not true
FAILED tests/test_cbps.py::Testcbps::test_cbps_in_balance_vs_r - TypeError: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given
FAILED tests/test_cli.py::TestCli::test_cli_return_df_with_original_dtypes - AssertionError: {'id'[48 chars]ype('int64'), 'is_respondent': dtype('int64'),[23 chars]64')} != {'id'[48 char...
FAILED tests/test_cli.py::TestCli::test_cli_standardize_types - AssertionError: {'id'[48 chars]ype('int64'), 'is_respondent': dtype('int64'),[23 chars]64')} != {'id'[48 char...
FAILED tests/test_datasets.py::TestDatasets::test_load_data_cbps - TypeError: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given
FAILED tests/test_sample.py::TestSample::test_Sample_from_frame - AssertionError: Attributes of Series are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_covars - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_df - AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="a") are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_outcomes - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="o") are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_set_weights - AssertionError: Attributes of Series are different
FAILED tests/test_sample.py::TestSample_metrics_methods::test_Sample_keep_only_some_rows_columns - AttributeError: module 'pandas.core.computation.ops' has no attribute 'UndefinedVariableError'
FAILED tests/test_sample.py::TestSamplePrivateAPI::test__covar_columns - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
FAILED tests/test_sample.py::TestSample_NA_behavior::test_can_handle_various_NAs - AssertionError: Exception not raised
FAILED tests/test_util.py::TestUtil::test_fct_lump_by - AssertionError: Series are different

Failed tests for pandas==2.2.3 (latest as of November 2024), 17 tests

FAILED tests/test_balancedf.py::TestBalanceOutcomesDF::test_BalanceOutcomesDF_df - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="o") are different
FAILED tests/test_balancedf.py::TestBalanceCovarsDF::test_BalanceCovarsDF_df - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
FAILED tests/test_balancedf.py::TestBalanceDF__str__::test_BalanceOutcomesDF___str__ - AssertionError: False is not true
FAILED tests/test_cbps.py::Testcbps::test_cbps_in_balance_vs_r - TypeError: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given
FAILED tests/test_cli.py::TestCli::test_cli_return_df_with_original_dtypes - AssertionError: {'id'[48 chars]ype('int64'), 'is_respondent': dtype('int64'),[23 chars]64')} != {'id'[48 chars]ype('float64'), 'is_...
FAILED tests/test_cli.py::TestCli::test_cli_standardize_types - AssertionError: {'id'[48 chars]ype('int64'), 'is_respondent': dtype('int64'),[23 chars]64')} != {'id'[48 chars]ype('float64'), 'is_...
FAILED tests/test_datasets.py::TestDatasets::test_load_data_cbps - TypeError: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given
FAILED tests/test_sample.py::TestSample::test_Sample_from_frame - AssertionError: Attributes of Series are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_covars - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_df - AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="a") are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_outcomes - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="o") are different
FAILED tests/test_sample.py::TestSample_base_and_adjust_methods::test_Sample_set_weights - AssertionError: Attributes of Series are different
FAILED tests/test_sample.py::TestSample_metrics_methods::test_Sample_keep_only_some_rows_columns - AttributeError: module 'pandas.core.computation.ops' has no attribute 'UndefinedVariableError'
FAILED tests/test_sample.py::TestSamplePrivateAPI::test__covar_columns - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
FAILED tests/test_sample.py::TestSample_NA_behavior::test_can_handle_various_NAs - AssertionError: Exception not raised
FAILED tests/test_util.py::TestUtil::test_fct_lump_by - AssertionError: Series are different
FAILED tests/test_util.py::TestUtil::test_rm_mutual_nas - AttributeError: module 'pandas.core.arrays.numpy_' has no attribute 'PandasArray'
@sqr00t sqr00t added the bug Something isn't working label Nov 7, 2024
@talgalili
Copy link
Contributor

Hi @sqr00t
Thank you very much for reporting this.
We are currently blocked from releasing the version dependency structure as long as we depend on python_glmnet.
We plan to move to sklearn for the backend by January 2025 (see: #30).
Once that's done, I believe the issues you mentioned would be resolved.
Also, it would allow users to install balance on Windows (#26), and for the copyrights to move from GPL to MIT (#16).
So many good things are expected to land in 2025.

:)

@sqr00t
Copy link
Author

sqr00t commented Nov 7, 2024

Thanks for the quick response @talgalili . This sounds great!

Is there a dev roadmap that you'd be free to link me to? Keen to contribute/ test.

@talgalili
Copy link
Contributor

Thanks for the offer @sqr00t !
AFAIK, the only change that is planned is what I wrote:
"We plan to move to sklearn for the backend by January 2025"

Other than that, no major changes are planned. If you have PR you want to propose, please do so and I'll gladly give them a look.

Also, if you'd be willing to share how/what you use balance for, I'd be happy to know about it.

Cheers,
T

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants