-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pandas and numpy >= 2 compatibility #287
Conversation
mkopec87
commented
Dec 6, 2024
•
edited
Loading
edited
- align requirements.txt with pyproject.toml
- remove calls to np.string_ not existing in numpy >= 2.0.0
- remove calls to pd._testing.makeMixedDataFrame not existing in new pandas versions
- fix install and test commands in documentation for developers
- replace np.mean with column-wise version
- drop pandas dependency constraint <2
- require Python 3.9 in pyproject.toml
- add PySpark 3.5.3 to test pipeline matrix
- update test pipeline matrix: exclude Python 3.8, include Python 3.12
c32e4ad
to
b5f7f11
Compare
Still 5 failing tests :(
BTW. do we need 'requirements.txt' for anything? |
6e0e1e6
to
cfc85d1
Compare
@@ -233,7 +238,7 @@ def __init__( | |||
:param kwargs: (dict, optional): residual kwargs passed on to mean and std functions | |||
""" | |||
super().__init__( | |||
np.mean, | |||
ReferencePullCalculator.mean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functools.partial
could be used here instead of staticmethod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
4f1b459
to
0c424a1
Compare
Managed to fix 3 tests, two more to go:
Something seems wrong with 'external' reference type. |
Hi @mkopec87, thanks for following up. Happy to have a look at these tests then - I'll give it a go. |
Hi @mkopec87, |
@mkopec87 With these fixes below the two tests work for me. In popmon/analysis/profiling/pull_calculator.py, in class ReferencePullCalculator, lines 236-237, set instead:
That should fix the crash, and return the same values as before. |
0c424a1
to
7762ae8
Compare
Thanks @mbaak, I added changes you've suggested :) |
3df42f5
to
7fc8ffb
Compare
- align requirements.txt with pyproject.toml - remove calls to np.string_ not existing in numpy >= 2.0.0 - remove calls to pd._testing.makeMixedDataFrame not existing in new pandas versions - fix install and test commands in documentation for developers - replace np.mean with column-wise version - drop pandas dependency constraint <2 - require histogrammar>=1.0.34 - require Python 3.9 in pyproject.toml - add PySpark 3.5.3 to test pipeline matrix - update test pipeline matrix: exclude Python 3.8, include Python 3.12 - add test notebook output to .gitignore - switch to importlib from pkg_resources - install project dependencies after pyspark in spark build tests - run mean and std calculations only on numeric columns - add dependency versions constraints to Spark tests - update version of actions/upload-artifact task in build pipeline
7fc8ffb
to
150710a
Compare
I've set up a build in my forked repo, seems it's passing now :) |
Let's give @sbrugman a day or two still for a quick, final review. |
@sbrugman bump ;) |
Hi guys, any chance you'd find some time for a final review? :) I'm eager to start using this new feature as soon as possible :) |
@mkopec87 merged. Thanks! |