-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: shims for array copy semantics #4482
Conversation
* Fixes MDAnalysis#4481 * this gets rid of segfaults/many test failurse against latest NumPy `main` while still passing the full suite against NumPy `1.26.4`; it is pretty hard to test since needs special branch of SciPy (for the same reason, upstream hasn't fully resolved copy semantics) * I still see 2-3 test failures against `main` locally, but likely still a large improvement; may want to let this sit a little bit while the upstream storm settles, but I suspect this is the gist of the shims we'd want
Linter Bot Results:Hi @tylerjereddy! Thanks for making this PR. We linted your code and found the following: There are currently no issues detected! 🎉 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4482 +/- ##
===========================================
- Coverage 93.38% 89.52% -3.86%
===========================================
Files 171 180 +9
Lines 21744 22333 +589
Branches 4014 3619 -395
===========================================
- Hits 20305 19993 -312
- Misses 952 1891 +939
+ Partials 487 449 -38 ☔ View full report in Codecov by Sentry. |
if np.__version__[0] == "2": | ||
copy = None | ||
else: | ||
copy = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more concise/idiomatic upstream approach was applied in the revised version of my patch in case we care about that: https://github.com/scipy/scipy/pull/20172/files#diff-d28fbb0be287769c763ce61b0695feb0a6ca0e67b28bafee20526b46be7aaa84R59
Basically, abstracting the check to avoid duplication and using np.lib.NumpyVersion
instead of pep440 or a raw string check like here. We probably don't need to worry about 2.0.0dev0
versions though, so we may care less about that, though the version check duplication I've added more broadly is indeed a bit much.
@@ -71,7 +71,7 @@ cdef class ArrayWrapper: | |||
self.data_type = data_type | |||
self.ndim = ndim | |||
|
|||
def __array__(self): | |||
def __array__(self, copy=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and elswhere we should probably also add dtype=None
before copy=None
, similar to the revisions added in my PR upstream at: scipy/scipy#20172
It doesn't seem to matter for testing purposes at the moment, but adding the dtype=None
is more formally correct now.
@hmacdope can you please have a look? Keeping on top of scipy/numpy changes is important. Thanks! |
I should update this soon to reduce code duplication and test it against the beta release if it comes out today or tomorrow. |
I think Sebastian also told me to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query about perhaps using a function to avoid some repetition.
@@ -970,8 +978,12 @@ def superimposition_matrix(v0, v1, scaling=False, usesvd=True): | |||
True | |||
|
|||
""" | |||
v0 = np.array(v0, dtype=np.float64, copy=False)[:3] | |||
v1 = np.array(v1, dtype=np.float64, copy=False)[:3] | |||
if np.__version__[0] == "2": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to wrap this in a helper routine to avoid repetition eg:
v0 = np.array(v0, dtype=np.float64, copy=no_copy_shim())[:3]
def no_copy_shim():
if np.__version__[0] == "2":
copy=None
else:
copy=False
...
Ill leave it up to you @tylerjereddy?
* abstract the array copy semantics (NumPy 1.x vs. 2.x) handling to a utility function to reduce code duplication * add a missing `dtype=None` to an `__array__` signature, for more formal correctness with NumPy 2.x
Hello @tylerjereddy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-04-07 20:50:51 UTC |
I pushed in an update to reduce code duplication and use a more formally correct I checked the status of the updated PR vs. latest
From a quick scan of the diff again, the only other place where I think I'm misbehaving a bit is in using a raw string version check instead of using |
@tylerjereddy is the plan to move ahead here before addressing ^ test failures? |
Possibly yeah, I didn't have immediate plans to tackle those, and may be quite different if they're real failures we need to worry about vs. some third-party issues (I didn't check). |
Note that NumPy |
Ecosystem support summary table is here: numpy/numpy#26191 I added us at the bottom below OpenCV. I need to circle back here soon to check against RC1, ping me if I don't. |
Thanks @tylerjereddy, much appreciated. |
* Update `vector_of_best_fit()` function in the helix analysis code to avoid use of `np.linalg.linalg` namespace, which is deprecated/private in NumPy `2.x`. Instead, just access `svd` via the public `np.linalg` namespace. * This fixes the last two remaining test failures for MDAnalysis against NumPy `2.0.0rc1`. * Switch the array `copy` shim to use the more appropriate `NumpyVersion` rather than a raw string check for version `2`.
Update here:
I believe I've addressed both my own comments and the reviewer comments so far, and we appear to be in good shape for NumPy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one blocking question from me, otherwise this looks good. Thanks @tylerjereddy 🙏
@@ -121,7 +121,7 @@ def vector_of_best_fit(coordinates): | |||
""" | |||
centered = coordinates - coordinates.mean(axis=0) | |||
Mt_M = np.matmul(centered.T, centered) | |||
u, s, vh = np.linalg.linalg.svd(Mt_M) | |||
u, s, vh = np.linalg.svd(Mt_M) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking - how backwards compatible is this? Do we need to try except an import or will this work with older versions of numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we're safe since I tested the runtime downgrade to our oldest supported version (1.23.2
) without issue.
If you don't believe me, you could use the version switcher in the NumPy docs and it is there in version 1.13
: https://numpy.org/doc/1.13/reference/generated/numpy.linalg.svd.html
That's pretty ancient now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sorry @tylerjereddy - I completely missed this from your previous comment message (it's the second time in a row over two PRs really sorry about this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I cycled CI - looks like all the failures are codecov related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @tylerjereddy, thanks for taking this on. I will cycle CI to see if we can get a few more greens.
I think codecov was removed from a number of upstream projects because of flakiness/security concerns that kept cropping up. MDA might want to consider doing the same (as much as I hate to see removal of a method of measuring test coverage). |
Yeah it's been a bit annoying lately - issue has been opened and temporary fix PRed. I'm going to go ahead and merge this, CI passed fine outside of report uploading. Thanks @tylerjereddy ! |
NumPy 2.0 is out now, and it looks like there is no new release of MDAnalysis with this PR included yet. Will the next release be 2.7.1 or 2.8.0? |
@rgommers the plan is to do a v2.8.0 release of MDAnalysis that is numpy 2.0 compatible somewhere towards the end of July. Due to limitations in available developers, it's the earliest we can do this. |
Fixes MAINT: test failures/segfaults against NumPy
main
#4481this gets rid of segfaults/many test failurse against latest NumPy
main
while still passing the full suite against NumPy1.26.4
; it is pretty hard to test since needs special branch of SciPy (for the same reason, upstream hasn't fully resolved copy semantics)I still see 2-3 test failures against
main
locally, but likely still a large improvement; may want to let this sit a little bit while the upstream storm settles, but I suspect this is the gist of the shims we'd wantOne thing I didn't try was building against NumPy
main
and testing against1.26.4
, just did each completely separately (that's a good thing to try though). Upstream pre-release wheels will probably be delayed reflecting the related issues for a bit.Getting late, so short explanation--
copy=False
got stricter,copy=None
is the "try your best not to copy, but you can if you need to" old NumPy way.📚 Documentation preview 📚: https://mdanalysis--4482.org.readthedocs.build/en/4482/