[MNT] Small NumPy 2 related fixes #5954

seberg · 2024-07-03T15:30:44Z

This applies some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing mostly fine. There is a single test remaining:

test_simpl_set.py::test_simplicial_set_embedding

is failing with:

(Pdb) cp.asarray(cu_embedding)
array([[23067.518, 23067.518],
       [17334.559, 17334.559],
       [22713.598, 22713.598],
       ...,
       [23238.438, 23238.438],
       [25416.912, 25416.912],
       [19748.943, 19748.943]], dtype=float32)

being completely different from the reference:

array([[5.330462 , 4.3419437],
       [4.1822557, 5.6225405],
       [5.200859 , 4.530094 ],
       ...,
       [4.852359 , 5.0026293],
       [5.361374 , 4.1475334],
       [4.0259256, 5.7187223]], dtype=float32)

And I am not sure why that might be, I will prod it a bit more, but it may need someone who knows the methods to have a look.

One wrinkle is that hdbscan is not yet released for NumPy 2, but I guess that still required even though sklearn has a version?
(Probably, not a big issue, but my fixups scikit-learn-contrib/hdbscan#644 run into some issue even though it doesn't seem NumPy 2 related.)

xref: rapidsai/build-planning#38

KyleFromNVIDIA

Approved pre-commit-config changes

dantegd · 2024-07-03T17:08:23Z

Thanks for the PR @seberg, I think someone like @divyegala can have a look at the remaining failure, but it'll probably need to wait until Monday.

seberg · 2024-07-11T15:56:04Z

Thanks to @divyegala for realizing the remaining error is related to umap. umap uses np.unique with return_inverse which changed (although I will probably revert that for 2.0.1).

Fixing that fixes the remaining failure (single gpu tests run locally). If umap doesn't do a release we might need to wait for NumPy 2.0.1 for 100% test passing (also hdbscan is fixed but not released yet).

divyegala · 2024-07-11T20:19:22Z

@seberg as far as I can tell from a quick search through the codebase, the reference UMAP package is only used for pyhon tests. Can we get away with patching umap and building it from source? This will let us push out numpy 2.0.0 compatible cuML with the rest of RAPIDS, although I do not know how quickly we want this and neither do I know when numpy 2.0.1 is releasing.

cc @dantegd @cjnolet if that's acceptable

seberg · 2024-07-11T20:23:40Z

The main thing right now is that CuPy still needs a release, I think. So I suspect it might be easier to sit out NumPy 2.0.1. (But yeah, I think monkey patching will become plausible once CuPy is there.)

python/cuml/tests/test_make_classification.py

jakirkham · 2024-07-16T21:11:22Z

python/cuml/internals/array.py

@@ -1172,12 +1172,16 @@ def from_input(
        if (
            not fail_on_order and order != arr.order and order != "K"
        ) or make_copy:


Since we are checking this within the if, we could drop this check

Suggested change

) or make_copy:

):

I suspect you are right and this can be simplified. But deleting it would not make a copy at all.
We could maybe delete the whole if (always creating a new arr) or assuming a copy is always made (I am not certain that is currently true for e.g. 1-D arrays).

We do not always do a copy and not always want to. If you see above in the function, the make_copy variable is inferred from other conditions like

make_copy = force_contiguous and not arr.is_contiguous

which would make the complex conditional worse if we wanted to pack everything in. So we do not want to delete this.

Ok maybe there's a better way to write the code below then: #5954 (comment)

Edit: Included a suggestion for clarity: #5954 (review)

jakirkham · 2024-07-17T18:52:44Z

python/cuml/internals/array.py

+            if make_copy:
+                data = arr.mem_type.xpy.array(
+                    arr.to_output("array"), order=order
+                )
+            else:
+                data = arr.mem_type.xpy.asarray(
+                    arr.to_output("array"), order=order
+                )


It seems odd to check this above and here. Is there a better way to write this so we only check make_copy once? Perhaps with an elif?

Probably, I was working on adding polars support (which needs to happen here), so I'm reviewing the whole conditionals... would you mind opening an issue to cleanup this so it doesn't block/slow down this PR?

For clarity tried to include some suggestions below. Though have no strong feelings on whether they are included

ref: #5954 (review)

This applys some smaller NumPy 2 related fixes. With (in progress) cupy 13.2 fixups, the single gpu test suite seems to be doing fine (not quite finished, I may push more commits, but can also open a new PR). The one thinig I noticed that is a bit anonying is that hdbscan is not yet released for NumPy 2, is that actually still required since I think sklearn has a version? (I don't expect this to be a problem for long, but there is at least one odd test failure trying to make hdbscan work in scikit-learn-contrib/hdbscan#644)

Even if NumPy reverts, this is not a problem.

I am not actually sure what changed here, but deepcopy seems sensible?

seberg · 2024-07-18T07:15:42Z

Rebased, since there were conflicts. I think the remaining failures were unrelated, but maybe/hopefully the rebase resolves them anyway.

.pre-commit-config.yaml

jakirkham

Including a proposal of how the array copying logic could be updated. Meant mostly to be illustrative. Defer to others on whether it is used

jakirkham · 2024-07-18T21:12:54Z

python/cuml/internals/array.py

+            if make_copy:
+                data = arr.mem_type.xpy.array(
+                    arr.to_output("array"), order=order
+                )
+            else:
+                data = arr.mem_type.xpy.asarray(
+                    arr.to_output("array"), order=order
+                )


For clarity tried to include some suggestions below. Though have no strong feelings on whether they are included

ref: #5954 (review)

jakirkham · 2024-07-18T21:16:36Z

python/cuml/cuml/internals/array.py

        if (
            not fail_on_order and order != arr.order and order != "K"
        ) or make_copy:


Suggested change

if (

not fail_on_order and order != arr.order and order != "K"

) or make_copy:

if not fail_on_order and order != arr.order and order != "K":

jakirkham · 2024-07-18T21:16:52Z

python/cuml/cuml/internals/array.py

+            if make_copy:
+                data = arr.mem_type.xpy.array(
+                    arr.to_output("array"), order=order
+                )
+            else:
+                data = arr.mem_type.xpy.asarray(
+                    arr.to_output("array"), order=order
+                )
+
+            arr = cls(data, index=index)


Suggested change

if make_copy:

data = arr.mem_type.xpy.array(

arr.to_output("array"), order=order

)

else:

data = arr.mem_type.xpy.asarray(

arr.to_output("array"), order=order

)

arr = cls(data, index=index)

arr = cls(

arr.mem_type.xpy.asarray(arr.to_output("array"), order=order),

index=index,

)

elif make_copy:

arr = cls(

arr.mem_type.xpy.array(arr.to_output("array"), order=order), index=index

)

dantegd · 2024-07-28T16:34:24Z

Merging due to closeness to code-freeze, @jakirkham capturing a task to improve and simplify the data processing code in #5995

dantegd · 2024-07-28T16:34:29Z

/merge

seberg requested a review from a team as a code owner July 3, 2024 15:30

github-actions bot added the Cython / Python Cython or Python issue label Jul 3, 2024

seberg requested a review from a team as a code owner July 3, 2024 16:50

seberg requested a review from KyleFromNVIDIA July 3, 2024 16:50

KyleFromNVIDIA approved these changes Jul 3, 2024

View reviewed changes

tfeher added bug Something isn't working non-breaking Non-breaking change labels Jul 4, 2024

seberg commented Jul 11, 2024

View reviewed changes

python/cuml/tests/test_make_classification.py Outdated Show resolved Hide resolved

jakirkham reviewed Jul 16, 2024

View reviewed changes

jakirkham reviewed Jul 17, 2024

View reviewed changes

seberg and others added 8 commits July 18, 2024 00:11

TST: asfarray is removed, it is the same as asarray here

3565fef

TST: Avoid behavior change in return_inverse of unique

449466c

Even if NumPy reverts, this is not a problem.

TST: Use deepcopy for copying the random state

bc7f7c1

I am not actually sure what changed here, but deepcopy seems sensible?

STY: Fixup copyright/pre-commit

3064e56

Ignore python/_thirdparty for style pre-commit check

e81b283

Simplify return_inverse fixup

f266d0b

Can't use copy=None on some older NumPy versions...

30c706c

seberg force-pushed the numpy2 branch from 72f456f to 30c706c Compare July 18, 2024 07:14

jakirkham reviewed Jul 18, 2024

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

Fix copyright exclusion on thirdparty directory

f399da2

jakirkham reviewed Jul 18, 2024

View reviewed changes

Merge branch 'branch-24.08' into numpy2

30cba15

dantegd mentioned this pull request Jul 28, 2024

[TASK] Simplification of input processing code #5995

Open

dantegd approved these changes Jul 28, 2024

View reviewed changes

rapids-bot bot merged commit 4338268 into rapidsai:branch-24.08 Jul 28, 2024
60 of 62 checks passed

seberg deleted the numpy2 branch July 29, 2024 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] Small NumPy 2 related fixes #5954

[MNT] Small NumPy 2 related fixes #5954

seberg commented Jul 3, 2024 •

edited by jakirkham

Loading

KyleFromNVIDIA left a comment

dantegd commented Jul 3, 2024

seberg commented Jul 11, 2024

divyegala commented Jul 11, 2024

seberg commented Jul 11, 2024

jakirkham Jul 16, 2024

seberg Jul 17, 2024

dantegd Jul 17, 2024

jakirkham Jul 17, 2024 •

edited

Loading

jakirkham Jul 17, 2024

dantegd Jul 17, 2024

jakirkham Jul 18, 2024 •

edited

Loading

seberg commented Jul 18, 2024

jakirkham left a comment

jakirkham Jul 18, 2024 •

edited

Loading

jakirkham Jul 18, 2024

jakirkham Jul 18, 2024

dantegd commented Jul 28, 2024

dantegd commented Jul 28, 2024

[MNT] Small NumPy 2 related fixes #5954

[MNT] Small NumPy 2 related fixes #5954

Conversation

seberg commented Jul 3, 2024 • edited by jakirkham Loading

KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

dantegd commented Jul 3, 2024

seberg commented Jul 11, 2024

divyegala commented Jul 11, 2024

seberg commented Jul 11, 2024

jakirkham Jul 16, 2024

Choose a reason for hiding this comment

seberg Jul 17, 2024

Choose a reason for hiding this comment

dantegd Jul 17, 2024

Choose a reason for hiding this comment

jakirkham Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

jakirkham Jul 17, 2024

Choose a reason for hiding this comment

dantegd Jul 17, 2024

Choose a reason for hiding this comment

jakirkham Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

seberg commented Jul 18, 2024

jakirkham left a comment

Choose a reason for hiding this comment

jakirkham Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

jakirkham Jul 18, 2024

Choose a reason for hiding this comment

jakirkham Jul 18, 2024

Choose a reason for hiding this comment

dantegd commented Jul 28, 2024

dantegd commented Jul 28, 2024

seberg commented Jul 3, 2024 •

edited by jakirkham

Loading

jakirkham Jul 17, 2024 •

edited

Loading

jakirkham Jul 18, 2024 •

edited

Loading

jakirkham Jul 18, 2024 •

edited

Loading