[BUG] Critical: Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample #3416

alexbarghi-nv · 2023-04-04T18:38:18Z

Currently, cudf does not merge series properly when they already share an index. I'm not sure if this is a bug in cudf, or intentional behavior. This issue does not occur with dask_cudf. The resolution is to use cudf.concat when passing a cudf.Series for start vertices and batch ids, and df.to_frame().merge when passing in a dask_cudf.Series for start vertices and batch ids.

This PR also adds an additional test which tests both cudf and dask_cudf inputs to catch these sort of problems in the future.

rlratzel

Thanks for adding a test. I have one minor change request related to the docstring, otherwise it LGTM.

python/cugraph/cugraph/dask/sampling/uniform_neighbor_sample.py

…v/cugraph into sampling-fix-concat

VibhuJawa · 2023-04-04T19:42:12Z

python/cugraph/cugraph/dask/sampling/uniform_neighbor_sample.py

+            start_list = start_list.to_frame()
+            batch_id_list = batch_id_list.to_frame()
+            ddf = start_list.merge(
+                batch_id_list,
+                how="left",
+                left_index=True,
+                right_index=True,
+            )
+        else:
+            # sg input
+            ddf = cudf.concat(
+                [
+                    start_list,
+                    batch_id_list,
+                ],
+                axis=1,
+            )
    else:
-        ddf = start_list
+        ddf = start_list.to_frame()


Do we really care about the index here ? I think not . Does below work ?

start_list = start_list.reset_index(drop=True) batch_id_list = batch_id_list.reset_index(drop=True) if isinstance(start_list, dask_cudf.Series): ddf = dd.concat([start_list, batch_id_list], ignore_unknown_divisions=True, axis=1) else: ddf = cudf.concat([start_list, batch_id_list], axis =1, ignore_index=True)

If we reset index can we join batch id and start list correctly?

And also, I ran into an issue with dask_cudf.concat where the name of the series was dropped in one of my first attempts at a solution. dask_cudf.merge doesn't have that problem.

I think we should be able to, from the logic you shared , we are merging on index ( left_index=True, right_index=True) in dask which is the same thing but more inefficient.

Edit: Also added ingore_index=True to make it more concrete in cuDF.

ok, let me try this

@VibhuJawa I just confirmed this is not an issue with dask-cudf, it's an issue with our get_distributed_data function. I will make an issue for cugraph instead.

I'm not sure why calling merge instead of concat before get_distributed_data works, but for some reason the bug completely disappears with merge.

I can take a look too

Thanks for creating an issue .

I should link it here, sorry: #3420

VibhuJawa

LGTM

alexbarghi-nv · 2023-04-05T14:18:41Z

/merge

alexbarghi-nv added 3 commits April 4, 2023 17:21

testing

a4a6aff

switch to merge

558f7aa

x

5353a82

alexbarghi-nv changed the title ~~Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample~~ [BUG] Critical: Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample Apr 4, 2023

alexbarghi-nv self-assigned this Apr 4, 2023

alexbarghi-nv added bug Something isn't working non-breaking Non-breaking change labels Apr 4, 2023

alexbarghi-nv added 4 commits April 4, 2023 18:40

fix typos

79b8148

style

25bf763

revert array split

c960b1c

style

3aef064

alexbarghi-nv marked this pull request as ready for review April 4, 2023 18:58

alexbarghi-nv requested a review from a team as a code owner April 4, 2023 18:58

alexbarghi-nv requested review from VibhuJawa, jnke2016 and rlratzel April 4, 2023 18:58

remove print statements

25723df

BradReesWork added this to the 23.04 milestone Apr 4, 2023

rlratzel requested changes Apr 4, 2023

View reviewed changes

python/cugraph/cugraph/dask/sampling/uniform_neighbor_sample.py Show resolved Hide resolved

alexbarghi-nv added 2 commits April 4, 2023 19:34

update docstrings

3961c3c

Merge branch 'sampling-fix-concat' of https://github.com/alexbarghi-n…

07d8eac

…v/cugraph into sampling-fix-concat

rlratzel approved these changes Apr 4, 2023

View reviewed changes

alexbarghi-nv added 2 commits April 4, 2023 19:41

fix style

8c91003

reformat

fa87949

VibhuJawa suggested changes Apr 4, 2023

View reviewed changes

VibhuJawa approved these changes Apr 4, 2023

View reviewed changes

jnke2016 approved these changes Apr 5, 2023

View reviewed changes

rapids-bot bot merged commit e76406d into rapidsai:branch-23.04 Apr 5, 2023

alexbarghi-nv deleted the sampling-fix-concat branch April 5, 2023 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Critical: Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample #3416

[BUG] Critical: Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample #3416

alexbarghi-nv commented Apr 4, 2023

rlratzel left a comment

VibhuJawa Apr 4, 2023 •

edited

Loading

alexbarghi-nv Apr 4, 2023

alexbarghi-nv Apr 4, 2023

VibhuJawa Apr 4, 2023 •

edited

Loading

alexbarghi-nv Apr 4, 2023

alexbarghi-nv Apr 5, 2023

alexbarghi-nv Apr 5, 2023

jnke2016 Apr 5, 2023

VibhuJawa Apr 5, 2023

alexbarghi-nv Apr 5, 2023

VibhuJawa left a comment

alexbarghi-nv commented Apr 5, 2023

[BUG] Critical: Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample #3416

[BUG] Critical: Force cudf.concat when passing in a cudf Series to MG Uniform Neighbor Sample #3416

Conversation

alexbarghi-nv commented Apr 4, 2023

rlratzel left a comment

Choose a reason for hiding this comment

VibhuJawa Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VibhuJawa Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

alexbarghi-nv commented Apr 5, 2023

VibhuJawa Apr 4, 2023 •

edited

Loading

VibhuJawa Apr 4, 2023 •

edited

Loading