[BUG] Reindex Start Vertices and Batch Ids Prior to Sampling Call #3393

alexbarghi-nv · 2023-03-29T22:25:10Z

This PR fixes a bug where output sample batch ids do not match those expected when using the bulk sampler, causing subgraphs that are larger than expected and incorrect. Without reindexing, the wrong batch ids are assigned to the start vertices. Reindexing ensures that the same order is preserved for batch ids and start vertices.

This PR also changes the empty dataframe passed to dask in uniform_neighbor_sample to match the correct ordering of batch_id and hop_id. This ensures that the columns are named correctly and are not inadvertently renamed due to them being created in a different order.

This PR is non-breaking because it restores the original behavior of bulk sampling and reverses a bug that was inadvertently introduced with the dask updates.

Resolves #3390

…exbarghi-nv/cugraph into cugraph-gnn-fix-sample-index

VibhuJawa

PR looks good. Thanks for debugging but we should add a test to catch it please.

@rlratzel

This PR adds a working Multi-GPU Graph (on 2 dask workers) being trained/loaded on multiple pytorch trainers. (3) Todo: - [x] Verify works on multiple trainers and multiple dask workers - [x] Show scaling as you increase training GPUs At 1 second we become bottlenecked by sampling dask cluster, but we see perf improvement by going from `1 GPU`->`2GPU`. **On OBGN-Products** ```md | Number of Training GPUs | Time per epoch | |-------------------------|----------------| | 1 | 2.3 s | | 2 | 0.582 s | | 4 | 0.792 s | ``` This PR depends upon: #3393 CC: @rlratzel , @alexbarghi-nv , @BradReesWork Authors: - Vibhu Jawa (https://github.com/VibhuJawa) - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) URL: #3212

alexbarghi-nv · 2023-04-03T16:43:39Z

/merge

bug fix for bulk sampler

9edf6f7

alexbarghi-nv self-assigned this Mar 29, 2023

alexbarghi-nv added bug Something isn't working non-breaking Non-breaking change labels Mar 29, 2023

alexbarghi-nv added this to the 23.04 milestone Mar 29, 2023

alexbarghi-nv and others added 3 commits March 29, 2023 15:26

Merge branch 'branch-23.04' into cugraph-gnn-fix-sample-index

d145a92

change fix for index issue, add fix for dask df

b78feeb

:Merge branch 'cugraph-gnn-fix-sample-index' of https://github.com/al…

1ee589f

…exbarghi-nv/cugraph into cugraph-gnn-fix-sample-index

alexbarghi-nv marked this pull request as ready for review March 29, 2023 23:21

alexbarghi-nv requested a review from a team as a code owner March 29, 2023 23:21

style

e78e15a

alexbarghi-nv requested review from rlratzel, jnke2016 and VibhuJawa March 29, 2023 23:41

VibhuJawa suggested changes Mar 30, 2023

View reviewed changes

VibhuJawa mentioned this pull request Mar 30, 2023

[REVIEW]Multi-trainers cugraph-DGL examples #3212

Merged

2 tasks

update tests

95c2749

rlratzel approved these changes Mar 30, 2023

View reviewed changes

fix style

264f3f9

VibhuJawa approved these changes Mar 30, 2023

View reviewed changes

jnke2016 approved these changes Mar 31, 2023

View reviewed changes

Merge branch 'branch-23.04' into cugraph-gnn-fix-sample-index

bdb931f

Merge branch 'branch-23.04' into cugraph-gnn-fix-sample-index

72819c8

rapids-bot bot merged commit 1281bb8 into rapidsai:branch-23.04 Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Reindex Start Vertices and Batch Ids Prior to Sampling Call #3393

[BUG] Reindex Start Vertices and Batch Ids Prior to Sampling Call #3393

alexbarghi-nv commented Mar 29, 2023 •

edited

Loading

VibhuJawa left a comment

alexbarghi-nv commented Apr 3, 2023

[BUG] Reindex Start Vertices and Batch Ids Prior to Sampling Call #3393

[BUG] Reindex Start Vertices and Batch Ids Prior to Sampling Call #3393

Conversation

alexbarghi-nv commented Mar 29, 2023 • edited Loading

VibhuJawa left a comment

Choose a reason for hiding this comment

alexbarghi-nv commented Apr 3, 2023

alexbarghi-nv commented Mar 29, 2023 •

edited

Loading