Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix logic for shuffling results #3619

Merged

Conversation

ChuckHastings
Copy link
Collaborator

@alexbarghi-nv found a bug in sampling during python testing. The process of shuffling sampled edges for output formatting was corrupting the label associated with each sampled edge. This PR fixes this bug.

@ChuckHastings ChuckHastings self-assigned this May 26, 2023
@ChuckHastings ChuckHastings added bug Something isn't working non-breaking Non-breaking change labels May 26, 2023
@ChuckHastings ChuckHastings added this to the 23.06 milestone May 26, 2023
@ChuckHastings ChuckHastings marked this pull request as ready for review May 26, 2023 05:20
@ChuckHastings ChuckHastings requested a review from a team as a code owner May 26, 2023 05:20
Copy link
Contributor

@naimnv naimnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Comment on lines +847 to +868
auto d_tx_value_counts = cugraph::groupby_and_count(
labels->begin(),
labels->end(),
thrust::make_zip_iterator(majors.begin(),
minors.begin(),
weights->begin(),
edge_ids->begin(),
edge_types->begin(),
hops->begin()),
shuffle_to_output_comm_rank_t<label_t>{std::get<0>(*label_to_output_comm_rank),
std::get<1>(*label_to_output_comm_rank)},
comm_size,
mem_frugal_threshold,
handle.get_stream());

std::vector<size_t> h_tx_value_counts(d_tx_value_counts.size());
raft::update_host(h_tx_value_counts.data(),
d_tx_value_counts.data(),
d_tx_value_counts.size(),
handle.get_stream());

handle.sync_stream();
Copy link
Contributor

@naimnv naimnv May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call cugraph::groupby_and_count only once if we use an index array and then use thrust::gather or thrust::permutation_iterator to rearrange majors minors weights edge_ids edge_types hops before exponential if/else block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or isn't it possible to make the body of each if-else cases as a template function. If I am not mistaken, the only differences between the cases is the iterators we are zipping using thrust::make_zip_iterator(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only differences between the cases is the iterators we are zipping using thrust::make_zip_iterator(...) - that's correct.
It can be merged as it and can be changed at later point as Chuck might be away.

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may consider refactoring this code to reduce code duplication in the next release.

@BradReesWork
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 483d36b into rapidsai:branch-23.06 May 26, 2023
@ChuckHastings ChuckHastings deleted the fix_sampling_batch_hop_bug branch September 27, 2023 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants