Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define API for MG random walk #2407

Merged

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented Jul 13, 2022

This PR defines the API for MG random walk in the C API and the C++ API.

C and C++ tests are defined, although some of the code is ifdef'ed out since there is not a working implementation here.

@ChuckHastings ChuckHastings added 2 - In Progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 13, 2022
@ChuckHastings ChuckHastings added this to the 22.08 milestone Jul 13, 2022
@ChuckHastings ChuckHastings self-assigned this Jul 13, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 13, 2022

Codecov Report

Merging #2407 (6fb67a1) into branch-22.08 (2aad5f2) will increase coverage by 0.27%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.08    #2407      +/-   ##
================================================
+ Coverage         60.11%   60.39%   +0.27%     
================================================
  Files               102      102              
  Lines              5155     5244      +89     
================================================
+ Hits               3099     3167      +68     
- Misses             2056     2077      +21     
Impacted Files Coverage Δ
python/cugraph/cugraph/gnn/graph_store.py 76.66% <0.00%> (-3.34%) ⬇️
python/cugraph/cugraph/structure/property_graph.py 96.41% <0.00%> (-0.02%) ⬇️
python/cugraph/cugraph/__init__.py 100.00% <0.00%> (ø)
python/cugraph/cugraph/dask/comms/comms.py 34.06% <0.00%> (ø)
python/cugraph/cugraph/dask/common/mg_utils.py 30.43% <0.00%> (ø)
python/cugraph/cugraph/dask/common/input_utils.py 22.13% <0.00%> (ø)
...pylibcugraph/pylibcugraph/experimental/__init__.py 100.00% <0.00%> (ø)
...ugraph/cugraph/dask/structure/mg_property_graph.py 18.56% <0.00%> (+0.07%) ⬆️
...ython/cugraph/cugraph/community/ktruss_subgraph.py 88.23% <0.00%> (+2.94%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aad5f2...6fb67a1. Read the comment docs.

* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_span Device span defining the starting vertices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think span in the variable name is redundant (as the type is device_span). start_vertices might be more informative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, start_vertices would be more clear to developers higher up the stack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had talked about that and I forgot. Updated in next push.

* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_span Device span defining the starting vertices
* @param max_depth maximum length of random walk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better be max_length? (depth is more relevant to sampling a tree but I think length makes more sense for random walks).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in next push.

cpp/include/cugraph/algorithms.hpp Outdated Show resolved Hide resolved
* vertices in the random walk. If a path terminates before max_depth,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t * type)<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an unsigned vertex_t * type => unsigned vertex_t

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_span,
size_t max_depth,
uint64_t seed = 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this function work for unweighted graphs? Still return rmm::device_uvector<weight_t>?

Copy link
Collaborator Author

@ChuckHastings ChuckHastings Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that we've explored support for an unweighted graph very much, especially as it relates to return values.

I could wrap the return in std::optional and return std::nullopt if the graph is unweighted. biased_* would fail on an unweighted graph. node2vec_* could assume a weight of 1 on an unweighted graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, and I think it is better to specify how we handle unweighted graphs in the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in next push (check all 3 APIs)

raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_span,
size_t max_depth,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_length?

cpp/src/sampling/random_walks_mg.cu Show resolved Hide resolved
cpp/src/sampling/random_walks_sg.cu Show resolved Hide resolved
###################################################################################################
# - RANDOM_WALKS tests ----------------------------------------------------------------------------
ConfigureTest(RANDOM_WALKS_TEST sampling/random_walks_test.cu)
ConfigureTest(RANDOM_WALKS_TEST sampling/sg_random_walks_test.cu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name SG tests with sg_? Our convention has been omitting sg_ for SG tests. If we decide to use sg_ for SG tests, we need to apply this for all SG tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also temporary. There is an existing random walks test for the original implementation (I hesitate to use the word legacy, since we usually refer to implementations that use the legacy graph objects). I can't delete the original test until I have a replacement working.

I could rename the existing as legacy if you think that would be cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, no complaint if temporary, just don't forget to fix this later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a FIXME to remind me

@@ -631,6 +639,7 @@ if(BUILD_CUGRAPH_MG_TESTS)
ConfigureCTestMG(MG_CAPI_EIGENVECTOR_CENTRALITY c_api/mg_eigenvector_centrality_test.c c_api/mg_test_utils.cpp)
ConfigureCTestMG(MG_CAPI_HITS c_api/mg_hits_test.c c_api/mg_test_utils.cpp)
ConfigureCTestMG(MG_CAPI_UNIFORM_NEIGHBOR_SAMPLE c_api/mg_uniform_neighbor_sample_test.c c_api/mg_test_utils.cpp)
ConfigureCTestMG(MG_CAPI_RANDOM_WALKS c_api/mg_random_walks_test.c c_api/mg_test_utils.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better align indentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

@alexbarghi-nv
Copy link
Member

Looks good, just the one change to use "start_vertices" or "source_vertices" to be consistent with the rest of the C API.

@ChuckHastings
Copy link
Collaborator Author

rerun tests

* set to weight_t{0}.
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not accept unweighted graphs, and should the returned weight vector here be std::optional? Can this ever be std::nullopt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping the signature the same for consistency. But you are correct, this function would never return std::null opt.

cugraph_error_code_t cugraph_uniform_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* sources,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sources == start_vertices in the code above? If yes, better use the same variable name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the functions below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed an update for this. Also updated in implementation and test files.

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be renamed to random_walks_test.cu in the future. Just a reminder.

std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
uniform_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_span,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_span here should be start_vertices, better search for start_span and replace them.

std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
uniform_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_span,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_span here as well.

Copy link
Contributor

@jnke2016 jnke2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ChuckHastings
Copy link
Collaborator Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5bf07fb into rapidsai:branch-22.08 Jul 22, 2022
@ChuckHastings ChuckHastings deleted the mg_random_walk_definition branch August 4, 2022 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants