Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement eigenvector centrality #2287

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented May 19, 2022

This PR implements Eigenvector Centrality in C++ using the graph primitives. It also provides the C API implementation.

There are unit tests for C++ and C both SG and MG.

Partially addresses #2146

@ChuckHastings ChuckHastings requested review from a team as code owners May 19, 2022 03:05
@ChuckHastings ChuckHastings self-assigned this May 19, 2022
@ChuckHastings ChuckHastings added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 19, 2022
@ChuckHastings ChuckHastings added this to the 22.06 milestone May 19, 2022
@codecov-commenter
Copy link

codecov-commenter commented May 19, 2022

Codecov Report

Merging #2287 (081134f) into branch-22.06 (d9ec8f7) will decrease coverage by 0.13%.
The diff coverage is 80.00%.

❗ Current head 081134f differs from pull request most recent head f86f15e. Consider uploading reports for the commit f86f15e to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06    #2287      +/-   ##
================================================
- Coverage         63.82%   63.69%   -0.14%     
================================================
  Files               100      100              
  Lines              4484     4481       -3     
================================================
- Hits               2862     2854       -8     
- Misses             1622     1627       +5     
Impacted Files Coverage Δ
python/cugraph/cugraph/sampling/node2vec.py 81.81% <33.33%> (ø)
python/cugraph/cugraph/gnn/graph_store.py 80.00% <100.00%> (-2.61%) ⬇️
python/cugraph/cugraph/utilities/utils.py 73.79% <100.00%> (+0.86%) ⬆️
...n/pylibcugraph/pylibcugraph/utilities/api_tools.py 88.05% <0.00%> (-7.47%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9ec8f7...f86f15e. Read the comment docs.

void eigenvector_centrality(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, true, multi_gpu> const& graph_view,
raft::device_span<weight_t> centralities,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the sake of discussion,

So, what do you think about passing raft::device_span<weight_t> centralities as an input argument vs returning rmm::device_uvector<weight_t> holding centrality values?

The former might be more natural when we're passing initial values and we may be able to reduce memory allocations (when we are running PageRank with different personalization vectors, but with the rmm pool allocator, memory allocation overhead might be insignificant) while the latter might be more functional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the idea of using the span from looking at your new triangle_count implementation. The [in/out] of centralities is more consistent with what we have been doing. Our paradigm thus far has been to specify the output storage a priori if we can know it, and to allocate it dynamically if we can't know it.

What you are suggesting would be a paradigm shift for the API. I'm not opposed to changing the paradigm.

It seems to me the current paradigm has the following advantages:

  • Less memory allocation. The new strategy would require temporarily having an extra vector of length V.
  • The caller can use any memory allocator that they choose to allocate the device memory

The new paradigm would have the following advantages:

  • More functional in nature
  • More consistency (all algorithms would return results the same way, whether the size is predictable or not)

In the grand scheme of memory things, I'm not all that concerned over allocating an extra result array temporarily. It seems to me that the functional feel of the proposed paradigm is useful and consistency in how algorithms behave across the interface is always better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I can certainly change raft::device_span<weight_t> centralities to std::optional< raft::device_span<weight_t>> centralities to support an optional input, and make the return value rmm::device_uvector<weight_t>

* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph view object.
* @param centralities Device span where we should store the eigenvector centralities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass initial values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that support. Missed that.

#include <rmm/exec_policy.hpp>

#include <thrust/fill.h>
#include <thrust/for_each.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, copy/paste. I'll check all the headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to delete this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

thrust::fill(handle.get_thrust_policy(),
centralities.begin(),
centralities.end(),
weight_t{1.0} / static_cast<weight_t>(num_vertices));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkX supports passing initial values (https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.eigenvector_centrality.html). Shouldn't we support the same (we support initial values for PageRank).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, missed that.

@ChuckHastings
Copy link
Collaborator Author

Pushed an update to address @seunghwak comments

@ChuckHastings
Copy link
Collaborator Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2e23132 into rapidsai:branch-22.06 May 20, 2022
@ChuckHastings ChuckHastings deleted the fea_implement_eigenvector_centrality branch August 4, 2022 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants