-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Louvain to use new graph primitives and pattern accelerators #1423
Update Louvain to use new graph primitives and pattern accelerators #1423
Conversation
1) Fix Dendrogram methods to be const consistent 2) Update experimental update_by_delta_modularity to use new graph primitives. Clean up unused local code
Did you merge with the latest cuGraph branch? I am seeing some updates I made in "Files Changed" |
Just double checked. I did merge them. I had to make a couple of additional changes from my debugging. There was a problem in I modified |
cpp/src/community/dendrogram.cuh
Outdated
@@ -26,12 +26,14 @@ template <typename vertex_t> | |||
class Dendrogram { | |||
public: | |||
void add_level(vertex_t num_verts, | |||
vertex_t first_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think add_level(vertex_t first_index, vertex_t num_(local_)verts, ...)
is more idiomatic than add_level(vertex_t num_verts, vertex_t first_index, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
cpp/src/community/dendrogram.cuh
Outdated
@@ -26,12 +26,14 @@ template <typename vertex_t> | |||
class Dendrogram { | |||
public: | |||
void add_level(vertex_t num_verts, | |||
vertex_t first_index, | |||
cudaStream_t stream = 0, | |||
rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource()) | |||
{ | |||
level_ptr_.push_back( | |||
std::make_unique<rmm::device_buffer>(num_verts * sizeof(vertex_t), stream, mr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for using rmm::device_buffer
instead of rmm::device_uvector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code I copied this from used device_buffer...
Can we type erase and pass a device_uvector back to python? This is currently C++ only, but I was intending that eventually the python interface would be able to fetch these values and store them in cudf columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Iroy30 may be able to confirm, but I guess this is possible.
https://github.com/rapidsai/cugraph/blob/branch-0.19/cpp/include/utilities/cython.hpp#L21
rmm::device_uvector is widely used in cython.hpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I can convert a device_uvector into a device_buffer when it's needed in python:
cugraph/cpp/include/utilities/cython.hpp
Line 200 in 7256f32
return std::make_pair(std::make_unique<rmm::device_buffer>(dv_.release()), sizeof(vertex_t)); |
I'll pursue this change.
#include <experimental/graph_functions.hpp> | ||
|
||
#include <patterns/copy_to_adj_matrix_row_col.cuh> | ||
#include <patterns/copy_v_transform_reduce_in_out_nbr.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blank line? Perhaps not. It prevents clang-format from reordering includes across blocks, I felt like the patterns should be after the graph and functions definitions... but I haven't verified that it's required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, #include <patterns/copy_v_transform_reduce_in_out_nbr.cuh>
AFAIK, this was used to compute weight_sums which has been replaced by a graph_view_t class member function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. You are correct. I have deleted that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke too soon. I actually still need it for a different purpose.
cugraph/cpp/src/experimental/louvain.cuh
Line 316 in 9e4421e
experimental::copy_v_transform_reduce_out_nbr( |
cpp/src/experimental/louvain.cuh
Outdated
vertex_weights_v_(graph_view.get_number_of_local_vertices(), handle.get_stream()), | ||
src_vertex_weights_cache_v_(0, handle.get_stream()), | ||
src_cluster_cache_v_(0, handle.get_stream()), | ||
dst_cluster_cache_v_(0, handle.get_stream()), | ||
stream_(handle.get_stream()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of having both handle_
and stream_
as stream
is just handle_.get_stream()
? This is like maintaining two copies of the same variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code was adapted from a previous version before we had handle. I can replace the stream_ references with handle_.get_stream()
cpp/src/experimental/louvain.cuh
Outdated
auto dst = cache_dst_vertex_properties(input, dst_cache_v); | ||
|
||
return std::make_tuple(src, dst); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this function anymore? This may cut one line in the caller site but just add an additional layer of indirection to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted this. In my original MNMG implementation this was called dozens of times.
cpp/src/experimental/louvain.cuh
Outdated
resolution * (a_new * k_k - a_old * k_k + k_k * k_k) / | ||
(total_edge_weight * total_edge_weight)); | ||
|
||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better delete temporary print code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Hadn't promoted this from draft yet, that was on my todo list.
cpp/src/experimental/louvain.cuh
Outdated
vertex_t new_cluster = thrust::get<0>(p); | ||
weight_t delta_modularity = thrust::get<1>(p); | ||
|
||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better delete temporary print code.
@@ -0,0 +1,189 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend renaming this to mg_louvain_test.cpp as the executable name is MG_LOUVAIN_TEST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// will be instantiated as the parameter to the tests defined below using | ||
// INSTANTIATE_TEST_CASE_P() | ||
// | ||
struct Louvain_Testparams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're mostly using AnalyticsName_Usecase_t, what's the purpose of introducing a new naming convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed Testparams to Usecase
You mean the outcome is dependent on vertex ordering? |
Yes. The algorithm breaks ties by choosing the vertex with the smaller vertex id making it deterministic with consistent vertex ordering. |
This sounds against graph invariants (https://en.wikipedia.org/wiki/Graph_property) we may try to preserve. Isn't there a better way to break a tie without violating graph invariants? (yeah... this sounds impossible; if there are two structurally equivalent communities to move in... no way to distinguish the two without relying on labeling...) What's the purpose of providing determinism? Just for testing or there is a bigger reason? |
Ease of testing. Request from cuML for it to be deterministic. Note that Louvain approximates optimizing modularity, exact solution is NP hard. The serial implementation of Louvain randomly orders the vertices before each pass. Consequently it randomly picks one of clusters when it encounters a tie. The consequence of this is that every time you run Louvain you get a different approximation of the answer. |
Let me think more but I am little worried about the current testing approach as this bypasses all the optimizations relying on renumbering; and leaves those optimizations untested. The current best approach I can think of is to 1) create a renumbered multi_gpu graph, 2) and using the renumber_map from here, renumber edge list to be used in creating a single_gpu graph, 3) and create a single_gpu graph with renumbering disabled. The results will be deterministic and SG & MG results will coincide, and we can compare optimization disabled path of SG implementation with optimization enabled path of MG implementation, so this will have a better coverage. |
Each renumbering pass in the SG and MG code would diverge. The results would be deterministic within a dendrogram level. But as soon as we coarsen the graph, the SG coarsen and the MG coarsen (with num GPUs > 1) will create different graphs and the results will diverge. I had thought a bit about the fact that we're not testing the renumbering phase. We can certainly create some tests to specifically test the graph coarsening logic so that we're sure that's tested independently of the Louvain testing. But I can't come up with a good way to test that. |
So, there are two issues.
In case of 2), neither the proposed approach or the current approach will allow directly comparing SG & MG results. 1-1) completely skipping renumbering will work and allow directly comparing SG & MG results. Are these correct? Assuming this, I think 1-2) has a better coverage than 1-1). |
1. Exposed flatten_dendrogram and a version of louvain that returns dendrogram in algorithms 2. Added methods to the legacy graph implementations to make it easier to template between old and new graph objects 3. Removed some class member data that can be derived 4. Added code to suppress gtest output from rank > 0
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #1423 +/- ##
===============================================
+ Coverage 58.24% 58.26% +0.02%
===============================================
Files 71 71
Lines 3281 3283 +2
===============================================
+ Hits 1911 1913 +2
Misses 1370 1370
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of great stuff, I posted some nonblocking comments on API and testing.
cpp/include/algorithms.hpp
Outdated
* | ||
* @tparam graph_t Type of graph | ||
* | ||
* @param[in] handle Library handle (RAFT). If a communicator is set in the handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If a communicator is set in the handle, ..." missing the end of the sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed extraneous text from copy/paste.
cpp/include/algorithms.hpp
Outdated
* @tparam graph_t Type of graph | ||
* | ||
* @param[in] handle Library handle (RAFT). If a communicator is set in the handle, | ||
* @param[in] graph input graph object (CSR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should start clarifying supported graphs (directed, distributed, self loops, multi edges)
weight_t resolution) | ||
std::pair<std::unique_ptr<Dendrogram<vertex_t>>, weight_t> louvain( | ||
raft::handle_t const &handle, | ||
GraphCSRView<vertex_t, edge_t, weight_t> const &graph_view, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the legacy one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The legacy version is still faster on SG than the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, are we using the legacy graph for this? or are we using the legacy version for SG and the new version for MG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ supports:
- Legacy graph for SG
- new graph for SG
- new graph for MG
The python is currently still using legacy graph for SG and new graph for MG. I'm doing some performance analysis now of the new graph for SG vice the legacy graph for SG. It might be necessary to get some optimizations in place in order for the new graph to be fast enough. My last run today (haven't pushed the latest mods yet, but will shortly) had web-Google.mtx
running in 14s on the legacy implementation and 56s on the new implementation. ljournal-2008.mtx
runs in 250s on legacy and crashes (OOM) on the new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, ljournal is not large at all and if you see OOM with ljournal, I guess something really shouldn't happen is happening in this case; something we should take a look (maybe not in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may dig deeper to both performance & memory footprint when I tune primitives for Louvain. The execution time and memory consumption sound very bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to process 1B-ish edge graph using one GPU. IIRC, ljournal is much smaller than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that should probably be outside of this PR. But that's why I've left the legacy implementation in place and the default for python. When you start tuning these we can evaluate when to delete the legacy implementation.
INSTANTIATE_TEST_CASE_P( | ||
simple_test, | ||
Louvain_MG_Testfixture, | ||
::testing::Values(Louvain_Usecase("test/datasets/karate.mtx", true, 100, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
karate won't capture all cases. Other MG algos started to use RMAT #1438
That raises the question of comparing the output but we could at least make sure it runs successfully and returns a valid result by looking at the modularity score and range of values in the dendrogram on a larger graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing the output should work for more than Karate. I didn't have enough time to verify other test cases and adding RMAT. I was hoping to expand later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, yeah this can go in the next PR along with python stuff etc.
@@ -14,10 +14,14 @@ | |||
* limitations under the License. | |||
*/ | |||
#pragma once | |||
|
|||
#include <dendrogram.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(question) for the python binding, are you planning on having a Louvain feature (maybe in cython.cu) that encapsulates the dendrogram aspects? Or will we add bindings for this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to add bindings for a louvain that returns a Dendrogram, eventually. I had to make the C++ change to support MNMG testing which needed the Dendrogram exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like returning a dendrogram too (probably optionally).
Did you take a look at SLHC btw? It should return a dendrogram (might be in RAFT already). Better be consistent as we consider adding SLHC to cuGraph also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corey and I talked about the dendrogram and potentially moving it to raft. We thought waiting until the Louvain work was done in cugraph would be prudent.
weight_t resolution) | ||
std::pair<std::unique_ptr<Dendrogram<vertex_t>>, weight_t> louvain( | ||
raft::handle_t const &handle, | ||
GraphCSRView<vertex_t, edge_t, weight_t> const &graph_view, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, are we using the legacy graph for this? or are we using the legacy version for SG and the new version for MG?
auto const col_comm_size = col_comm.get_size(); | ||
|
||
vertex_t number_of_vertices = static_cast<vertex_t>(vertices.size()); | ||
template <typename vertex_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need the changes in this file after the test updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, I will verify this. It does successfully allow supporting of generating a multi-gpu graph without renumbering - in case we find another need for that.
That can be useful for debugging algorithms (knowing the numbering a priori and that it remains consistent across different shape MG tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this file significantly in PR #1443, and this is not necessary for testing, we may better undo this change for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor doc question and a couple of comments, but nothing to hold up my approval now.
* @tparam graph_t Type of graph | ||
* | ||
* @param[in] handle Library handle (RAFT). If a communicator is set in the handle, | ||
* @param[in] graph input graph object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the actual param name is graph_view
, does that need to match (or should the param name be graph
since it's not a view obj)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more pervasive. They are all graph views. In fact all of our algorithms are using graph views and calling them graphs.
I'm testing a change that changes louvain
calls to consistently call it a graph_view. I believe we should do this with all algorithms (I'll try and make this change as I rework things to use the new graph primitives).
@@ -0,0 +1,53 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are mg_louvain_helper.[cu|hpp]
really specific to louvain? I'm just wondering if they're generally useful, should they be renamed and placed in the cugraphtestutil
lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are currently specific to louvain. I think they might be more generally useful. But I wanted to have at least one more unit test that used it before moving into a more general location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And FYI,
renumber_ext_vertices
https://github.com/rapidsai/cugraph/pull/1443/files#diff-4106792213a4762c10634a44ac8c92f194994ec0b595dba4eccaa822e8353978R215
unrenumber_local_int_vertices
https://github.com/rapidsai/cugraph/pull/1443/files#diff-4106792213a4762c10634a44ac8c92f194994ec0b595dba4eccaa822e8353978R245
and unrenumber_local_int_vertices
https://github.com/rapidsai/cugraph/pull/1443/files#diff-4106792213a4762c10634a44ac8c92f194994ec0b595dba4eccaa822e8353978R279
will be added once PR 1443 gets merged.
There will be some overlaps and some functions here may become obsolete.
size_t max_level; | ||
double resolution; | ||
|
||
// FIXME: We really should have a Graph_Testparms_Base class or something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good FIXME
, I agree.
@gpucibot merged |
@gpucibot merge |
Implement the
update_by_delta_modularity
method using the new graph primitives and pattern accelerators.This eliminates all of the custom MNMG implementation originally created for MNMG Louvain a few releases ago and replaces it with the new pattern accelerator and graph primitives that have been added in the last couple of releases.
This depends on the following PRs and should not be merged until after them:
closes #1220