-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds fail_on_nonconvergence
option to pagerank
to provide pagerank results even on non-convergence
#3639
Adds fail_on_nonconvergence
option to pagerank
to provide pagerank results even on non-convergence
#3639
Conversation
…agerank call to not converge yet still return a result with an additional flag indicating if the results converged or not.
error_on_nonconvergence
option to pagerank
to provide pagerank results even on non-convergencefail_on_nonconvergence
option to pagerank
to provide pagerank results even on non-convergence
…edToConvergeError exception type, adds tests for MG pagerank and personalization options.
…hub.com:rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
…_algorithms.pxd, adds exceptions module to PLC, remaining updates to PLC and cugraph code for initial passing tests.
…converged bool separately.
…8-python_pagerank_convergence_option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving from the Python/Dask cugraph layer.
…hub.com:rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one additional complaint.
cpp/include/cugraph/algorithms.hpp
Outdated
raft::handle_t const& handle, | ||
graph_view_t<vertex_t, edge_t, true, multi_gpu> const& graph_view, | ||
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view, | ||
std::optional<weight_t const*> precomputed_vertex_out_weight_sums, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this better be std::optional<device_span<>>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #3659 to address this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind. Addressed in this PR since we had to make other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For user-facing API, I wonder whether fail_on_nonconvergence
is the clearest and most convenient:
pagerank(..., max_iter=3, fail_on_nonconvergence=False)
I think I would prefer a more direct, affirmative argument, such as:
pagerank(..., num_iter=3)
result_tuples = [ | ||
client.submit(convert_to_return_tuple, cp_arrays) for cp_arrays in result | ||
] | ||
|
||
wait(cudf_result) | ||
# Convert the futures to dask delayed objects so the tuples can be | ||
# split. nout=2 is passed since each tuple/iterable is a fixed length of 2. | ||
result_tuples = [dask.delayed(r, nout=2) for r in result_tuples] | ||
|
||
# Create the ddf and get the converged bool from the delayed objs. Use a | ||
# meta DataFrame to pass the expected dtypes for the DataFrame to prevent | ||
# another compute to determine them automatically. | ||
meta = cudf.DataFrame(columns=["vertex", "pagerank"]) | ||
meta = meta.astype({"pagerank": "float64", "vertex": vertex_dtype}) | ||
ddf = dask_cudf.from_delayed([t[0] for t in result_tuples], meta=meta).persist() | ||
converged = all(dask.compute(*[t[1] for t in result_tuples])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative implementation to this could be something like:
import operator as op
...
result_tuples = client.map(convert_to_return_tuple, cp_arrays)
meta = cudf.DataFrame(columns=["vertex", "pagerank"])
meta = meta.astype({"pagerank": "float64", "vertex": vertex_dtype})
ddf = dask_cudf.from_delayed(client.map(op.itemgetter(0), result_tuples), meta=meta).persist()
converged = client.submit(all, client.map(op.itemgetter(1), result_tuples)).result()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh Nice, Did not know we could do op.itemgetter
like this. Very cool to learn. Thanks
… exceptions using proper exception chaining.
…8-python_pagerank_convergence_option
…ps://github.com/rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
…hub.com:rlratzel/cugraph into branch-23.08-python_pagerank_convergence_option
/merge |
closes #3613
Prior to this PR,
pagerank
will raise aRuntimeError
if it fails to converge, often because themax_iter
param is set too small (intentionally or otherwise). This PR adds the optional paramterfail_on_nonconvergence
which defaults toTrue
(ie. the current behavior to ensure backwards-compatibility) that allows a caller to runpagerank
and get results even if it did not converge. Whenfail_on_nonconvergence
isFalse
,pagerank
will return a tuple containing the pagerank results and a bool indicating if the results converged or not).