Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds fail_on_nonconvergence option to pagerank to provide pagerank results even on non-convergence #3639

Conversation

rlratzel
Copy link
Contributor

@rlratzel rlratzel commented Jun 6, 2023

closes #3613

Prior to this PR, pagerank will raise a RuntimeError if it fails to converge, often because the max_iter param is set too small (intentionally or otherwise). This PR adds the optional paramter fail_on_nonconvergence which defaults to True (ie. the current behavior to ensure backwards-compatibility) that allows a caller to run pagerank and get results even if it did not converge. When fail_on_nonconvergence is False, pagerank will return a tuple containing the pagerank results and a bool indicating if the results converged or not).

…agerank call to not converge yet still return a result with an additional flag indicating if the results converged or not.
@rlratzel rlratzel added feature request New feature or request non-breaking Non-breaking change labels Jun 6, 2023
@rlratzel rlratzel added this to the 23.08 milestone Jun 6, 2023
@rlratzel rlratzel self-assigned this Jun 6, 2023
@rlratzel rlratzel changed the title Adds error_on_nonconvergence option to pagerank to provide pagerank results even on non-convergence Adds fail_on_nonconvergence option to pagerank to provide pagerank results even on non-convergence Jun 6, 2023
@rlratzel rlratzel marked this pull request as ready for review June 12, 2023 21:23
@rlratzel rlratzel requested review from a team as code owners June 12, 2023 21:23
cpp/include/cugraph/algorithms.hpp Outdated Show resolved Hide resolved
cpp/tests/link_analysis/pagerank_test.cpp Outdated Show resolved Hide resolved
cpp/include/cugraph/algorithms.hpp Outdated Show resolved Hide resolved
@BradReesWork BradReesWork requested a review from eriknw June 13, 2023 14:40
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving from the Python/Dask cugraph layer.

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one additional complaint.

raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, true, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<weight_t const*> precomputed_vertex_out_weight_sums,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this better be std::optional<device_span<>>?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #3659 to address this in a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. Addressed in this PR since we had to make other changes.

Copy link
Contributor

@eriknw eriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user-facing API, I wonder whether fail_on_nonconvergence is the clearest and most convenient:

pagerank(..., max_iter=3, fail_on_nonconvergence=False)

I think I would prefer a more direct, affirmative argument, such as:

pagerank(..., num_iter=3)

python/cugraph/cugraph/dask/link_analysis/pagerank.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/dask/link_analysis/pagerank.py Outdated Show resolved Hide resolved
Comment on lines +404 to +418
result_tuples = [
client.submit(convert_to_return_tuple, cp_arrays) for cp_arrays in result
]

wait(cudf_result)
# Convert the futures to dask delayed objects so the tuples can be
# split. nout=2 is passed since each tuple/iterable is a fixed length of 2.
result_tuples = [dask.delayed(r, nout=2) for r in result_tuples]

# Create the ddf and get the converged bool from the delayed objs. Use a
# meta DataFrame to pass the expected dtypes for the DataFrame to prevent
# another compute to determine them automatically.
meta = cudf.DataFrame(columns=["vertex", "pagerank"])
meta = meta.astype({"pagerank": "float64", "vertex": vertex_dtype})
ddf = dask_cudf.from_delayed([t[0] for t in result_tuples], meta=meta).persist()
converged = all(dask.compute(*[t[1] for t in result_tuples]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative implementation to this could be something like:

import operator as op

...

result_tuples = client.map(convert_to_return_tuple, cp_arrays)
meta = cudf.DataFrame(columns=["vertex", "pagerank"])
meta = meta.astype({"pagerank": "float64", "vertex": vertex_dtype})
ddf = dask_cudf.from_delayed(client.map(op.itemgetter(0), result_tuples), meta=meta).persist()
converged = client.submit(all, client.map(op.itemgetter(1), result_tuples)).result()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Nice, Did not know we could do op.itemgetter like this. Very cool to learn. Thanks

python/cugraph/cugraph/link_analysis/pagerank.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/dask/link_analysis/pagerank.py Outdated Show resolved Hide resolved
@ChuckHastings
Copy link
Collaborator

/merge

@rapids-bot rapids-bot bot merged commit 5a18cde into rapidsai:branch-23.08 Jun 14, 2023
@rlratzel rlratzel deleted the branch-23.08-python_pagerank_convergence_option branch September 28, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Choose small max_iter parameter like 2
7 participants