Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] add multi-column support in algorithms - part 2 #1571

Merged
merged 10 commits into from
Jun 2, 2021

Conversation

Iroy30
Copy link
Contributor

@Iroy30 Iroy30 commented May 3, 2021

Adds multicolumn support for:
jaccard
wjaccard
overlap
woverlap
pagerank
spectral clustering
forceatlas

@Iroy30 Iroy30 requested a review from a team as a code owner May 3, 2021 15:29
@Iroy30 Iroy30 changed the title [WIP] updates algorithms and add tests [WIP] multi-column updates to algorithms May 3, 2021
@Iroy30 Iroy30 changed the title [WIP] multi-column updates to algorithms [WIP] add multi-column support in algorithms - part 2 May 3, 2021
@Iroy30 Iroy30 changed the title [WIP] add multi-column support in algorithms - part 2 [REVIEW] add multi-column support in algorithms - part 2 May 4, 2021
@Iroy30 Iroy30 self-assigned this May 4, 2021
@Iroy30 Iroy30 added the improvement Improvement / enhancement to an existing function label May 4, 2021
@codecov-commenter
Copy link

codecov-commenter commented May 4, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@0859228). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 619117f differs from pull request most recent head b89e083. Consider uploading reports for the commit b89e083 to get more accurate results
Impacted file tree graph

@@              Coverage Diff               @@
##             branch-21.06   #1571   +/-   ##
==============================================
  Coverage                ?   0.22%           
==============================================
  Files                   ?      79           
  Lines                   ?    3527           
  Branches                ?       0           
==============================================
  Hits                    ?       8           
  Misses                  ?    3519           
  Partials                ?       0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0859228...b89e083. Read the comment docs.

@Iroy30 Iroy30 requested a review from a team as a code owner May 4, 2021 14:00
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BradReesWork BradReesWork added this to the 21.06 milestone May 5, 2021
@BradReesWork
Copy link
Member

rerun tests

Copy link
Collaborator

@ChuckHastings ChuckHastings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad we added multi-column tests.

if input_graph.renumbered is True:
if len(input_graph.renumber_map.implementation.col_names) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference: input_graph.renumber_map.implementation.col_names exposes internal implementation wherever we use it. I'd suggest adding a data member or method (either on the NumberMap or Graph class -- I'm inclined to think the latter) to indicate that it's a multi-column situation. Otherwise if we eventually change how we implement multi-column (i.e. when we rework it to use an indirect hash table as we have discussed) then all references to the implementation of renumber_map will need to be changed throughout the python code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@BradReesWork
Copy link
Member

rerun tests

1 similar comment
@Iroy30
Copy link
Contributor Author

Iroy30 commented May 20, 2021

rerun tests

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question and a suggestion. I also agree with @ChuckHastings request to hide accessing the renumber_map implementation detail directly.

@@ -451,8 +451,9 @@
"metadata": {},
"outputs": [],
"source": [
"pr_df.rename(columns={'pagerank': 'weight'}, inplace=True)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a notebook, does this need a comment explaining why it's needed to the readers?

Comment on lines 112 to 127
vertex_size = len(input_graph.renumber_map.implementation.col_names)
columns = vertex_pair.columns.to_list()
if vertex_size == 1:
for col in vertex_pair.columns:
null_check(vertex_pair[col])
if input_graph.renumbered:
vertex_pair = input_graph.add_internal_vertex_id(
vertex_pair, col, col
)
else:
if input_graph.renumbered:
vertex_pair = input_graph.add_internal_vertex_id(
vertex_pair, col, col
vertex_pair, "src", columns[:vertex_size]
)
vertex_pair = input_graph.add_internal_vertex_id(
vertex_pair, "dst", columns[vertex_size:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is repeated in several algorithms - can the entire if-elif-else block be replaced by a utility that can be reused everywhere? This would remove a lot of code and make what's happening more self-documenting IMO, and could also minimize the amount of places that would need to be updated when implementation changes. Maybe something that can be used like:

if type(input_graph) is not Graph:
    raise Exception("input graph must be undirected")

vertex_pairs = prepare_vertex_pairs(input_graph, vertex_pairs)

df = jaccard_wrapper.jaccard(input_graph, None, vertex_pairs)

side note: it seems like an algo call shouldn't modify the graph like this (eg. adding internal vertex IDs to the graph), but that's probably a bigger problem for a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added utility function.

side note: it seems like an algo call shouldn't modify the graph like this (eg. adding internal vertex IDs to the graph), but that's probably a bigger problem for a different PR.

The algo renumbers the vertex_pairs, the graph stays the same

Copy link
Contributor

@jnke2016 jnke2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

python/cugraph/tests/test_jaccard.py Show resolved Hide resolved
Copy link
Collaborator

@ChuckHastings ChuckHastings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.

@Iroy30
Copy link
Contributor Author

Iroy30 commented May 26, 2021

rerun tests

1 similar comment
@BradReesWork
Copy link
Member

rerun tests

@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 575677f into rapidsai:branch-21.06 Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants