[REVIEW] add multi-column support in algorithms - part 2 #1571

Iroy30 · 2021-05-03T15:29:19Z

Adds multicolumn support for:
jaccard
wjaccard
overlap
woverlap
pagerank
spectral clustering
forceatlas

codecov-commenter · 2021-05-04T05:50:15Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@0859228). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 619117f differs from pull request most recent head b89e083. Consider uploading reports for the commit b89e083 to get more accurate results

@@              Coverage Diff               @@
##             branch-21.06   #1571   +/-   ##
==============================================
  Coverage                ?   0.22%           
==============================================
  Files                   ?      79           
  Lines                   ?    3527           
  Branches                ?       0           
==============================================
  Hits                    ?       8           
  Misses                  ?    3519           
  Partials                ?       0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0859228...b89e083. Read the comment docs.

review-notebook-app · 2021-05-04T14:00:48Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BradReesWork · 2021-05-06T14:54:27Z

rerun tests

ChuckHastings

Glad we added multi-column tests.

ChuckHastings · 2021-05-06T15:38:58Z

python/cugraph/layout/force_atlas2.py

        if input_graph.renumbered is True:
+            if len(input_graph.renumber_map.implementation.col_names) > 1:


This reference: input_graph.renumber_map.implementation.col_names exposes internal implementation wherever we use it. I'd suggest adding a data member or method (either on the NumberMap or Graph class -- I'm inclined to think the latter) to indicate that it's a multi-column situation. Otherwise if we eventually change how we implement multi-column (i.e. when we rework it to use an indirect hash table as we have discussed) then all references to the implementation of renumber_map will need to be changed throughout the python code.

BradReesWork · 2021-05-13T14:20:12Z

rerun tests

Iroy30 · 2021-05-20T05:37:01Z

rerun tests

rlratzel

Just a question and a suggestion. I also agree with @ChuckHastings request to hide accessing the renumber_map implementation detail directly.

rlratzel · 2021-05-25T13:45:25Z

notebooks/link_prediction/Jaccard-Similarity.ipynb

@@ -451,8 +451,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
+    "pr_df.rename(columns={'pagerank': 'weight'}, inplace=True)",


Since this is a notebook, does this need a comment explaining why it's needed to the readers?

rlratzel · 2021-05-25T14:30:36Z

python/cugraph/link_prediction/jaccard.py

+        vertex_size = len(input_graph.renumber_map.implementation.col_names)
+        columns = vertex_pair.columns.to_list()
+        if vertex_size == 1:
+            for col in vertex_pair.columns:
+                null_check(vertex_pair[col])
+                if input_graph.renumbered:
+                    vertex_pair = input_graph.add_internal_vertex_id(
+                        vertex_pair, col, col
+                    )
+        else:
            if input_graph.renumbered:
                vertex_pair = input_graph.add_internal_vertex_id(
-                    vertex_pair, col, col
+                    vertex_pair, "src", columns[:vertex_size]
+                )
+                vertex_pair = input_graph.add_internal_vertex_id(
+                    vertex_pair, "dst", columns[vertex_size:]


This pattern is repeated in several algorithms - can the entire if-elif-else block be replaced by a utility that can be reused everywhere? This would remove a lot of code and make what's happening more self-documenting IMO, and could also minimize the amount of places that would need to be updated when implementation changes. Maybe something that can be used like:

if type(input_graph) is not Graph: raise Exception("input graph must be undirected") vertex_pairs = prepare_vertex_pairs(input_graph, vertex_pairs) df = jaccard_wrapper.jaccard(input_graph, None, vertex_pairs)

side note: it seems like an algo call shouldn't modify the graph like this (eg. adding internal vertex IDs to the graph), but that's probably a bigger problem for a different PR.

added utility function.

side note: it seems like an algo call shouldn't modify the graph like this (eg. adding internal vertex IDs to the graph), but that's probably a bigger problem for a different PR.

The algo renumbers the vertex_pairs, the graph stays the same

jnke2016

Overall looks good to me.

python/cugraph/tests/test_jaccard.py

ChuckHastings

Thanks for the update.

into branch-0.20

Iroy30 · 2021-05-26T19:59:52Z

rerun tests

BradReesWork · 2021-06-01T20:37:30Z

rerun tests

BradReesWork · 2021-06-02T14:07:01Z

@gpucibot merge

updates algorithms and add tests

2a7bdb5

Iroy30 requested a review from a team as a code owner May 3, 2021 15:29

Iroy30 changed the title ~~[WIP] updates algorithms and add tests~~ [WIP] multi-column updates to algorithms May 3, 2021

Iroy30 changed the title ~~[WIP] multi-column updates to algorithms~~ [WIP] add multi-column support in algorithms - part 2 May 3, 2021

update wjaccard, woverlap, flake8

efbc03d

Iroy30 changed the title ~~[WIP] add multi-column support in algorithms - part 2~~ [REVIEW] add multi-column support in algorithms - part 2 May 4, 2021

Iroy30 added 3 - Ready for Review non-breaking Non-breaking change labels May 4, 2021

Iroy30 self-assigned this May 4, 2021

Iroy30 added the improvement Improvement / enhancement to an existing function label May 4, 2021

update notebook

49051bb

Iroy30 requested a review from a team as a code owner May 4, 2021 14:00

Remove null check

1a5b6f0

BradReesWork requested review from rlratzel and jnke2016 May 5, 2021 18:54

BradReesWork added this to the 21.06 milestone May 5, 2021

BradReesWork approved these changes May 6, 2021

View reviewed changes

ChuckHastings reviewed May 6, 2021

View reviewed changes

Update Jaccard-Similarity.ipynb

7892109

rlratzel reviewed May 25, 2021

View reviewed changes

jnke2016 reviewed May 25, 2021

View reviewed changes

python/cugraph/tests/test_jaccard.py Show resolved Hide resolved

Iroy30 and others added 2 commits May 26, 2021 10:27

review updates

4aafda9

Merge branch 'branch-21.06' into multi-column-updates

f668572

ChuckHastings approved these changes May 26, 2021

View reviewed changes

jnke2016 approved these changes May 26, 2021

View reviewed changes

Iroy30 added 3 commits May 26, 2021 11:22

review comments

152d1f0

Merge branch 'multi-column-updates' of https://github.com/Iroy30/cugraph

60f10e4

into branch-0.20

flake8

b89e083

rapids-bot bot merged commit 575677f into rapidsai:branch-21.06 Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] add multi-column support in algorithms - part 2 #1571

[REVIEW] add multi-column support in algorithms - part 2 #1571

Iroy30 commented May 3, 2021 •

edited

Loading

codecov-commenter commented May 4, 2021 •

edited

Loading

review-notebook-app bot commented May 4, 2021

BradReesWork commented May 6, 2021

ChuckHastings left a comment

ChuckHastings May 6, 2021

Iroy30 May 26, 2021

BradReesWork commented May 13, 2021

Iroy30 commented May 20, 2021

rlratzel left a comment

rlratzel May 25, 2021

rlratzel May 25, 2021

Iroy30 May 26, 2021

jnke2016 left a comment

ChuckHastings left a comment

Iroy30 commented May 26, 2021

BradReesWork commented Jun 1, 2021

BradReesWork commented Jun 2, 2021

		if input_graph.renumbered is True:
		if len(input_graph.renumber_map.implementation.col_names) > 1:

[REVIEW] add multi-column support in algorithms - part 2 #1571

[REVIEW] add multi-column support in algorithms - part 2 #1571

Conversation

Iroy30 commented May 3, 2021 • edited Loading

codecov-commenter commented May 4, 2021 • edited Loading

Codecov Report

review-notebook-app bot commented May 4, 2021

BradReesWork commented May 6, 2021

ChuckHastings left a comment

Choose a reason for hiding this comment

ChuckHastings May 6, 2021

Choose a reason for hiding this comment

Iroy30 May 26, 2021

Choose a reason for hiding this comment

BradReesWork commented May 13, 2021

Iroy30 commented May 20, 2021

rlratzel left a comment

Choose a reason for hiding this comment

rlratzel May 25, 2021

Choose a reason for hiding this comment

rlratzel May 25, 2021

Choose a reason for hiding this comment

Iroy30 May 26, 2021

Choose a reason for hiding this comment

jnke2016 left a comment

Choose a reason for hiding this comment

ChuckHastings left a comment

Choose a reason for hiding this comment

Iroy30 commented May 26, 2021

BradReesWork commented Jun 1, 2021

BradReesWork commented Jun 2, 2021

Iroy30 commented May 3, 2021 •

edited

Loading

codecov-commenter commented May 4, 2021 •

edited

Loading