-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] add multi-column support in algorithms - part 2 #1571
[REVIEW] add multi-column support in algorithms - part 2 #1571
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #1571 +/- ##
==============================================
Coverage ? 0.22%
==============================================
Files ? 79
Lines ? 3527
Branches ? 0
==============================================
Hits ? 8
Misses ? 3519
Partials ? 0 Continue to review full report at Codecov.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad we added multi-column tests.
if input_graph.renumbered is True: | ||
if len(input_graph.renumber_map.implementation.col_names) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reference: input_graph.renumber_map.implementation.col_names
exposes internal implementation wherever we use it. I'd suggest adding a data member or method (either on the NumberMap or Graph class -- I'm inclined to think the latter) to indicate that it's a multi-column situation. Otherwise if we eventually change how we implement multi-column (i.e. when we rework it to use an indirect hash table as we have discussed) then all references to the implementation of renumber_map will need to be changed throughout the python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
rerun tests |
1 similar comment
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question and a suggestion. I also agree with @ChuckHastings request to hide accessing the renumber_map implementation detail directly.
@@ -451,8 +451,9 @@ | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"pr_df.rename(columns={'pagerank': 'weight'}, inplace=True)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a notebook, does this need a comment explaining why it's needed to the readers?
vertex_size = len(input_graph.renumber_map.implementation.col_names) | ||
columns = vertex_pair.columns.to_list() | ||
if vertex_size == 1: | ||
for col in vertex_pair.columns: | ||
null_check(vertex_pair[col]) | ||
if input_graph.renumbered: | ||
vertex_pair = input_graph.add_internal_vertex_id( | ||
vertex_pair, col, col | ||
) | ||
else: | ||
if input_graph.renumbered: | ||
vertex_pair = input_graph.add_internal_vertex_id( | ||
vertex_pair, col, col | ||
vertex_pair, "src", columns[:vertex_size] | ||
) | ||
vertex_pair = input_graph.add_internal_vertex_id( | ||
vertex_pair, "dst", columns[vertex_size:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is repeated in several algorithms - can the entire if-elif-else block be replaced by a utility that can be reused everywhere? This would remove a lot of code and make what's happening more self-documenting IMO, and could also minimize the amount of places that would need to be updated when implementation changes. Maybe something that can be used like:
if type(input_graph) is not Graph:
raise Exception("input graph must be undirected")
vertex_pairs = prepare_vertex_pairs(input_graph, vertex_pairs)
df = jaccard_wrapper.jaccard(input_graph, None, vertex_pairs)
side note: it seems like an algo call shouldn't modify the graph like this (eg. adding internal vertex IDs to the graph), but that's probably a bigger problem for a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added utility function.
side note: it seems like an algo call shouldn't modify the graph like this (eg. adding internal vertex IDs to the graph), but that's probably a bigger problem for a different PR.
The algo renumbers the vertex_pairs, the graph stays the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
rerun tests |
1 similar comment
rerun tests |
@gpucibot merge |
Adds multicolumn support for:
jaccard
wjaccard
overlap
woverlap
pagerank
spectral clustering
forceatlas