Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Added Sorensen algorithm to Python API #1820

Merged
merged 15 commits into from
Sep 24, 2021

Conversation

jnke2016
Copy link
Contributor

Add a python implementation of the Sorensen and the wSorensen from a prior Jaccard implementation

Add tests for both algorithms

Since there is no current implementation of networkX Sorensen, the tests convert networkX Jaccard to Sorensen and compare it to cugraph Sorensen

@jnke2016 jnke2016 requested a review from a team as a code owner September 14, 2021 18:28
@rlratzel rlratzel changed the title Fea Sorensen [FEA] Added Sorensen algorithm to Python API Sep 14, 2021
@rlratzel rlratzel added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 14, 2021
@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2021

Codecov Report

Merging #1820 (3984e76) into branch-21.10 (bf64c2c) will increase coverage by 9.74%.
The diff coverage is n/a.

❗ Current head 3984e76 differs from pull request most recent head dcd9123. Consider uploading reports for the commit dcd9123 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.10    #1820      +/-   ##
================================================
+ Coverage         59.85%   69.60%   +9.74%     
================================================
  Files                77      143      +66     
  Lines              3547     8672    +5125     
================================================
+ Hits               2123     6036    +3913     
- Misses             1424     2636    +1212     
Impacted Files Coverage Δ
python/cugraph/community/egonet.py
python/cugraph/community/leiden.py
python/cugraph/traversal/ms_bfs.py
python/cugraph/utilities/grmat.py
python/cugraph/structure/hypergraph.py
python/cugraph/proto/components/scc.py
python/cugraph/centrality/katz_centrality.py
python/cugraph/linear_assignment/__init__.py
python/cugraph/dask/common/mg_utils.py
python/cugraph/structure/shuffle.py
... and 210 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df2d81d...dcd9123. Read the comment docs.

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, the only real issue is the pandas import which I mentioned below. I'm also wondering if there's a better way to test since there's a lot of repetition and calling Nx may not be necessary (seems like it's mostly testing Jaccard that way).

python/cugraph/cugraph/link_prediction/sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/wsorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_wsorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/wsorensen.py Outdated Show resolved Hide resolved
@BradReesWork BradReesWork added this to the 21.10 milestone Sep 14, 2021
python/cugraph/cugraph/link_prediction/sorensen.py Outdated Show resolved Hide resolved
G, isNx = check_nx_graph(G)

if isNx is True and ebunch is not None:
vertex_pair = cudf.from_pandas(pd.DataFrame(ebunch))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion seems odd. Can you just pass the ebunch directly to cudf?

Copy link
Contributor Author

@jnke2016 jnke2016 Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That conversion is indeed odd. My Cugraph Sorensen implementation mirrors the already existing Jaccard one which has a few more dubious calls and I didn't pay attention to. I tried to make Sorensen consistent with Jaccard. I am refactoring Jaccard, overlap as well.

@BradReesWork
Copy link
Member

rerun tests

python/cugraph/cugraph/link_prediction/jaccard.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/sorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/wjaccard.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/wjaccard.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/wsorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/link_prediction/wsorensen.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_jaccard.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_sorensen.py Outdated Show resolved Hide resolved
@rlratzel
Copy link
Contributor

rerun tests

@rlratzel
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 9191711 into rapidsai:branch-21.10 Sep 24, 2021
@BradReesWork BradReesWork linked an issue Oct 4, 2021 that may be closed by this pull request
@jnke2016 jnke2016 deleted the fea_sorensen branch September 24, 2022 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Sorenson-Dice Coefficent
4 participants