Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Optimize pg.get_x_data APIs #3086

Merged

Conversation

VibhuJawa
Copy link
Member

@VibhuJawa VibhuJawa commented Dec 15, 2022

This PR closes #3026 .
The speed up on cupy arrays is 83x.

mainline:
----------------------- benchmark: 1 tests -----------------------
Name (time in s, mem in bytes)                        Mean  Rounds
------------------------------------------------------------------
bench_get_vector_features_cp_array[128-1000000]     1.0880       1
------------------------------------------------------------------


branch: 
------------------------ benchmark: 1 tests -----------------------
Name (time in ms, mem in bytes)                        Mean  Rounds
-------------------------------------------------------------------
bench_get_vector_features_cp_array[128-1000000]     12.7905       1
-------------------------------------------------------------------
branch: 
------------------------- benchmark: 1 tests -------------------------
Name (time in ms, mem in bytes)                           Mean  Rounds
----------------------------------------------------------------------
bench_get_vector_features_cudf_series[128-1000000]     12.7068       1
----------------------------------------------------------------------


Mainline: 
------------------------- benchmark: 1 tests -------------------------
Name (time in ms, mem in bytes)                           Mean  Rounds
----------------------------------------------------------------------
bench_get_vector_features_cudf_series[128-1000000]     12.6223       1
----------------------------------------------------------------------

Comment on lines +869 to +871
assert (
df["_VERTEX_"] == df["_VERTEX_"]._constructor(range(start, stop + 1))
).all()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are similar to https://github.com/rapidsai/cugraph/pull/3036/files#r1048542911 .

Unsure how CI passed without them

Comment on lines +875 to +876
expected = cur._constructor(sorted(x for x, *args in data[key][1]))
assert (cur.values == expected.values).all()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made similar changes in my PR. We have to be more careful running MG tests until they are included in CI.

Comment on lines +911 to +913
actual = df[pG.edge_id_col_name]
expected = actual._constructor(range(start, stop + 1))
assert (actual == expected).all()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VibhuJawa VibhuJawa added non-breaking Non-breaking change Fix labels Dec 15, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2022

Codecov Report

Base: 58.50% // Head: 57.17% // Decreases project coverage by -1.33% ⚠️

Coverage data is based on head (b7814d3) compared to base (fd8d764).
Patch coverage: 25.00% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02    #3086      +/-   ##
================================================
- Coverage         58.50%   57.17%   -1.34%     
================================================
  Files               133      148      +15     
  Lines              7895     9295    +1400     
================================================
+ Hits               4619     5314     +695     
- Misses             3276     3981     +705     
Impacted Files Coverage Δ
...ugraph/cugraph/dask/structure/mg_property_graph.py 11.97% <0.00%> (-0.18%) ⬇️
python/cugraph/cugraph/structure/property_graph.py 94.67% <50.00%> (-0.29%) ⬇️
...-service/server/cugraph_service_server/__init__.py 25.00% <0.00%> (ø)
...aph-service/client/cugraph_service_client/types.py 74.69% <0.00%> (ø)
...t/cugraph_service_client/cugraph_service_thrift.py 68.18% <0.00%> (ø)
...-service/server/cugraph_service_server/__main__.py 0.00% <0.00%> (ø)
...rvice_server/testing/benchmark_server_extension.py 40.38% <0.00%> (ø)
...-service/client/cugraph_service_client/__init__.py 100.00% <0.00%> (ø)
...-service/client/cugraph_service_client/defaults.py 100.00% <0.00%> (ø)
...lient/cugraph_service_client/remote_graph_utils.py 20.00% <0.00%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@VibhuJawa VibhuJawa added the improvement Improvement / enhancement to an existing function label Dec 16, 2022
@VibhuJawa VibhuJawa marked this pull request as ready for review December 16, 2022 14:17
@VibhuJawa VibhuJawa requested a review from a team as a code owner December 16, 2022 14:17
@VibhuJawa VibhuJawa changed the title [WIP] Optimize pg.get_x_data APIs [REVIEW] Optimize pg.get_x_data APIs Dec 16, 2022
@VibhuJawa VibhuJawa changed the title [REVIEW] Optimize pg.get_x_data APIs [WIP] Optimize pg.get_x_data APIs Dec 16, 2022
@VibhuJawa VibhuJawa changed the title [WIP] Optimize pg.get_x_data APIs [REVIEW] Optimize pg.get_x_data APIs Dec 16, 2022
Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I did have one suggestion.

python/cugraph/cugraph/dask/structure/mg_property_graph.py Outdated Show resolved Hide resolved
@BradReesWork BradReesWork added this to the 23.02 milestone Dec 16, 2022
@alexbarghi-nv
Copy link
Member

@VibhuJawa could you clarify - is cudf.Series the preferred datatype to pass in here? So cupy arrays are discouraged?

@VibhuJawa
Copy link
Member Author

@VibhuJawa could you clarify - is cudf.Series the preferred datatype to pass in here? So cupy arrays are discouraged?

Nope, You can use anything you want after this PR . Before this PR cupy arrays on SG bonked on SG and MG . I think even cudf.series was a problem for MG previously but i might be wrong

@VibhuJawa
Copy link
Member Author

rerun tests

Reason:
The previous failure was because of cuDF issues.

12:20:19 Encountered problems while solving:
12:20:19   - nothing provides libcudf 23.02.00a221216.* needed by cudf-23.02.00a221216-cuda_11_py39_g5e357b4f49_188
12:20:19   - nothing provides libcudf 23.02.00a221216.* needed by cudf-23.02.00a221216-cuda_11_py39_g5e357b4f49_188

@VibhuJawa
Copy link
Member Author

@rlratzel / @alexbarghi-nv , Can we merge this in now ? I think all changes have been addressed and CI passes.

@alexbarghi-nv
Copy link
Member

@rlratzel / @alexbarghi-nv , Can we merge this in now ? I think all changes have been addressed and CI passes.

I've approved, but don't have merge privileges.

@alexbarghi-nv
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit bc2e130 into rapidsai:branch-23.02 Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: PG. get_vertex_data with cupy arrays as vertex ids is slow
5 participants