[REVIEW] Optimize dask.uniform_neighbor_sample #2887

VibhuJawa · 2022-11-03T22:24:36Z

This PR closes #2872 and part of https://github.com/rapidsai/graph_dl/issues/74 .

On a preliminary benchmark i am seeing a ~~6x~~ , 3.5x improvement and i expect more improvement on bigger clusters.

Before PR :

%%timeit
df = cugraph.dask.uniform_neighbor_sample(dask_g, start_list[:10_000], [20])
187 ms ± 40.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After PR

%%timeit
df = cugraph.dask.uniform_neighbor_sample(dask_g, start_list[:10_000], [20])
50.1 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Updated benchmark

Before PR:

---------------------------------------------------------------------- benchmark: 3 tests ----------------------------------------------------------------------
Name (time in ms, mem in bytes)                            Mean              GPU mem            GPU Leaked mem            Rounds            GPU Rounds          
----------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_uniform_neigbour_sample_email_eu_core[1000]      170.2833 (1.0)        300,224 (1.0)                   0 (1.0)           1           1
bench_uniform_neigbour_sample_email_eu_core[5000]      191.7691 (1.13)     1,200,944 (4.00)                  0 (1.0)           1           1
bench_uniform_neigbour_sample_email_eu_core[10000]     194.8658 (1.14)     1,200,944 (4.00)                  0 (1.0)           1           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------

After PR:

--------------------------------------------------------------------- benchmark: 3 tests --------------------------------------------------------------------
Name (time in ms, mem in bytes)                           Mean            GPU mem            GPU Leaked mem            Rounds            GPU Rounds          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_uniform_neigbour_sample_email_eu_core[1000]      54.0802 (1.0)            0 (1.0)                   0 (1.0)           1           1
bench_uniform_neigbour_sample_email_eu_core[5000]      63.0271 (1.17)           0 (1.0)                   0 (1.0)           1           1
bench_uniform_neigbour_sample_email_eu_core[10000]     70.1445 (1.30)           0 (1.0)                   0 (1.0)           1           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------

We should probably follow similar things for other algos too.
CC: @jnke2016 , @rlratzel , @alexbarghi-nv .

codecov-commenter · 2022-11-04T18:55:01Z

Codecov Report

Base: 60.43% // Head: 61.71% // Increases project coverage by +1.27% 🎉

Coverage data is based on head (632d584) compared to base (d86c933).
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-22.12    #2887      +/-   ##
================================================
+ Coverage         60.43%   61.71%   +1.27%     
================================================
  Files               114      123       +9     
  Lines              6547     7543     +996     
================================================
+ Hits               3957     4655     +698     
- Misses             2590     2888     +298

Impacted Files	Coverage Δ
python/cugraph/cugraph/components/connectivity.py	`94.11% <ø> (-0.89%)`	⬇️
python/cugraph/cugraph/dask/__init__.py	`100.00% <ø> (ø)`
...on/cugraph/cugraph/dask/components/connectivity.py	`31.03% <ø> (+5.10%)`	⬆️
...thon/cugraph/cugraph/dask/sampling/random_walks.py	`22.91% <ø> (ø)`
...h/cugraph/dask/sampling/uniform_neighbor_sample.py	`25.42% <ø> (+4.26%)`	⬆️
...ugraph/cugraph/dask/structure/mg_property_graph.py	`14.12% <ø> (+0.46%)`	⬆️
python/cugraph/cugraph/dask/traversal/bfs.py	`25.39% <ø> (-0.84%)`	⬇️
...ph/cugraph/experimental/pyg_extensions/__init__.py	`100.00% <ø> (ø)`
python/cugraph/cugraph/gnn/__init__.py	`100.00% <ø> (ø)`
...h/cugraph/gnn/dgl_extensions/base_cugraph_store.py	`84.21% <ø> (ø)`
... and 47 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

alexbarghi-nv · 2022-11-09T03:31:13Z

python/cugraph/cugraph/dask/sampling/uniform_neighbor_sample.py

-    wait(cudf_result)
-
-    ddf = dask_cudf.from_delayed(cudf_result).persist()
+    # Send tasks all at once


Just one comment; it would be nice to have a helper function for this so we can apply it to more algorithms. But I suggest we leave that to a future PR given how much work we have left before burndown.

To clarify, this was referencing lines 174-180

alexbarghi-nv

👍

python/cugraph/cugraph/dask/sampling/uniform_neighbor_sample.py

jnke2016

This implementation experiences a hang on 2, 3 and 4 GPUs similar to this issue that was closed few months ago by this PR. To reproduce, just run this implementation of uniform_nwighbor_sample several times(less than 10) on 2 GPUs.

VibhuJawa · 2022-11-09T16:12:43Z

This implementation experiences a hang on 2, 3 and 4 GPUs similar to this issue that was closed few months ago by this PR. To reproduce, just run this implementation of uniform_nwighbor_sample several times(less than 10) on 2 GPUs.

Thanks for testing this thoroughly @jnke2016 , I really appreciate it. Can you post a link to the code that you ran to get to a hand as i could not get it to hand in my testing on 8 GPUs .

We can probably add those statements back and give back 10 ms of perf on 8GPUs but it is not a real solution as it will be much worse on larger cluster (say 128 GPUs) .

I am gonna try looking into a real fix of the problem today

jnke2016 · 2022-11-09T16:21:43Z

You can run this on 2 GPUs.

import cugraph
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
import dask
from cugraph.dask.comms import comms as Comms
import cudf
import dask_cudf
from cugraph.dask.common.mg_utils import get_visible_devices
import cugraph.dask as dcg
from cugraph.generators import rmat

import time

def setup():
    cluster = LocalCUDACluster()
    client = Client(cluster)
    client.wait_for_workers(len(get_visible_devices()))

    Comms.initialize(p2p=True)
    return (client, cluster)

def teardown(client, cluster=None):
    Comms.destroy()
    client.close()
    if cluster:
        cluster.close()

def generate_edgelist(scale,
                      edgefactor,
                      seed=None,
                      unweighted=False,
                     ):

    df = rmat(
        scale,
        (2**scale)*edgefactor,
        0.57,  # from Graph500
        0.19,  # from Graph500
        0.19,  # from Graph500
        seed or 42,
        clip_and_flip=False,
        scramble_vertex_ids=True,
        create_using=None,  # return edgelist instead of Graph instance
        mg=True
    )

    return df

if __name__ == "__main__":

    setup_objs = setup()
    client = setup_objs[0]
    num_workers = len(client.scheduler_info()['workers'])

    df = generate_edgelist(scale=10, edgefactor=16, seed=5)

    m_G = cugraph.Graph()
    m_G.from_dask_cudf_edgelist(
        df, source='src', destination='dst', renumber=True, legacy_renum_only=True)

    vertices = m_G.nodes().persist()
    print("number of nodes are ", len(vertices))
    print("number of edges is ", len(m_G.input_df))
    vertices = vertices.compute().reset_index(drop=True).iloc[:10000]


    time_list = []
    for i in range(10):
        print("*****iteration ", i, "*****")
        t0 = time.time()
        results = cugraph.dask.uniform_neighbor_sample(m_G, vertices, [2])
        t1 = time.time()
        total = t1-t0
        time_list.append(total)


    import statistics
    print("the mean is ", statistics.mean(time_list))
    print("the standard deviation is ", statistics.stdev(time_list))
    print("number of GPUs is ", num_workers)
    
    teardown(*setup_objs)

VibhuJawa · 2022-11-09T16:27:30Z

@jnke2016 , Thanks for this script. Let me investigate more :-)

VibhuJawa · 2022-11-17T03:30:36Z

@jnke2016 ,

Swapping client.submit to client.map is runnning into issues with work-stealing (See internal thread). Swapping it back gets rid of the hang .

We still see speedups (See updated benchmarks in the PR description on a 8 GPU cluster) but we will not be as fast as we should be on large clusters . (We currently loose 5ms per worker)

Anyways, I have verified that your script works on 1,2,3,4,5,6,7,8 GPUs with 50 iterations each without hanging.

I will make the cluster agnostic changes in 23.02 as it is more involved.

Please feel free to review again

review-notebook-app · 2022-11-17T03:59:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

VibhuJawa · 2022-11-17T04:03:02Z

@rerun tests

wangxiaoyunNV

It looks good.

jnke2016

This looks good to me.

BradReesWork · 2022-11-18T14:16:16Z

@gpucibot merge

Add uniform_neighbor_sample.py

07e9341

VibhuJawa requested a review from a team as a code owner November 3, 2022 22:24

VibhuJawa added this to the 22.12 milestone Nov 3, 2022

VibhuJawa self-assigned this Nov 3, 2022

VibhuJawa added 2 - In Progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 3, 2022

Improved Uniform Sampling Perf

553f14f

VibhuJawa removed the 2 - In Progress label Nov 4, 2022

VibhuJawa changed the title ~~[WIP] Optimize dask.uniform_neighbor_sample~~ [REVIEW] Optimize dask.uniform_neighbor_sample Nov 4, 2022

VibhuJawa added 2 commits November 4, 2022 09:06

@jnke2016 review to removing expensive_test

6dfa1e5

Style Fixes

9633224

BradReesWork requested review from alexbarghi-nv and jnke2016 November 4, 2022 18:30

alexbarghi-nv reviewed Nov 9, 2022

View reviewed changes

alexbarghi-nv approved these changes Nov 9, 2022

View reviewed changes

jnke2016 reviewed Nov 9, 2022

View reviewed changes

python/cugraph/cugraph/dask/sampling/uniform_neighbor_sample.py Show resolved Hide resolved

jnke2016 requested changes Nov 9, 2022

View reviewed changes

VibhuJawa changed the title ~~[REVIEW] Optimize dask.uniform_neighbor_sample~~ [WIP] Optimize dask.uniform_neighbor_sample Nov 10, 2022

VibhuJawa added 6 commits November 16, 2022 13:01

Add uniform_neighbor_sample.py

cd63a81

Improved Uniform Sampling Perf

680191d

@jnke2016 review to removing expensive_test

1fdc8ba

Style Fixes

e6cbfd4

added back do_expensive_check

41b7ddb

switch to client.submit

5875522

VibhuJawa changed the title ~~[WIP] Optimize dask.uniform_neighbor_sample~~ [REVIEW] Optimize dask.uniform_neighbor_sample Nov 17, 2022

switch to client.submit

632d584

VibhuJawa requested review from a team as code owners November 17, 2022 03:59

VibhuJawa changed the base branch from branch-22.12 to branch-22.10 November 17, 2022 04:00

VibhuJawa changed the base branch from branch-22.10 to branch-22.12 November 17, 2022 04:01

BradReesWork requested a review from wangxiaoyunNV November 17, 2022 19:06

BradReesWork approved these changes Nov 17, 2022

View reviewed changes

wangxiaoyunNV closed this Nov 17, 2022

wangxiaoyunNV reviewed Nov 17, 2022

View reviewed changes

wangxiaoyunNV reopened this Nov 17, 2022

wangxiaoyunNV approved these changes Nov 17, 2022

View reviewed changes

jnke2016 approved these changes Nov 18, 2022

View reviewed changes

rapids-bot bot merged commit 34e3dd7 into rapidsai:branch-22.12 Nov 18, 2022

VibhuJawa mentioned this pull request Jan 17, 2023

Implement New Sampling API in Python #3082

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Optimize dask.uniform_neighbor_sample #2887

[REVIEW] Optimize dask.uniform_neighbor_sample #2887

VibhuJawa commented Nov 3, 2022 •

edited

Loading

codecov-commenter commented Nov 4, 2022 •

edited

Loading

alexbarghi-nv Nov 9, 2022 •

edited

Loading

alexbarghi-nv left a comment

jnke2016 left a comment •

edited

Loading

VibhuJawa commented Nov 9, 2022

jnke2016 commented Nov 9, 2022

VibhuJawa commented Nov 9, 2022

VibhuJawa commented Nov 17, 2022 •

edited

Loading

review-notebook-app bot commented Nov 17, 2022

VibhuJawa commented Nov 17, 2022

wangxiaoyunNV left a comment

jnke2016 left a comment

BradReesWork commented Nov 18, 2022

[REVIEW] Optimize dask.uniform_neighbor_sample #2887

[REVIEW] Optimize dask.uniform_neighbor_sample #2887

Conversation

VibhuJawa commented Nov 3, 2022 • edited Loading

codecov-commenter commented Nov 4, 2022 • edited Loading

Codecov Report

alexbarghi-nv Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

alexbarghi-nv left a comment

Choose a reason for hiding this comment

jnke2016 left a comment • edited Loading

Choose a reason for hiding this comment

VibhuJawa commented Nov 9, 2022

jnke2016 commented Nov 9, 2022

VibhuJawa commented Nov 9, 2022

VibhuJawa commented Nov 17, 2022 • edited Loading

review-notebook-app bot commented Nov 17, 2022

VibhuJawa commented Nov 17, 2022

wangxiaoyunNV left a comment

Choose a reason for hiding this comment

jnke2016 left a comment

Choose a reason for hiding this comment

BradReesWork commented Nov 18, 2022

VibhuJawa commented Nov 3, 2022 •

edited

Loading

codecov-commenter commented Nov 4, 2022 •

edited

Loading

alexbarghi-nv Nov 9, 2022 •

edited

Loading

jnke2016 left a comment •

edited

Loading

VibhuJawa commented Nov 17, 2022 •

edited

Loading