MG get_two_hop_neighbors fails with KeyError when accessing start_vertices #3745

rlratzel · 2023-07-26T21:31:50Z

On a system with more visible devices than are needed for the distributed start_vertices list, a KeyError is raised when the get_two_hop_neighbors implementation attempts to access it:

  File "/home/user/miniconda3/envs/cugraph_dev-23.08/lib/python3.10/site-packages/cugraph/structure/graph_implementation/simpleDistributedGraph.py", line 781, in <listcomp>
    start_vertices[w][0],
KeyError: 'tcp://127.0.0.1:46347'

In this case, the system had 4 GPUs, and the workaround was to restrict the run to 2 GPUs:

(cugraph_dev-23.08) user@machine ~/nvidia/demo> CUDA_VISIBLE_DEVICES=0,1 python get_two_hop_demo.py

The fix is to not assume the start_vertices list is always distributed across every worker in the cluster.

The text was updated successfully, but these errors were encountered:

… start vertices list (#3778) closes #3745 This PR adds updates to replace the `get_distributed_data()` call with `persist_dask_df_equal_parts_per_worker()` and `get_persisted_df_worker_map()` to avoid a problem where `get_distributed_data()` does not distribute data properly across all workers. This resulted in a `KeyError` when the data was accessed via worker, when that worker was not a key in the map. More details are in the [linked issue](#3745). This PR also does minor refactoring in `get_two_hop_neighbors()` and reorganizes the imports according to [PEP 8](https://peps.python.org/pep-0008/#imports). Tested manually on a 4-GPU system, where the problem described in #3745 was reproduced, the change in the PR applied and re-run, and the error no longer occurring. Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - Vibhu Jawa (https://github.com/VibhuJawa) - Brad Rees (https://github.com/BradReesWork) URL: #3778

rlratzel added the bug Something isn't working label Jul 26, 2023

rlratzel self-assigned this Jul 26, 2023

rlratzel added the CRITICAL BUG! BUG that needs to be FIX NOW !!!! label Aug 8, 2023

rlratzel mentioned this issue Aug 10, 2023

Fixes KeyError for get_two_hop_neighbors when called with a small start vertices list #3778

Merged

rapids-bot bot closed this as completed in #3778 Aug 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MG get_two_hop_neighbors fails with KeyError when accessing start_vertices #3745

MG get_two_hop_neighbors fails with KeyError when accessing start_vertices #3745

rlratzel commented Jul 26, 2023

MG get_two_hop_neighbors fails with KeyError when accessing start_vertices #3745

MG get_two_hop_neighbors fails with KeyError when accessing start_vertices #3745

Comments

rlratzel commented Jul 26, 2023