Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures present in MG bulk sampler tests #3390

Closed
rlratzel opened this issue Mar 29, 2023 · 0 comments · Fixed by #3393
Closed

Test failures present in MG bulk sampler tests #3390

rlratzel opened this issue Mar 29, 2023 · 0 comments · Fixed by #3393
Assignees
Labels
bug Something isn't working

Comments

@rlratzel
Copy link
Contributor

The test run was on a 2-node 16-GPU (total) configuration:

=========================== short test summary info ============================
FAILED tests/sampling/test_bulk_sampler_mg.py::test_bulk_sampler_simple - Run...
FAILED tests/sampling/test_bulk_sampler_mg.py::test_bulk_sampler_mg_graph_sg_input
================== 2 failed, 1 skipped, 3 warnings in 51.13s ===================

Both tests had the same problem:

        bs.flush()

>       recovered_samples = cudf.read_parquet(os.path.join(tempdir_object.name, "rank=0"))

tests/sampling/test_bulk_sampler_mg.py:63:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
    result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:550: in read_parquet
    return _parquet_to_frame(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
    result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:611: in _parquet_to_frame
    _read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
    result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:688: in _read_parquet
    return libparquet.read_parquet(
parquet.pyx:123: in cudf._lib.parquet.read_parquet
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   RuntimeError: CUDF failure at: /project/cpp/src/io/parquet/reader_impl_helpers.cpp:262: All sources must have the same number of columns

parquet.pyx:182: RuntimeError
@rlratzel rlratzel added the bug Something isn't working label Mar 29, 2023
@BradReesWork BradReesWork added this to the 23.04 milestone Mar 30, 2023
@kingmesal kingmesal removed this from the 23.04 milestone Mar 30, 2023
@rapids-bot rapids-bot bot closed this as completed in #3393 Apr 3, 2023
rapids-bot bot pushed a commit that referenced this issue Apr 3, 2023
This PR fixes a bug where output sample batch ids do not match those expected when using the bulk sampler, causing subgraphs that are larger than expected and incorrect.  Without reindexing, the wrong batch ids are assigned to the start vertices.  Reindexing ensures that the same order is preserved for batch ids and start vertices.

This PR also changes the empty dataframe passed to dask in `uniform_neighbor_sample` to match the correct ordering of batch_id and hop_id.  This ensures that the columns are named correctly and are not inadvertently renamed due to them being created in a different order.

This PR is non-breaking because it restores the original behavior of bulk sampling and reverses a bug that was inadvertently introduced with the dask updates.

Resolves #3390

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Joseph Nke (https://github.com/jnke2016)

URL: #3393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants