Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Shuffle results with dask-cuda 24.06 & above #134

Closed
ayushdg opened this issue Jul 1, 2024 · 3 comments
Closed

Incorrect Shuffle results with dask-cuda 24.06 & above #134

ayushdg opened this issue Jul 1, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ayushdg
Copy link
Collaborator

ayushdg commented Jul 1, 2024

Describe the bug

Seeing incorrect shuffle results (fewer rows written) when using dask-cuda 24.06 and above.
Narrowed it down to explicit comms changes in: rapidsai/dask-cuda#1323

Can also confirm that with explicit-comms disabled don't run into the issue of incorrect results.

Steps/Code to reproduce bug

Nothing minimal yet.

Expected behavior

Correct number of resulting rows.

Environment overview (please complete the following information)

  • Environment location: bare-metal
  • Method of NeMo-Curator install: from source

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version: ubuntu 22.04
  • Dask version (2025.5.1, dask-cuda 24.06)
  • Python version 3.10

Additional context

Add any other context about the problem here.

@ayushdg ayushdg added the bug Something isn't working label Jul 1, 2024
@VibhuJawa
Copy link
Collaborator

VibhuJawa commented Jul 29, 2024

Is this closedd with : #147 ?

@ayushdg
Copy link
Collaborator Author

ayushdg commented Jul 30, 2024

#147 Skips explicit comms for 24.06. I haven't had a chance to test rapidsai/dask-cuda#1356 with newer 24.08 versions to see if explicit comms works as expected in newer versions. I'd like to keep this open until that's verified.

@ayushdg
Copy link
Collaborator Author

ayushdg commented Sep 10, 2024

Fixed via rapidsai/dask-cuda#1356

@ayushdg ayushdg closed this as completed Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants