Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the hash-merge utility and switch it to use mainline dask.merge #158

Open
VibhuJawa opened this issue Dec 11, 2020 · 0 comments
Open

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented Dec 11, 2020

Remove the hash-merge utility and switch it to use mainline dask.merge

We added the hash-merge utility because of the difference of implementation b/w dask_cudf repartition and the shuffle function being used by dask's merge.

The earlier dask.dataframe shuffle implementation used a different code than the dask_cudf and was more memory-hungry.

We have upstreamed our repartition function to dask-mainline since then, and it should now have the same performance characteristics as this merge, so we should use that instead after we scale test to verify similar results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant