Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Compute] Use generic hash-aggregate for DictionaryArrays #28104

Open
asfimport opened this issue Apr 8, 2021 · 3 comments
Open

[C++][Compute] Use generic hash-aggregate for DictionaryArrays #28104

asfimport opened this issue Apr 8, 2021 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Apr 8, 2021

When calculating unique for chunked DictionaryArrays we currently run through all chunks and unify their dictionaries and then collect chunk indices. We could avoid the dictionary unification by using a generic hash.

See discussion here and here

Reporter: Rok Mihevc / @rok

Related issues:

Note: This issue was originally created as ARROW-12301. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Niranda Perera / @nirandaperera:
@rok do you think this is similar to ARROW-9773?

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Anything relating to hashing should be coordinated with the ongoing query engine work (ARROW-12633); cc @michalursa

@asfimport
Copy link
Collaborator Author

Rok Mihevc / @rok:
@nirandaperera Sounds like it could be. Adding both to ARROW-12633.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant