-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37055: [C++] Optimize hash kernels for Dictionary ChunkedArrays #38394
Conversation
|
Benchmark result: https://gist.github.com/js8544/d64f289313c814a2df6a87a945a0b382#file-pr38394-txt Also significant improvement in user's case in python: #37055 (comment) |
It was mentioned in #9683 (comment) that we can compute the result of each chunk and then merge them with This PR saves calls of the dictionary unifier, we can also further optimize this by optimizing the unifying process. This will be done once we have a faster hashtable: #38372. |
@felipecrv Do you mind having a look at this? |
Soon! |
ARROW_CHECK_OK(unifier->Unify(*arr_dict, &transpose_map)); | ||
ARROW_CHECK_OK(unifier->GetResult(&out_dict_type, &out_dict)); | ||
RETURN_NOT_OK(dictionary_unifier_->Unify(*arr_dict, &transpose_map)); | ||
RETURN_NOT_OK(dictionary_unifier_->GetResult(&out_dict_type, &out_dict)); | ||
|
||
dictionary_ = out_dict; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need dictionary_
to be a member variable now? Wouldn't it suffice to perform a single DictionaryUnifier::GetResult
call at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation for DictionaryUnifier::GetResult
says the unifier can't be re-used after a call to GetResult
[1].
My suggestion (that I think will work well):
Rename dictionary_
to first_dictionary_
and change Append(arr)
to transition through this state machine on each call:
// --------------------------------------------------------------------------------------------------------------
// Current State Next State
// --------------------------------------------------------------------------------------------------------------
// !first_dictionary_ && !dictionary_unifier_ --> first_dictionary_ = arr_dict_
// UNCHANGED dictionary_unifier_
// --------------------------------------------------------------------------------------------------------------
// first_dictionary_ && !dictionary_unifier_ --> if !first_dictionary_.Equals(arr_dict) then
// dictionary_unifier_ = unify(first_dictionary_, arr_dict)
// first_dictionary_ = nullptr
// else
// UNCHANGED first_dictionary_, dictionary_unifier_
// end
// --------------------------------------------------------------------------------------------------------------
dictionary_unifier_ --> dictionary_unifier_ = unify(dictionary_unifier_, arr_dict)
You will then have to re-think how dictionary_value_type
and dictionary
work below.
[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/array/array_dict.h#L169
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@js8544 to clarify, dictionary_unifier_ = unify(dictionary_unifier_, arr_dict)
means something like unifier->unify(arr_dict)
in the actual code :)
001a846
to
f09ab5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any changes other than the rebase.
Still working on it :) |
@felipecrv Updated as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@js8544 I won't merge my own suggestions before you have a chance to reply and apply them yourself. If you're OK with them and CI passes with them applied I will merge. |
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
macOS build failing for some unknown reason and this has been rebased very recently, so I'm merging. |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit ec41209. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…ays (apache#38394) ### Rationale for this change When merging dictionaries across chunks, the hash kernels unnecessarily unify the existing dictionary, dragging down the performance. ### What changes are included in this PR? Reuse the dictionary unifier across chunks. ### Are these changes tested? Yes, with a new benchmark for dictionary chunked arrays. ### Are there any user-facing changes? No. * Closes: apache#37055 Lead-authored-by: Jin Shang <[email protected]> Co-authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
…ays (apache#38394) ### Rationale for this change When merging dictionaries across chunks, the hash kernels unnecessarily unify the existing dictionary, dragging down the performance. ### What changes are included in this PR? Reuse the dictionary unifier across chunks. ### Are these changes tested? Yes, with a new benchmark for dictionary chunked arrays. ### Are there any user-facing changes? No. * Closes: apache#37055 Lead-authored-by: Jin Shang <[email protected]> Co-authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Rationale for this change
When merging dictionaries across chunks, the hash kernels unnecessarily unify the existing dictionary, dragging down the performance.
What changes are included in this PR?
Reuse the dictionary unifier across chunks.
Are these changes tested?
Yes, with a new benchmark for dictionary chunked arrays.
Are there any user-facing changes?
No.
value_counts
extremely slow for chunkedDictionaryArray
#37055