-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39565: [C++] Do not concatenate chunked values of fixed-width types to run "array_take" #41700
base: main
Are you sure you want to change the base?
Conversation
cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc
Outdated
Show resolved
Hide resolved
May I ask a unrelated question, when would we call assert and when call DCHECK, since I think they would likely to be same? |
We call assert in headers because we don't want to pay the cost of including |
fbd97a3
to
f4b4e12
Compare
… make them private (#42127) ### Rationale for this change Move TakeXXX free functions into `TakeMetaFunction` and make them private ### What changes are included in this PR? Code move and some small refactorings in preparation for #41700. ### Are these changes tested? By existing tests. * GitHub Issue: #42126 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
f4b4e12
to
df7de46
Compare
@@ -60,6 +60,7 @@ void RegisterSelectionFunction(const std::string& name, FunctionDoc doc, | |||
{std::move(kernel_data.value_type), std::move(kernel_data.selection_type)}, | |||
OutputType(FirstType)); | |||
base_kernel.exec = kernel_data.exec; | |||
base_kernel.exec_chunked = kernel_data.chunked_exec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member variable is called exec_chunked
but the type is called ChunkedExec
(so confusing). In this PR I ended up sticking to chunked_exec
. Once everything is reviewed and merged I could try to unify things to the direction people prefer.
467f0f8
to
d92da9f
Compare
28da5e6
to
2ff6789
Compare
cbf1ddb
to
72101ab
Compare
@pitrou wouldn't it make sense to keep the responsibility for concatenation to a layer above the kernels? Like a query optimizer? They are in a better position to make memory/time trade-offs than the context-less kernel. The worst regression (-81%) has the kernel still at 4G items/sec.
I find it very inelegant to put these heuristics at the compute kernel level. Imagine a pipeline trying to save on memory allocations by keeping the array chunked as much as possible and then a simple filter operation requires allocating enough memory to keep it all in memory. Another case would be a pipeline where the caller is consolidating a big contiguous array for more operations than just |
Ideally perhaps. In practice this assumes that 1) there is a query optimizer 2) it has enough information about implementation details to make an informed decision. In practice, arrow/python/pyarrow/table.pxi Lines 1139 to 1143 in e62fbaa
I might be misreading, but this is the worst regression on the new benchmarks, right (those with a small selection factor)? On the benchmarks with a 1.0 selection factor (such as when sorting), the worst absolute results are around 25 Mitems/sec AFAICT. Or are those results obsolete?
Well, currently "take" would always concatenate array chunks, so at least there is no regression in that regard. Still, I understand the concern. We might want to expose an additional option to entirely disable concatenation when possible. But that might be overkill as well. |
We will ensure "array_take" returns a ChunkedArray if at least one input is chunked, just like "take" does. Even when the output fits in a single chunk.
…::exec_chunked Before this commit, only the "take" meta function could handle CA parameters.
This is not a time-saver yet because in TakeCC kernels, every call to TakeCA will create a new ValuesSpan instance, but this will change in the next commits.
72101ab
to
018320d
Compare
@pitrou what conditional checks should I add here to avoid regressions? I'm giving up on making the non-concatenation versions work well for integer arrays and want to merge this PR sooner rather than later and then start working on the string array implementation which is what will unlock most user value in the first place. |
By building on this (arguably simplified) analysis:
and assuming the following known values:
Then a simple heuristic could be to concatenate iff (btw, a moderate improvement could probably be achieved by using |
Rationale for this change
Concatenating a chunked array into a single array before running the
array_take
kernels is very inefficient and can lead to out-of-memory crashes. See also #25822.What changes are included in this PR?
"array_take"
that can receive aChunkedArray
asvalues
and produce an output without concatenating these chunksTakeMetaFunction
("take"
) to make"array_take"
able to have achunked_exec
kernel for all types (some specialized and some based on concatenation)Are these changes tested?
By existing tests. Some tests were added in previous PRs that introduced some of the infrastructure to support this.