-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression #41094
Comments
should not be executed? I think a |
Corresponding to the case of a = 0, b / a should not be executed, but in fact, the existing logic has already executed |
We have talked about adding support for selection vectors but never got around to adding support for them: https://github.com/bkietz/arrow/blob/c25866bc59e30e43e8dc3b05f3973d48074c3594/cpp/src/arrow/compute/exec.h#L199-L204 If you're interested in adding support for them, that would be greatly appreciated |
Thanks for your suggestion, we'll take a look. |
The solution should not be specific to if-then-else. Other constructs pose similar challenges (e.g. Velox (a vectorized query engine) calls these constructs "special forms". Lisp uses the same terminology. Although (if (> den 0) (/ num den) 0) Other special forms are
This can't be done properly when They would look like - using Impl = std::variant<Datum, Parameter, Call>;
+ using Impl = std::variant<Datum, Parameter, Call, Special>; Vectorized evaluation of special formsHere I will try to expand on what @bkietz meant by "support for selection vectors". Let's say we want to eval the following special form:
The naive algorithm to evaluate the whole expression would then, be: for (int64_t i = 0; i < cond.length; i++) {
if (GetBit(cond.buffers[1].data, i)) {
out[i] = list_value_length(a, i);
} else {
out[i] = list_value_length(b, i);
}
} this would be very inefficient since we're dealing with columnar data layouts here. The vectorization friendly solution is to make every compute function "maskable"/"selectable". This can be made efficient if auto cond = Evaluate(bool_expr);
auto then_selection_vec = BoolToSelVec(cond);
auto else_selection_vec = BoolToSelVec(not(cond));
auto out = PreallocateResultArray();
call list_value_length::ExecWithSel(then_selection_vec, a, &out);
call list_value_lenght::ExecWithSel(else_selection_vec, b, &out); A "selection vector" is an array of indices. Incremental improvementIt's unrealistic to expect we will specialize all kernel functions to handle the optional selection vector parameter, but we should at least push them all the way down to It's not a trivial project. [1] https://courses.cs.northwestern.edu/325/readings/special-forms.html arrow/cpp/src/arrow/compute/expression.h Line 136 in 0d7fac0
[4] https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc#L275 |
It would be very grateful if you move this forward.
The implementation method you mentioned is more elegant, and it basically does not affect the scheduling of the expression system of other functions. And implement a scheduling framework so that relevant special functions can be Incremental pushed into this framework later. |
Besides, Velox supports filter reordering, so error might be handled in a kleene logic way. I think introducing a special form firstly is ok |
The starting point is making special forms representable in |
Hi @felipecrv , I'm interested in this and doing some POC in my local (thanks for the instructive proposal!). One thing that I'm having trouble with is that, to allow the kernels to become "selection vector aware" incrementally:
We not only [1] arrow/cpp/src/arrow/array/concatenate.h Lines 34 to 35 in 4a2df66
|
You're right. It's not just taking/gathering the values, we have to stitch the output array from many different outputs. And you're right — we don't have a scatter facility — but note that for some array types (e.g. list and binary/string), you can't scatter based on destination indices. You will have to append in order from the arrays resulting from each if/else or case/when branch. This is where the recent additions (list-view and string-view) can shine because you can scatter by index to them in any order as the variable-length data doesn't have to be packed in the same order as the offsets and sizes. |
Aha, seems that dictionary, string/binary view, ree, list-view should have some unified place for them to execute fastly :-) |
It's still preferable that the kernels take selection/mask as an auxiliary parameter. It's only the generic scatter phase that's optimal. REE's and dictionaries can be handled generically for every scalar kernel without problems though. Just pass the values array and re-use the integer buffer in the REE/DICTIONARY array. |
In the implementation, maybe we can first implement gather-scatter? Since Arrow doesn't gurantee a data type supports un-ordered write currently ( which is different from velox ) |
I think both 2 and 3 could potentially benefit from leveraging special attributes of specific data types such as list/string-view, ree and dict, though I'm not exactly sure how. I'm now working on an overall framework, maybe things will become clearer when I get there. I can use some help/comment from you guys then :) |
The leverage is always a function of type+kernel. The first kernels that deserve good specialization are For types that support out-of-order writing (list-view, dict, string-view...) you can scatter incrementally: scatter(branch0, sel0, &output)
scatter(branch1, sel1, &output);
...
scatter(branchn, seln, &output); (this assumes all the selection vectors are disjoint) For types that need in-order appending, you will need all selection vectors and merge them: selections = MinHeapOfSelections{{branch0, sel0, 0}, {branch1, sel1, 0}, ..., {branchn, seln, 0}};
while (!selections.empty()) {
i = min_selection(selections);
output_builder.AppendFrom(selections[i].branch, selections[i].pos);
selections.ExtractMin(/*hint=*/i);
} These branches and selection vectors are described in this comment about evaluation of case-when #41453 (comment)
The details will become clear only when you try to draft a big change for sure, but for the sake of review an interesting first step would be the representation of the |
Sure, I'll send out a draft PR soon sketching the whole idea after making the code more readable. |
PR #42106 drafted to show the main sketch. I've also put some notes in PR description. @felipecrv would you please to take a look? Thanks. |
Describe the bug, including details regarding any error messages, version, and platform.
A bug is found here. When we execute a similar case when a != 0 then b / a, div 0 exception will be reported. But logically, such a statement should be executed.
The reason why the above execution will report an error is because of the existing ExecuteScalarExpression execution system logic, as shown in the following code:
For the above statement, the expression arguments[0] corresponding to the judgment condition a != 0 will be executed first, and then the arguments[1] corresponding to b / a will be executed. Obviously, logically speaking, b / a should not be fully executed, it should be partially executed based on a != 0
Because if-else related expressions have a special execution order, should we extend ExecuteScalarIfElseExpression to handle such expressions separately? The execution order is as follows:
To do this, we obviously need to extend Expression to support if-else related logic and callbacks
laotan332 [email protected]
Component(s)
C++
The text was updated successfully, but these errors were encountered: