[C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression #41094

ZhangHuiGui · 2024-04-09T07:42:58Z

Describe the bug, including details regarding any error messages, version, and platform.

A bug is found here. When we execute a similar case when a != 0 then b / a, div 0 exception will be reported. But logically, such a statement should be executed.
The reason why the above execution will report an error is because of the existing ExecuteScalarExpression execution system logic, as shown in the following code:

for (size_t i = 0; i < arguments.size(); ++i) {
     ARROW_ASSIGN_OR_RAISE(
         arguments[i], ExecuteScalarExpression(call->arguments[i], input, exec_context));
     if (arguments[i].is_array()) {
       all_scalar = false;
     }
   }

For the above statement, the expression arguments[0] corresponding to the judgment condition a != 0 will be executed first, and then the arguments[1] corresponding to b / a will be executed. Obviously, logically speaking, b / a should not be fully executed, it should be partially executed based on a != 0

Because if-else related expressions have a special execution order, should we extend ExecuteScalarIfElseExpression to handle such expressions separately? The execution order is as follows:

First execute the condition corresponding to the if statement
Filter the corresponding data based on the execution result of the if statement
Perform the remaining operations

To do this, we obviously need to extend Expression to support if-else related logic and callbacks

laotan332 [email protected]

Component(s)

C++

ZhangHuiGui · 2024-04-09T07:47:04Z

@kou @bkietz @pitrou @lidavidm Any suggestion on this issue? Seems we need to refactor the if/else (and other similar kernel)'s execute logic to solve the problem and also improve it's performance?

mapleFU · 2024-04-10T04:55:45Z

A bug is found here. When we execute a similar case when a != 0 then b / a, div 0 exception will be reported. But logically, such a statement should be executed.

should not be executed?

I think a Selector would be help here. But I don't know whether it breaks the current design. I'll dive into it later

qinpengxiang001 · 2024-04-10T06:00:06Z

A bug is found here. When we execute a similar case when a != 0 then b / a, div 0 exception will be reported. But logically, such a statement should be executed.

should not be executed?

I think a Selector would be help here. But I don't know whether it breaks the current design. I'll dive into it later

Corresponding to the case of a = 0, b / a should not be executed, but in fact, the existing logic has already executed

bkietz · 2024-04-23T19:54:10Z

We have talked about adding support for selection vectors but never got around to adding support for them: https://github.com/bkietz/arrow/blob/c25866bc59e30e43e8dc3b05f3973d48074c3594/cpp/src/arrow/compute/exec.h#L199-L204

If you're interested in adding support for them, that would be greatly appreciated

ZhangHuiGui · 2024-04-24T13:32:15Z

If you're interested in adding support for them, that would be greatly appreciated

Thanks for your suggestion, we'll take a look.

felipecrv · 2024-04-30T23:33:25Z

Because if-else related expressions have a special execution order, should we extend ExecuteScalarIfElseExpression to handle such expressions separately?

The solution should not be specific to if-then-else. Other constructs pose similar challenges (e.g. CASE WHEN):

Velox (a vectorized query engine) calls these constructs "special forms". Lisp uses the same terminology. Although if and cond look like function calls in Lisp [1][2], they are special in that not all arguments are eagerly evaluated:

(if (> den 0) (/ num den) 0)

Other special forms are and, or, and cond:

(and expr ...)
(or expr ...)

(cond [test-expr body ...+]
      ...)

and is special because after the first false you get, you don't have to evaluate more input expressions. or is special because after the first true you get, you... cond is like CASE .. WHEN in SQL — a chain of if-then-else's.

This can't be done properly when if_then_else, case_when, and, or are implemented as functions like in Arrow currently. We need to create special expression fragments that can receive unevaluated expressions as input parameters.

They would look like compute::Expression::Call that can receive compute::Expression instances as parameters [3].

-  using Impl = std::variant<Datum, Parameter, Call>;
+  using Impl = std::variant<Datum, Parameter, Call, Special>;

Vectorized evaluation of special forms

Here I will try to expand on what @bkietz meant by "support for selection vectors".

Let's say we want to eval the following special form:

if bool_expr then list_value_length(a) else list_value_length(b)

bool_expr is always evaluated. When it's evaluated, a Type::BOOL array is produced.

The naive algorithm to evaluate the whole expression would then, be:

for (int64_t i = 0; i < cond.length; i++) {
  if (GetBit(cond.buffers[1].data, i)) {
    out[i] = list_value_length(a, i);
  } else {
    out[i] = list_value_length(b, i);
  }
}

this would be very inefficient since we're dealing with columnar data layouts here.

The vectorization friendly solution is to make every compute function "maskable"/"selectable". This can be made efficient if list_value_length can receive an extra "filter" parameter that selects the positions that list_value_length should read read from input and write to output:

auto cond = Evaluate(bool_expr);
auto then_selection_vec = BoolToSelVec(cond);
auto else_selection_vec = BoolToSelVec(not(cond));
auto out = PreallocateResultArray();
call list_value_length::ExecWithSel(then_selection_vec, a, &out);
call list_value_lenght::ExecWithSel(else_selection_vec, b, &out);

A "selection vector" is an array of indices. GetTakeIndicesFromBitmap [4] produces one from boolean arrays/bitmaps. By passing the selection vectors to the kernels, you allow each kernel to vectorize as is most appropriate to what it's doing: for instance, list_value_length can gather the offsets of the indices in the selection vector, perform all the subtractions to calculate the sizes, and scatter the values into the value buffer in out.

Incremental improvement

It's unrealistic to expect we will specialize all kernel functions to handle the optional selection vector parameter, but we should at least push them all the way down to Function::Execute. If a Function doesn't handle selection vectors, then a call is dispatched for all the values (as it is today) and code similar to Take is used to gather the values that the selection vector selects. The more functions become aware of selection vectors, the less we have to rely on this slow code.

It's not a trivial project.

[1] https://courses.cs.northwestern.edu/325/readings/special-forms.html
[2] https://docs.racket-lang.org/guide/conditionals.html
[3]

arrow/cpp/src/arrow/compute/expression.h

Line 136 in 0d7fac0

using Impl = std::variant<Datum, Parameter, Call>;

[4] https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc#L275

ZhangHuiGui · 2024-05-01T05:31:37Z

It would be very grateful if you move this forward.

This can't be done properly when if_then_else, case_when, and, or are implemented as functions like in Arrow currently. We need to create special expression fragments that can receive unevaluated expressions as input parameters.

They would look like compute::Expression::Call that can receive compute::Expression instances as parameters [3].

The implementation method you mentioned is more elegant, and it basically does not affect the scheduling of the expression system of other functions.
We could focus on the special cases in expression logic.

And implement a scheduling framework so that relevant special functions can be Incremental pushed into this framework later.

mapleFU · 2024-05-01T08:50:32Z

Besides, Velox supports filter reordering, so error might be handled in a kleene logic way. I think introducing a special form firstly is ok

felipecrv · 2024-05-01T15:00:29Z

The implementation method you mentioned is more elegant, and it basically does not affect the scheduling of the expression system of other functions.
We could focus on the special cases in expression logic.

The starting point is making special forms representable in Expression::Impl, then adding placeholder implementations that don't prevent the full vectorization.

zanmato1984 · 2024-05-29T14:38:04Z

Hi @felipecrv , I'm interested in this and doing some POC in my local (thanks for the instructive proposal!). One thing that I'm having trouble with is that, to allow the kernels to become "selection vector aware" incrementally:

Incremental improvement

It's unrealistic to expect we will specialize all kernel functions to handle the optional selection vector parameter, but we should at least push them all the way down to Function::Execute. If a Function doesn't handle selection vectors, then a call is dispatched for all the values (as it is today) and code similar to Take is used to gather the values that the selection vector selects. The more functions become aware of selection vectors, the less we have to rely on this slow code.

We not only Take the selected rows to pass to a kernel that has not yet been selection vector aware, but also "scatter" the "partial" output by putting the rows back to the corresponding position to the original input rows. However I couldn't find any existing facility for this scatter operation as handy as Take/Filter for gather. Do you know any? Or do we need a Scatter function, or something more light-weight like Concatenate [1], to be implemented first? Thanks.

[1]

arrow/cpp/src/arrow/array/concatenate.h

Lines 34 to 35 in 4a2df66

    
           Result<std::shared_ptr<Array>> Concatenate(const ArrayVector& arrays, 
        
                                                      MemoryPool* pool = default_memory_pool());

felipecrv · 2024-06-01T01:09:29Z

You're right. It's not just taking/gathering the values, we have to stitch the output array from many different outputs. And you're right — we don't have a scatter facility — but note that for some array types (e.g. list and binary/string), you can't scatter based on destination indices. You will have to append in order from the arrays resulting from each if/else or case/when branch.

This is where the recent additions (list-view and string-view) can shine because you can scatter by index to them in any order as the variable-length data doesn't have to be packed in the same order as the offsets and sizes.

mapleFU · 2024-06-02T09:14:20Z

This is where the recent additions (list-view and string-view) can shine because you can scatter by index to them in any order as the variable-length data doesn't have to be packed in the same order as the offsets and sizes.

Aha, seems that dictionary, string/binary view, ree, list-view should have some unified place for them to execute fastly :-)

felipecrv · 2024-06-02T21:23:01Z

This is where the recent additions (list-view and string-view) can shine because you can scatter by index to them in any order as the variable-length data doesn't have to be packed in the same order as the offsets and sizes.

Aha, seems that dictionary, string/binary view, ree, list-view should have some unified place for them to execute fastly :-)

It's still preferable that the kernels take selection/mask as an auxiliary parameter. It's only the generic scatter phase that's optimal.

REE's and dictionaries can be handled generically for every scalar kernel without problems though. Just pass the values array and re-use the integer buffer in the REE/DICTIONARY array.

mapleFU · 2024-06-03T02:49:34Z

It's still preferable that the kernels take selection/mask as an auxiliary parameter. It's only the generic scatter phase that's optimal.

In the implementation, maybe we can first implement gather-scatter? Since Arrow doesn't gurantee a data type supports un-ordered write currently ( which is different from velox )

zanmato1984 · 2024-06-03T04:11:50Z

In the implementation, maybe we can first implement gather-scatter? Since Arrow doesn't gurantee a data type supports un-ordered write currently ( which is different from velox )

According to @felipecrv , we don't have an existing "scatter" function, so yes we'll have to implement that first.
When this "scatter" function is available, we can build a "naive" evaluation of special form on top of it (as well as "gather") - by doing a centralized "gather the input by the selection vector, pass the selected rows to the dumb kernel, and scatter the kernel's output back to the actual output". This requires NO kernel's awareness of selection vector, making available of an incremental approach of 3.
Gradually make kernel support selection vector as an optional parameter, i.e., selection-vector-aware. Once all done, we don't need 2 any more.

I think both 2 and 3 could potentially benefit from leveraging special attributes of specific data types such as list/string-view, ree and dict, though I'm not exactly sure how. I'm now working on an overall framework, maybe things will become clearer when I get there. I can use some help/comment from you guys then :)

felipecrv · 2024-06-05T19:19:20Z

I think both 2 and 3 could potentially benefit from leveraging special attributes of specific data types such as list/string-view, ree and dict, though I'm not exactly sure how.

The leverage is always a function of type+kernel. The first kernels that deserve good specialization are array_take (the "gather") and the scatter.

For types that support out-of-order writing (list-view, dict, string-view...) you can scatter incrementally:

scatter(branch0, sel0, &output)
scatter(branch1, sel1, &output);
...
scatter(branchn, seln, &output);

(this assumes all the selection vectors are disjoint)

For types that need in-order appending, you will need all selection vectors and merge them:

selections = MinHeapOfSelections{{branch0, sel0, 0}, {branch1, sel1, 0}, ..., {branchn, seln, 0}};
while (!selections.empty()) {
  i = min_selection(selections);
  output_builder.AppendFrom(selections[i].branch, selections[i].pos);
  selections.ExtractMin(/*hint=*/i);
}

These branches and selection vectors are described in this comment about evaluation of case-when #41453 (comment)

I'm now working on an overall framework, maybe things will become clearer when I get there. I can use some help/comment from you guys then :)

The details will become clear only when you try to draft a big change for sure, but for the sake of review an interesting first step would be the representation of the cond special form in compute::Expression with strict/eager evaluation that you later replace.

zanmato1984 · 2024-06-11T11:29:16Z

The details will become clear only when you try to draft a big change for sure, but for the sake of review an interesting first step would be the representation of the cond special form in compute::Expression with strict/eager evaluation that you later replace.

Sure, I'll send out a draft PR soon sketching the whole idea after making the code more readable.

zanmato1984 · 2024-06-11T18:34:20Z

PR #42106 drafted to show the main sketch. I've also put some notes in PR description. @felipecrv would you please to take a look? Thanks.

ZhangHuiGui added the Type: bug label Apr 9, 2024

github-actions bot added the Component: C++ label Apr 9, 2024

felipecrv mentioned this issue Jun 5, 2024

[C++] A more efficient "case_when" specialization for list-view types #41453

Open

github-actions bot mentioned this issue Jun 11, 2024

GH-41094: [C++] Support scalar expression special form #42106

Draft

github-actions bot assigned zanmato1984 Jun 11, 2024

This was referenced Oct 13, 2024

[C++][Compute] "Scatter" vector functions #44393

Open

GH-44393: [C++][Compute] Swizzle vector functions #44394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression #41094

[C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression #41094

ZhangHuiGui commented Apr 9, 2024 •

edited

Loading

ZhangHuiGui commented Apr 9, 2024 •

edited

Loading

mapleFU commented Apr 10, 2024

qinpengxiang001 commented Apr 10, 2024

bkietz commented Apr 23, 2024

ZhangHuiGui commented Apr 24, 2024

felipecrv commented Apr 30, 2024 •

edited

Loading

ZhangHuiGui commented May 1, 2024

mapleFU commented May 1, 2024

felipecrv commented May 1, 2024

zanmato1984 commented May 29, 2024

Incremental improvement

felipecrv commented Jun 1, 2024 •

edited

Loading

mapleFU commented Jun 2, 2024

felipecrv commented Jun 2, 2024 •

edited

Loading

mapleFU commented Jun 3, 2024

zanmato1984 commented Jun 3, 2024

felipecrv commented Jun 5, 2024

zanmato1984 commented Jun 11, 2024

zanmato1984 commented Jun 11, 2024

[C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression #41094

[C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression #41094

Comments

ZhangHuiGui commented Apr 9, 2024 • edited Loading

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

ZhangHuiGui commented Apr 9, 2024 • edited Loading

mapleFU commented Apr 10, 2024

qinpengxiang001 commented Apr 10, 2024

bkietz commented Apr 23, 2024

ZhangHuiGui commented Apr 24, 2024

felipecrv commented Apr 30, 2024 • edited Loading

Vectorized evaluation of special forms

Incremental improvement

ZhangHuiGui commented May 1, 2024

mapleFU commented May 1, 2024

felipecrv commented May 1, 2024

zanmato1984 commented May 29, 2024

Incremental improvement

felipecrv commented Jun 1, 2024 • edited Loading

mapleFU commented Jun 2, 2024

felipecrv commented Jun 2, 2024 • edited Loading

mapleFU commented Jun 3, 2024

zanmato1984 commented Jun 3, 2024

felipecrv commented Jun 5, 2024

zanmato1984 commented Jun 11, 2024

zanmato1984 commented Jun 11, 2024

ZhangHuiGui commented Apr 9, 2024 •

edited

Loading

ZhangHuiGui commented Apr 9, 2024 •

edited

Loading

felipecrv commented Apr 30, 2024 •

edited

Loading

felipecrv commented Jun 1, 2024 •

edited

Loading

felipecrv commented Jun 2, 2024 •

edited

Loading