[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType #40308

ZhangHuiGui · 2024-03-01T14:14:46Z

Describe the bug, including details regarding any error messages, version, and platform.

Below codes calculate the decimal type's precision and scale.

arrow/cpp/src/gandiva/decimal_type_util.cc

Line 62 in 30e6d72

result_scale = std::max(kMinAdjustedScale, s1 + p2 + 1);

It seems different from our Redshift’s decimal promotion rules, is there any relevant background here?

scale = max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale

It may cause the decimal divide results different from normal expression calculate.

Component(s)

C++ - Gandiva

kou · 2024-03-03T08:17:16Z

@pravindra Could you take a look at this?

ZhangHuiGui · 2024-03-09T09:49:45Z

I think it's a problem and we should fix it.
Below codes could prove this problem:

// Normal expression
TEST(DecimalTest, Divide) {
  auto expr = arrow::compute::call("divide", {arrow::compute::field_ref("f1"),
                                              arrow::compute::field_ref("f2")});
  auto s = arrow::schema({arrow::field("f1", arrow::decimal128(11, 3)),
                          arrow::field("f2", arrow::decimal128(20, 9))});
  ASSERT_OK_AND_ASSIGN(expr, expr.Bind(*s));
  arrow::AssertTypeEqual(*expr.type(), *arrow::decimal128(32, 15));

  auto b = arrow::RecordBatchFromJSON(s, R"([
    ["1.982", "1.000000001"]
   ])");
  ASSERT_OK_AND_ASSIGN(auto input, arrow::compute::MakeExecBatch(*s, b));
  ASSERT_OK_AND_ASSIGN(auto res,
                       arrow::compute::ExecuteScalarExpression(expr, input));
  arrow::AssertArraysEqual(
      *res.make_array(),
      *arrow::ArrayFromJSON(arrow::decimal128(32, 15), R"(["1.981999998018000"])"));
}

// gandiva test
TEST_F(TestDecimalOps, TestDivide) {
  DivideAndVerify(decimal_literal("1982", 11, 3),
                  decimal_literal("1000000001", 20, 9),
                  decimal_literal("1981999998018000001982", 38, 21));
}

We will have different precision and scale for the same data.

kou · 2024-03-14T03:03:05Z

Below codes could prove this problem:

I'm not sure what is the important part in the codes. Could you explain it?

our Redshift’s decimal promotion rules

Could you show the URL that describes the rule?

is there any relevant background here?

Based on the kMinAdjustedScale comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:

arrow/cpp/src/gandiva/decimal_type_util.h

Lines 54 to 58 in 6b1e254

    
           // When operating on decimal inputs, the integer part of the output can exceed the 
        
           // max precision. In such cases, the scale can be reduced, up to a minimum of 
        
           // kMinAdjustedScale. 
        
           // * There is no strong reason for 6, but both SQLServer and Impala use 6 too. 
        
           static constexpr int32_t kMinAdjustedScale = 6;

ZhangHuiGui · 2024-03-14T03:25:03Z

I'm not sure what is the important part in the codes. Could you explain it?

The test codes are the decimal divide operation in compute expression and gandiva expression.
We input the save decimal type (11, 3), (20, 9) with same input arr-data, but the results are different.

For compute expression, divide result's decimal type is (32,15); but gandiva expression's result is (38, 21).

Could you show the URL that describes the rule?

Below rules is our compute expression's rules:

arrow/cpp/src/arrow/compute/kernels/codegen_internal.cc

Line 421 in 9f0a28f

    
           // https://docs.aws.amazon.com/redshift/latest/dg/r_numeric_computations201.html

And the compute doc:

arrow/docs/source/cpp/compute.rst

Lines 501 to 520 in 4fe364e

    
           * \(1) Precision and scale of computed DECIMAL results 
        
             +------------+---------------------------------------------+ 
        
             | Operation  | Result precision and scale                  | 
        
             +============+=============================================+ 
        
             | | add      | | scale = max(s1, s2)                       | 
        
             | | subtract | | precision = max(p1-s1, p2-s2) + 1 + scale | 
        
             +------------+---------------------------------------------+ 
        
             | multiply   | | scale = s1 + s2                           | 
        
             |            | | precision = p1 + p2 + 1                   | 
        
             +------------+---------------------------------------------+ 
        
             | divide     | | scale = max(4, s1 + p2 - s2 + 1)          | 
        
             |            | | precision = p1 - s1 + s2 + scale          | 
        
             +------------+---------------------------------------------+ 
        
             It's compatible with Redshift's decimal promotion rules. All decimal digits 
        
             are preserved for `add`, `subtract` and `multiply` operations. The result 
        
             precision of `divide` is at least the sum of precisions of both operands with 
        
             enough scale kept. Error is returned if the result precision is beyond the 
        
             decimal value range.

.

Based on the kMinAdjustedScale comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:

Yes, but maybe we should unify these rules between compute expression and gandiva expression. Or the results will be different when user do optimize expression with gandiva?

kou · 2024-03-14T04:58:42Z

Ah, I understand. You referred our compute module's behavior by "our Redshift’s decimal promotion rules". I thought that you refer Redshift's rules not our compute module's one.

I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?

ZhangHuiGui · 2024-03-14T06:01:25Z

I thought that you refer Redshift's rules not our compute module's one.

Actually our compute module's decimal rules is the same with Redshift's rules. Compute module implement the rules by two step:

Implicit cast. The logic is in CastBinaryDecimalArgs.
Output resolve. The logic is in ResolveDecimalDivisionOutput.

All these two steps are in expression Bind, and form the rules same with Redshift's numeric rules.

I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?

Agree.

ZhangHuiGui · 2024-03-14T14:57:19Z

@kou Added the compatibility in the pr. PTAL?

…motion rules (#40434) ### Rationale for this change Gandiva decimal divide rules are different with our compute module's rules. Some systems such as Redshift use the same rules as our compute module's rules. So it's useful that Gandiva support our compute module's rules too. ### What changes are included in this PR? Support an option argument in GetResultType for compatibilty with **compute module's decimal promotion rules**. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #40308 Authored-by: ZhangHuiGui <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>

kou · 2024-03-25T05:02:30Z

Issue resolved by pull request 40434
#40434

ZhangHuiGui added the Type: bug label Mar 1, 2024

github-actions bot added the Component: Gandiva label Mar 1, 2024

github-actions bot mentioned this issue Mar 9, 2024

GH-40308: [C++][Gandiva] Add support for compute module's decimal promotion rules #40434

Merged

github-actions bot assigned ZhangHuiGui Mar 9, 2024

kou changed the title ~~[C++][Gandiva] A question about gandiva's GetResultType for DecimalType~~ [C++][Gandiva] A question about Gandiva's GetResultType for DecimalType Mar 12, 2024

kou added this to the 16.0.0 milestone Mar 25, 2024

kou closed this as completed Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType #40308

[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType #40308

ZhangHuiGui commented Mar 1, 2024 •

edited by kou

Loading

kou commented Mar 3, 2024

ZhangHuiGui commented Mar 9, 2024 •

edited

Loading

kou commented Mar 14, 2024

ZhangHuiGui commented Mar 14, 2024 •

edited

Loading

kou commented Mar 14, 2024

ZhangHuiGui commented Mar 14, 2024

ZhangHuiGui commented Mar 14, 2024

kou commented Mar 25, 2024

[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType #40308

[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType #40308

Comments

ZhangHuiGui commented Mar 1, 2024 • edited by kou Loading

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

kou commented Mar 3, 2024

ZhangHuiGui commented Mar 9, 2024 • edited Loading

kou commented Mar 14, 2024

ZhangHuiGui commented Mar 14, 2024 • edited Loading

kou commented Mar 14, 2024

ZhangHuiGui commented Mar 14, 2024

ZhangHuiGui commented Mar 14, 2024

kou commented Mar 25, 2024

ZhangHuiGui commented Mar 1, 2024 •

edited by kou

Loading

ZhangHuiGui commented Mar 9, 2024 •

edited

Loading

ZhangHuiGui commented Mar 14, 2024 •

edited

Loading