Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType #40308

Closed
ZhangHuiGui opened this issue Mar 1, 2024 · 8 comments
Closed

Comments

@ZhangHuiGui
Copy link
Collaborator

ZhangHuiGui commented Mar 1, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Below codes calculate the decimal type's precision and scale.

result_scale = std::max(kMinAdjustedScale, s1 + p2 + 1);

It seems different from our Redshift’s decimal promotion rules, is there any relevant background here?

scale = max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale

It may cause the decimal divide results different from normal expression calculate.

Component(s)

C++ - Gandiva

@kou
Copy link
Member

kou commented Mar 3, 2024

@pravindra Could you take a look at this?

@ZhangHuiGui
Copy link
Collaborator Author

ZhangHuiGui commented Mar 9, 2024

I think it's a problem and we should fix it.
Below codes could prove this problem:

// Normal expression
TEST(DecimalTest, Divide) {
  auto expr = arrow::compute::call("divide", {arrow::compute::field_ref("f1"),
                                              arrow::compute::field_ref("f2")});
  auto s = arrow::schema({arrow::field("f1", arrow::decimal128(11, 3)),
                          arrow::field("f2", arrow::decimal128(20, 9))});
  ASSERT_OK_AND_ASSIGN(expr, expr.Bind(*s));
  arrow::AssertTypeEqual(*expr.type(), *arrow::decimal128(32, 15));

  auto b = arrow::RecordBatchFromJSON(s, R"([
    ["1.982", "1.000000001"]
   ])");
  ASSERT_OK_AND_ASSIGN(auto input, arrow::compute::MakeExecBatch(*s, b));
  ASSERT_OK_AND_ASSIGN(auto res,
                       arrow::compute::ExecuteScalarExpression(expr, input));
  arrow::AssertArraysEqual(
      *res.make_array(),
      *arrow::ArrayFromJSON(arrow::decimal128(32, 15), R"(["1.981999998018000"])"));
}

// gandiva test
TEST_F(TestDecimalOps, TestDivide) {
  DivideAndVerify(decimal_literal("1982", 11, 3),
                  decimal_literal("1000000001", 20, 9),
                  decimal_literal("1981999998018000001982", 38, 21));
}

We will have different precision and scale for the same data.

@kou kou changed the title [C++][Gandiva] A question about gandiva's GetResultType for DecimalType [C++][Gandiva] A question about Gandiva's GetResultType for DecimalType Mar 12, 2024
@kou
Copy link
Member

kou commented Mar 14, 2024

Below codes could prove this problem:

I'm not sure what is the important part in the codes. Could you explain it?

our Redshift’s decimal promotion rules

Could you show the URL that describes the rule?

is there any relevant background here?

Based on the kMinAdjustedScale comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:

// When operating on decimal inputs, the integer part of the output can exceed the
// max precision. In such cases, the scale can be reduced, up to a minimum of
// kMinAdjustedScale.
// * There is no strong reason for 6, but both SQLServer and Impala use 6 too.
static constexpr int32_t kMinAdjustedScale = 6;

@ZhangHuiGui
Copy link
Collaborator Author

ZhangHuiGui commented Mar 14, 2024

I'm not sure what is the important part in the codes. Could you explain it?

The test codes are the decimal divide operation in compute expression and gandiva expression.
We input the save decimal type (11, 3), (20, 9) with same input arr-data, but the results are different.

For compute expression, divide result's decimal type is (32,15); but gandiva expression's result is (38, 21).

Could you show the URL that describes the rule?

Below rules is our compute expression's rules:

// https://docs.aws.amazon.com/redshift/latest/dg/r_numeric_computations201.html

And the compute doc:

* \(1) Precision and scale of computed DECIMAL results
+------------+---------------------------------------------+
| Operation | Result precision and scale |
+============+=============================================+
| | add | | scale = max(s1, s2) |
| | subtract | | precision = max(p1-s1, p2-s2) + 1 + scale |
+------------+---------------------------------------------+
| multiply | | scale = s1 + s2 |
| | | precision = p1 + p2 + 1 |
+------------+---------------------------------------------+
| divide | | scale = max(4, s1 + p2 - s2 + 1) |
| | | precision = p1 - s1 + s2 + scale |
+------------+---------------------------------------------+
It's compatible with Redshift's decimal promotion rules. All decimal digits
are preserved for `add`, `subtract` and `multiply` operations. The result
precision of `divide` is at least the sum of precisions of both operands with
enough scale kept. Error is returned if the result precision is beyond the
decimal value range.
.

Based on the kMinAdjustedScale comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:

Yes, but maybe we should unify these rules between compute expression and gandiva expression. Or the results will be different when user do optimize expression with gandiva?

@kou
Copy link
Member

kou commented Mar 14, 2024

Ah, I understand. You referred our compute module's behavior by "our Redshift’s decimal promotion rules". I thought that you refer Redshift's rules not our compute module's one.

I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?

@ZhangHuiGui
Copy link
Collaborator Author

I thought that you refer Redshift's rules not our compute module's one.

Actually our compute module's decimal rules is the same with Redshift's rules. Compute module implement the rules by two step:

  1. Implicit cast. The logic is in CastBinaryDecimalArgs.
  2. Output resolve. The logic is in ResolveDecimalDivisionOutput.

All these two steps are in expression Bind, and form the rules same with Redshift's numeric rules.

I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?

Agree.

@ZhangHuiGui
Copy link
Collaborator Author

@kou Added the compatibility in the pr. PTAL?

kou pushed a commit that referenced this issue Mar 25, 2024
…motion rules (#40434)

### Rationale for this change

Gandiva decimal divide rules are different with our compute module's rules. Some systems such as Redshift use the same rules as our compute module's rules. So it's useful that Gandiva support our compute module's rules too. 

### What changes are included in this PR?
Support an option argument in GetResultType for compatibilty with  **compute module's decimal promotion rules**.

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

* GitHub Issue: #40308

Authored-by: ZhangHuiGui <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@kou
Copy link
Member

kou commented Mar 25, 2024

Issue resolved by pull request 40434
#40434

@kou kou added this to the 16.0.0 milestone Mar 25, 2024
@kou kou closed this as completed Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants