-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15557][SQL] cast the string into DoubleType when it's used together with decimal #13368
Conversation
a.makeCopy(Array(Cast(left, DecimalType.SYSTEM_DEFAULT), right)) | ||
case a @ BinaryArithmetic(left @ DecimalType.Expression(_, _), right @ StringType()) => | ||
a.makeCopy(Array(left, Cast(right, DecimalType.SYSTEM_DEFAULT))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @dilipbiswal .
IMHO, the root cause seems to be decimal multiplication between decimal(38,18)
s.
scala> sql("select cast(10 as decimal(38,18)) * cast(10 as decimal(38,18))").head
res0: org.apache.spark.sql.Row = [null]
How do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @dongjoon-hyun
Thanks !! You are right. We are multiplying two decimals with SYSTEM_DEFAULT. I looked at hive, impala code and it seems like the rule to multiply two decimals d1, d2 are pretty standard. The resulting decimal gets a precision of p1+p2+1 and scale of s1 + s2.
So i looked at how hive promotes a string in an expression involving decimal and they use double and so does mysql. I also checked db2's behaviour. In db2, they have a floating point decimal type DECFLOAT and they promote the string to a DECFLOAT in this case.
Please let me know your thoughts..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dongjoon-hyun
Some more data points for us to decide. I did the following in hive to verify decimal multiplication semantics.
hive> create table t1(c1 decimal(38, 18), c2 decimal(38, 18));
OK
hive> insert into t1 values (123456,123, 1234.123);
OK
hive> create table t2 as select c1 * c2 from t1;
OK
hive> describe t2;
OK
_c0 decimal(38,36)
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> select * from t2;
OK
NULL
For decimal literal case , hive seems to do better than us. We can look at it as a followup.
Test build #59537 has finished for PR 13368 at commit
|
In MySQL
It's hard to pick the reasonable precision and scale for a string, so DoubleType sounds better. LGTM |
@dilipbiswal Could you update the title of this PR to say what's the actual change in this PR (something like `cast the string into DoubleType when it's used together with decimal")? |
@davies Hi Davies, Thank you very much for your review. I have updated the PR title as per your suggestion. |
Merging this into master and 2.0, thanks! |
…gether with decimal In this case, the result type of the expression becomes DECIMAL(38, 36) as we promote the individual string literals to DECIMAL(38, 18) when we handle string promotions for `BinaryArthmaticExpression`. I think we need to cast the string literals to Double type instead. I looked at the history and found that this was changed to use decimal instead of double to avoid potential loss of precision when we cast decimal to double. To double check i ran the query against hive, mysql. This query returns non NULL result for both the databases and both promote the expression to use double. Here is the output. - Hive ```SQL hive> create table l2 as select (cast(99 as decimal(19,6)) + '2') from l1; OK hive> describe l2; OK _c0 double ``` - MySQL ```SQL mysql> create table foo2 as select (cast(99 as decimal(19,6)) + '2') from test; Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> describe foo2; +-----------------------------------+--------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------------------------+--------+------+-----+---------+-------+ | (cast(99 as decimal(19,6)) + '2') | double | NO | | 0 | | +-----------------------------------+--------+------+-----+---------+-------+ ``` ## How was this patch tested? Added a new test in SQLQuerySuite Author: Dilip Biswal <[email protected]> Closes #13368 from dilipbiswal/spark-15557. (cherry picked from commit dfe2cbe) Signed-off-by: Davies Liu <[email protected]>
### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
In this case, the result type of the expression becomes DECIMAL(38, 36) as we promote the individual string literals to DECIMAL(38, 18) when we handle string promotions for
BinaryArthmaticExpression
.I think we need to cast the string literals to Double type instead. I looked at the history and found that this was changed to use decimal instead of double to avoid potential loss of precision when we cast decimal to double.
To double check i ran the query against hive, mysql. This query returns non NULL result for both the databases and both promote the expression to use double.
Here is the output.
How was this patch tested?
Added a new test in SQLQuerySuite