Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15557][SQL] cast the string into DoubleType when it's used together with decimal #13368

Closed
wants to merge 1 commit into from

Conversation

dilipbiswal
Copy link
Contributor

In this case, the result type of the expression becomes DECIMAL(38, 36) as we promote the individual string literals to DECIMAL(38, 18) when we handle string promotions for BinaryArthmaticExpression.

I think we need to cast the string literals to Double type instead. I looked at the history and found that this was changed to use decimal instead of double to avoid potential loss of precision when we cast decimal to double.

To double check i ran the query against hive, mysql. This query returns non NULL result for both the databases and both promote the expression to use double.
Here is the output.

  • Hive
hive> create table l2 as select (cast(99 as decimal(19,6)) + '2') from l1;
OK
hive> describe l2;
OK
_c0                     double 
  • MySQL
mysql> create table foo2 as select (cast(99 as decimal(19,6)) + '2') from test;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> describe foo2;
+-----------------------------------+--------+------+-----+---------+-------+
| Field                             | Type   | Null | Key | Default | Extra |
+-----------------------------------+--------+------+-----+---------+-------+
| (cast(99 as decimal(19,6)) + '2') | double | NO   |     | 0       |       |
+-----------------------------------+--------+------+-----+---------+-------+

How was this patch tested?

Added a new test in SQLQuerySuite

@dilipbiswal dilipbiswal changed the title SPARK-15557] expression ((cast(99 as decimal) + '3') * '2.3' ) return NULL [SPARK-15557][SQL] expressi[on ((cast(99 as decimal) + '3') * '2.3' ) return NULL May 27, 2016
a.makeCopy(Array(Cast(left, DecimalType.SYSTEM_DEFAULT), right))
case a @ BinaryArithmetic(left @ DecimalType.Expression(_, _), right @ StringType()) =>
a.makeCopy(Array(left, Cast(right, DecimalType.SYSTEM_DEFAULT)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @dilipbiswal .
IMHO, the root cause seems to be decimal multiplication between decimal(38,18)s.

scala> sql("select cast(10 as decimal(38,18)) * cast(10 as decimal(38,18))").head
res0: org.apache.spark.sql.Row = [null]

How do you think about this?

Copy link
Contributor Author

@dilipbiswal dilipbiswal May 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @dongjoon-hyun
Thanks !! You are right. We are multiplying two decimals with SYSTEM_DEFAULT. I looked at hive, impala code and it seems like the rule to multiply two decimals d1, d2 are pretty standard. The resulting decimal gets a precision of p1+p2+1 and scale of s1 + s2.

So i looked at how hive promotes a string in an expression involving decimal and they use double and so does mysql. I also checked db2's behaviour. In db2, they have a floating point decimal type DECFLOAT and they promote the string to a DECFLOAT in this case.

Please let me know your thoughts..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dongjoon-hyun
Some more data points for us to decide. I did the following in hive to verify decimal multiplication semantics.

hive> create table t1(c1 decimal(38, 18), c2 decimal(38, 18));
OK
hive> insert into t1 values (123456,123, 1234.123);
OK
hive> create table t2 as select c1 * c2 from t1;
OK
hive> describe t2;
OK
_c0                     decimal(38,36)                              
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive> select * from t2;
OK
NULL

For decimal literal case , hive seems to do better than us. We can look at it as a followup.

@SparkQA
Copy link

SparkQA commented May 28, 2016

Test build #59537 has finished for PR 13368 at commit 46f3495.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cc @yhuai @davies , do you still remember why we promote string to decimal(38, 18) instead of double?

@davies
Copy link
Contributor

davies commented May 31, 2016

In MySQL

mysql> select cast(99 as decimal(19, 6)) + '3.000000000000001';
+--------------------------------------------------+
| cast(99 as decimal(19, 6)) + '3.000000000000001' |
+--------------------------------------------------+
|                                              102 |
+--------------------------------------------------+
1 row in set (0.00 sec)

mysql> select cast(99 as decimal(19, 6)) + 3.000000000000001;
+------------------------------------------------+
| cast(99 as decimal(19, 6)) + 3.000000000000001 |
+------------------------------------------------+
|                            102.000000000000001 |
+------------------------------------------------+
1 row in set (0.00 sec)

It's hard to pick the reasonable precision and scale for a string, so DoubleType sounds better.

LGTM

@davies
Copy link
Contributor

davies commented May 31, 2016

@dilipbiswal Could you update the title of this PR to say what's the actual change in this PR (something like `cast the string into DoubleType when it's used together with decimal")?

@dilipbiswal dilipbiswal changed the title [SPARK-15557][SQL] expressi[on ((cast(99 as decimal) + '3') * '2.3' ) return NULL [SPARK-15557][SQL] cast the string into DoubleType when it's used together with decimal May 31, 2016
@dilipbiswal
Copy link
Contributor Author

@davies Hi Davies, Thank you very much for your review. I have updated the PR title as per your suggestion.

@davies
Copy link
Contributor

davies commented May 31, 2016

Merging this into master and 2.0, thanks!

asfgit pushed a commit that referenced this pull request May 31, 2016
…gether with decimal

In this case, the result type of the expression becomes DECIMAL(38, 36) as we promote the individual string literals to DECIMAL(38, 18) when we handle string promotions for `BinaryArthmaticExpression`.

I think we need to cast the string literals to Double type instead. I looked at the history and found that  this was changed to use decimal instead of double to avoid potential loss of precision when we cast decimal to double.

To double check i ran the query against hive, mysql. This query returns non NULL result for both the databases and both promote the expression to use double.
Here is the output.

- Hive
```SQL
hive> create table l2 as select (cast(99 as decimal(19,6)) + '2') from l1;
OK
hive> describe l2;
OK
_c0                 	double
```
- MySQL
```SQL
mysql> create table foo2 as select (cast(99 as decimal(19,6)) + '2') from test;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> describe foo2;
+-----------------------------------+--------+------+-----+---------+-------+
| Field                             | Type   | Null | Key | Default | Extra |
+-----------------------------------+--------+------+-----+---------+-------+
| (cast(99 as decimal(19,6)) + '2') | double | NO   |     | 0       |       |
+-----------------------------------+--------+------+-----+---------+-------+
```

## How was this patch tested?
Added a new test in SQLQuerySuite

Author: Dilip Biswal <[email protected]>

Closes #13368 from dilipbiswal/spark-15557.

(cherry picked from commit dfe2cbe)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in dfe2cbe May 31, 2016
dongjoon-hyun pushed a commit that referenced this pull request Jun 28, 2023
### What changes were proposed in this pull request?
This pr aims to upgrade netty from 4.1.92 to 4.1.93.

### Why are the changes needed?
1.v4.1.92 VS v4.1.93
netty/netty@netty-4.1.92.Final...netty-4.1.93.Final

2.The new version brings some bug fix, eg:
- Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320))
- OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365))
- Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366))
- HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368))
- Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375))
- H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388))
- Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389))
- HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394))
- Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399))

3.The release notes as follows:
- https://netty.io/news/2023/05/25/4-1-93-Final.html

4.Why not upgrade to `4-1-94-Final` version?
Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version.

The reference is as follows:
https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164
https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA.

Closes #41681 from panbingkun/upgrade_netty.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants