Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Results are mismatch with the vanilla Spark when cast(sum(decmial(20,4)), float) #4891

Open
kecookier opened this issue Mar 8, 2024 · 6 comments
Labels
bug Something isn't working triage

Comments

@kecookier
Copy link
Contributor

Backend

VL (Velox)

Bug description

The type of column ind_order_amount in SrcTable is decimal(20,4), and the type of column week_ind_order_amount
in TableTarget is float.

The following SQL snippet shows that the calculation of columweek_ind_order_amount for Gluten is inconsistent with the Vanilla version, as detailed in the screenshot.

gluten branch https://github.com/apache/incubator-gluten/tree/branch-1.1

INSERT OVERWRITE TABLE `TableTarget` PARTITION (dt='2024-02-29')
SELECT 
a.deal_id,
SUM(a.quantity) AS week_quantity,
SUM(a.ind_quantity) AS week_ind_quantity,
SUM(a.ind_order_amount) AS week_ind_order_amount,
SUM(a.user_cnt) AS week_user_cnt,
FROM
(
  SELECT deal_id,
  quantity,
  ind_quantity,
  ind_order_amount,
  user_cnt,
  FROM SrcTable
  WHERE partition_date BETWEEN '2024-01-31' AND '2024-02-29'
    AND deal_id IS NOT NULL
) a
GROUP BY deal_id;

image

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@kecookier kecookier added bug Something isn't working triage labels Mar 8, 2024
@kecookier
Copy link
Contributor Author

cc @zhouyuan @rui-mo Could you please help me take a look at this issue?

@kecookier kecookier changed the title Results is inconsistent with the vanilla Spark when cast(sum(decmial(20,4)), float) Results are mismatch with the vanilla Spark when cast(sum(decmial(20,4)), float) Mar 8, 2024
@kecookier kecookier changed the title Results are mismatch with the vanilla Spark when cast(sum(decmial(20,4)), float) [VL]Results are mismatch with the vanilla Spark when cast(sum(decmial(20,4)), float) Mar 8, 2024
@kecookier kecookier changed the title [VL]Results are mismatch with the vanilla Spark when cast(sum(decmial(20,4)), float) [VL] Results are mismatch with the vanilla Spark when cast(sum(decmial(20,4)), float) Mar 8, 2024
@kecookier
Copy link
Contributor Author

SrcTable test data

deal_id | ind_order_amount
------------+--------------------------
  952732041 |                2599.0000
  952732041 |                2599.0000
  952732041 |                   0.0000
  952732041 |               10396.0000
  952732041 |                5198.0000
  952732041 |                2599.0000
  952732041 |                   0.0000
  952732041 |                   0.0000
  952732041 |                2599.0000
  952732041 |                2599.0000

Expect 28589.0
Actual 28588.998

@rui-mo
Copy link
Contributor

rui-mo commented Mar 8, 2024

@kecookier Do you find this issue is caused by cast(decimal as float)? If so, could you provide a decimal value before cast so I can reproduce the result difference? Thanks.

@kecookier
Copy link
Contributor Author

kecookier commented Mar 11, 2024

@kecookier Do you find this issue is caused by cast(decimal as float)? If so, could you provide a decimal value before cast so I can reproduce the result difference? Thanks.

Hi @rui-mo , You can use the following SQL to reproduce the issue.

SELECT cast(sum(cast(col as decimal(20, 4))) as float) FROM VALUES (2599.0000), (2599.0000), (2599.0000), (2599.0000), (2599.0000), (10396.0000), (5198.0000) AS tab(col);

@kecookier
Copy link
Contributor Author

I've identified the error code, and I will submit a velox PR as soon as possible.

@kecookier
Copy link
Contributor Author

I've identified the error code, and I will submit a velox PR as soon as possible.

cc @rui-mo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants