Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

Query result not match Vanilla #928

Open
FelixYBW opened this issue May 20, 2022 · 3 comments
Open

Query result not match Vanilla #928

FelixYBW opened this issue May 20, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@FelixYBW
Copy link
Collaborator

SQL: select c_last_name, c_first_name, max(ss_customer_sk) ss_customer_sk_max, min(ss_customer_sk) ss_customer_sk_min, max(cast(c_customer_sk as string)) c_customer_sk_max, min(cast(c_customer_sk as string)) c_customer_sk_min, s_store_name, max(c_birth_country) c_birth_country_max, min(c_birth_country) c_birth_country_min, max(ca_country) ca_country_max,min(ca_country) ca_country_min, max(upper(ca_country)) ca_country_u_max,min(upper(ca_country)) ca_country_u_min, s_zip, ca_zip from store_sales, store_returns, store, item, customer, customer_address where ss_ticket_number = sr_ticket_number and ss_item_sk = sr_item_sk and cast(ss_customer_sk as string) = cast(c_customer_sk as string) and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and c_birth_country = upper(ca_country) and s_zip = ca_zip and s_market_id = 8 group by c_last_name, c_first_name, c_customer_sk, s_store_name, s_zip, ca_zip

using sr606 jenkins server, NativeSQL test evn.

Vanilla spark returns: 86,788. http://sr606:18080/history/application_1652807458381_0093/SQL/execution/?id=54
5/20 main branch: 6,157 http://sr606:18080/history/application_1652807458381_0099/SQL/execution/?id=3

Looks the sort agg is wrong:
Gazelle:
image
Vanilla:
image

@FelixYBW FelixYBW added the bug Something isn't working label May 20, 2022
@FelixYBW
Copy link
Collaborator Author

@zhixingheyi-tian
Copy link
Collaborator

This issue is caused by

struct ArrayItemIndexS {
  uint16_t id = 0;
  uint16_t array_id = 0;

There are 287 recordbatches of row_number > 64K from CSHJ, and exceed the uint16_t range in the next ColumnarSort operator.

This issue is resolved. temporarily by the patch #941 . This patch replaced the ColumnarSort + SortAggregate with ColumnarHashAggregate. So the issue is skipped.

Will implement the batch_size control in below operators, and solve these problems thoroughly.

ColumnarBroadcastHashJoinExec
ColumnarShuffledHashJoinExec
ColumnarSortMergeJoinExec

CC @FelixYBW @zhouyuan @PHILO-HE

@FelixYBW
Copy link
Collaborator Author

The same root cause as #906

We should add ARROW_CHECK for all cases where int16 is used as record batch size

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants