-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Results mismatch when scan low version orc file #6673
Comments
cc @kecookier |
would you log RowVector in TableScan#getOutput to check if this issue caused by scan orc? |
would you check the orc file schema of the old hive table, maybe it is the problem of orc file. ( hive --orcfiledump <orc_file_path> ) |
@Yohahaha This table contains 10365356 rows. It's tricky to log RowVector. |
@Z1Wu It looks like table schema is same.(DESCRIBE FORMATTED <table_name>)
new table:
|
you can set |
Hive orc table have table schema and its orc data file should also contain schema too, but orc data file written by some old engine(like hive-1.x) contains incomplete schema ( lack of column name). For a hive orc table create by :
You can get orc data file schema using this command :
Malformed orc schema output looks like below. Orc file with schema like
|
@Z1Wu Thanks for your clarification. It looks like the old table lacks of column name.
The new table orc file schema is
|
If the the old table's orc files lack of column name, it may be the same problem as #5638 You can set |
Backend
VL (Velox)
Bug description
SparkSQL:
Gluten Result:
Vanilla Result
Physical Plan:
Unfortunately, I can't reproduce it with new hive table. I tried to create a new table that contains rows in original table and submit a same SQL to Spark and even the physical plan is same as before. But the result of gluten is same as vanilla spark.
Spark version
None
Spark configurations
No response
System information
v1.2.0 rc1
Relevant logs
No response
The text was updated successfully, but these errors were encountered: