-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V] Remove complex type fallback for parquet #6712
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
} else { | ||
validateTypes(orcTypeValidatorWithComplexTypeFallback) | ||
} | ||
ValidationResult.succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kecookier have you use complex datatype in ORC format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for just a try how the change affect ORC related UTs. It turns out timestamp support has result mismatch issue. We may still keep this option for ORC as it's not fully verified as Parquet. For parquet, we have problems in nested struct type support in Gluten and I am looking into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kecookier have you use complex datatype in ORC format?
Yes, we disable option forceComplexTypeScanFallbackEnabled
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the timestamp support result match issue involves Velox? Could we create a new issue in Velox to track it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 6831 is created for track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
if mapType.valueType.isInstanceOf[ArrayType] => | ||
"ArrayType as Value in MapType" | ||
case StructField(_, TimestampType, _, _) | ||
if GlutenConfig.getConf.forceParquetTimestampTypeScanFallbackEnabled => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the int64 timestamp in the parquet file does not seem to be supported yet.
facebookincubator/velox#8325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I keep this check and we may have a try by setting forceParquetTimestampTypeScanFallbackEnabled
to false once the related support merged.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@FelixYBW can you help approve this PR? Thanks. |
We have passed all Spark UTs with complex datatype in parquet. |
* disable complex type fallback for parquet * disable parquet files reading as velox not supported yet * fallback timestamp scan for parquet if necessary
This reverts commit 4533c72.
This reverts commit 4533c72.
What changes were proposed in this pull request?
Remove complex type fallback for parquet
How was this patch tested?
leveraging existing UT
GlutenParquetV2SchemaPruningSuite
.