Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V] Remove complex type fallback for parquet #6712

Merged
merged 3 commits into from
Sep 5, 2024
Merged

Conversation

yma11
Copy link
Contributor

@yma11 yma11 commented Aug 5, 2024

What changes were proposed in this pull request?

Remove complex type fallback for parquet

How was this patch tested?

leveraging existing UT GlutenParquetV2SchemaPruningSuite.

Copy link

github-actions bot commented Aug 5, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@yma11 yma11 changed the title disable complex fallback [WIP]disable complex fallback Aug 5, 2024
} else {
validateTypes(orcTypeValidatorWithComplexTypeFallback)
}
ValidationResult.succeeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecookier have you use complex datatype in ORC format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for just a try how the change affect ORC related UTs. It turns out timestamp support has result mismatch issue. We may still keep this option for ORC as it's not fully verified as Parquet. For parquet, we have problems in nested struct type support in Gluten and I am looking into that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecookier have you use complex datatype in ORC format?

Yes, we disable option forceComplexTypeScanFallbackEnabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the timestamp support result match issue involves Velox? Could we create a new issue in Velox to track it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 6831 is created for track.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@github-actions github-actions bot added the CORE works for Gluten Core label Aug 12, 2024
Copy link

Run Gluten Clickhouse CI

@yma11 yma11 changed the title [WIP]disable complex fallback [V] Remove complex type fallback for parquet Aug 12, 2024
@github-actions github-actions bot removed the BUILD label Aug 12, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

if mapType.valueType.isInstanceOf[ArrayType] =>
"ArrayType as Value in MapType"
case StructField(_, TimestampType, _, _)
if GlutenConfig.getConf.forceParquetTimestampTypeScanFallbackEnabled =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the int64 timestamp in the parquet file does not seem to be supported yet.
facebookincubator/velox#8325

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I keep this check and we may have a try by setting forceParquetTimestampTypeScanFallbackEnabled to false once the related support merged.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 2, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 2, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 2, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 2, 2024

Run Gluten Clickhouse CI

@yma11
Copy link
Contributor Author

yma11 commented Sep 3, 2024

@FelixYBW can you help approve this PR? Thanks.

@yma11 yma11 merged commit 4533c72 into apache:main Sep 5, 2024
43 checks passed
@FelixYBW
Copy link
Contributor

FelixYBW commented Sep 5, 2024

We have passed all Spark UTs with complex datatype in parquet.

dcoliversun pushed a commit to dcoliversun/gluten that referenced this pull request Sep 11, 2024
* disable complex type fallback for parquet

* disable parquet files reading as velox not supported yet

* fallback timestamp scan for parquet if necessary
zml1206 added a commit to zml1206/incubator-gluten that referenced this pull request Sep 11, 2024
zml1206 added a commit to zml1206/incubator-gluten that referenced this pull request Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants