Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] overflow in parsing parquet int96 timestamp (backport #22356) (branch-2.4) #23158

Merged

Conversation

rickif
Copy link
Contributor

@rickif rickif commented May 10, 2023

This is a backport of pull request #22356 to branch-2.4.

@wanpengfei-git wanpengfei-git enabled auto-merge (rebase) May 10, 2023 09:02
@mergify mergify bot assigned rickif May 10, 2023
The parquet format would use int96 to store big datetime like `9999-12-31 23:59:59`, which leads to overflow when it is cast to int64.
The parquet reader provides an option to set the unit of int96 timestamp in apache/arrow#10461.
This PR adds a config `parquet_coerce_int96_timestamp_unit` for BE to set the unit of reading parquet int96 timestamp.
With the default value `MICRO`, the maximum datetime value `9999-12-31 23:59:59.999999` in MySQL could be correctly handled.
auto-merge was automatically disabled May 10, 2023 09:35

Head branch was pushed to by a user without write access

@rickif rickif force-pushed the cherry-pick/branch-2.4-1d3ea49d branch from 658ecf2 to 5b2ba49 Compare May 10, 2023 09:35
@chaoyli chaoyli merged commit a31f246 into StarRocks:branch-2.4 May 10, 2023
@rickif rickif deleted the cherry-pick/branch-2.4-1d3ea49d branch May 12, 2023 11:38
rickif added a commit to rickif/starrocks that referenced this pull request May 22, 2023
StarRocks#23158)

The parquet format would use int96 to store big datetime like `9999-12-31 23:59:59`, which leads to overflow when it is cast to int64.
The parquet reader provides an option to set the unit of int96 timestamp in apache/arrow#10461.
This PR adds a config `parquet_coerce_int96_timestamp_unit` for BE to set the unit of reading parquet int96 timestamp.
With the default value `MICRO`, the maximum datetime value `9999-12-31 23:59:59.999999` in MySQL could be correctly handled.
wanpengfei-git pushed a commit that referenced this pull request May 22, 2023
The parquet format would use int96 to store big datetime like `9999-12-31 23:59:59`, which leads to overflow when it is cast to int64.
The parquet reader provides an option to set the unit of int96 timestamp in apache/arrow#10461.
This PR adds a config `parquet_coerce_int96_timestamp_unit` for BE to set the unit of reading parquet int96 timestamp.
With the default value `MICRO`, the maximum datetime value `9999-12-31 23:59:59.999999` in MySQL could be correctly handled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants