-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw #43330
Conversation
So this change is only needed in 3.5, and we already fixed it differently in 4.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this activity is about the return value of Parquet code. IIUC, there is no Spark user-facing issue here. He just want to be consistent with Apache Parquet library. In Apache Spark 4.0.0, I suggested to delete the old one lz4raw
which is added and 3.5.0 and deprecated by 3.5.1 (by this PR) with the migration guide. Since lz4raw
is a relatively new, I believe the deletion is fine.
footer.getParquetMetadata.getBlocks(0).column(0).getCodec.name()
cc @wangyum |
@@ -94,18 +108,22 @@ class ParquetCompressionCodecPrecedenceSuite extends ParquetTest with SharedSpar | |||
withTempDir { tmpDir => | |||
val tempTableName = "TempParquetTable" | |||
withTable(tempTableName) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. This looks like a mistake. Let's remove this empty line.
02a8068
to
6149ebb
Compare
Yes. This PR only used for 3.5.1. and #43310 used to fix it in 4.0.0 |
The GA failure is unrelated to this PR.
|
cc @zhengruifeng Could we possibly backport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
backport SPARK-44619 to branch-3.5 to avoid |
6149ebb
to
fa233c2
Compare
…codec lz4raw ### What changes were proposed in this pull request? According to the discussion at #43310 (comment), this PR want deprecates the incorrect parquet compression codec `lz4raw` at Spark 3.5.1 and adds a warning log. The warning log prompts users that `lz4raw` will be removed it at Apache Spark 4.0.0. ### Why are the changes needed? Deprecated the incorrect parquet compression codec `lz4raw`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Users will see the waring log below. `Parquet compression codec 'lz4raw' is deprecated, please use 'lz4_raw'` ### How was this patch tested? Exists test cases and new test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #43330 from beliefer/SPARK-45484_3.5. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Jiaan Geng <[email protected]>
@srowen @dongjoon-hyun @LuciferYang Merged! Thank you all! |
What changes were proposed in this pull request?
According to the discussion at #43310 (comment), this PR want deprecates the incorrect parquet compression codec
lz4raw
at Spark 3.5.1 and adds a warning log.The warning log prompts users that
lz4raw
will be removed it at Apache Spark 4.0.0.Why are the changes needed?
Deprecated the incorrect parquet compression codec
lz4raw
.Does this PR introduce any user-facing change?
'Yes'.
Users will see the waring log below.
Parquet compression codec 'lz4raw' is deprecated, please use 'lz4_raw'
How was this patch tested?
Exists test cases and new test cases.
Was this patch authored or co-authored using generative AI tooling?
'No'.