GH-3026: ParquetRewriter fails when you try to nullify and encrypt 2 different columns #3027
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GitHub issue: ParquetRewriter fails when you try to nullify and encrypt 2 different columns #3026
This issue was previously reported in PR: PARQUET-2430: Add parquet joiner v2 #1335
Issue description
When you try to nullify and encrypt different columns using ParquetRewriter it fails. There is a related test but it nullifies and encrypts the same column which doesn't reproduce a bug. The bug can be reproduced by changing a single line in
ParquetRewriterTest
frommaskColumns.put("DocId", MaskMode.NULLIFY);
tomaskColumns.put("Links.Forward", MaskMode.NULLIFY);
intestNullifyAndEncryptColumn()
method, If you do that the test start to fail with bellow exception:Issue root cause
The reason of a failure is that during the nullification we create a single column schema
MessageType newSchema = newSchema(schema, descriptor)
, this is needed because we need to nullify only a specified column, so we create a custom schema for that purpose. But we can't reuse a default encryptor created during ParquetRewriter construction with that new custom schema because default encryptor performs encrypted columns metadata checks internally and when it does it fails because of schema discrepancy.Close #3026