-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Retain Glue Catalog column comment after updating Iceberg table #10276
Conversation
I built the jars and manually tested on Amazon EMR as a Spark runtime. |
List<Column> columns = toColumns(metadata); | ||
if (existingTable != null) { | ||
List<Column> existingColumns = existingTable.storageDescriptor().columns(); | ||
Map<String, Column> existingColumnMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this a map of column name to comment here. Since we aren't using any additional information from the existing comments
@@ -501,6 +501,97 @@ public void testColumnCommentsAndParameters() { | |||
assertThat(actualColumns).isEqualTo(expectedColumns); | |||
} | |||
|
|||
@Test | |||
public void testGlueTableColumnCommentsPreserved() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where we drop a column from a table. I believe today we keep those columns and descriptions. Should we add a test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @lawofcycles! I've taken an initial pass. Thanks for the solid work!
@geruh Thank you for your review! |
Optional.ofNullable(properties.get(GLUE_DESCRIPTION_KEY)) | ||
.ifPresent(tableInputBuilder::description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lawofcycles thank you for your patch!
Now this method has the existing table information, so I think it would be better to move the function to keep the table description https://github.com/aajisaka/iceberg/blob/58c61241bd61de7a4062dbf664691f05eeb2ea53/aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java#L319-L321 into this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review!
I moved keeping the table description logic to IcebergToGlueConverter.
c022769
to
d094968
Compare
a4c481a
to
aa12a67
Compare
Hi @geruh would you review the latest change? |
Optional.ofNullable(existingTable.description()).ifPresent(tableInputBuilder::description); | ||
} | ||
|
||
List<Column> columns = toColumns(metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we can simplify the implementation by creating existingColumnMap
first and use it inside addColumnWithDedupe
method. In this way we don't have to replace the new columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged existing column comment processing into creating column set on to toColumns
and addColumnWithDedupe
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lawofcycles
Hi @amogh-jahagirdar would you review the latest change? |
I hope this message finds you well. I wanted to kindly follow up on this PR, which addresses an important issue with preserving Glue Catalog column comments when updating Iceberg tables (#10220). Could you please take a look when you have a moment? Thank you for your time and consideration. |
Really sorry for the delayed review @lawofcycles , I'll take a look today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lawofcycles I think this is pretty good, just had some questions and nits. I'll wait a bit if @geruh @rahil-c have any concerns and if not I'll go ahead and merge
aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
Show resolved
Hide resolved
aws/src/test/java/org/apache/iceberg/aws/glue/TestIcebergToGlueConverter.java
Outdated
Show resolved
Hide resolved
aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
Show resolved
Hide resolved
6f32211
to
8df1fc2
Compare
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java
Show resolved
Hide resolved
Let's get this in, thanks @lawofcycles for working on this, and @amogh-jahagirdar and @rahil-c for the review! |
Thanks for reviewing @amogh-jahagirdar , @aajisaka , @geruh and @rahil-c !! |
Thank you for merging @Fokko ! |
* Retain Glue Catalog column comment * merge handling existing column comment to creating column set * apply spotless * Add newline after else block for improved readability
What this PR does
This PR addresses the issue where column comments in the Glue catalog are not retained when updating an existing Iceberg table schema. It fixes the problem by modifying the IcebergToGlueConverter.setTableInputInformation method to preserve existing column comments from the Glue table when creating a new TableInput for an update operation.
When an existing Glue table is provided, the modified method retrieves the comments from its columns and applies them to the corresponding columns in the new TableInput, but only if the Iceberg table's column does not already have a comment.
This change ensures that any user-defined column comments in the Glue catalog are carried over during table updates, preventing unintentional loss of documentation.
Testing
Added a new unit test method testSetTableInputInformationWithExistingTable to verify that column comments from an existing Glue table are correctly preserved when creating a new TableInput.
I'll also add Integration Tests same as #10199.
Fixes #10220