-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: provide option to hide old fields in Glue table #7584
Comments
I also don't quite understand the current behavior in Athena/Glue when a column is dropped. I can see that a new schema is created in the metadata file without the column and in Glue the column moves to the end of the table and gets a "iceberg.field.current": "false" setting. However, the column still shows up for consumers in Athena web console (but not when doing a DESCRIBE of the table) so this has led to some confusion in our business. I couldn't check if the column appears via JDBC (because of some errors) but I guess the column won't be listed because I see in Athena that a DESCRIBE query is used to retrieve that information. Can someone confirm that? I personally think that Athena should not show the deleted columns (neither in the web nor via JDBC). Is there perhaps a way to keep track of the dropped column(s) without showing them in Athena? If not, it would be great if one could be created. |
Also the same issue when renaming columns. |
Same issue. Curious to read what @jackye1995 and @yyanyy think about it. |
Hello! Our organization is facing the same problem. In particular, the Glue API will return columns that cannot be resolved in the source data, causing queries to fail. We've been using Presto views created dynamically, and breaking every time a column is dropped. Technically, schema versioning is meant to solve this challenge:
The latest schema of a table should be aligned with the data, and previous versions will keep track of historical modifications. |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
Hi there! Thanks a lot! |
Feature Request / Improvement
In #3888 the Glue schema generation was adjusted so that all old fields are included in the schema. The original reasoning was
In my organization, there are many users of these tables via Athena who are not data engineers that own the schema. They no idea about the old schema, they are not editing the schema, and their default use case is querying the current data. They report it as confusing that the schema shows a field that does not exist and produces errors if they attempt to use it.
Neither Athena nor Glue seem to have any support to display these old fields as non-active or deprecated or to hide these fields. Therefore, it would be nice to have a configuration option to disable including non-current fields in the schema.
Query engine
Athena
The text was updated successfully, but these errors were encountered: