Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Preserve original field order during schema evolution
This overcomes a limitation with how Hudi syncs schemas to the Glue catalog. Previously, if version `1-0-0` of a schema had fields `a` and `b`, and then vesion `1-0-1` adds a field `c`, then the new field might be added _before_ the original fields in the Hudi schema. The new field would get synced to Glue, but only for new partitions; it is not back-filled to existing partitions. After this change, the new field `c` is added _after_ the original fields `a` and `b` in the Hudi schema. Then there is no need to sync the new field to existing partitions in Glue. The problem manifested in AWS Athena with a message like: > HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. This fix was implemented in snowplow/schema-ddl#213 and snowplow-incubator/common-streams#98 and imported via a new version of common-streams. This change does not impact Delta or Iceberg, where nothing was broken.
- Loading branch information