0.5.1-rc5
istreeter
tagged this
19 Nov 19:11
This overcomes a limitation with how Hudi syncs schemas to the Glue catalog. Previously, if version `1-0-0` of a schema had fields `a` and `b`, and then vesion `1-0-1` adds a field `c`, then the new field might be added _before_ the original fields in the Hudi schema. The new field would get synced to Glue, but only for new partitions; it is not back-filled to existing partitions. After this change, the new field `c` is added _after_ the original fields `a` and `b` in the Hudi schema. Then there is no need to sync the new field to existing partitions in Glue. The problem manifested in AWS Athena with a message like: > HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. This fix was implemented in snowplow/schema-ddl#213 and snowplow-incubator/common-streams#98 and imported via a new version of common-streams. This change does not impact Delta or Iceberg, where nothing was broken.