-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disregarding Iglu field's nullability when creating output columns #66
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
istreeter
force-pushed
the
respect-nullabilities
branch
from
June 20, 2024 11:19
6d1cd92
to
bf98abb
Compare
Some query engines dislike Iceberg tables in which a STRUCT field is nullable and a nested field of the STRUCT is non-nullable. For example, in Snowflake we have seen errors like "SQL execution internal error" when the nested field contains a null. This PR adds a config option `respectNullability`. When set to `false`, the Lake Loader will make all nested fields nullable. This is slightly less ideal for data storage, but it brings back compatibility with query engines like Snowflake. The default value of the new config option is `true` which maintains the behaviour of previous versions of the loader.
istreeter
force-pushed
the
respect-nullabilities
branch
from
July 15, 2024 11:53
bf98abb
to
65fa669
Compare
istreeter
commented
Jul 15, 2024
pondzix
reviewed
Jul 15, 2024
modules/core/src/main/scala/com.snowplowanalytics.snowplow.lakes/processing/SparkSchema.scala
Outdated
Show resolved
Hide resolved
pondzix
approved these changes
Jul 15, 2024
zhaow-de
added a commit
to alloy-ch/rcplus-alloy-snowplow-lake-loader
that referenced
this pull request
Oct 4, 2024
…patch-for-alloy * commit '7ab2edc3fd4d81ffb4d5f3285d02330def7672b1': Upgrade common-streams to 0.8.0-M5 Delete files asynchronously (snowplow-incubator#82) Upgrade common-streams 0.8.0-M4 (snowplow-incubator#81) Avoid error on duplicate view name (snowplow-incubator#80) Add option to exit on missing Iglu schemas (snowplow-incubator#79) common-streams 0.8.x with refactored health monitoring (snowplow-incubator#78) Create table concurrently with subscribing to stream of events (snowplow-incubator#77) Iceberg fail fast if missing permissions on the catalog (snowplow-incubator#76) Make alert messages more human-readable (snowplow-incubator#75) Hudi loader should fail early if missing permissions on Glue catalog (snowplow-incubator#72) Add alert & retry for delta/s3 initialization (snowplow-incubator#74) Implement alerting and retrying mechanisms Bump aws-hudi to 1.0.0-beta2 (snowplow-incubator#71) Bump hudi to 0.15.0 (snowplow-incubator#70) Allow disregarding Iglu field's nullability when creating output columns (snowplow-incubator#66) Extend health probe to report unhealthy on more error scenarios (snowplow-incubator#69) Fix bad rows resizing (snowplow-incubator#68)
oguzhanunlu
pushed a commit
that referenced
this pull request
Nov 1, 2024
…mns (#66) Some query engines dislike Iceberg tables in which a STRUCT field is nullable and a nested field of the STRUCT is non-nullable. For example, in Snowflake we have seen errors like "SQL execution internal error" when the nested field contains a null. This PR adds a config option `respectIgluNullability`. When set to `false`, the Lake Loader will make all nested fields nullable. This is slightly less ideal for data storage, but it brings back compatibility with query engines like Snowflake. The default value of the new config option is `true` which maintains the behaviour of previous versions of the loader.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some query engines dislike Iceberg tables in which a STRUCT field is nullable and a nested field of the STRUCT is non-nullable. For example, in Snowflake we have seen errors like "SQL execution internal error" when the nested field contains a null.
This PR adds a config option
respectIgluNullability
. When set tofalse
, the Lake Loader will make all nested fields nullable. This is slightly less ideal for data storage, but it brings back compatibility with query engines like Snowflake.The default value of the new config option is
true
which maintains the behaviour of previous versions of the loader.