Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disregarding Iglu field's nullability when creating output columns #66

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

istreeter
Copy link
Collaborator

@istreeter istreeter commented Jun 20, 2024

Some query engines dislike Iceberg tables in which a STRUCT field is nullable and a nested field of the STRUCT is non-nullable. For example, in Snowflake we have seen errors like "SQL execution internal error" when the nested field contains a null.

This PR adds a config option respectIgluNullability. When set to false, the Lake Loader will make all nested fields nullable. This is slightly less ideal for data storage, but it brings back compatibility with query engines like Snowflake.

The default value of the new config option is true which maintains the behaviour of previous versions of the loader.

@istreeter istreeter force-pushed the respect-nullabilities branch from 6d1cd92 to bf98abb Compare June 20, 2024 11:19
Some query engines dislike Iceberg tables in which a STRUCT field is
nullable and a nested field of the STRUCT is non-nullable. For example,
in Snowflake we have seen errors like "SQL execution internal error"
when the nested field contains a null.

This PR adds a config option `respectNullability`.  When set to `false`,
the Lake Loader will make all nested fields nullable. This is slightly
less ideal for data storage, but it brings back compatibility with query
engines like Snowflake.

The default value of the new config option is `true` which maintains the
behaviour of previous versions of the loader.
@istreeter istreeter force-pushed the respect-nullabilities branch from bf98abb to 65fa669 Compare July 15, 2024 11:53
@istreeter istreeter changed the base branch from main to develop July 15, 2024 11:56
@istreeter istreeter marked this pull request as ready for review July 15, 2024 11:56
@istreeter istreeter requested a review from pondzix July 15, 2024 15:41
@istreeter istreeter merged commit ba90853 into develop Jul 15, 2024
2 checks passed
@istreeter istreeter deleted the respect-nullabilities branch July 15, 2024 16:07
zhaow-de added a commit to alloy-ch/rcplus-alloy-snowplow-lake-loader that referenced this pull request Oct 4, 2024
…patch-for-alloy

* commit '7ab2edc3fd4d81ffb4d5f3285d02330def7672b1':
  Upgrade common-streams to 0.8.0-M5
  Delete files asynchronously (snowplow-incubator#82)
  Upgrade common-streams 0.8.0-M4 (snowplow-incubator#81)
  Avoid error on duplicate view name (snowplow-incubator#80)
  Add option to exit on missing Iglu schemas (snowplow-incubator#79)
  common-streams 0.8.x with refactored health monitoring (snowplow-incubator#78)
  Create table concurrently with subscribing to stream of events (snowplow-incubator#77)
  Iceberg fail fast if missing permissions on the catalog (snowplow-incubator#76)
  Make alert messages more human-readable (snowplow-incubator#75)
  Hudi loader should fail early if missing permissions on Glue catalog (snowplow-incubator#72)
  Add alert & retry for delta/s3 initialization (snowplow-incubator#74)
  Implement alerting and retrying mechanisms
  Bump aws-hudi to 1.0.0-beta2 (snowplow-incubator#71)
  Bump hudi to 0.15.0 (snowplow-incubator#70)
  Allow disregarding Iglu field's nullability when creating output columns (snowplow-incubator#66)
  Extend health probe to report unhealthy on more error scenarios (snowplow-incubator#69)
  Fix bad rows resizing (snowplow-incubator#68)
oguzhanunlu pushed a commit that referenced this pull request Nov 1, 2024
…mns (#66)

Some query engines dislike Iceberg tables in which a STRUCT field is
nullable and a nested field of the STRUCT is non-nullable. For example,
in Snowflake we have seen errors like "SQL execution internal error"
when the nested field contains a null.

This PR adds a config option `respectIgluNullability`.  When set to `false`,
the Lake Loader will make all nested fields nullable. This is slightly
less ideal for data storage, but it brings back compatibility with query
engines like Snowflake.

The default value of the new config option is `true` which maintains the
behaviour of previous versions of the loader.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants