Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hudi loader should fail early if missing permissions on Glue catalog #72

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

istreeter
Copy link
Collaborator

It is possible to run the Hudi Lake Loader enabling the hudi option "hoodie.datasource.hive_sync.enable": "true" to register/sync the table to a Hive Metastore or Glue.

However, with that setting enabled, the Hudi delays syncing until the first time events are committed. For use case, it is more helpful if the loader connects to Glue/Hive during startup, so we more quickly get an alert if the loader is missing permissions.

This PR works my making the loader add an empty commit during startup. It does not add any parquet file, but it triggers the loader to sync the table to Glue/Hive.

It is possible to run the Hudi Lake Loader enabling the hudi option
`"hoodie.datasource.hive_sync.enable": "true"` to register/sync the
table to a Hive Metastore or Glue.

However, with that setting enabled, the Hudi delays syncing until the
first time events are committed.  For use case, it is more helpful if
the loader connects to Glue/Hive during startup, so we more quickly get
an alert if the loader is missing permissions.

This PR works my making the loader add an empty commit during startup.
It does not add any parquet file, but it triggers the loader to sync the
table to Glue/Hive.
Copy link
Contributor

@spenes spenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@istreeter istreeter merged commit cd920c4 into develop Aug 2, 2024
2 checks passed
@istreeter istreeter deleted the hudi-loader-fail-earlier branch August 2, 2024 12:29
zhaow-de added a commit to alloy-ch/rcplus-alloy-snowplow-lake-loader that referenced this pull request Oct 4, 2024
…patch-for-alloy

* commit '7ab2edc3fd4d81ffb4d5f3285d02330def7672b1':
  Upgrade common-streams to 0.8.0-M5
  Delete files asynchronously (snowplow-incubator#82)
  Upgrade common-streams 0.8.0-M4 (snowplow-incubator#81)
  Avoid error on duplicate view name (snowplow-incubator#80)
  Add option to exit on missing Iglu schemas (snowplow-incubator#79)
  common-streams 0.8.x with refactored health monitoring (snowplow-incubator#78)
  Create table concurrently with subscribing to stream of events (snowplow-incubator#77)
  Iceberg fail fast if missing permissions on the catalog (snowplow-incubator#76)
  Make alert messages more human-readable (snowplow-incubator#75)
  Hudi loader should fail early if missing permissions on Glue catalog (snowplow-incubator#72)
  Add alert & retry for delta/s3 initialization (snowplow-incubator#74)
  Implement alerting and retrying mechanisms
  Bump aws-hudi to 1.0.0-beta2 (snowplow-incubator#71)
  Bump hudi to 0.15.0 (snowplow-incubator#70)
  Allow disregarding Iglu field's nullability when creating output columns (snowplow-incubator#66)
  Extend health probe to report unhealthy on more error scenarios (snowplow-incubator#69)
  Fix bad rows resizing (snowplow-incubator#68)
oguzhanunlu pushed a commit that referenced this pull request Nov 1, 2024
…72)

It is possible to run the Hudi Lake Loader enabling the hudi option
`"hoodie.datasource.hive_sync.enable": "true"` to register/sync the
table to a Hive Metastore or Glue.

However, with that setting enabled, the Hudi delays syncing until the
first time events are committed.  For use case, it is more helpful if
the loader connects to Glue/Hive during startup, so we more quickly get
an alert if the loader is missing permissions.

This PR works my making the loader add an empty commit during startup.
It does not add any parquet file, but it triggers the loader to sync the
table to Glue/Hive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants