Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4364] feat(iceberg): Support GCS storage for Iceberg REST server #4627

Merged
merged 4 commits into from
Sep 9, 2024

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Aug 22, 2024

What changes were proposed in this pull request?

Support GCS storage for Iceberg REST server

Why are the changes needed?

Fix: #4364

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. start Iceberg REST server with following config:
gravitino.iceberg-rest.warehouse = gs://xxx/test
gravitino.iceberg-rest.io-impl= org.apache.iceberg.gcp.gcs.GCSFileIO
  1. run spark sqls to create Iceberg table

@FANNG1 FANNG1 marked this pull request as draft August 22, 2024 11:40
@FANNG1 FANNG1 marked this pull request as ready for review September 9, 2024 02:38
@FANNG1
Copy link
Contributor Author

FANNG1 commented Sep 9, 2024

@jerryshao @yuqi1129 @diqiu50 please help to review when you are free, thanks

@@ -161,6 +161,7 @@ iceberg-aws = { group = "org.apache.iceberg", name = "iceberg-aws", version.ref
iceberg-core = { group = "org.apache.iceberg", name = "iceberg-core", version.ref = "iceberg" }
iceberg-api = { group = "org.apache.iceberg", name = "iceberg-api", version.ref = "iceberg" }
iceberg-hive-metastore = { group = "org.apache.iceberg", name = "iceberg-hive-metastore", version.ref = "iceberg" }
iceberg-gcp = { group = "org.apache.iceberg", name = "iceberg-gcp", version.ref = "iceberg" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be added. if uses want to use, it's add the jars to class path himself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcp is a general storage and Iceberg gcp package jar is light without the GCP dependences, so I prefer to keep it in Gravitino.

@jerryshao
Copy link
Contributor

�I'm OK with the change here. But it is not a good design to support adding cloud storage one by one continuously. We should figure out a framework to support adding by users.


For other Iceberg GCS properties not managed by Gravitino like `gcs.project-id`, you could config it directly by `gravitino.bypass.gcs.project-id`.

Please make sure the credential file is accessible by Gravitino, like using `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json` before Gravitino server is started.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GCS need configurations like access key ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCS access key and secrect key are encoded in application_default_credentials.json

@FANNG1
Copy link
Contributor Author

FANNG1 commented Sep 9, 2024

�I'm OK with the change here. But it is not a good design to support adding cloud storage one by one continuously. We should figure out a framework to support adding by users.

For more general storage, we could define properties explicitly, for others use #4896 to support.

@jerryshao jerryshao merged commit 738cb6c into apache:main Sep 9, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] add Google Cloud Storage for IcebergRESTService
3 participants