[#4895] docs(iceberg): add document for support not managed storages for Iceberg #4896

FANNG1 · 2024-09-09T10:45:48Z

What changes were proposed in this pull request?

For other storage not build in with Gravitino, we should add a document about how to run it

Why are the changes needed?

Fix: #4895

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

jerryshao · 2024-09-10T17:41:59Z

docs/iceberg-rest-service.md

+|----------------------------------|-----------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in Iceberg, please use the full qualified classname. | (none)        | No       | 0.6.0         |
+
+For other custom properties like `security-token` to pass to `FileIO`, you could config it directly by `gravitino.iceberg-rest.security-token`.


Can you explain more here? For example, if user has a custom FileIO implementation called "A", then set configuration like "gravitino.iceberg-rest.xxx", how does this "A" know this configuration "xxx"?

jerryshao · 2024-09-10T17:44:45Z

docs/iceberg-rest-service.md

@@ -321,7 +332,7 @@ For example, we can configure Spark catalog options to use Gravitino Iceberg RES
 --conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/
 ```

-You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in S3, you need to download [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) jar and place it in the classpath of Spark, no extra config is needed because S3 related properties is transferred from Iceberg REST server to Iceberg REST client automaticly. 
+You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in S3, you need to download [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) jar and place it in the classpath of Spark, no extra config is needed because S3 related properties is transferred from Iceberg REST server to Iceberg REST client automaticly. For other storages not managed by Gravitino, you could specify the configuration explicitly to initialize the `FileIO` implementation, like `spark.sql.catalog.${catalog_name}.${configuration_key}`.


Is it only for S3, how about other storages supported by Gravitino, and others not directly support by us?

Can you please describe more, and write them down as a user? I don't think users can handle this with such simple words, at least for me.

jerryshao · 2024-09-10T17:51:09Z

docs/lakehouse-iceberg-catalog.md

+
+#### Other storages
+
+For storages that are not inherently integrated into Gravitino Iceberg REST service, you can manage them effectively through custom catalog properties.


What's the meaning of "inherently integrated"?

jerryshao · 2024-09-11T18:07:00Z

docs/iceberg-rest-service.md

-  - HDFS
-  - S3
-  - OSS
+- Supports diverse storage like `S3`, `HDFS`, `OSS`, and provides the capability to support other storages.


I believe there's gcs, right? Please list all the supported cloud storage.

"Supports different cloud storages...: "

using Supports different storages, because HDFS is not cloud storage

jerryshao · 2024-09-11T18:10:39Z

docs/iceberg-rest-service.md

@@ -337,7 +348,7 @@ For example, we can configure Spark catalog options to use Gravitino Iceberg RES
 --conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/
 ```

-You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in cloud, you need to download corresponding jars (please refer to the cloud storage part) and place it in the classpath of Spark, no extra config is needed because related properties is transferred from Iceberg REST server to Iceberg REST client automatically. 
+You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in cloud, you need to download corresponding jars (please refer to the cloud storage part) and place it in the classpath of Spark, no extra config is needed because related properties is transferred from Iceberg REST server to Iceberg REST client automatically.  For other storages not managed by Gravitino, the properties wouldn't transfer from the server to client automatically, if you want to pass custom properties to initialize `FileIO`, you could add it by `spark.sql.catalog.${iceberg_catalog_name}.${configuration_key}` = `{property_value}`.


Can you please split this long sentence into several paragraphs to make it more clear.

jerryshao · 2024-09-11T20:35:00Z

docs/lakehouse-iceberg-catalog.md

@@ -119,6 +119,20 @@ Please make sure the credential file is accessible by Gravitino, like using `exp
 Please set `warehouse` to `gs://{bucket_name}/${prefix_name}`, and download [Iceberg gcp bundle jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) and place it to `catalogs/lakehouse-iceberg/libs/`.
 :::

+#### Other storages
+
+For other storages that are managed by Gravitino directly, you can manage them through custom catalog properties.


“that are not managed...”

jerryshao · 2024-09-11T20:35:27Z

docs/iceberg-rest-service.md

@@ -162,6 +159,20 @@ You should place HDFS configuration file to the classpath of the Iceberg REST se
 Builds with Hadoop 2.10.x. There may be compatibility issues when accessing Hadoop 3.x clusters.
 :::

+#### Other storages
+
+For other storages that are managed by Gravitino directly, you can manage them through custom catalog properties.


"are not managed by..."

FANNG1 changed the title ~~[#4895] docs(iceberg): add document for support other storages for Iceberg~~ [#4895] docs(iceberg): add document for support not managed storages for Iceberg Sep 9, 2024

FANNG1 mentioned this pull request Sep 9, 2024

[#4364] feat(iceberg): Support GCS storage for Iceberg REST server #4627

Merged

jerryshao reviewed Sep 10, 2024

View reviewed changes

add other custom config

abfa7f5

FANNG1 force-pushed the other-storage branch from 4c7241e to abfa7f5 Compare September 11, 2024 12:16

FANNG1 added 2 commits September 11, 2024 20:36

xx

42bb5ea

xx

b6f570b

jerryshao reviewed Sep 11, 2024

View reviewed changes

FANNG1 added 2 commits September 12, 2024 08:31

fix comments

05465e9

fix comments

254bef1

jerryshao assigned FANNG1 Sep 12, 2024

jerryshao approved these changes Sep 12, 2024

View reviewed changes

jerryshao merged commit 55ad5fd into apache:main Sep 12, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#4895] docs(iceberg): add document for support not managed storages for Iceberg #4896

[#4895] docs(iceberg): add document for support not managed storages for Iceberg #4896

FANNG1 commented Sep 9, 2024 •

edited

Loading

jerryshao Sep 10, 2024

FANNG1 Sep 11, 2024

jerryshao Sep 10, 2024

FANNG1 Sep 11, 2024

jerryshao Sep 10, 2024

FANNG1 Sep 11, 2024

jerryshao Sep 11, 2024 •

edited

Loading

jerryshao Sep 11, 2024

FANNG1 Sep 12, 2024

jerryshao Sep 11, 2024

FANNG1 Sep 12, 2024

jerryshao Sep 11, 2024

FANNG1 Sep 12, 2024

jerryshao Sep 11, 2024

FANNG1 Sep 12, 2024


		#### Other storages

		For storages that are not inherently integrated into Gravitino Iceberg REST service, you can manage them effectively through custom catalog properties.

[#4895] docs(iceberg): add document for support not managed storages for Iceberg #4896

[#4895] docs(iceberg): add document for support not managed storages for Iceberg #4896

Conversation

FANNG1 commented Sep 9, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FANNG1 commented Sep 9, 2024 •

edited

Loading

jerryshao Sep 11, 2024 •

edited

Loading