-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#4895] docs(iceberg): add document for support not managed storages for Iceberg #4896
Conversation
docs/iceberg-rest-service.md
Outdated
|----------------------------------|-----------------------------------------------------------------------------------------|---------------|----------|---------------| | ||
| `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in Iceberg, please use the full qualified classname. | (none) | No | 0.6.0 | | ||
|
||
For other custom properties like `security-token` to pass to `FileIO`, you could config it directly by `gravitino.iceberg-rest.security-token`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more here? For example, if user has a custom FileIO implementation called "A", then set configuration like "gravitino.iceberg-rest.xxx", how does this "A" know this configuration "xxx"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
docs/iceberg-rest-service.md
Outdated
@@ -321,7 +332,7 @@ For example, we can configure Spark catalog options to use Gravitino Iceberg RES | |||
--conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/ | |||
``` | |||
|
|||
You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in S3, you need to download [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) jar and place it in the classpath of Spark, no extra config is needed because S3 related properties is transferred from Iceberg REST server to Iceberg REST client automaticly. | |||
You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in S3, you need to download [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) jar and place it in the classpath of Spark, no extra config is needed because S3 related properties is transferred from Iceberg REST server to Iceberg REST client automaticly. For other storages not managed by Gravitino, you could specify the configuration explicitly to initialize the `FileIO` implementation, like `spark.sql.catalog.${catalog_name}.${configuration_key}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it only for S3, how about other storages supported by Gravitino, and others not directly support by us?
Can you please describe more, and write them down as a user? I don't think users can handle this with such simple words, at least for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
docs/lakehouse-iceberg-catalog.md
Outdated
|
||
#### Other storages | ||
|
||
For storages that are not inherently integrated into Gravitino Iceberg REST service, you can manage them effectively through custom catalog properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of "inherently integrated"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
4c7241e
to
abfa7f5
Compare
docs/iceberg-rest-service.md
Outdated
- HDFS | ||
- S3 | ||
- OSS | ||
- Supports diverse storage like `S3`, `HDFS`, `OSS`, and provides the capability to support other storages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's gcs, right? Please list all the supported cloud storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Supports different cloud storages...: "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Supports different storages
, because HDFS is not cloud storage
docs/iceberg-rest-service.md
Outdated
@@ -337,7 +348,7 @@ For example, we can configure Spark catalog options to use Gravitino Iceberg RES | |||
--conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/ | |||
``` | |||
|
|||
You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in cloud, you need to download corresponding jars (please refer to the cloud storage part) and place it in the classpath of Spark, no extra config is needed because related properties is transferred from Iceberg REST server to Iceberg REST client automatically. | |||
You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in cloud, you need to download corresponding jars (please refer to the cloud storage part) and place it in the classpath of Spark, no extra config is needed because related properties is transferred from Iceberg REST server to Iceberg REST client automatically. For other storages not managed by Gravitino, the properties wouldn't transfer from the server to client automatically, if you want to pass custom properties to initialize `FileIO`, you could add it by `spark.sql.catalog.${iceberg_catalog_name}.${configuration_key}` = `{property_value}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please split this long sentence into several paragraphs to make it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/lakehouse-iceberg-catalog.md
Outdated
@@ -119,6 +119,20 @@ Please make sure the credential file is accessible by Gravitino, like using `exp | |||
Please set `warehouse` to `gs://{bucket_name}/${prefix_name}`, and download [Iceberg gcp bundle jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) and place it to `catalogs/lakehouse-iceberg/libs/`. | |||
::: | |||
|
|||
#### Other storages | |||
|
|||
For other storages that are managed by Gravitino directly, you can manage them through custom catalog properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“that are not managed...”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/iceberg-rest-service.md
Outdated
@@ -162,6 +159,20 @@ You should place HDFS configuration file to the classpath of the Iceberg REST se | |||
Builds with Hadoop 2.10.x. There may be compatibility issues when accessing Hadoop 3.x clusters. | |||
::: | |||
|
|||
#### Other storages | |||
|
|||
For other storages that are managed by Gravitino directly, you can manage them through custom catalog properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are not managed by..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
What changes were proposed in this pull request?
For other storage not build in with Gravitino, we should add a document about how to run it
Why are the changes needed?
Fix: #4895
Does this PR introduce any user-facing change?
no
How was this patch tested?
existing tests