Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4938]feat(lakehouse-paimon): Support S3 filesystem for Paimon catalog. #4939

Merged
merged 24 commits into from
Oct 10, 2024

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Add support for Paimon S3 filesystem.

Note: related documents will be added in another PR.

Why are the changes needed?

for better user experience.

Fix: #4938

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Test locally and IT

@yuqi1129 yuqi1129 self-assigned this Sep 13, 2024
@yuqi1129 yuqi1129 requested review from caican00 and FANNG1 September 14, 2024 02:48
@@ -61,6 +62,12 @@ public class PaimonCatalogPropertiesMetadata extends BaseCatalogPropertiesMetada
AuthenticationConfig.AUTH_TYPE_KEY,
AuthenticationConfig.AUTH_TYPE_KEY);

private static final Map<String, String> S3_CONFIGURATION =
ImmutableMap.of(
PaimonS3FileSystemConfig.S3_ACCESS_KEY, PaimonS3FileSystemConfig.S3_ACCESS_KEY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use the properties defined in S3Properties to unify the storage configuration for Gravitino

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #4939 (comment)

@Override
protected void startNecessaryContainers() {
localStackContainer =
new LocalStackContainer(DockerImageName.parse("localstack/localstack")).withServices(S3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to use localstack for test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you have a suggestion for the use of the s3 simulator?

LOCAL_FILE,
HDFS,
S3,
OSS;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not plan to support oss in this pr, it have better to remove it first to avoid confusion for other users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 Do you have any other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to remove OSS, since it's not supported.

import org.apache.gravitino.config.ConfigEntry;
import org.apache.gravitino.connector.PropertyEntry;

public class PaimonS3FileSystemConfig extends Config {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is possible to merge the same configurations of iceberg to provide a common configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keys for S3 in Iceberg are not the same as those here, please see #4939 (comment)

public enum FileSystemType {
LOCAL_FILE,
HDFS,
S3,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the rename in S3 an atomic operation?
FilesystemCatalog depends on the atomic rename to avoid commit confliction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the rename in S3 an atomic operation? FilesystemCatalog depends on the atomic rename to avoid commit confliction.

I don't have much knowledge about it. let me verify this.

@yuqi1129 yuqi1129 requested a review from FANNG1 September 24, 2024 13:46
@yuqi1129
Copy link
Contributor Author

@FANNG1
Please help to take a look again.

}

public static FileSystemType fromStoragePath(String storagePath) {
if (storagePath.startsWith("s3://")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3a?

Copy link
Contributor Author

@yuqi1129 yuqi1129 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check it for Paimon, the value is s3:// in the example shown in https://paimon.apache.org/docs/0.8/filesystems/s3/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value s3a for Iceberg S3 storage?

}

public static FileSystemType fromStoragePath(String storagePath) {
if (storagePath.startsWith("s3://")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoud we consider upper cases like S3://?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HDFS only support lower case 'hdfs', if we use HDFS://xxx

Caused by: MetaException(message:Got exception: java.io.IOException No FileSystem for scheme: HDFS)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:26660)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:26628)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:26562)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)

ImmutableMap.of(
S3Properties.GRAVITINO_S3_ACCESS_KEY_ID, S3_ACCESS_KEY,
S3Properties.GRAVITINO_S3_SECRET_ACCESS_KEY, S3_SECRET_KEY,
S3Properties.GRAVITINO_S3_ENDPOINT, S3_ENDPOINT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider s3 region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the doc provided by Paimon, region is excessive.

| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication. | (none) | required if the value of `authentication.type` is Kerberos. | 0.6.0 |
| `authentication.kerberos.check-interval-sec` | The check interval of Kerberos credential for Paimon catalog. | 60 | No | 0.6.0 |
| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60 | No | 0.6.0 |
| `s3.endpoint` | The endpoint of the AWS s3. | (none) | required if the value of `warehouse` is a S3 path | 0.7.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the s3 properties name correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@yuqi1129 yuqi1129 requested a review from FANNG1 October 9, 2024 06:37
@yuqi1129 yuqi1129 closed this Oct 9, 2024
@yuqi1129 yuqi1129 reopened this Oct 9, 2024
*/
package org.apache.gravitino.catalog.lakehouse.paimon.filesystem;

public enum FileSystemType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this class if not used

@FANNG1 FANNG1 merged commit f753afa into apache:main Oct 10, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support S3 filesystem in Paimon catalog
4 participants