Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5585] improvement(bundles): Refactor bundle jars and provide core jars that does not contains hadoop-{aws,gcp,aliyun,azure} #5806

Merged
merged 39 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
6be4e69
Refactor bundle jars and provide mini jars that does not contains had…
yuqi1129 Dec 9, 2024
e854f70
Fix build error.
yuqi1129 Dec 10, 2024
32be67a
fix
yuqi1129 Dec 10, 2024
b69a793
fix
yuqi1129 Dec 10, 2024
86e8c2e
fix
yuqi1129 Dec 11, 2024
4c04317
fix
yuqi1129 Dec 11, 2024
3abc788
Merge branch 'main' of github.com:apache/gravitino into fix_bundles_j…
yuqi1129 Dec 11, 2024
9a87b87
Merge branch 'main' of github.com:datastrato/graviton into fix_bundle…
yuqi1129 Dec 16, 2024
94179fa
Fix
yuqi1129 Dec 16, 2024
96b69e2
Merge branch 'main' of github.com:datastrato/graviton into fix_bundle…
yuqi1129 Dec 16, 2024
9d527cb
Fix
yuqi1129 Dec 16, 2024
cc65f17
Fix
yuqi1129 Dec 16, 2024
92b4539
Fix
yuqi1129 Dec 17, 2024
b876977
Fix
yuqi1129 Dec 17, 2024
0585df5
Merge branch 'main' of github.com:datastrato/graviton into fix_bundle…
yuqi1129 Dec 17, 2024
dc085ae
Fix
yuqi1129 Dec 17, 2024
6a0e87b
Fix
yuqi1129 Dec 17, 2024
b63a84e
Merge branch 'main' of github.com:datastrato/graviton into fix_bundle…
yuqi1129 Dec 19, 2024
e5a2083
fix
yuqi1129 Dec 19, 2024
df4884e
Merge branch 'main' of github.com:datastrato/graviton into fix_bundle…
yuqi1129 Dec 23, 2024
a78e4e8
optimize
yuqi1129 Dec 24, 2024
b7ca512
Rename bundle modules
yuqi1129 Dec 25, 2024
1f58642
fix
yuqi1129 Dec 25, 2024
742982c
polish again
yuqi1129 Dec 25, 2024
8c4386e
fix a minor mistake in docs
yuqi1129 Dec 25, 2024
9d0f57b
Merge branch 'main' of github.com:apache/gravitino into fix_bundles_j…
yuqi1129 Dec 25, 2024
258a328
Fix CI error.
yuqi1129 Dec 25, 2024
44e5807
Fix again.
yuqi1129 Dec 25, 2024
c4eac38
Fix again.
yuqi1129 Dec 25, 2024
bbefd00
Remove the newly added document as suggested.
yuqi1129 Dec 26, 2024
a9c1a29
Merge branch 'main' of github.com:datastrato/graviton into fix_bundle…
yuqi1129 Dec 26, 2024
1ee1dd1
fix docs
yuqi1129 Dec 26, 2024
1923707
fix
yuqi1129 Dec 26, 2024
3c6e20c
fix
yuqi1129 Dec 26, 2024
ac4b815
Fix again.
yuqi1129 Dec 27, 2024
d62f4c5
fix
yuqi1129 Dec 27, 2024
1e7abe4
fix
yuqi1129 Dec 27, 2024
af4bdd4
fix docs again
yuqi1129 Dec 27, 2024
0d7baf5
Fix a silly mistake in doc description.
yuqi1129 Dec 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions authorizations/authorization-ranger/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,10 @@ dependencies {
testImplementation("org.apache.kyuubi:kyuubi-spark-authz-shaded_$scalaVersion:$kyuubiVersion") {
exclude("com.sun.jersey")
}
testImplementation(libs.hadoop3.client)
testImplementation(libs.hadoop3.common) {
exclude("com.sun.jersey")
exclude("javax.servlet", "servlet-api")
}

testImplementation(libs.hadoop3.client.api)
testImplementation(libs.hadoop3.client.runtime)

testImplementation(libs.hadoop3.hdfs) {
exclude("com.sun.jersey")
exclude("javax.servlet", "servlet-api")
Expand Down
7 changes: 5 additions & 2 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -779,7 +779,9 @@ tasks {
!it.name.startsWith("client") && !it.name.startsWith("filesystem") && !it.name.startsWith("spark") && !it.name.startsWith("iceberg") && it.name != "trino-connector" &&
it.name != "integration-test" && it.name != "bundled-catalog" && !it.name.startsWith("flink") &&
it.name != "integration-test" && it.name != "hive-metastore-common" && !it.name.startsWith("flink") &&
it.name != "gcp-bundle" && it.name != "aliyun-bundle" && it.name != "aws-bundle" && it.name != "azure-bundle" && it.name != "hadoop-common"
it.name != "gcp-bundle" && it.name != "aliyun-bundle" && it.name != "aws-bundle" && it.name != "azure-bundle" &&
it.name != "aws-hadoop-bundle" && it.name != "gcp-hadoop-bundle" && it.name != "azure-hadoop-bundle" && it.name != "aliyun-hadoop-bundle" &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, all modules under bundles do not need to copy dependencies.

Can we optimize the judgment condition here so that we can add sub-modules under bundles later without modifying this judgment condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use endsWith("bundle") or contains("bundle")?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about (it.parent?.name) != "bundles"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will take it.

it.name != "hadoop-common"
) {
from(it.configurations.runtimeClasspath)
into("distribution/package/libs")
Expand All @@ -802,7 +804,8 @@ tasks {
it.name != "bundled-catalog" &&
it.name != "hive-metastore-common" && it.name != "gcp-bundle" &&
it.name != "aliyun-bundle" && it.name != "aws-bundle" && it.name != "azure-bundle" &&
it.name != "hadoop-common" && it.name != "docs"
it.name != "docs" && it.name != "aws-hadoop-bundle" && it.name != "gcp-hadoop-bundle" && it.name != "azure-hadoop-bundle" && it.name != "aliyun-hadoop-bundle" &&
it.name != "hadoop-common"
) {
dependsOn("${it.name}:build")
from("${it.name}/build/libs")
Expand Down
39 changes: 26 additions & 13 deletions bundles/aliyun-bundle/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -26,31 +26,40 @@ plugins {

dependencies {
compileOnly(project(":api"))
compileOnly(project(":core"))
compileOnly(project(":catalogs:catalog-common"))
compileOnly(project(":catalogs:catalog-hadoop"))
compileOnly(project(":catalogs:hadoop-common")) {
compileOnly(project(":core"))
compileOnly(libs.hadoop3.client.api)
compileOnly(libs.hadoop3.client.runtime)
compileOnly(libs.hadoop3.oss)

implementation(project(":catalogs:catalog-common")) {
exclude("*")
}
implementation(project(":catalogs:hadoop-common")) {
exclude("*")
}
compileOnly(libs.hadoop3.common)

implementation(libs.aliyun.credentials.sdk)
implementation(libs.hadoop3.oss)

// Aliyun oss SDK depends on this package, and JDK >= 9 requires manual add
// https://www.alibabacloud.com/help/en/oss/developer-reference/java-installation?spm=a2c63.p38356.0.i1
implementation(libs.sun.activation)
implementation(libs.commons.collections3)

// oss needs StringUtils from commons-lang3 or the following error will occur in 3.3.0
// java.lang.NoClassDefFoundError: org/apache/commons/lang3/StringUtils
// org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.initialize(AliyunOSSFileSystemStore.java:111)
// org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:323)
// org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
implementation(libs.commons.lang3)
implementation(libs.guava)

implementation(project(":catalogs:catalog-common")) {
exclude("*")
}
implementation(libs.httpclient)
implementation(libs.jackson.databind)
implementation(libs.jackson.annotations)
implementation(libs.jackson.datatype.jdk8)
implementation(libs.jackson.datatype.jsr310)

// Aliyun oss SDK depends on this package, and JDK >= 9 requires manual add
// https://www.alibabacloud.com/help/en/oss/developer-reference/java-installation?spm=a2c63.p38356.0.i1
implementation(libs.sun.activation)
}

tasks.withType(ShadowJar::class.java) {
Expand All @@ -60,8 +69,12 @@ tasks.withType(ShadowJar::class.java) {
mergeServiceFiles()

// Relocate dependencies to avoid conflicts
relocate("org.jdom", "org.apache.gravitino.shaded.org.jdom")
relocate("org.apache.commons.lang3", "org.apache.gravitino.shaded.org.apache.commons.lang3")
relocate("org.jdom", "org.apache.gravitino.aliyun.shaded.org.jdom")
relocate("org.apache.commons.lang3", "org.apache.gravitino.aliyun.shaded.org.apache.commons.lang3")
relocate("com.fasterxml.jackson", "org.apache.gravitino.aliyun.shaded.com.fasterxml.jackson")
relocate("com.google.common", "org.apache.gravitino.aliyun.shaded.com.google.common")
relocate("org.apache.http", "org.apache.gravitino.aliyun.shaded.org.apache.http")
relocate("org.apache.commons.collections", "org.apache.gravitino.aliyun.shaded.org.apache.commons.collections")
}

tasks.jar {
Expand Down
58 changes: 58 additions & 0 deletions bundles/aliyun-hadoop-bundle/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar

plugins {
`maven-publish`
id("java")
alias(libs.plugins.shadow)
}

dependencies {
implementation(project(":bundles:aliyun-bundle"))
implementation(libs.commons.collections3)
implementation(libs.hadoop3.client.api)
implementation(libs.hadoop3.client.runtime)
implementation(libs.hadoop3.oss)
implementation(libs.httpclient)
}

tasks.withType(ShadowJar::class.java) {
isZip64 = true
configurations = listOf(project.configurations.runtimeClasspath.get())
archiveClassifier.set("")
mergeServiceFiles()

// Relocate dependencies to avoid conflicts
relocate("org.jdom", "org.apache.gravitino.aliyun.shaded.org.jdom")
relocate("org.apache.commons.lang3", "org.apache.gravitino.aliyun.shaded.org.apache.commons.lang3")
relocate("com.fasterxml.jackson", "org.apache.gravitino.aliyun.shaded.com.fasterxml.jackson")
relocate("com.google.common", "org.apache.gravitino.aliyun.shaded.com.google.common")
relocate("org.apache.http", "org.apache.gravitino.aliyun.shaded.org.apache.http")
relocate("org.apache.commons.collections", "org.apache.gravitino.aliyun.shaded.org.apache.commons.collections")
}

tasks.jar {
dependsOn(tasks.named("shadowJar"))
archiveClassifier.set("empty")
}

tasks.compileJava {
dependsOn(":catalogs:catalog-hadoop:runtimeJars")
}
23 changes: 15 additions & 8 deletions bundles/aws-bundle/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -26,27 +26,34 @@ plugins {

dependencies {
compileOnly(project(":api"))
compileOnly(project(":core"))
compileOnly(project(":catalogs:catalog-common"))
compileOnly(project(":catalogs:catalog-hadoop"))
compileOnly(project(":catalogs:hadoop-common")) {
compileOnly(project(":core"))
compileOnly(libs.hadoop3.aws)
compileOnly(libs.hadoop3.client.api)
compileOnly(libs.hadoop3.client.runtime)

implementation(project(":catalogs:catalog-common")) {
exclude("*")
}
implementation(project(":catalogs:hadoop-common")) {
exclude("*")
}
compileOnly(libs.hadoop3.common)

implementation(libs.aws.iam)
implementation(libs.aws.policy)
implementation(libs.aws.sts)
implementation(libs.hadoop3.aws)
implementation(project(":catalogs:catalog-common")) {
exclude("*")
}
implementation(libs.commons.lang3)
implementation(libs.guava)
}

tasks.withType(ShadowJar::class.java) {
isZip64 = true
configurations = listOf(project.configurations.runtimeClasspath.get())
archiveClassifier.set("")

relocate("org.apache.commons.lang3", "org.apache.gravitino.aws.shaded.org.apache.commons.lang3")
relocate("com.google.common", "org.apache.gravitino.aws.shaded.com.google.common")
relocate("com.fasterxml.jackson", "org.apache.gravitino.aws.shaded.com.fasterxml.jackson")
}

tasks.jar {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,14 @@

package org.apache.gravitino.s3.fs;

import com.amazonaws.auth.AWSCredentialsProvider;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Joiner;
import com.google.common.base.Splitter;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.Lists;
import java.io.IOException;
import java.util.List;
import java.util.Map;
import org.apache.gravitino.catalog.hadoop.fs.FileSystemProvider;
import org.apache.gravitino.catalog.hadoop.fs.FileSystemUtils;
Expand All @@ -31,30 +36,81 @@
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.Constants;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class S3FileSystemProvider implements FileSystemProvider {

private static final Logger LOGGER = LoggerFactory.getLogger(S3FileSystemProvider.class);

@VisibleForTesting
public static final Map<String, String> GRAVITINO_KEY_TO_S3_HADOOP_KEY =
ImmutableMap.of(
S3Properties.GRAVITINO_S3_ENDPOINT, Constants.ENDPOINT,
S3Properties.GRAVITINO_S3_ACCESS_KEY_ID, Constants.ACCESS_KEY,
S3Properties.GRAVITINO_S3_SECRET_ACCESS_KEY, Constants.SECRET_KEY);

// We can't use Constants.AWS_CREDENTIALS_PROVIDER directly, as in 2.7, this key does not exist.
private static final String S3_CREDENTIAL_KEY = "fs.s3a.aws.credentials.provider";
private static final String S3_SIMPLE_CREDENTIAL =
"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider";

@Override
public FileSystem getFileSystem(Path path, Map<String, String> config) throws IOException {
Configuration configuration = new Configuration();
Map<String, String> hadoopConfMap =
FileSystemUtils.toHadoopConfigMap(config, GRAVITINO_KEY_TO_S3_HADOOP_KEY);

if (!hadoopConfMap.containsKey(Constants.AWS_CREDENTIALS_PROVIDER)) {
configuration.set(
Constants.AWS_CREDENTIALS_PROVIDER, Constants.ASSUMED_ROLE_CREDENTIALS_DEFAULT);
if (!hadoopConfMap.containsKey(S3_CREDENTIAL_KEY)) {
hadoopConfMap.put(S3_CREDENTIAL_KEY, S3_SIMPLE_CREDENTIAL);
}

hadoopConfMap.forEach(configuration::set);

// Hadoop-aws 2 does not support IAMInstanceCredentialsProvider
checkAndSetCredentialProvider(configuration);

return S3AFileSystem.newInstance(path.toUri(), configuration);
}

private void checkAndSetCredentialProvider(Configuration configuration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it related to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this change aims to make it work in hadoop 2.x

String provides = configuration.get(S3_CREDENTIAL_KEY);
if (provides == null) {
return;
}

Splitter splitter = Splitter.on(',').trimResults().omitEmptyStrings();
Joiner joiner = Joiner.on(",").skipNulls();
// Split the list of providers
List<String> providers = splitter.splitToList(provides);
List<String> validProviders = Lists.newArrayList();

for (String provider : providers) {
try {
Class<?> c = Class.forName(provider);
if (AWSCredentialsProvider.class.isAssignableFrom(c)) {
validProviders.add(provider);
} else {
LOGGER.warn(
"Credential provider {} is not a subclass of AWSCredentialsProvider, skipping",
provider);
}
} catch (Exception e) {
LOGGER.warn(
"Credential provider {} not found in the Hadoop runtime, falling back to default",
provider);
configuration.set(S3_CREDENTIAL_KEY, S3_SIMPLE_CREDENTIAL);
return;
}
}

if (validProviders.isEmpty()) {
configuration.set(S3_CREDENTIAL_KEY, S3_SIMPLE_CREDENTIAL);
} else {
configuration.set(S3_CREDENTIAL_KEY, joiner.join(validProviders));
}
}

/**
* Get the scheme of the FileSystem. Attention, for S3 the schema is "s3a", not "s3". Users should
* use "s3a://..." to access S3.
Expand Down
51 changes: 51 additions & 0 deletions bundles/aws-hadoop-bundle/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar

plugins {
`maven-publish`
id("java")
alias(libs.plugins.shadow)
}

dependencies {
implementation(project(":bundles:aws-bundle"))
implementation(libs.hadoop3.aws)
implementation(libs.hadoop3.client.api)
implementation(libs.hadoop3.client.runtime)
}

tasks.withType(ShadowJar::class.java) {
isZip64 = true
configurations = listOf(project.configurations.runtimeClasspath.get())
archiveClassifier.set("")

relocate("org.apache.commons.lang3", "org.apache.gravitino.aws.shaded.org.apache.commons.lang3")
relocate("com.google.common", "org.apache.gravitino.aws.shaded.com.google.common")
relocate("com.fasterxml.jackson", "org.apache.gravitino.aws.shaded.com.fasterxml.jackson")
}

tasks.jar {
dependsOn(tasks.named("shadowJar"))
archiveClassifier.set("empty")
}

tasks.compileJava {
dependsOn(":catalogs:catalog-hadoop:runtimeJars")
}
Loading
Loading