-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5074] feat(hadoop-catalog): Support GCS fileset. #5079
Conversation
The related documents will be in separate PR files. |
|
||
@Tag("gravitino-docker-test") | ||
@TestInstance(TestInstance.Lifecycle.PER_CLASS) | ||
@Disabled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if ITs are not possible without a valid account, then we can add some unit tests for GCS. As I know, Iceberg had already implemented this, you can refer this: https://github.com/apache/iceberg/blob/main/gcp/src/test/java/org/apache/iceberg/gcp/gcs/GCSFileIOTest.java. But I haven't verified the feasibility, please help confirm this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check it.
build.gradle.kts
Outdated
@@ -764,7 +764,7 @@ tasks { | |||
!it.name.startsWith("integration-test") && | |||
!it.name.startsWith("flink") && | |||
!it.name.startsWith("trino-connector") && | |||
it.name != "hive-metastore-common" | |||
it.name != "hive-metastore-common" && it.name != "gcs-bundle" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcp-bundle, not gcs
@Disabled( | ||
"Disabled due to as we don't have a real GCP account to test. If you have a GCP account," | ||
+ "please change the configuration(YOUR_KEY_FILE, YOUR_BUCKET) and enable this test.") | ||
public class HadoopGCPCatalogIT extends HadoopCatalogIT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to clarify the difference between gcp and gcs.
bundles/gcs-bundle/build.gradle.kts
Outdated
} | ||
|
||
dependencies { | ||
compileOnly(project(":catalogs:catalog-hadoop")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess compileOnly
is not enough for gvfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this depend as implementation
in module filesysem-hadoop3
, so I believe it's unnecessary for gcs-bundle
jar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
public static final String SERVICE_ACCOUNT_FILE = "YOUR_KEY_FILE"; | ||
|
||
@BeforeAll | ||
public void setup() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better you can also have gvfs test for gcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
How does user use gcp-bundle with gvfs, can you please give me an example? |
I have verify the code, from the client size, the users should include the following dependencies if he wants to use gcs fileset <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.gravitino</groupId>
<artifactId>gcp-bundle</artifactId>
<version>0.7.0-incubating-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.gravitino</groupId>
<artifactId>filesystem-hadoop3-runtime</artifactId>
<version>0.7.0-incubating-SNAPSHOT</version>
</dependency> suggested by @xloya, there may be conflicts if we include The reason why we need to include |
You can use GravitinoVirtualFileSystemIT.testCreate first in deploy mode and block it to start the gravitino server.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.gravitino</groupId>
<artifactId>gcp-bundle</artifactId>
<version>0.7.0-incubating-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.gravitino</groupId>
<artifactId>filesystem-hadoop3-runtime</artifactId>
<version>0.7.0-incubating-SNAPSHOT</version>
</dependency>
@jerryshao |
### What changes were proposed in this pull request? 1. Add a bundled jar for Hadoop GCS jar. 2. Support GCS in Hadoop catalog. ### Why are the changes needed? Users highly demand Fileset for GCS storage. Fix: apache#5074 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Manually, please see: HadoopGCPCatalogIT
What changes were proposed in this pull request?
Why are the changes needed?
Users highly demand Fileset for GCS storage.
Fix: #5074
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Manually, please see: HadoopGCPCatalogIT