Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST: Docker file for REST Fixture #11283

Merged
merged 4 commits into from
Nov 28, 2024
Merged

Conversation

ajantha-bhat
Copy link
Member

depends on #11279

build.gradle Outdated Show resolved Hide resolved
@@ -985,6 +985,15 @@ project(':iceberg-open-api') {
exclude group: 'org.apache.commons', module: 'commons-configuration2'
exclude group: 'org.apache.hadoop.thirdparty', module: 'hadoop-shaded-protobuf_3_7'
exclude group: 'org.eclipse.jetty'
exclude group: 'com.google.re2j', module: 're2j'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluded some more which are unrelated to 'Configuration' class.

These were not included in the license too.

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Nov 11, 2024

@Fokko: Thanks for the review. When I exclude some dependencies from hadoop-common (like hadoop auth), it failed at runtime.

I fixed and double checked now. Everything works fine.
I also enabled logging framework by default for this runtime jar now (it will be helpful for the user).

Steps to verify.

1. java -jar open-api/build/libs/iceberg-open-api-test-fixtures-runtime-1.8.0-SNAPSHOT.jar

2. /Users/ajantha/Downloads/spark-3.5.0-bin-hadoop3/bin/spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.tck=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.defaultCatalog=tck \
--conf spark.sql.catalog.tck.uri=http://localhost:8181 \
--conf spark.sql.catalog.tck.type=rest \
--conf spark.sql.catalog.tck.warehouse=/Users/ajantha/Downloads/temp/wh

3. CREATE TABLE tck.nyc.taxis
(
  vendor_id bigint,
  trip_id bigint,
  trip_distance float,
  fare_amount double,
  store_and_fwd_flag string
)
PARTITIONED BY (vendor_id);


INSERT INTO tck.nyc.taxis
VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');

SELECT * FROM tck.nyc.taxis;

@Fokko
Copy link
Contributor

Fokko commented Nov 12, 2024

Hey @ajantha-bhat Thanks for taking a stab at this again. My main goal is to use this docker image to replace PyIceberg, Iceberg-Rust, and Iceberg-Go. These repositories still rely on a third-pary container that we want to get rid of (I believe you also raised this earlier). I tried this, but failed because it didn't come with the AWS Runtime:

pyiceberg-rest   | [main] INFO org.apache.iceberg.rest.RESTCatalogServer - Creating catalog with properties: {jdbc.password=password, s3.endpoint=http://minio:9000, jdbc.user=user, io-impl=org.apache.iceberg.aws.s3.S3FileIO, catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog, jdbc.schema-version=V1, warehouse=s3://warehouse/, uri=jdbc:sqlite::memory:}
pyiceberg-rest   | [main] INFO org.apache.iceberg.CatalogUtil - Loading custom FileIO implementation: org.apache.iceberg.aws.s3.S3FileIO
pyiceberg-rest   | Exception in thread "main" java.lang.IllegalArgumentException: Cannot initialize FileIO implementation org.apache.iceberg.aws.s3.S3FileIO: Cannot find constructor for interface org.apache.iceberg.io.FileIO
pyiceberg-rest   | 	Missing org.apache.iceberg.aws.s3.S3FileIO [java.lang.NoClassDefFoundError: software/amazon/awssdk/core/exception/SdkException]
pyiceberg-rest   | 	at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:356)
pyiceberg-rest   | 	at org.apache.iceberg.jdbc.JdbcCatalog.initialize(JdbcCatalog.java:132)
pyiceberg-rest   | 	at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:274)
pyiceberg-rest   | 	at org.apache.iceberg.CatalogUtil.buildIcebergCatalog(CatalogUtil.java:328)
pyiceberg-rest   | 	at org.apache.iceberg.rest.RESTCatalogServer.initializeBackendCatalog(RESTCatalogServer.java:88)

This is because we store the metadata in Minio, so the metadata is easily accessible outside of the container as well. How do you feel about adding the S3 runtime?

Steps to run the tests:

git clone [email protected]:apache/iceberg-python.git
cd iceberg-python

Apply patch as described in #11283 (review):

➜  iceberg-python git:(main) ✗ git diff
diff --git a/dev/docker-compose-integration.yml b/dev/docker-compose-integration.yml
index fccdcdc75..9a807fca3 100644
--- a/dev/docker-compose-integration.yml
+++ b/dev/docker-compose-integration.yml
@@ -41,7 +41,7 @@ services:
       - hive:hive
       - minio:minio
   rest:
-    image: tabulario/iceberg-rest
+    image: apache/iceberg-rest-adapter
     container_name: pyiceberg-rest
     networks:
       iceberg_net:

And run the tests:

make install
make test-integration

Tail the logs using:

docker compose -f dev/docker-compose-integration.yml logs -f

@ajantha-bhat
Copy link
Member Author

@Fokko: I have also added GCP and Azure runtime dependency.
Most of the tests were passed. 2 failures are there. I think @sungwy is looking at it.

================================================================================== short test summary info ===================================================================================
FAILED tests/integration/test_writes/test_writes.py::test_create_table_transaction[session_catalog-2] - pydantic_core._pydantic_core.ValidationError: 1 validation error for TableResponse
FAILED tests/integration/test_writes/test_writes.py::test_create_table_with_non_default_values[session_catalog-2] - pydantic_core._pydantic_core.ValidationError: 1 validation error for TableResponse
==================================================== 2 failed, 814 passed, 8 skipped, 2853 deselected, 1451 warnings in 524.78s (0:08:44) ====================================================
make: *** [test-integration] Error 1
ajantha@Ajantha-Bhat-MacBook-Pro-16-inch-2023- iceberg-python % 

@Fokko
Copy link
Contributor

Fokko commented Nov 14, 2024

@ajantha-bhat Thanks for adding the runtime dependencies! 🙌

Yes, that looks like it will be fixed in apache/iceberg-python#1321 (review). I'll do some final checks, but I think this is ready 👍

@Fokko
Copy link
Contributor

Fokko commented Nov 14, 2024

The patch fixes it indeed 👍

==================================================================================== 816 passed, 8 skipped, 2853 deselected, 1455 warnings in 182.65s (0:03:02) =====================================================================================

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some final checks on the licenses, looks all good 👍 Thanks @ajantha-bhat for fixing this, @kevinjqliu, @findepi, @mrcnc for the review and thanks @bryanck for the help around the licenses generation.

@jbonofre
Copy link
Member

I'm doing a new pass on license and content (Cat A deps).

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Built locally and ran the image against iceberg-python, iceberg-go, and iceberg-rust integration tests, all passed.

Here are some other references to tabulario/iceberg-rest
https://grep.app/search?q=tabulario/iceberg-rest&filter[repo.pattern][0]=apache

@ajantha-bhat
Copy link
Member Author

@jbonofre: Do you have any more comments for this?

@Fokko Fokko requested a review from jbonofre November 20, 2024 20:22
docker/iceberg-rest-adapter-image/Dockerfile Show resolved Hide resolved
docker/iceberg-rest-adapter-image/README.md Show resolved Hide resolved
@@ -81,6 +81,9 @@ Copyright 2002-2024 The Apache Software Foundation
Apache Commons Lang
Copyright 2001-2023 The Apache Software Foundation

Apache Commons Configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this ?
I checked the NOTICE file in commons-configuration, and I see:

Apache Commons Configuration
Copyright 2001-2013 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

As reminder, the purpose of ALv2 4.d, is to include NOTICE content (as it is) in the NOTICE file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure.

https://github.com/apache/commons-configuration/blob/master/NOTICE.txt

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

This line is there at the beginning of the section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for each one instead of section. So, notice looks as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's the NOTICE from main branch. I checked in the commons-configuration distribution of the version used in Iceberg.
That's why I was not sure.

@jbonofre
Copy link
Member

@ajantha-bhat @Fokko quick question for you guys. We can use a single DockerHub repo (apache/iceberg) with different image tag names (iceberg-rest-fixture-x.x, iceberg-other-y.y, ...). Else, I can create several DockerHub repos (per usage, similar to what I did for ActiveMQ for instance where we have apache/activemq-classic and apache/activemq-artemis). What's your preference ? Personally, I think it's "cleaner" to use different repositories.

@Fokko
Copy link
Contributor

Fokko commented Nov 25, 2024

@jbonofre I'm strongly in favor of having separate repositories so we can separate the different containers nicely.

@ajantha-bhat
Copy link
Member Author

@jbonofre: Separate repo is fine for me.
Regarding this PR. Do you have any suggestions? I didn't see any changes required based on your comment yesterday.

@Fokko Fokko changed the title REST: Docker file for Rest catalog adapter image REST: Docker file for REST Fixture Nov 28, 2024
@Fokko Fokko merged commit 163e206 into apache:main Nov 28, 2024
51 checks passed
@Fokko
Copy link
Contributor

Fokko commented Nov 28, 2024

Thanks @ajantha-bhat for working on this, and thanks @bryanck, @mrcnc, @kevinjqliu, @jbonofre and @findepi for reviewing 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants