Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EMR serverless bundled iceberg JAR. #2632

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

asuresh8
Copy link
Contributor

Description

Instead of downloading the JAR from Maven, the JAR in the EMR serverless root file system can be used.

EMR bundles the Iceberg jar inside the default root file system:

$ docker run --rm -it --entrypoint bash public.ecr.aws/emr-serverless/spark/emr-6.13.0:latest
bash-4.2$ ls -la /usr/share/aws/iceberg/lib
total 157236
drwxr-xr-x 2 root root     4096 Jan  4 14:57 .
drwxr-xr-x 3 root root     4096 Jan  4 14:57 ..
lrwxrwxrwx 1 root root       24 Jan  4 14:57 iceberg-emr-common.jar -> iceberg-emr-common-*.jar
-rw-r--r-- 1 root root 54783862 Jul 30  2023 iceberg-flink-runtime-1.17-1.3.0-amzn-1.jar
lrwxrwxrwx 1 root root       43 Jan  4 14:57 iceberg-flink-runtime.jar -> iceberg-flink-runtime-1.17-1.3.0-amzn-1.jar
-rw-r--r-- 1 root root 54672313 Jul 30  2023 iceberg-hive-runtime-1.3.0-amzn-1.jar
lrwxrwxrwx 1 root root       37 Jan  4 14:57 iceberg-hive3-runtime.jar -> iceberg-hive-runtime-1.3.0-amzn-1.jar
-rw-r--r-- 1 root root 51530165 Jul 30  2023 iceberg-spark-runtime-3.4_2.12-1.3.0-amzn-1.jar
lrwxrwxrwx 1 root root       47 Jan  4 14:57 iceberg-spark3-runtime.jar -> iceberg-spark-runtime-3.4_2.12-1.3.0-amzn-1.jar

Testing

  1. Set up Security Lake which uses Iceberg tables
  2. Set up Ec2 instance with OpenSearch 2.13 using s3 data source
  3. Built sql plugin with local changes and uploaded to Ec2 instance running OpenSearch 2.13
  4. Submitted queries against iceberg table
$ curl --request  POST   --url http://localhost:9200/_plugins/_async_query   --header 'content-type: application/x-ndjson'   --data '{"datasource": "mygdc2","lang": "sql","query": "SELECT * FROM mygdc2.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_vpc_flow_2_0 LIMIT 1"}'
$ curl --request  GET --url http://localhost:9200/_plugins/_async_query/cW13OThCMWFTSG15Z2RjMg==
{
  "status": "SUCCESS",
  ...
}

EMR serverless logs confirms jar was used correctly:

24/04/22 01:46:35 INFO SparkContext: Added JAR file:/usr/share/aws/iceberg/lib/iceberg-spark-runtime-3.4_2.12-1.3.0-amzn-1.jar at 

Check List

  • New functionality includes testing.
    • [] All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Instead of downloading the JAR from Maven, the JAR in the EMR serverless
root file system can be used.

Signed-off-by: Adi Suresh <[email protected]>
@vamsimanohar vamsimanohar added enhancement New feature or request backport 2.x maintenance Improves code quality, but not the product and removed enhancement New feature or request labels Apr 22, 2024
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@vamsimanohar vamsimanohar merged commit e578a57 into opensearch-project:main Apr 24, 2024
26 of 29 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 24, 2024
Instead of downloading the JAR from Maven, the JAR in the EMR serverless
root file system can be used.

Signed-off-by: Adi Suresh <[email protected]>
(cherry picked from commit e578a57)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
vamsimanohar pushed a commit that referenced this pull request May 1, 2024
Instead of downloading the JAR from Maven, the JAR in the EMR serverless
root file system can be used.


(cherry picked from commit e578a57)

Signed-off-by: Adi Suresh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 1, 2024
Instead of downloading the JAR from Maven, the JAR in the EMR serverless
root file system can be used.

Signed-off-by: Adi Suresh <[email protected]>
(cherry picked from commit e578a57)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
vamsimanohar pushed a commit that referenced this pull request May 1, 2024
Instead of downloading the JAR from Maven, the JAR in the EMR serverless
root file system can be used.


(cherry picked from commit e578a57)

Signed-off-by: Adi Suresh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport 2.14 maintenance Improves code quality, but not the product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants