Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16584: [Java] Java JNI with S3 support #13157

Merged
merged 12 commits into from
Jul 29, 2022

Conversation

REASY
Copy link
Contributor

@REASY REASY commented May 14, 2022

macOS development target is changed to 10.13 from 10.11.
See also the discussion on mailing list:
https://lists.apache.org/thread/pjgjrl716gvqzql586cnnoxb38nb0j5w

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@kou
Copy link
Member

kou commented May 14, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: 777e1ef

Submitted crossbow builds: ursacomputing/crossbow @ actions-2091

Task Status
java-jars Github Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update ci/scripts/java_jni_macos_build.sh?

Could you also follow #13157 (comment) ?

ci/docker/linux-apt-jni.dockerfile Outdated Show resolved Hide resolved
ci/docker/linux-apt-jni.dockerfile Show resolved Hide resolved
ci/scripts/java_jni_manylinux_build.sh Outdated Show resolved Hide resolved
@REASY REASY changed the title feat: Java JNI with S3 support ARROW-16584: [Java] Java JNI with S3 support May 16, 2022
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@REASY REASY marked this pull request as ready for review May 17, 2022 01:36
@REASY REASY requested a review from kou May 18, 2022 08:03
@kou
Copy link
Member

kou commented May 18, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: df01c4b

Submitted crossbow builds: ursacomputing/crossbow @ actions-2110

Task Status
java-jars Github Actions

@REASY
Copy link
Contributor Author

REASY commented May 18, 2022

@kou do you think this can be backported to 8.x branch?
Could you point to release cycle, maybe 9.x is coming soo though...

@kou
Copy link
Member

kou commented May 18, 2022

It seems that we need to link AWS SDK C++ statically:

https://github.com/ursacomputing/crossbow/runs/6485814040?check_suite_focus=true#step:7:4639

Error: Unexpected shared dependency found in libgandiva_jni.dylib: `libaws-cpp-sdk-config`

Could you add the following build option?

diff --git a/ci/scripts/java_jni_macos_build.sh b/ci/scripts/java_jni_macos_build.sh
index e9572c334b..71a1730fce 100755
--- a/ci/scripts/java_jni_macos_build.sh
+++ b/ci/scripts/java_jni_macos_build.sh
@@ -76,6 +76,7 @@ cmake \
   -DARROW_THRIFT_USE_SHARED=OFF \
   -DARROW_UTF8PROC_USE_SHARED=OFF \
   -DARROW_ZSTD_USE_SHARED=OFF \
+  -DAWSSDK_SOURCE=BUNDLE \
   -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
   -DCMAKE_INSTALL_LIBDIR=lib \
   -DCMAKE_INSTALL_PREFIX=${build_dir} \

@kou
Copy link
Member

kou commented May 18, 2022

@kou do you think this can be backported to 8.x branch?

No. Generally, we backport only bug fixes. But we can discuss this on the [email protected] mailing list.

Could you point to release cycle, maybe 9.x is coming soo though...

Generally, we release a new version each 3-4 months. 8.0.0 was released in 2022-05. So 9.0.0 will be released in 2022-08/09.

FYI: We discussed release cycle recently: https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt
We may release more frequently in future.

@REASY REASY force-pushed the feature/java-with-s3 branch from df01c4b to ad62c6f Compare May 19, 2022 00:50
@raulcd
Copy link
Member

raulcd commented May 20, 2022

@kou @kszucs as a follow up from this issue, I can see we generate the jars on our nightly tests via the java-jars task but we don't seem to test them https://github.com/apache/arrow/blob/master/dev/tasks/java-jars/github.yml#L109-L116
We also seem to build them on java-jni-manylinux-2014 but not test the ones built either and only with JDK 1.8.
Am I missing us testing them somewhere else? Should we add new nightly jobs that builds on several JDKs and runs tests over the generated binaries? I can work on that if it would make sense

@kou
Copy link
Member

kou commented May 21, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: ad62c6f

Submitted crossbow builds: ursacomputing/crossbow @ actions-f6bd02cb91

Task Status
java-jars Github Actions

@kou
Copy link
Member

kou commented May 21, 2022

@raulcd Right. We don't test the built jars for now. We should test them. We can add extra (GitHub Actions) jobs to https://github.com/apache/arrow/blob/master/dev/tasks/java-jars/github.yml to test the built jars on Linux and macOS. (We don't need to add extra Crossbow jobs.)

I don't know much about JDK versions. Can we assume that the built jars with JDK 1.8 can work with all JDK >= 1.8? Or should we build jars for JDK X with JDK X? If we can assume that the built jars with JDK 1.8 can work with all JDK >= 1.8, we can just build jars only with JDK 1.8. (It may be better that we test the built jars with multiple JDK versions.)

@kou
Copy link
Member

kou commented May 22, 2022

Could you rebase on master? It may resolve the java-jars build failure.

@REASY REASY force-pushed the feature/java-with-s3 branch from ad62c6f to c2733b0 Compare May 22, 2022 13:44
@REASY
Copy link
Contributor Author

REASY commented May 22, 2022

@kou I think it makes sense to test separately for Java 8 and Java 11. Java 11 has new features, so the bytecode could look different. I'm getting the following warning on Java 11 when running parquet reader locally:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.arrow.memory.util.MemoryUtil (file:/home/%USER%/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/arrow/arrow-memory-core/8.0.0/arrow-memory-core-8.0.0.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of org.apache.arrow.memory.util.MemoryUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

If arrow java supports Java 17, it has to be tested separately as well because of the following change in JDK 17: https://openjdk.java.net/jeps/403

Strongly encapsulate all internal elements of the JDK, except for critical internal APIs such as sun.misc.Unsafe. It will no longer be possible to relax the strong encapsulation of internal elements via a single command-line option, as was possible in JDK 9 through JDK 16.

@kou
Copy link
Member

kou commented May 22, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: c2733b0

Submitted crossbow builds: ursacomputing/crossbow @ actions-04af25fddc

Task Status
java-jars Github Actions

@kou
Copy link
Member

kou commented May 23, 2022

Thanks for the warning information! Then we should test with multiple JDK versions.

@raulcd
Copy link
Member

raulcd commented May 24, 2022

I have created https://issues.apache.org/jira/browse/ARROW-16640 to track what we discussed about testing the jars and extend to other JDKs. Thanks!

@REASY
Copy link
Contributor Author

REASY commented May 28, 2022

@kou what are the next steps to make this PR ready?

@kou
Copy link
Member

kou commented May 28, 2022

@REASY
Copy link
Contributor Author

REASY commented Jul 26, 2022

@kou the error is the same as in previous java-jars:

Undefined symbols for architecture x86_64:
  "Aws::S3::S3Client::GetObjectAttributes(Aws::S3::Model::GetObjectAttributesRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::WriteGetObjectResponse(Aws::S3::Model::WriteGetObjectResponseRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::GetObjectAttributesAsync(Aws::S3::Model::GetObjectAttributesRequest const&, std::__1::function<void (Aws::S3::S3Client const*, Aws::S3::Model::GetObjectAttributesRequest const&, Aws::Utils::Outcome<Aws::S3::Model::GetObjectAttributesResult, Aws::S3::S3Error> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&)> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::GetObjectAttributesCallable(Aws::S3::Model::GetObjectAttributesRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::WriteGetObjectResponseAsync(Aws::S3::Model::WriteGetObjectResponseRequest const&, std::__1::function<void (Aws::S3::S3Client const*, Aws::S3::Model::WriteGetObjectResponseRequest const&, Aws::Utils::Outcome<Aws::NoResult, Aws::S3::S3Error> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&)> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::WriteGetObjectResponseCallable(Aws::S3::Model::WriteGetObjectResponseRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@REASY REASY force-pushed the feature/java-with-s3 branch from 53a8006 to d482759 Compare July 26, 2022 03:45
@REASY
Copy link
Contributor Author

REASY commented Jul 27, 2022

I've tried to add BUILD_SHARED_LIBS (related to aws/aws-sdk-cpp#1334 (comment))

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 5d1da18b7..f8679b04f 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -4552,6 +4552,7 @@ macro(build_awssdk)
   set(AWSSDK_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/awssdk_ep-install")
   set(AWSSDK_INCLUDE_DIR "${AWSSDK_PREFIX}/include")
   set(AWSSDK_LIB_DIR "lib")
+  option(BUILD_SHARED_LIBS "Build shared libraries" ON)

   if(WIN32)
     # On Windows, need to match build types
@@ -4781,6 +4782,7 @@ if(ARROW_S3)
     set_target_properties(AWS::aws-c-common
                           PROPERTIES INTERFACE_LINK_LIBRARIES
                                      "-pthread;pthread;-framework CoreFoundation")
+    set(BUILD_SHARED_LIBS "ON")
   endif()
 endif()

but getting the same error:

Undefined symbols for architecture x86_64:
  "Aws::S3::S3Client::GetObjectAttributes(Aws::S3::Model::GetObjectAttributesRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::WriteGetObjectResponse(Aws::S3::Model::WriteGetObjectResponseRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::GetObjectAttributesAsync(Aws::S3::Model::GetObjectAttributesRequest const&, std::__1::function<void (Aws::S3::S3Client const*, Aws::S3::Model::GetObjectAttributesRequest const&, Aws::Utils::Outcome<Aws::S3::Model::GetObjectAttributesResult, Aws::S3::S3Error> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&)> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::GetObjectAttributesCallable(Aws::S3::Model::GetObjectAttributesRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::WriteGetObjectResponseAsync(Aws::S3::Model::WriteGetObjectResponseRequest const&, std::__1::function<void (Aws::S3::S3Client const*, Aws::S3::Model::WriteGetObjectResponseRequest const&, Aws::Utils::Outcome<Aws::NoResult, Aws::S3::S3Error> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&)> const&, std::__1::shared_ptr<Aws::Client::AsyncCallerContext const> const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
  "Aws::S3::S3Client::WriteGetObjectResponseCallable(Aws::S3::Model::WriteGetObjectResponseRequest const&) const", referenced from:
      vtable for arrow::fs::(anonymous namespace)::S3Client in libarrow.a(s3fs.cc.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[179/185] Linking CXX shared library release/libgandiva_jni.900.0.0.dylib
ninja: build stopped: subcommand failed.
ec2-user@ip-x-x-x-x Desktop %

I can see those simbols in libarrow.a, for example, GetObjectAttributes:

ec2-user@ip-x-x-x-x arrow % nm ./cpp-build/cpp/release/libarrow.a | grep GetObjectAttributes
no symbols
no symbols
no symbols
no symbols
no symbols
no symbols
no symbols
                 U __ZNK3Aws2S38S3Client19GetObjectAttributesERKNS0_5Model26GetObjectAttributesRequestE
                 U __ZNK3Aws2S38S3Client24GetObjectAttributesAsyncERKNS0_5Model26GetObjectAttributesRequestERKNSt3__18functionIFvPKS1_S5_RKNS_5Utils7OutcomeINS2_25GetObjectAttributesResultENS0_7S3ErrorEEERKNS6_10shared_ptrIKNS_6Client18AsyncCallerContextEEEEEESN_
                 U __ZNK3Aws2S38S3Client27GetObjectAttributesCallableERKNS0_5Model26GetObjectAttributesRequestE

It is in the source code: S3Client.h#L2563 and S3Client.cpp#L1917 but cannot be found in the symbols:

ec2-user@ip-x-x-x-x arrow % nm ./cpp-build/cpp/awssdk_ep-prefix/src/awssdk_ep-build/aws-cpp-sdk-s3/libaws-cpp-sdk-s3.a | grep GetObjectAttributes
ec2-user@ip-x-x-x-x arrow %

@kou any thoughts?

brew's version for aws-sdk-cpp is 1.9.300

@kou
Copy link
Member

kou commented Jul 28, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: 106c816

Submitted crossbow builds: ursacomputing/crossbow @ actions-cfd8fcd94f

Task Status
java-jars Github Actions

@kou
Copy link
Member

kou commented Jul 28, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: 1ac0510

Submitted crossbow builds: ursacomputing/crossbow @ actions-683460ca62

Task Status
java-jars Github Actions

REASY and others added 11 commits July 28, 2022 20:03
…d shared dependency found in libgandiva_jni.dylib: `libaws-cpp-sdk-config`

`
…crossbow/arrow/cpp-build/awssdk_ep-prefix/src/awssdk_ep/aws-cpp-sdk-core/source/utils/crypto/commoncrypto/CryptoImpl.cpp:505:27: error: 'CCCryptorGCMSetIV' is only available on macOS 10.13 or newer [-Werror,-Wunguarded-availability-new]`
@REASY REASY force-pushed the feature/java-with-s3 branch from f3c2263 to 3c7255a Compare July 28, 2022 12:04
@REASY
Copy link
Contributor Author

REASY commented Jul 28, 2022

Revision: 1ac0510

Submitted crossbow builds: ursacomputing/crossbow @ actions-683460ca62

Task Status
java-jars Github Actions

This should be fixed by #13712

@kou
Copy link
Member

kou commented Jul 29, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: 3c7255a

Submitted crossbow builds: ursacomputing/crossbow @ actions-ea49ddbb96

Task Status
java-jars Github Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Thanks!

@kou kou merged commit 7fe71f5 into apache:master Jul 29, 2022
@REASY
Copy link
Contributor Author

REASY commented Jul 29, 2022

Thanks, @kou!

Could you, please, explain why it could not find the symbols? I saw you removed brew's version of aws-sdk-cpp. In ThirpartyToolChain.cmake I can see

  elseif(AWSSDK_SOURCE STREQUAL "BUNDLED")
    build_awssdk()

Wasn't it enough for cmake to use built version?

@kou
Copy link
Member

kou commented Jul 29, 2022

Sure.

Our build used header files provided by aws-sdk-cpp installed by Homebrew but used library files provided by our bundled aws-sdk-cpp. It causes the problem.

Our build used packages installed by Homebrew. Our build adds -I$(brew --prefix)/include for them. aws-sdk-cpp installed by Homebrew installs it's header files to $(brew --prefix)/include. Our build used bundled aws-sdk-cpp but header files provided by aws-sdk-cpp installed by Homebrew are used because header file search path includes $(brew --prefix)/include before header file search path for bundled aws-sdk-cpp.

@ursabot
Copy link

ursabot commented Jul 29, 2022

Benchmark runs are scheduled for baseline = fa7e7a3 and contender = 7fe71f5. 7fe71f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.51% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.78% ⬆️0.11%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] 7fe71f5c ec2-t3-xlarge-us-east-2
[Finished] 7fe71f5c test-mac-arm
[Finished] 7fe71f5c ursa-i9-9960x
[Finished] 7fe71f5c ursa-thinkcentre-m75q
[Failed] fa7e7a3f ec2-t3-xlarge-us-east-2
[Failed] fa7e7a3f test-mac-arm
[Finished] fa7e7a3f ursa-i9-9960x
[Finished] fa7e7a3f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants