Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Test failure in Dataset regarding TestAllTypes #40568

Closed
vibhatha opened this issue Mar 15, 2024 · 19 comments
Closed

[Java] Test failure in Dataset regarding TestAllTypes #40568

vibhatha opened this issue Mar 15, 2024 · 19 comments

Comments

@vibhatha
Copy link
Collaborator

Describe the bug, including details regarding any error messages, version, and platform.

There is a test failure regarding parquet based test cases in Dataset module. This issue was identified here.

Error:  Failures: 
Error:    TestAllTypes.testAllTypesParquet:261->TestDataset.assertParquetFileEquals:124 expected:<...length=2, nullCount=[2], ArrowFieldNode [length=0, nullCount=0], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=0, nullCount=0], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=0, nullCount=0], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=2, nullCount=0]], #buffers=71, buffersLayout=[ArrowBuffer [offset=0, size=1], ArrowBuffer [offset=8, size=1], ArrowBuffer [offset=16, size=1], ArrowBuffer [offset=24, size=2], ArrowBuffer [offset=32, size=1], ArrowBuffer [offset=40, size=4], ArrowBuffer [offset=48, size=1], ArrowBuffer [offset=56, size=8], ArrowBuffer [offset=64, size=1], ArrowBuffer [offset=72, size=16], ArrowBuffer [offset=88, size=1], ArrowBuffer [offset=96, size=2], ArrowBuffer [offset=104, size=1], ArrowBuffer [offset=112, size=4], ArrowBuffer [offset=120, size=1], ArrowBuffer [offset=128, size=8], ArrowBuffer [offset=136, size=1], ArrowBuffer [offset=144, size=16], ArrowBuffer [offset=160, size=1], ArrowBuffer [offset=168, size=8], ArrowBuffer [offset=176, size=1], ArrowBuffer [offset=184, size=16], ArrowBuffer [offset=200, size=1], ArrowBuffer [offset=208, size=12], ArrowBuffer [offset=224, size=1], ArrowBuffer [offset=232, size=1], ArrowBuffer [offset=240, size=12], ArrowBuffer [offset=256, size=1], ArrowBuffer [offset=264, size=1], ArrowBuffer [offset=272, size=12], ArrowBuffer [offset=288, size=1], ArrowBuffer [offset=296, size=1], ArrowBuffer [offset=304, size=12], ArrowBuffer [offset=320, size=1], ArrowBuffer [offset=328, size=1], ArrowBuffer [offset=336, size=2], ArrowBuffer [offset=344, size=1], ArrowBuffer [offset=352, size=8], ArrowBuffer [offset=360, size=1], ArrowBuffer [offset=368, size=8], ArrowBuffer [offset=376, size=1], ArrowBuffer [offset=384, size=16], ArrowBuffer [offset=400, size=1], ArrowBuffer [offset=408, size=16], ArrowBuffer [offset=424, size=1], ArrowBuffer [offset=432, size=16], ArrowBuffer [offset=448, size=1], ArrowBuffer [offset=456, size=16], ArrowBuffer [offset=472, size=1], ArrowBuffer [offset=480, size=16], ArrowBuffer [offset=496, size=1], ArrowBuffer [offset=504, size=16], ArrowBuffer [offset=520, size=1], ArrowBuffer [offset=528, size=32], ArrowBuffer [offset=560, size=1], ArrowBuffer [offset=568, size=32], ArrowBuffer [offset=600, size=1], ArrowBuffer [offset=608, size=12], ArrowBuffer [offset=624, size=0], ArrowBuffer [offset=624, size=0], ArrowBuffer [offset=624, size=1], ArrowBuffer [offset=632, size=12], ArrowBuffer [offset=648, size=0], ArrowBuffer [offset=648, size=0], ArrowBuffer [offset=648, size=1], ArrowBuffer [offset=656, size=12], ArrowBuffer [offset=672, size=0], ArrowBuffer [offset=672, size=0], ArrowBuffer [offset=672, size=1], ArrowBuffer [offset=680, size=1], ArrowBuffer [offset=688], size=8]], closed=f...> but was:<...length=2, nullCount=[1], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=0, nullCount=0], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=0, nullCount=0], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=0, nullCount=0], ArrowFieldNode [length=2, nullCount=2], ArrowFieldNode [length=2, nullCount=0]], #buffers=73, buffersLayout=[ArrowBuffer [offset=0, size=1], ArrowBuffer [offset=8, size=1], ArrowBuffer [offset=16, size=1], ArrowBuffer [offset=24, size=2], ArrowBuffer [offset=32, size=1], ArrowBuffer [offset=40, size=4], ArrowBuffer [offset=48, size=1], ArrowBuffer [offset=56, size=8], ArrowBuffer [offset=64, size=1], ArrowBuffer [offset=72, size=16], ArrowBuffer [offset=88, size=1], ArrowBuffer [offset=96, size=2], ArrowBuffer [offset=104, size=1], ArrowBuffer [offset=112, size=4], ArrowBuffer [offset=120, size=1], ArrowBuffer [offset=128, size=8], ArrowBuffer [offset=136, size=1], ArrowBuffer [offset=144, size=16], ArrowBuffer [offset=160, size=1], ArrowBuffer [offset=168, size=4], ArrowBuffer [offset=176, size=1], ArrowBuffer [offset=184, size=8], ArrowBuffer [offset=192, size=1], ArrowBuffer [offset=200, size=16], ArrowBuffer [offset=216, size=1], ArrowBuffer [offset=224, size=12], ArrowBuffer [offset=240, size=1], ArrowBuffer [offset=248, size=1], ArrowBuffer [offset=256, size=12], ArrowBuffer [offset=272, size=1], ArrowBuffer [offset=280, size=1], ArrowBuffer [offset=288, size=12], ArrowBuffer [offset=304, size=1], ArrowBuffer [offset=312, size=1], ArrowBuffer [offset=320, size=12], ArrowBuffer [offset=336, size=1], ArrowBuffer [offset=344, size=1], ArrowBuffer [offset=352, size=2], ArrowBuffer [offset=360, size=1], ArrowBuffer [offset=368, size=8], ArrowBuffer [offset=376, size=1], ArrowBuffer [offset=384, size=8], ArrowBuffer [offset=392, size=1], ArrowBuffer [offset=400, size=16], ArrowBuffer [offset=416, size=1], ArrowBuffer [offset=424, size=16], ArrowBuffer [offset=440, size=1], ArrowBuffer [offset=448, size=16], ArrowBuffer [offset=464, size=1], ArrowBuffer [offset=472, size=16], ArrowBuffer [offset=488, size=1], ArrowBuffer [offset=496, size=16], ArrowBuffer [offset=512, size=1], ArrowBuffer [offset=520, size=16], ArrowBuffer [offset=536, size=1], ArrowBuffer [offset=544, size=32], ArrowBuffer [offset=576, size=1], ArrowBuffer [offset=584, size=32], ArrowBuffer [offset=616, size=1], ArrowBuffer [offset=624, size=12], ArrowBuffer [offset=640, size=0], ArrowBuffer [offset=640, size=0], ArrowBuffer [offset=640, size=1], ArrowBuffer [offset=648, size=12], ArrowBuffer [offset=664, size=0], ArrowBuffer [offset=664, size=0], ArrowBuffer [offset=664, size=1], ArrowBuffer [offset=672, size=12], ArrowBuffer [offset=688, size=0], ArrowBuffer [offset=688, size=0], ArrowBuffer [offset=688, size=1], ArrowBuffer [offset=696, size=1], ArrowBuffer [offset=704], size=8]], closed=f...>

Component(s)

Java

@kou
Copy link
Member

kou commented Mar 15, 2024

Can you reproduce this on local?

@vibhatha
Copy link
Collaborator Author

I am working on it at the moment 🙂

@vibhatha
Copy link
Collaborator Author

@kou I cannot build the JNI libraries in both Ubuntu and Mac due to a different reason. I am working on that...

@vibhatha
Copy link
Collaborator Author

-- Building using CMake version: 3.28.3
-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 13.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /bin/c++ - skipped

This is part of the trace.

@kou
Copy link
Member

kou commented Mar 15, 2024

Could you also show dpkg -l | grep libstd?

@vibhatha
Copy link
Collaborator Author

vibhatha commented Mar 15, 2024

$ dpkg -l | grep libstd
ii  libstdc++-11-dev:amd64                     11.4.0-1ubuntu1~22.04                                 amd64        GNU Standard C++ Library v3 (development files)
ii  libstdc++6:amd64                           12.3.0-1ubuntu1~22.04                                 amd64        GNU Standard C++ Library v3
ii  libstdc++6:i386                            12.3.0-1ubuntu1~22.04                                 i386         GNU Standard C++ Library v3

@kou
Copy link
Member

kou commented Mar 15, 2024

Could you remove llvm installed by conda?

@kou
Copy link
Member

kou commented Mar 15, 2024

Or could you also install clang by conda?

@kou
Copy link
Member

kou commented Mar 15, 2024

It seems that you mixed LLVM from conda and LLVM by apt:

CMake Warning at cmake_modules/FindLLVMAlt.cmake:58 (find_package):
  Could not find a configuration file for package "LLVM" that is compatible
  with requested version "18.1".

  The following configuration files were considered but not accepted:

    /home/asus/miniforge3/envs/pyarrow-dev/lib/cmake/llvm/LLVMConfig.cmake, version: 17.0.6
    /usr/lib/llvm-15/cmake/LLVMConfig.cmake, version: 15.0.7
    /usr/lib/llvm-13/cmake/LLVMConfig.cmake, version: 13.0.1
    /usr/lib/llvm-15/lib/cmake/llvm/LLVMConfig.cmake, version: 15.0.7
    /usr/lib/llvm-13/lib/cmake/llvm/LLVMConfig.cmake, version: 13.0.1
    /usr/lib/llvm-15/share/llvm/cmake/LLVMConfig.cmake, version: 15.0.7
    /usr/lib/llvm-13/share/llvm/cmake/LLVMConfig.cmake, version: 13.0.1
    /lib/llvm-15/cmake/LLVMConfig.cmake, version: 15.0.7
    /lib/llvm-13/cmake/LLVMConfig.cmake, version: 13.0.1

Call Stack (most recent call first):
  src/gandiva/CMakeLists.txt:30 (find_package)


-- Using LLVMConfig.cmake in: /home/asus/miniforge3/envs/pyarrow-dev/lib/cmake/llvm
-- Found llvm-link /usr/lib/llvm-15/bin/llvm-link
-- Found clang /usr/lib/llvm-15/bin/clang-15

@vibhatha
Copy link
Collaborator Author

@kou I removed llvm from apt and also installed clang.
But I get a different error now.

CMake Error at cmake_modules/ThirdpartyToolchain.cmake:4988 (set_property):
  The link interface of target "AWS::aws-c-cal" contains:

    OpenSSL::Crypto

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:176 (build_awssdk)
  cmake_modules/ThirdpartyToolchain.cmake:308 (build_dependency)
  cmake_modules/ThirdpartyToolchain.cmake:5030 (resolve_dependency)
  CMakeLists.txt:543 (include)


CMake Error at cmake_modules/BuildUtils.cmake:301 (target_link_libraries):
  Target "gandiva_objlib" links to:

    OpenSSL::Crypto

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  src/gandiva/CMakeLists.txt:140 (add_arrow_lib)


CMake Error at cmake_modules/BuildUtils.cmake:475 (target_link_libraries):
  Target "gandiva_static" links to:

    OpenSSL::Crypto

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  src/gandiva/CMakeLists.txt:140 (add_arrow_lib)

should I explicitly install openssl libs?

@vibhatha
Copy link
Collaborator Author

clang was installed by conda.

@kou
Copy link
Member

kou commented Mar 15, 2024

Yes. Could you install OpenSSL by ... conda?

@vibhatha
Copy link
Collaborator Author

@kou I was able to fix my dev environment and thanks for the tips.

Also I can verify that the issue recorded here can be reproduced in Ubuntu 22.04.4 LTS

@vibhatha
Copy link
Collaborator Author

@kou looking into the issue, in high level what's happening is that there is a comparison between in memory generated Arrow Java data which is later written to the disk vs a predefined test file alltypes-java.parquet which has been introduced here: https://github.com/apache/arrow-testing/blob/ad82a736c170e97b7c8c035ebd8a801c17eec170/data/parquet/README.md. In the description it mentions Parquet file written by using the Java DatasetWriter class in Arrow 14.0.... I am just thinking out loud, some modification happened in Arrow 15.0 could be causing this?

@kou
Copy link
Member

kou commented Mar 19, 2024

Could you try git bisect to identify which commit is related to this?

@vibhatha
Copy link
Collaborator Author

@kou the main change in the source is associated with #39681

Here we introduce the Float16, if we do it here, don't we have to update the alltypes-java.parquet in accordance with that?

@vibhatha
Copy link
Collaborator Author

If we check the testing in the main: https://github.com/apache/arrow-testing/tree/ad82a736c170e97b7c8c035ebd8a801c17eec170

And if we check the latest commit, we don't find what is introduced by this PR: apache/arrow-testing#99

@vibhatha
Copy link
Collaborator Author

Seems like we have to update the submodule?

lidavidm pushed a commit that referenced this issue Mar 19, 2024
### Rationale for this change

A recurring CI failure was observed as recorded in #40568. 

### What changes are included in this PR?

Updating the testing submodule to the changes reflected in apache/arrow-testing#99

### Are these changes tested?

Tested by existing test cases. 

### Are there any user-facing changes?

No
* GitHub Issue: #40568

Authored-by: Vibhatha Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm lidavidm added this to the 16.0.0 milestone Mar 19, 2024
@lidavidm
Copy link
Member

Issue resolved by pull request 40662
#40662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants