GH-38074: [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join #38147

llama90 · 2023-10-09T08:02:14Z

Rationale for this change

We found that the wrong results in inner joins during hash join operations were caused by a problem with how large strings and binary types were handled. The Slice function was not calculating their sizes correctly.

To fix this, I changed the Slice function to calculate the sizes correctly, based on the type of data for large string and binary.

Issue raised: Inner joins are incorrect #37729

What changes are included in this PR?

The Slice function has been updated to correctly calculate the offset for Large String and Large Binary types, and assertion statements have been added to improve maintainability.
Unit tests (TEST(KeyColumnArray, SliceBinaryTest))for the Slice function have been added.
During random tests for Hash Join (TEST(HashJoin, Random)), modifications were made to allow the creation of Large String as key column values.

Are these changes tested?

Yes

Are there any user-facing changes?

Acero might not have a large user base as it is an experimental feature, but I deemed the issue of incorrect join results as critical and have addressed the bug.

Closes: [C++][Acero] Incorrect results in inner join #38074

github-actions · 2023-10-09T08:02:53Z

⚠️ GitHub issue #38074 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/compute/light_array.cc

pitrou · 2023-10-09T15:24:20Z

@llama90 I don't understand why you've changed your fix, while I was asking you to explain the underlying bug.

llama90 · 2023-10-09T15:44:14Z

@pitrou I thought you were pointing out a part in the code where a bug could occur due to implicit type conversion.

So that's why I made the changes, and I didn't realize that there should be a discussion first when such reviews are given. I apologize for the confusion.

Is it right to fundamentally ask why the code was changed?

The initial issue raised was regarding incorrect return values of the Inner Join. Upon analyzing the code, it was found that during the execution of the BuildBloomFilter_exec_task function, incorrect offset calculations were made when calling the HashBatch function, leading to incorrect hash values being generated.

HashBatch is responsible for copying ColumnArrays within the Key Batch using offset and length, and it calls the Slice function during this process.

In the issue, a large_utf8 type key column was being used, and the original code was set to always calculate the offset for such binary types as uint32_t size, which resulted in incorrect Inner Join outcomes.

pitrou · 2023-10-09T15:54:38Z

In the issue, a large_utf8 type key column was being used, and the original code was set to always calculate the offset for such binary types as uint32_t size, which resulted in incorrect Inner Join outcomes.

Ahah, ok, thanks for the explanation.

pitrou

Here are some comments.

Also, can you add join tests with large_binary or large_utf8?

cpp/src/arrow/compute/light_array_test.cc

cpp/src/arrow/compute/light_array.cc

llama90 · 2023-10-09T16:09:57Z

@pitrou You are right. I will incorporate your feedback and add the Commit as soon as possible.

llama90 · 2023-10-10T15:56:30Z

Here are some comments.

Also, can you add join tests with large_binary or large_utf8?

I added the unit test about inner join for large_binary or large_utf8

3894147

ianmcook · 2023-10-10T20:14:01Z

@llama90 could you please run the linter? Instructions at https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

llama90 · 2023-10-11T04:32:55Z

@llama90 could you please run the linter? Instructions at https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

Did I apply the lint correctly as you intended?

ianmcook · 2023-10-11T11:18:12Z

@llama90 could you please run the linter? Instructions at https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci

Did I apply the lint correctly as you intended?

Yes, the "Dev / Lint C++, Python, R, Docker, RAT" test is passing now

ianmcook · 2023-10-11T13:42:58Z

@llama90 could you please merge/rebase this with the latest changes on the main branch? That should fix the remaining CI failure.

pitrou · 2023-10-11T13:48:24Z

FTR, I still need to take a look at the fix and see if we can make things more maintainable and more understandable in the future.

llama90 · 2023-10-11T14:09:21Z

FTR, I still need to take a look at the fix and see if we can make things more maintainable and more understandable in the future.

If possible, could you provide specific guidelines?

llama90 · 2023-10-11T14:10:33Z

@llama90 could you please merge/rebase this with the latest changes on the main branch? That should fix the remaining CI failure.

I rebased the main branch code onto my working branch and encountered the following error.

✅ For now, I resolved the error by adding the -DARROW_FLIGHT=OFF -DARROW_FLIGHT_SQL=OFF options.

[873/1122] Building CXX object src/arrow/flight/sql/CMakeFiles/acero-flight-sql-server.dir/example/acero_server.cc.o
FAILED: src/arrow/flight/sql/CMakeFiles/acero-flight-sql-server.dir/example/acero_server.cc.o 
/opt/homebrew/bin/ccache /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DARROW_EXTRA_ERROR_CONTEXT -DARROW_HAVE_NEON -DARROW_HDFS -DARROW_MIMALLOC -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DGFLAGS_IS_A_DLL=0 -DGRPC_ENABLE_ASYNC -DGRPC_NAMESPACE_FOR_TLS_CREDENTIALS_OPTIONS=grpc::experimental -DGRPC_USE_CERTIFICATE_VERIFIER -DGRPC_USE_TLS_CHANNEL_CREDENTIALS_OPTIONS -DURI_STATIC_BUILD -DUTF8PROC_STATIC -I/Users/lama/workspace/arrow-2/cpp/build-debug/src -I/Users/lama/workspace/arrow-2/cpp/src -I/Users/lama/workspace/arrow-2/cpp/src/generated -I/Users/lama/workspace/arrow-2/cpp/build-debug/substrait_ep-generated -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/grpc_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/absl_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/re2_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/cares_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/zlib_ep/src/zlib_ep-install/include -isystem /opt/homebrew/opt/openssl@3/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/protobuf_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/thirdparty/flatbuffers/include -isystem /Users/lama/workspace/arrow-2/cpp/thirdparty/hadoop/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/google_cloud_cpp_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/nlohmann_json_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/crc32c_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/boost_ep-prefix/src/boost_ep -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/brotli_ep/src/brotli_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/bzip2_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/lz4_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/snappy_ep/src/snappy_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/zstd_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/orc_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/awssdk_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/utf8proc_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/rapidjson_ep/src/rapidjson_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/xsimd_ep/src/xsimd_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/jemalloc_ep-prefix/src -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/mimalloc_ep/src/mimalloc_ep/include/mimalloc-2.0 -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googletest/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googletest -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googlemock/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googlemock -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/thrift_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/gflags_ep-prefix/src/gflags_ep/include -fno-aligned-new  -Qunused-arguments -fcolor-diagnostics  -Wall -Wextra -Wdocumentation -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wno-unknown-warning-option -Wno-pass-failed -march=armv8-a  -g -Werror -O0 -ggdb  -std=c++17 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -fPIE -fcolor-diagnostics -MD -MT src/arrow/flight/sql/CMakeFiles/acero-flight-sql-server.dir/example/acero_server.cc.o -MF src/arrow/flight/sql/CMakeFiles/acero-flight-sql-server.dir/example/acero_server.cc.o.d -o src/arrow/flight/sql/CMakeFiles/acero-flight-sql-server.dir/example/acero_server.cc.o -c /Users/lama/workspace/arrow-2/cpp/src/arrow/flight/sql/example/acero_server.cc
/Users/lama/workspace/arrow-2/cpp/src/arrow/flight/sql/example/acero_server.cc:169:86: error: missing field 'app_metadata' initializer [-Werror,-Wmissing-field-initializers]
        Ticket{std::move(ticket)}, /*locations=*/{}, /*expiration_time=*/std::nullopt}};
                                                                                     ^
1 error generated.
[877/1122] Building CXX object src/arrow/flight/sql/CMakeFiles/arrow-flight-sql-test.dir/example/acero_server.cc.o
FAILED: src/arrow/flight/sql/CMakeFiles/arrow-flight-sql-test.dir/example/acero_server.cc.o 
/opt/homebrew/bin/ccache /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DARROW_EXTRA_ERROR_CONTEXT -DARROW_FLIGHT_SQL_STATIC -DARROW_FLIGHT_STATIC -DARROW_HAVE_NEON -DARROW_HDFS -DARROW_MIMALLOC -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DGRPC_ENABLE_ASYNC -DGRPC_NAMESPACE_FOR_TLS_CREDENTIALS_OPTIONS=grpc::experimental -DGRPC_USE_CERTIFICATE_VERIFIER -DGRPC_USE_TLS_CHANNEL_CREDENTIALS_OPTIONS -DURI_STATIC_BUILD -DUTF8PROC_STATIC -I/Users/lama/workspace/arrow-2/cpp/build-debug/src -I/Users/lama/workspace/arrow-2/cpp/src -I/Users/lama/workspace/arrow-2/cpp/src/generated -I/Users/lama/workspace/arrow-2/cpp/build-debug/substrait_ep-generated -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/src/arrow/flight/sql/.. -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/grpc_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/absl_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/re2_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/cares_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/zlib_ep/src/zlib_ep-install/include -isystem /opt/homebrew/opt/openssl@3/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/protobuf_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/thirdparty/flatbuffers/include -isystem /Users/lama/workspace/arrow-2/cpp/thirdparty/hadoop/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/google_cloud_cpp_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/nlohmann_json_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/crc32c_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/boost_ep-prefix/src/boost_ep -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/brotli_ep/src/brotli_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/bzip2_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/lz4_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/snappy_ep/src/snappy_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/zstd_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/orc_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/awssdk_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/utf8proc_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/rapidjson_ep/src/rapidjson_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/xsimd_ep/src/xsimd_ep-install/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/jemalloc_ep-prefix/src -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/mimalloc_ep/src/mimalloc_ep/include/mimalloc-2.0 -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googletest/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googletest -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googlemock/include -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/_deps/googletest-src/googlemock -isystem /Users/lama/workspace/arrow-2/cpp/build-debug/thrift_ep-install/include -fno-aligned-new  -Qunused-arguments -fcolor-diagnostics  -Wall -Wextra -Wdocumentation -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wno-unknown-warning-option -Wno-pass-failed -march=armv8-a  -g -Werror -O0 -ggdb  -std=c++17 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -fPIE -fcolor-diagnostics -MD -MT src/arrow/flight/sql/CMakeFiles/arrow-flight-sql-test.dir/example/acero_server.cc.o -MF src/arrow/flight/sql/CMakeFiles/arrow-flight-sql-test.dir/example/acero_server.cc.o.d -o src/arrow/flight/sql/CMakeFiles/arrow-flight-sql-test.dir/example/acero_server.cc.o -c /Users/lama/workspace/arrow-2/cpp/src/arrow/flight/sql/example/acero_server.cc
/Users/lama/workspace/arrow-2/cpp/src/arrow/flight/sql/example/acero_server.cc:169:86: error: missing field 'app_metadata' initializer [-Werror,-Wmissing-field-initializers]
        Ticket{std::move(ticket)}, /*locations=*/{}, /*expiration_time=*/std::nullopt}};
                                                                                     ^
1 error generated.
[880/1122] Building CXX object src/arrow/flight/sql/CMakeFiles/arrow-flight-sql-test.dir/acero_test.cc.o
ninja: build stopped: subcommand failed.

…tion

…types in slice function

…e conversion in uint32_t * int64_t

…e_utf8 and large_binary

…r the slice function with binary type

llama90 · 2023-10-13T17:04:45Z

@westonpace I've refined the code, removing the dictionary type as it doesn't seem to be added in any test.

Also, I truly appreciate all the reviews.

As a beginner, I feel both overwhelmed and excited to handle an issue that requires a complex understanding. While I am aware of my limitations, I am committed to giving my best.

I humbly ask for your generous advice and guidance. Thank you.

westonpace

Thanks for your contribution! This is an improvement over what was there before. I think, with this PR, that slicing KeyColumnArray with large string works.

I'm not quite convinced yet that large strings work consistently in the hash join. I see you did add some testing of large string / hash join in the hash_join_node_test but these don't cover values greater than 2^32 (which is hard to do in any kind of performant test sadly). So maybe this is a sign that we support "large strings that could be stored as small strings"

If the values in buffers_[1] of the key column array are ever cast to int32_t in the hash join code (which I feel they most likely are) then this type of failure wouldn't show up until actual large strings start showing up.

However, this is an improvement, and we don't need to solve every problem all at once, so I don't see any real concern with proceeding with this PR if @pitrou is satisfied.

westonpace · 2023-10-14T16:23:42Z

As a beginner, I feel both overwhelmed and excited to handle an issue that requires a complex understanding. While I am aware of my limitations, I am committed to giving my best.

The hash join code is complex and quite different from the rest of the arrow code base. It unfortunately reinvents things that we have elsewhere. Don't worry about feeling overwhelmed here, I think many of us are. Do you have any long term goals for this feature?

pitrou · 2023-10-14T16:25:34Z

Yes, I agree the PR as is is a good improvement now.

What I would suggest is to update the PR title and description to better explain the problem. Specifically, it is about slicing large string and large binary types, with the problem being the offset size not correctly computed, IIUC.

("uint64_t Types" in the title is really confusing as this PR has nothing to do with 64-bit integer columns)

pitrou · 2023-10-14T16:27:48Z

Also, big +1 to what @westonpace said above. You definitely didn't choose the easiest part of Arrow to contribute to :-)

llama90 · 2023-10-14T17:33:52Z

@pitrou Hello, I have revised and updated the PR title and content.

@westonpace It seems like the issues you mentioned include the following items:

Support for Dictionary types in key columns during Hash Join
Handling of Dictionary types in Hash Join (Swiss)
Join support when some columns contain lists

All are related to joins and seem to be interesting areas. I am also interested in the issues you've highlighted and would like to attempt improvements when I have some spare time.

I feel proud to have made a meaningful contribution.

@pitrou @westonpace @ianmcook Thank you again for your review, and I hope to engage with you more frequently with new contributions.

pitrou · 2023-10-16T08:58:15Z

Thanks a lot for this fix @llama90 !

@raulcd This should probably be a candidate for 14.0.0.

…and Binary Types in Hash Join (#38147) ### Rationale for this change We found that the wrong results in inner joins during hash join operations were caused by a problem with how large strings and binary types were handled. The `Slice` function was not calculating their sizes correctly. To fix this, I changed the `Slice` function to calculate the sizes correctly, based on the type of data for large string and binary. * Issue raised: #37729 ### What changes are included in this PR? * The `Slice` function has been updated to correctly calculate the offset for Large String and Large Binary types, and assertion statements have been added to improve maintainability. * Unit tests (`TEST(KeyColumnArray, SliceBinaryTest)`)for the Slice function have been added. * During random tests for Hash Join (`TEST(HashJoin, Random)`), modifications were made to allow the creation of Large String as key column values. ### Are these changes tested? Yes ### Are there any user-facing changes? Acero might not have a large user base as it is an experimental feature, but I deemed the issue of incorrect join results as critical and have addressed the bug. * Closes: #38074 Authored-by: Hyunseok Seo <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

conbench-apache-arrow · 2023-10-17T21:54:31Z

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit fb26178.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.

…tring and Binary Types in Hash Join (apache#38147) ### Rationale for this change We found that the wrong results in inner joins during hash join operations were caused by a problem with how large strings and binary types were handled. The `Slice` function was not calculating their sizes correctly. To fix this, I changed the `Slice` function to calculate the sizes correctly, based on the type of data for large string and binary. * Issue raised: apache#37729 ### What changes are included in this PR? * The `Slice` function has been updated to correctly calculate the offset for Large String and Large Binary types, and assertion statements have been added to improve maintainability. * Unit tests (`TEST(KeyColumnArray, SliceBinaryTest)`)for the Slice function have been added. * During random tests for Hash Join (`TEST(HashJoin, Random)`), modifications were made to allow the creation of Large String as key column values. ### Are these changes tested? Yes ### Are there any user-facing changes? Acero might not have a large user base as it is an experimental feature, but I deemed the issue of incorrect join results as critical and have addressed the bug. * Closes: apache#38074 Authored-by: Hyunseok Seo <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

github-actions bot added Component: C++ awaiting review Awaiting review labels Oct 9, 2023

pitrou reviewed Oct 9, 2023

View reviewed changes

cpp/src/arrow/compute/light_array.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 9, 2023

llama90 requested a review from pitrou October 9, 2023 15:22

pitrou requested changes Oct 9, 2023

View reviewed changes

cpp/src/arrow/compute/light_array_test.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/light_array.cc Show resolved Hide resolved

llama90 requested a review from westonpace as a code owner October 10, 2023 15:54

llama90 requested a review from pitrou October 10, 2023 16:33

llama90 force-pushed the bugfix/hash-join-issue branch from 4df23e9 to dd5bce5 Compare October 11, 2023 14:06

llama90 added 6 commits October 11, 2023 23:26

feat(cpp/computer): apache#38074 Support uint64_t types in slice func…

8788b85

…tion

feat(cpp/computer): apache#38074 Add unitest for supporting uint64_t …

38d751e

…types in slice function

fix(cpp/compute): apache#38074 Fix to avoid error due to implicit typ…

d0037b1

…e conversion in uint32_t * int64_t

feat(cpp/acero): apache#38074 Add unit test for inner join about larg…

73b69a2

…e_utf8 and large_binary

style(cpp/compute): apache#38074 Enhance readability in unit tests fo…

ee4ec93

…r the slice function with binary type

style(cpp/compute): Apply lint

222de88

llama90 force-pushed the bugfix/hash-join-issue branch from dd5bce5 to 222de88 Compare October 11, 2023 14:27

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 13, 2023

llama90 added 2 commits October 14, 2023 01:54

fix(cpp/compute): Add accidentally deleted test

7225ab1

feat(cpp/compute): Refine code based on review feedback

f87389b

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Oct 13, 2023

westonpace approved these changes Oct 14, 2023

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Oct 14, 2023

llama90 changed the title ~~GH-38074: [C++] Support uint64_t Types in Slice Function to Address Specific Inner Join Bug~~ GH-38074: [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join Oct 14, 2023

llama90 requested a review from pitrou October 15, 2023 09:19

pitrou approved these changes Oct 16, 2023

View reviewed changes

pitrou merged commit fb26178 into apache:main Oct 16, 2023
41 checks passed

pitrou removed the awaiting merge Awaiting merge label Oct 16, 2023

github-actions bot added the awaiting committer review Awaiting committer review label Oct 16, 2023

AlenkaF mentioned this pull request Dec 18, 2023

Inner joins are incorrect #37729

Closed

zanmato1984 mentioned this pull request Dec 21, 2023

[Python][C++?] Dataset "right anti" join gives incorrect result with large_string columns (11.0.0) #35354

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-38074: [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join #38147

GH-38074: [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join #38147

llama90 commented Oct 9, 2023 •

edited

Loading

github-actions bot commented Oct 9, 2023

pitrou commented Oct 9, 2023

llama90 commented Oct 9, 2023

pitrou commented Oct 9, 2023

pitrou left a comment

llama90 commented Oct 9, 2023

llama90 commented Oct 10, 2023

ianmcook commented Oct 10, 2023

llama90 commented Oct 11, 2023

ianmcook commented Oct 11, 2023

ianmcook commented Oct 11, 2023

pitrou commented Oct 11, 2023

llama90 commented Oct 11, 2023

llama90 commented Oct 11, 2023 •

edited

Loading

llama90 commented Oct 13, 2023

westonpace left a comment

westonpace commented Oct 14, 2023

pitrou commented Oct 14, 2023

pitrou commented Oct 14, 2023

llama90 commented Oct 14, 2023

pitrou commented Oct 16, 2023

conbench-apache-arrow bot commented Oct 17, 2023

GH-38074: [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join #38147

GH-38074: [C++] Fix Offset Size Calculation for Slicing Large String and Binary Types in Hash Join #38147

Conversation

llama90 commented Oct 9, 2023 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Oct 9, 2023

pitrou commented Oct 9, 2023

llama90 commented Oct 9, 2023

pitrou commented Oct 9, 2023

pitrou left a comment

Choose a reason for hiding this comment

llama90 commented Oct 9, 2023

llama90 commented Oct 10, 2023

ianmcook commented Oct 10, 2023

llama90 commented Oct 11, 2023

ianmcook commented Oct 11, 2023

ianmcook commented Oct 11, 2023

pitrou commented Oct 11, 2023

llama90 commented Oct 11, 2023

llama90 commented Oct 11, 2023 • edited Loading

llama90 commented Oct 13, 2023

westonpace left a comment

Choose a reason for hiding this comment

westonpace commented Oct 14, 2023

pitrou commented Oct 14, 2023

pitrou commented Oct 14, 2023

llama90 commented Oct 14, 2023

pitrou commented Oct 16, 2023

conbench-apache-arrow bot commented Oct 17, 2023

llama90 commented Oct 9, 2023 •

edited

Loading

llama90 commented Oct 11, 2023 •

edited

Loading