Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13643: [C++][Compute] Implement outer join with support for residual predicates #11579

Closed
wants to merge 1 commit into from

Conversation

save-buffer
Copy link
Contributor

Implements residual predicates on hash join

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@save-buffer save-buffer force-pushed the sasha_join_filter branch 2 times, most recently from 0754d39 to 718f2a3 Compare November 4, 2021 04:36
@save-buffer
Copy link
Contributor Author

The CI failure seems to be a failure with failing to download python 3.1 (https://discuss.python.org/t/disappearing-macos-packages-on-python-org/11737), which is unrelated to this PR.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I'm still reading this PR, but I've got a few minor comments now

cpp/src/arrow/compute/exec/expression.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/options.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join_node.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join.h Outdated Show resolved Hide resolved
@save-buffer save-buffer force-pushed the sasha_join_filter branch 6 times, most recently from 71f5bd6 to 8e01ed9 Compare November 6, 2021 18:35
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. I'm surprised this only required one additional test as there is a fair amount of nuance here but it is possible that some of these situations are being covered by the existing tests.

cpp/src/arrow/compute/exec/hash_join.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join_node.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join_node.cc Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join_node.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join_node.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join_node_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join.cc Show resolved Hide resolved
cpp/src/arrow/compute/exec/hash_join.cc Show resolved Hide resolved
@save-buffer
Copy link
Contributor Author

The failures seem to be due to conda having a bad time:

UserAgent: aws-sdk-cpp/1.8.74/S3/Linux/5.11.0-1020-azure x86_64 GCC/7.5.0
Error: mkdir /tmp/s3fs-test-4m9eos07/bucket/somefile/: file exists (*fs.PathError)
       4: cmd/api-errors.go:2082:cmd.toAPIErrorCode()
       3: cmd/api-errors.go:2107:cmd.toAPIError()
       2: cmd/object-handlers.go:1801:cmd.objectAPIHandlers.PutObjectHandler()
       1: net/http/server.go:2046:http.HandlerFunc.ServeHTTP()

Don't think it's related to me. I'll try rebasing in a couple days and maybe the checks will pass.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nits from me.

cpp/src/arrow/compute/exec/hash_join.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/exec/options.h Outdated Show resolved Hide resolved
@westonpace
Copy link
Member

@bkietz did you want to look over this one more time? Otherwise I think we can merge.

@save-buffer save-buffer force-pushed the sasha_join_filter branch 2 times, most recently from 76613e3 to 762695a Compare November 25, 2021 06:44
@save-buffer
Copy link
Contributor Author

[ RUN      ] TestArrowReadDeltaEncoding.DeltaByteArray
unknown file: Failure
C++ exception with description "IOError: Failed to open local file '/arrow/cpp/submodules/parquet-testing/data/delta_byte_array.parquet'. Detail: [errno 2] No such file or directory" thrown in the test body.

These failures don't appear to be me.

@lidavidm
Copy link
Member

[ RUN      ] TestArrowReadDeltaEncoding.DeltaByteArray
unknown file: Failure
C++ exception with description "IOError: Failed to open local file '/arrow/cpp/submodules/parquet-testing/data/delta_byte_array.parquet'. Detail: [errno 2] No such file or directory" thrown in the test body.

These failures don't appear to be me.

Hmm, it seems this PR modifies the parquet-testing submodule - perhaps it needs to be rebased/something happened during the last rebase?

@save-buffer
Copy link
Contributor Author

Yeah that seems like what happened. I guess I still need some practice with rebasing 😅 I went with the nuclear option and squashed my commits and removed the change to the testing directory, so hopefully it'll be good now.

@lidavidm
Copy link
Member

Ok, looks like these last two failures/timeouts are not related. Thanks for the updates!

@lidavidm lidavidm closed this in 4913352 Nov 29, 2021
@ursabot
Copy link

ursabot commented Nov 29, 2021

Benchmark runs are scheduled for baseline = 72f7fbe and contender = 4913352. 4913352 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Finished ⬇️0.53% ⬆️0.22%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@save-buffer save-buffer deleted the sasha_join_filter branch November 29, 2021 22:14
kou pushed a commit to kou/arrow that referenced this pull request Dec 1, 2021
…idual predicates

Implements residual predicates on hash join

Closes apache#11579 from save-buffer/sasha_join_filter

Authored-by: Sasha Krassovsky <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants