-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overflow in repeat_arrs_from_indices
#13237
Labels
bug
Something isn't working
Comments
Exception raised here |
Yeah, this doesn't seem like a problem with join but rather a problem with |
Dandandan
changed the title
Overflow in Join Processing for Large Ranges
Overflow in Nov 8, 2024
repeat_arrs_from_indices
Failed with:
It works, but slow, taking 224.24 seconds. cc @alamb |
Sorry, can you point out the difference between the code snippets? |
Ah largelistbuilder vs listbuilder |
Sounds like a fun optimization exercise to figure out what is going on |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
An error occurs when performing an inner join on two large unnest(range(...)) datasets in DataFusion version 42.2.0. The issue persists regardless of join type preference, as setting
datafusion.optimizer.prefer_hash_join = false
does not affect the outcome. Notably, this query executes successfully on version 41.0 without issues.To Reproduce
Full Log Output (click to expand)
Expected behavior
When sufficient memory is allocated, the join operation should complete successfully without errors. If memory is insufficient, the system should handle the situation gracefully, either by spilling to disk or by returning an appropriate error message indicating a memory limit issue, rather than encountering a panic or overflow error.
The text was updated successfully, but these errors were encountered: