-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regularly Segfaulting in NOT EXIST query #7394
Comments
Just a check, is your project running in release mode? |
Yes it is running in release. I do I can't share the exact example, maybe I can share the queries if I obfuscate them a bit. |
I have distilled the query down specifically to a SELECT
lhs.uuid,
lhs.thing,
lhs.other_thing,
lhs.other_other_thing
FROM lhs
WHERE NOT EXISTS (
SELECT * FROM rhs
WHERE lhs.uuid = rhs.uuid
AND lhs.thing = rhs.thing
AND lhs.other_thing = rhs.other_thing
AND lhs.other_other_thing = rhs.other_other_thing
) where all types are nullable strings, or int64s. At first I thought it had to do with nesting, now I am wondering if it has to do with nullable comparisons inside of a not exist? The same happens with LEFT OUTER JOIN for what it's worth. |
if I replace the AND's with checks for IS NOT NULL first on the optional fields, I now OOM. The OOM is surprising, the data isn't that large (<500 MB on disk uncompressed) and the query itself isn't that demanding but I'm blowing through 50GB SWAP and 16GB RAM. So it seems like I am possibly blowing out the stack or there is a memory bug associated with NOT EXISTS with WHERE clauses that compare NULL's. But in general the memory consumption here seems quite high for a query returning zero records. I might do better JOINing the two tables and looking for NULLs in a for loop post serialization? |
I was able to work around these issues by sorting the data frames and single thread walking the OUTER JOIN/NOT EXISTS in rust without getting anything near an OOM, nor a SO. This is at least 15x faster than the failed runs despite costs to serialization, etc. I recommend this approach as a workaround should anyone else run into this. I do hope to see the performance and correctness of these queries improved because I really do like this project. |
I think we should understand what is going on here. Can you post some dummy data and an accompanying query to reproduce on our end? |
Describe the bug
The data I am using is in a normalized parquet format and each "table" is in its own file, each registered to the SessionContext instance, and can be queried with a variety of other queries. The problematic query has maybe 4
JOIN
s (outer, inner, etc) and it ends with aNOT EXISTS
clause on a subquery with another set of the roughly the same 4JOIN
s on different tables. The outer join occurs inside of a view which is shared by the other 3 inner joins. The query planning succeeds. The query runs for a while, maybe 15 minutes, and then appears as though it has completed (CPU cores spin down, RAM consumption goes down to baseline). The SegFault happens during the collection of the DataFrame itself(it'sawait
ed on) before the RecordBatches are collected from the Dataframe. For what its worth, the dataframe should be empty at the end of this query as its serving as a control for a unit test.Then segfault currently occurs on an intel Mac. I saw an open issue about seg faulting in unit tests #5693 and don't know whether or not this could be the same issue.
I see a few blocks of unsafe code in the project, most of which look benign, but I haven't ruled out a stack overflow scenario. Not sure where to poke at. May try adjusting
RUST_MIN_STACK
to see if that helps? Or memoizing the subquery results before theNOT EXIST
call?Any suggestions appreciated.
To Reproduce
I can't share the data to reproduce this or the code unfortunately but something tells me I could make a MRE as I doubt this behavior is exclusive to this type of query.
Expected behavior
This may sound terse but I mean this in the most polite way possible. Ideally queries do not segfault.
Additional context
No response
The text was updated successfully, but these errors were encountered: