-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of using PruningPredicate
to datafusion-examples
#9183
Conversation
// File 2: `x = 5 AND y = 10` can never evaluate to true because y | ||
// has only the value of 7. Thus this file can be skipped. | ||
false, | ||
// File 3: `x = 5 AND y = 10` can never evaluate to true because x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @appletreeisyellow here is an actual example showing that the pruning predicate does the right thing with unknown column values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File 3 example makes sense to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the result will be for a file 4 like:
File 4: x has values between 4
and 6
nothing is known about the value of y
Same the predicate x = 5 AND y = 10
, my understanding is that it will evaluate to true.
x = 5 AND y = 10
--> true AND null
--> null
Since y is unknown, so there is a possibility that y is 10
in this file / partition / row group of data. Thus this file can not be skipped and the result is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same the predicate x = 5 AND y = 10, my understanding is that it will evaluate to true.
Yes, this is my understanding too (that the PruningPredicate
will return true
for this container)
Since y is unknown, so there is a possibility that y is 10 in this file / partition / row group of data. Thus this file can not be skipped and the result is true
Yes, that is my understanding as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @alamb I love reviewing such docs as it gives more understanding
There likely a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding examples @alamb. Super helpful! I left a question for a new example and a suggestion
// File 2: `x = 5 AND y = 10` can never evaluate to true because y | ||
// has only the value of 7. Thus this file can be skipped. | ||
false, | ||
// File 3: `x = 5 AND y = 10` can never evaluate to true because x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File 3 example makes sense to me 👍
// Note, returning null means the value isn't known, NOT | ||
// that we know the entire column is null. | ||
(None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That probably looks familiar :)
// File 2: `x = 5 AND y = 10` can never evaluate to true because y | ||
// has only the value of 7. Thus this file can be skipped. | ||
false, | ||
// File 3: `x = 5 AND y = 10` can never evaluate to true because x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the result will be for a file 4 like:
File 4: x has values between 4
and 6
nothing is known about the value of y
Same the predicate x = 5 AND y = 10
, my understanding is that it will evaluate to true.
x = 5 AND y = 10
--> true AND null
--> null
Since y is unknown, so there is a possibility that y is 10
in this file / partition / row group of data. Thus this file can not be skipped and the result is true
Co-authored-by: Chunchun Ye <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Which issue does this PR close?
Part of #7013
Related to #7869 and #9171
Rationale for this change
What changes are included in this PR?
pruning.rs
example to datafusion-examples with an annotated guide to using `PruningPredicatePruningPredicate
API docsAre these changes tested?
Yes, as part of CI
Are there any user-facing changes?
A new example, no code changes