-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix some issues with filters on numeric columns with nulls #9251
fix some issues with filters on numeric columns with nulls #9251
Conversation
.dataSource(CalciteTests.DATASOURCE3) | ||
.intervals(querySegmentSpec(Filtration.eternity())) | ||
.granularity(Granularities.ALL) | ||
.filters(bound("l1", "3", null, true, false, null, StringComparators.NUMERIC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving these three tests to BoundFilterTest
? I suggest this for two reasons:
- This test isn't adding much to coverage of the SQL layer.
- BoundFilterTest extends from BaseFilterTest which tests a whole big matrix of configurations, so coverage of the lower level stuff will be better vs. using CalciteQueryTest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm working on adding some tests here to get the extra coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some refactoring around BaseFilterTest
to add in numeric columns with null values, and bumped into some additional issues. First one in the JavascriptDimFilter
, whose numeric predicates did not implement applyNull
, meaning you couldn't specifically filter for the null value. The 'in' filter I believe also has this problem, assuming that something like .. x IN (1, 2, NULL)
is valid, but is more involved to fix so I didn't do it in this PR (assuming it needs fixed?).
I also ran into some issues with vectorized value and predicate matchers on numeric null columns. The initial issue is that the matchers were not checking the null bitmap, however once that was in place I bumped into another issue that the null vector could be incorrectly polluted with the wrong null information in the case where the null bitmap ran out of values before the end of the column (likely) because it was breaking from the loop instead of writing false values until the end vector offset.
I added tests for all of these cases.
I did not dig into the expression filter for fear of what I might run into, but will investigate this as a follow-up to this PR, along with any remaining filters which I did not address.
@@ -57,7 +57,8 @@ | |||
final int row = i + startOffset; | |||
nullIterator.advanceIfNeeded(row); | |||
if (!nullIterator.hasNext()) { | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Arrays.fill
to fill the rest with falses, and then break? Should be somewhat more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 will change
@@ -71,7 +72,8 @@ | |||
final int row = currentOffsets[i]; | |||
nullIterator.advanceIfNeeded(row); | |||
if (!nullIterator.hasNext()) { | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment.
@@ -24,8 +24,20 @@ | |||
import com.google.common.collect.ImmutableList; | |||
import com.google.common.collect.ImmutableMap; | |||
import com.google.common.collect.Iterables; | |||
import junitparams.converters.Nullable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean javax.annotation.Nullable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, oops, gg intellij
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment: did you consider removing the default impl of applyNull
from the DruidXPredicate interfaces? It seems like it being missing is just asking for null handling bugs.
Yes, and I agree, though since I was planning to look at expression filters and the others as a follow-up I was considering making this change later. Should I just do it all now? |
I think doing those as a follow-up is OK. |
* fix issue with long column predicate filters and nulls * dang * uncomment a thing * styles * oops * allcaps * review stuff
Description
This PR fixes an issue with predicate filters, such as a bound filter, on long columns with numeric nulls, where it will incorrectly compare
0
instead of callingpredicate.applyNull()
. The correct code is actually in place already inLongValueMatcherColumnSelectorStrategy
, but this was getting pre-empted byFilters.makeValueMatcher
which had it's own predicate defined specifically handling long columns, along with a code comment:Unfortunately, it just hadn't been done yet.
Prior to this PR, the added
testLongPredicateFilterNulls
inCalciteQueryTest
would fail with an error of the form:While adding tests for this fix, I also encountered an issue with the
JavascriptDimFilter
, which did not implementapplyNull
for it's predicate, causing it to be unable to feed null values into javascript filter functions.I also ran into some issues with vectorized value and predicate matchers on numeric null columns. The initial issue is that the matchers were not checking the null vector, however once that was in place I bumped into another issue that the null vector could be incorrectly polluted with the wrong null information in the case where the null bitmap ran out of values before the end of the column (likely) because it was breaking from the loop instead of writing false values until the end vector offset.
This PR has: