Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix some issues with filters on numeric columns with nulls #9251

Merged
merged 7 commits into from
Jan 28, 2020

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jan 24, 2020

Description

This PR fixes an issue with predicate filters, such as a bound filter, on long columns with numeric nulls, where it will incorrectly compare 0 instead of calling predicate.applyNull(). The correct code is actually in place already in LongValueMatcherColumnSelectorStrategy, but this was getting pre-empted by Filters.makeValueMatcher which had it's own predicate defined specifically handling long columns, along with a code comment:

// This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns.

Unfortunately, it just hadn't been done yet.

Prior to this PR, the added testLongPredicateFilterNulls in CalciteQueryTest would fail with an error of the form:

java.lang.AssertionError
	at org.apache.druid.segment.data.ColumnarLongs$1HistoricalLongColumnSelectorWithNulls.getLong(ColumnarLongs.java:124)
	at org.apache.druid.segment.filter.Filters$4.matches(Filters.java:467)
	at org.apache.druid.segment.FilteredOffset.increment(FilteredOffset.java:75)
	at org.apache.druid.segment.QueryableIndexCursorSequenceBuilder$QueryableIndexCursor.advance(QueryableIndexCursorSequenceBuilder.java:417)
	at org.apache.druid.query.timeseries.TimeseriesQueryEngine.lambda$processNonVectorized$2(TimeseriesQueryEngine.java:288)

While adding tests for this fix, I also encountered an issue with the JavascriptDimFilter, which did not implement applyNull for it's predicate, causing it to be unable to feed null values into javascript filter functions.

I also ran into some issues with vectorized value and predicate matchers on numeric null columns. The initial issue is that the matchers were not checking the null vector, however once that was in place I bumped into another issue that the null vector could be incorrectly polluted with the wrong null information in the case where the null bitmap ran out of values before the end of the column (likely) because it was breaking from the loop instead of writing false values until the end vector offset.


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

.dataSource(CalciteTests.DATASOURCE3)
.intervals(querySegmentSpec(Filtration.eternity()))
.granularity(Granularities.ALL)
.filters(bound("l1", "3", null, true, false, null, StringComparators.NUMERIC))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving these three tests to BoundFilterTest? I suggest this for two reasons:

  1. This test isn't adding much to coverage of the SQL layer.
  2. BoundFilterTest extends from BaseFilterTest which tests a whole big matrix of configurations, so coverage of the lower level stuff will be better vs. using CalciteQueryTest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm working on adding some tests here to get the extra coverage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some refactoring around BaseFilterTest to add in numeric columns with null values, and bumped into some additional issues. First one in the JavascriptDimFilter, whose numeric predicates did not implement applyNull, meaning you couldn't specifically filter for the null value. The 'in' filter I believe also has this problem, assuming that something like .. x IN (1, 2, NULL) is valid, but is more involved to fix so I didn't do it in this PR (assuming it needs fixed?).

I also ran into some issues with vectorized value and predicate matchers on numeric null columns. The initial issue is that the matchers were not checking the null bitmap, however once that was in place I bumped into another issue that the null vector could be incorrectly polluted with the wrong null information in the case where the null bitmap ran out of values before the end of the column (likely) because it was breaking from the loop instead of writing false values until the end vector offset.

I added tests for all of these cases.

I did not dig into the expression filter for fear of what I might run into, but will investigate this as a follow-up to this PR, along with any remaining filters which I did not address.

@clintropolis clintropolis changed the title fix issue with long column predicate matcher based filters and nulls fix some issues with filters on numeric columns with nulls Jan 27, 2020
@@ -57,7 +57,8 @@
final int row = i + startOffset;
nullIterator.advanceIfNeeded(row);
if (!nullIterator.hasNext()) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using Arrays.fill to fill the rest with falses, and then break? Should be somewhat more efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will change

@@ -71,7 +72,8 @@
final int row = currentOffsets[i];
nullIterator.advanceIfNeeded(row);
if (!nullIterator.hasNext()) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

@@ -24,8 +24,20 @@
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.Iterables;
import junitparams.converters.Nullable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean javax.annotation.Nullable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, oops, gg intellij

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment: did you consider removing the default impl of applyNull from the DruidXPredicate interfaces? It seems like it being missing is just asking for null handling bugs.

@clintropolis
Copy link
Member Author

Another comment: did you consider removing the default impl of applyNull from the DruidXPredicate interfaces? It seems like it being missing is just asking for null handling bugs.

Yes, and I agree, though since I was planning to look at expression filters and the others as a follow-up I was considering making this change later. Should I just do it all now?

@gianm
Copy link
Contributor

gianm commented Jan 27, 2020

Yes, and I agree, though since I was planning to look at expression filters and the others as a follow-up I was considering making this change later. Should I just do it all now?

I think doing those as a follow-up is OK.

@gianm gianm merged commit 36c5efe into apache:master Jan 28, 2020
clintropolis added a commit to implydata/druid-public that referenced this pull request Jan 28, 2020
* fix issue with long column predicate filters and nulls

* dang

* uncomment a thing

* styles

* oops

* allcaps

* review stuff
@clintropolis clintropolis deleted the predicate-filter-long-nulls branch January 28, 2020 02:54
@jihoonson jihoonson added this to the 0.18.0 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants