Delegate wildcard query creation to MappedFieldType. #34062

jtibshirani · 2018-09-25T21:07:03Z

This is consistent with how we handle fuzzy, prefix, and regexp queries.

As part of this change, disallow wildcard queries on non-string fields, in addition to collation fields.

elasticmachine · 2018-09-25T21:07:04Z

Pinging @elastic/es-search-aggs

jimczi

Good catch @jtibshirani. I left some comments.

jimczi · 2018-09-25T21:30:10Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

@@ -345,6 +348,15 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me
        throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
    }

+    public Query wildcardQuery(String value, QueryShardContext context) {


I think the same reasoning can apply for wildcard, prefix and regex queries so the default impl should throw a QueryShardException ? Only StringFieldType fields should be able to build a wildcard query.

The keyword field applies the normalizer on termQuery. Depending on the normalizer the wildcard and escaped characters could be removed/replaced so I wonder if we should apply the same logic than QueryParserBase#analyzeWildcard for keyword fields. This is out of scope for this pr but it made me realize that we might have a bug here.

I didn't go further and disallow wildcard queries for all non-keyword or text fields, as some other field types like _index explicitly support wildcard queries.

I missed this part sorry. I think we should explicitly add the support in the _index field type rather than supporting this query on all fields. Currently the support for prefix queries is also broken so we don't really use this ability.

Okay, I'll make sure only string fields support wildcards by default. Maybe I'll add an upgrade note too in case this breaks some types we don't have test coverage for (will make it easier for users to debug + file issues)?

This is out of scope for this pr but it made me realize that we might have a bug here.

Makes sense, I'll make a note to follow-up on this.

Looking through the non-string field types, what do you think should be done with metadata types like IdFieldType, IgnoredFieldType, and RoutingFieldType? My intuition is we should switch them to being string fields to avoid breaking any queries.

This change would only break wildcard query on these fields, right ? +1 to make them string fields, prefix and regex query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.

jimczi · 2018-09-25T21:31:25Z

server/src/main/java/org/elasticsearch/index/query/WildcardQueryBuilder.java

        if (fieldType == null) {
-            term = new Term(fieldName, BytesRefs.toBytesRef(value));
+            Term term = new Term(fieldName, BytesRefs.toBytesRef(value));
+            query = new WildcardQuery(term);


nit: if the field does not exist we could return a MatchNoDocsQuery ?

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

In any case, maybe I could make this change in a follow-up PR, as I was just hoping for a straight refactor here?

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery is that we can fold the reason in the message and if users check the Lucene query through the _validate API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.

For backwards compatibility, we maintain support on the `_index` field.

jimczi

Thanks @jtibshirani , let's fix prefix and regex queries for metadata fields in a follow up.

jimczi · 2018-09-26T09:11:19Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

@@ -345,6 +348,15 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me
        throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
    }

+    public Query wildcardQuery(String value, QueryShardContext context) {


This change would only break wildcard query on these fields, right ? +1 to make them string fields, prefix and regex query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.

jimczi · 2018-09-26T09:17:10Z

server/src/main/java/org/elasticsearch/index/query/WildcardQueryBuilder.java

        if (fieldType == null) {
-            term = new Term(fieldName, BytesRefs.toBytesRef(value));
+            Term term = new Term(fieldName, BytesRefs.toBytesRef(value));
+            query = new WildcardQuery(term);


This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery is that we can fold the reason in the message and if users check the Lucene query through the _validate API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.

…fallback * elastic/master: TEST: Add engine is closed as expected failure msg Adjust bwc version for max_seq_no_of_updates Build DocStats from SegmentInfos in ReadOnlyEngine (elastic#34079) When creating wildcard queries, use MatchNoDocsQuery when the field type doesn't exist. (elastic#34093) [DOCS] Moves graph to docs folder (elastic#33472) Mute MovAvgIT#testHoltWintersNotEnoughData Security: use default scroll keepalive (elastic#33639) Calculate changed roles on roles.yml reload (elastic#33525) Scripting: Reflect factory signatures in painless classloader (elastic#34088) XContentBuilder to handle BigInteger and BigDecimal (elastic#32888) Delegate wildcard query creation to MappedFieldType. (elastic#34062) Painless: Cleanup Cache (elastic#33963)

* Delegate wildcard query creation to MappedFieldType. * Disallow wildcard queries on collation fields. * Disallow wildcard queries on non-string fields.

jtibshirani added 2 commits September 25, 2018 13:56

Delegate wildcard query creation to MappedFieldType.

cf40cd4

Disallow wildcard queries on collation fields.

f2d621f

jtibshirani added >bug :Search/Search Search-related issues that do not fall into other categories v7.0.0 >refactoring labels Sep 25, 2018

jimczi reviewed Sep 25, 2018

View reviewed changes

Disallow wildcard queries on non-string fields.

dcd8502

For backwards compatibility, we maintain support on the `_index` field.

jtibshirani force-pushed the wildcard-query-builder branch from a3f5623 to dcd8502 Compare September 26, 2018 05:41

jimczi approved these changes Sep 26, 2018

View reviewed changes

jtibshirani changed the title ~~Disallow wildcard queries on collation fields.~~ Delegate wildcard query creation to MappedFieldType. Sep 26, 2018

jtibshirani merged commit de8bfb9 into elastic:master Sep 26, 2018

jtibshirani deleted the wildcard-query-builder branch September 26, 2018 16:36

This was referenced Sep 26, 2018

Support 'string'-style queries on metadata fields when reasonable. #34089

Merged

When creating wildcard queries, use MatchNoDocsQuery when the field type doesn't exist. #34093

Merged

Add a simple JSON field mapper. #33923

Merged

jtibshirani added the v6.5.0 label Sep 28, 2018

kcm pushed a commit that referenced this pull request Oct 30, 2018

Delegate wildcard query creation to MappedFieldType. (#34062)

e2c64d9

* Delegate wildcard query creation to MappedFieldType. * Disallow wildcard queries on collation fields. * Disallow wildcard queries on non-string fields.

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delegate wildcard query creation to MappedFieldType. #34062

Delegate wildcard query creation to MappedFieldType. #34062

jtibshirani commented Sep 25, 2018 •

edited

Loading

elasticmachine commented Sep 25, 2018

jimczi left a comment

jimczi Sep 25, 2018

jimczi Sep 25, 2018

jimczi Sep 25, 2018

jtibshirani Sep 25, 2018 •

edited

Loading

jtibshirani Sep 26, 2018

jimczi Sep 26, 2018

jtibshirani Sep 26, 2018

jimczi Sep 25, 2018

jtibshirani Sep 25, 2018 •

edited

Loading

jimczi Sep 26, 2018

jtibshirani Sep 26, 2018

jimczi left a comment

jimczi Sep 26, 2018

jimczi Sep 26, 2018

Delegate wildcard query creation to MappedFieldType. #34062

Delegate wildcard query creation to MappedFieldType. #34062

Conversation

jtibshirani commented Sep 25, 2018 • edited Loading

elasticmachine commented Sep 25, 2018

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtibshirani Sep 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtibshirani Sep 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtibshirani commented Sep 25, 2018 •

edited

Loading

jtibshirani Sep 25, 2018 •

edited

Loading

jtibshirani Sep 25, 2018 •

edited

Loading