-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delegate wildcard query creation to MappedFieldType. #34062
Delegate wildcard query creation to MappedFieldType. #34062
Conversation
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @jtibshirani. I left some comments.
@@ -345,6 +348,15 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me | |||
throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]"); | |||
} | |||
|
|||
public Query wildcardQuery(String value, QueryShardContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same reasoning can apply for wildcard
, prefix
and regex
queries so the default impl should throw a QueryShardException
? Only StringFieldType
fields should be able to build a wildcard
query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keyword
field applies the normalizer on termQuery
. Depending on the normalizer the wildcard and escaped characters could be removed/replaced so I wonder if we should apply the same logic than QueryParserBase#analyzeWildcard
for keyword
fields. This is out of scope for this pr but it made me realize that we might have a bug here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go further and disallow wildcard queries for all non-keyword or text fields, as some other field types like _index explicitly support wildcard queries.
I missed this part sorry. I think we should explicitly add the support in the _index
field type rather than supporting this query on all fields. Currently the support for prefix
queries is also broken so we don't really use this ability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll make sure only string fields support wildcards by default. Maybe I'll add an upgrade note too in case this breaks some types we don't have test coverage for (will make it easier for users to debug + file issues)?
This is out of scope for this pr but it made me realize that we might have a bug here.
Makes sense, I'll make a note to follow-up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the non-string field types, what do you think should be done with metadata types like IdFieldType
, IgnoredFieldType
, and RoutingFieldType
? My intuition is we should switch them to being string fields to avoid breaking any queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change would only break wildcard
query on these fields, right ? +1 to make them string fields, prefix
and regex
query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if (fieldType == null) { | ||
term = new Term(fieldName, BytesRefs.toBytesRef(value)); | ||
Term term = new Term(fieldName, BytesRefs.toBytesRef(value)); | ||
query = new WildcardQuery(term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if the field does not exist we could return a MatchNoDocsQuery
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery
amongst other examples).
In any case, maybe I could make this change in a follow-up PR, as I was just hoping for a straight refactor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).
Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery
is that we can fold the reason in the message and if users check the Lucene query through the _validate
API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
For backwards compatibility, we maintain support on the `_index` field.
a3f5623
to
dcd8502
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jtibshirani , let's fix prefix
and regex
queries for metadata fields in a follow up.
@@ -345,6 +348,15 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me | |||
throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]"); | |||
} | |||
|
|||
public Query wildcardQuery(String value, QueryShardContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change would only break wildcard
query on these fields, right ? +1 to make them string fields, prefix
and regex
query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.
if (fieldType == null) { | ||
term = new Term(fieldName, BytesRefs.toBytesRef(value)); | ||
Term term = new Term(fieldName, BytesRefs.toBytesRef(value)); | ||
query = new WildcardQuery(term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).
Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery
is that we can fold the reason in the message and if users check the Lucene query through the _validate
API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.
…fallback * elastic/master: TEST: Add engine is closed as expected failure msg Adjust bwc version for max_seq_no_of_updates Build DocStats from SegmentInfos in ReadOnlyEngine (elastic#34079) When creating wildcard queries, use MatchNoDocsQuery when the field type doesn't exist. (elastic#34093) [DOCS] Moves graph to docs folder (elastic#33472) Mute MovAvgIT#testHoltWintersNotEnoughData Security: use default scroll keepalive (elastic#33639) Calculate changed roles on roles.yml reload (elastic#33525) Scripting: Reflect factory signatures in painless classloader (elastic#34088) XContentBuilder to handle BigInteger and BigDecimal (elastic#32888) Delegate wildcard query creation to MappedFieldType. (elastic#34062) Painless: Cleanup Cache (elastic#33963)
* Delegate wildcard query creation to MappedFieldType. * Disallow wildcard queries on collation fields. * Disallow wildcard queries on non-string fields.
* Delegate wildcard query creation to MappedFieldType. * Disallow wildcard queries on collation fields. * Disallow wildcard queries on non-string fields.
This is consistent with how we handle
fuzzy
,prefix
, andregexp
queries.As part of this change, disallow wildcard queries on non-string fields, in addition to
collation
fields.