-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query parsers to throw exception when multiple field names are provided #19791
Conversation
} | ||
} | ||
|
||
if (text == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed these checks at the end of parsing as they are not needed. The query builder throws error anyways when required fields are missing as we validate its input and that is also what we test already (testIllegalArguments test method).
I left some comments, this looks great! |
throw new ParsingException(parser.getTokenLocation(), "[match] query doesn't support multiple fields, found [" | ||
+ fieldName + "] and [" + currentFieldName + "]"); | ||
} | ||
fieldName = currentFieldName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match query is tricky because it can be written as:
{
"match" : {
"message1" : {
"query" : "this is a test"
}
}
}
but also:
{
"match" : {
"message1" : "this is a test"
}
}
and in case of multiple fields we just catch the first form, not the second which is very used too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many queries actually have a short syntax like the match one. I'm afraid what you bring up is a problem that's common to them all. I will dig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know, sorry for that :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that this situation is already at least partially caught by making QueryParseContext#parseInnerQueryBuilder stricter as in checking what is the current token after the query gets parsed. I have ideas on how to test it too but I'd prefer to do this in a followup PR if you don' mind. We already test these short syntaxes but we never inject bogus objects in them like we do with the "standard" json output, we can totally introduce that for more coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
We already test these short syntaxes but we never inject bogus objects in them like we do with the "standard" json output, we can totally introduce that for more coverage.
Since query builders now implements ToXContent I think we could introduce bogus object using a XContentBuilder that randomly duplicates fields. Just an idea.
It's a nice change, thanks for doing that. I left comments concerning the queries that can be written differently like |
Range Query, like many other queries, used to parse when the query refers to multiple fields and the last one would win. We rather throw an exception now instead. Closes elastic#19547
Prefix Query, like many other queries, used to parse when the query refers to multiple fields and the last one would win. We rather throw an exception now instead. Also added tests for short prefix quer variant.
Regexp Query, like many other queries, used to parse even when the query referred to multiple fields and the last one would win. We rather throw an exception now instead. Also added test for short prefix query variant.
Wildcard Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Match phrase Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Geo distance Query, like many other queries, used to parse even when the query referred to multiple fields and the last one would win. We rather throw an exception now instead.
…elds Match phrase prefix Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Match Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Common Terms Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Span term Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also modified the parsing code to consume the whole query object.
Fuzzy Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead. Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
…BJECT token Instead of being lenient in QueryParseContext#parseInnerQueryBuilder we check that the token where the parser stopped reading was END_OBJECT, and throw error otherwise. This is a best effort to verify that the parsers read a whole object rather than stepping out in the middle of it due to malformed queries.
a4c5df3
to
7f0bd56
Compare
Most of our queries support a single field name, the field that gets queried. That is the key of the object which contains the query options. For such queries, in case multiple fields are presented, in most of the cases the query goes through and either the last or the first field only will be read and queried.
This PR changes the behaviour of all those parsers to throw exception in case multiple field names are provided.
Closes #19547