-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wildcard field regex query fix. #78520
Conversation
Pinging @elastic/es-search (Team:Search) |
Don’t revert to match_all when query only exists of required clauses that can’t be expressed as queries on ngram index. Closes elastic#78391
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the logic difficult to read to be honest. Is it really worth the complexity ?
I am afraid that we could shoot ourselves in the foot if we change the term queries and miss the usage of MatchAllButRequireVerificationQuery
.
We already need this level of complexity to faithfully represent all of the logic in the regex when it comes to the construction of the ngram query. We cannot create an approximation query that under-matches or we get false negative bugs which often go undetected (unlike the false positive we spotted here). Rolling back this optimisation won't reduce much of the complexity - we already need to know the difference between:
If we find a 2) we must drop all other ORed 3) clauses to avoid false negatives. The revert-to-match-all-with-no-verification optimisation is a comparatively small addition to the necessarily complex task of translating the regex to an ngram query. |
I don't get that part. We can remove
The simplification here would be to add the match_all as an additional should clause and let the boolean query simplify this for us during rewrite ? |
OK, I see what you mean. |
Closed in favour of new approach in #78839 |
Don’t revert to match_all when query only exists of required clauses that can’t be expressed as queries on ngram index.
Reverting to a match_all is an important optimisation e.g. if users type
a*
- we don't want to decompress all docs and run a regex on the contents when we can know this is literally a match all expression.Closes #78391