-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search - add case insensitive support for regex queries. #59441
Conversation
Pinging @elastic/es-search (:Search/Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can't make the request case-sensitive if the analyzer/normalizer lowercased the content, I wonder if we should reject setting case_sensitive: true
. (I think this question was initially raised by @jimczi.)
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
We do state in the docs that case sensitivity (or insensitivity) is in relation to the indexed content so we have a justification for the logic. I don't have a strong opinion either way on whether to error or not. The bit where I come unstuck is thinking about how to describe the default value (something we normally document). |
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
e0f9244
to
98b0e04
Compare
Following discussion with @jimczi we may want to reconsider the implementation here. Rather than an explicit flag in the JSON we could consider adding passing the case insensitivity option as part of the regular expression string. There are a number of ways regex engines do this - either globally using The advantage of expressing the option as part of the regex string is that query-string and KQL based inputs could be extended to make these choices (if the users know to use this syntax). |
I made this comment for the |
Ah - I thought we'd agreed we would want KQL, query_string and RegExpQuery to all support the same string syntax if we were to expand it. If there's two ways to express the option (in-string and in JSON boolean) there's the potential for conflicting arguments. |
870371b
to
6111094
Compare
run elasticsearch-ci/docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me. I left some comments regarding the testing of the new options.
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTests.java
Show resolved
Hide resolved
Thanks for the review, @jimczi Otherwise is this good to go? |
Can we be more lenient ? It's a breaking change imo so we could apply 0xff as a mask to clear out the extra bits ? |
Good to go @jimczi ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one question for cleanup, LGTM otherwise
server/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java
Outdated
Show resolved
Hide resolved
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released.
(Backport) Added case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes elastic#59235
Hi, not sure if caused by this PR. We use regex in query_string and want the search to be case insensitive. It has been working until recently we upgraded our AWS ES cluster from 7.7 to 7.10. The same regex query became case sensitive now (it only works with lower case as in the standard analyzer). I'm unable to set Changing regex to wildcard makes it case insensitive again. |
While Lucene has more complex bitmask flags for regex match settings like case insensitivity the JSON representation we add here is a simple
case_sensitive
boolean parameter which defaults to true. Currently this case logic only works with ASCII characters.To implement this feature I have forked a copy of Lucene’s RegexpQuery and RegExp classes from Lucene master.
These classes can be removed when 8.7 Lucene is released which is the first release with the case insensitivity option.