-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: add =~ operator (case insensitive equality) #103656
ESQL: add =~ operator (case insensitive equality) #103656
Conversation
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to touch the QL side - see my comments.
Also the operator need to work on multi-value fields - please add unwrapping and tests.
|
||
evalEquals#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13] | ||
from employees | where emp_no == 10001 | ||
| eval a = first_name =~ "georgi", b = first_name == "georgi", c = first_name =~ "GEORGI", d = first_name =~ "Geor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try "GeoRgI" or something similar just in cae.
@@ -130,6 +130,7 @@ RP : ')'; | |||
TRUE : 'true'; | |||
|
|||
EQ : '=='; | |||
EQ_IGNORE_CASE : '=~'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we use SEQ for "string equality" (just like in EQL) or IEQ for "Insensitive Equality".
Same for the backing class.
|
||
import static org.elasticsearch.xpack.ql.expression.TypeResolutions.ParamOrdinal.DEFAULT; | ||
|
||
public class EqualsIgnoreCase extends BinaryComparison { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringEquals or InsensitiveEquals
@@ -25,6 +25,10 @@ public static Boolean eq(Object l, Object r) { | |||
return i == null ? null : i.intValue() == 0; | |||
} | |||
|
|||
public static Boolean eqIgnoreCase(Object l, Object r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who's using this method?
If needed see StringComparisons inside EQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed (together with all the other changes in QL)
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
package org.elasticsearch.xpack.ql.expression.predicate.operator.comparison; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in the ESQL package.
@@ -336,15 +337,15 @@ static Query translate(BinaryComparison bc, TranslatorHandler handler) { | |||
if (bc instanceof LessThanOrEqual) { | |||
return new RangeQuery(source, name, null, false, value, true, format, zoneId); | |||
} | |||
if (bc instanceof Equals || bc instanceof NullEquals || bc instanceof NotEquals) { | |||
if (bc instanceof Equals || bc instanceof NullEquals || bc instanceof NotEquals || bc instanceof EqualsIgnoreCase) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to touch QL - see InsensitiveBinaryComparisons inside EQL on how to extend the method (add a dedicated TranslatorHandler).
Thanks for the feedback @costin, I pushed some changes and moved all the logic in ESQL module (apart from one small fix in QL).
I think this aspect needs some clarifications, let's move the discussion off-line |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected behavior when doing something like first_name =~ last_name
?
The original issue this PR covers says
Introduce =~ operator for performing case insensitive equality between a string field and a string literal
The field can be both single and multi value. The literal has to be single value.
Should I see an error, or warning or did the requirements change in the meantime and the original issue and/or the PR description needs an update?
On a similar note, if I do where last_name =~ 123
I get no results and no relevant warning. What is the expected behavior in this case?
Another similar behavior is with where last_name =~ languages
where I get a CCE:
class org.elasticsearch.compute.data.IntArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock
but if I do where last_name == languages
I am getting a more meaningful error message:
"reason": "Found 1 problem\nline 1:24: first argument of [last_name == languages] is [keyword] so second argument must also be [keyword] but was [integer]",
"stack_trace": "org.elasticsearch.xpack.esql.analysis.VerificationException: Found 1 problem\nline 1:24: first argument of [last_name == languages] is [keyword] so second argument must also be [keyword] but was [integer]
This test also fails
whereas |
I don't have a strong opinion on this. For sure having a string literal there will reduce validation risks, solving possible problems at parsing time.
I think it's a bug (lack of validation), it should throw an exception as == does.
same as above. I'll check and fix it.
this is interesting, I'll investigate Thanks! |
If this is supported on purpose, then there should be tests covering it. |
Thanks for you feedback @astefan |
@elasticmachine update branch |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Luigi - this looks pretty complete.
I've added a comment regarding testing and whether field attributes are allowed or not both sides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear from the tests if the right hand side allows only literals (the original requirement), folded expressions or generic expressions (which you added).
I think it's the first variant but I don't see any tests validating this.
So please add more test to either validate that literals/folded expressions are required (and fields are not allowed) or vice-versa - queries that have fields on both sides and more over expressions:
where concat(field, "constant") =~ concat(field, concat("con", "stant))
etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @costin
Yes, the implementation supports any kind of expression, both on the left and on the right.
I added a few more tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some tests between multi-values and single values.
And until we figure out #103599 fully when it comes to mv and wildcard behavior, please disable/prevent usage of this operator between fields in the Verifier.
|
||
|
||
expressionsLeftRight#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13] | ||
from employees | where substring(first_name, 1, 2) =~ substring(last_name, -2) | keep emp_no, first_name, last_name | sort emp_no; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about this some more and we need to conclude how to address the missing requirements (multi-value and wildcard matching) from
#103599 before enabling this functionality.
If =~
is asymmetric like EQL :
than comparing two fields can't happen. in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with; it's better to enable them later than have them disabled.
In other words, please keep the comparison only between a literal/folded expression and a field but not between two fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with
👍 I think it's a wise decision, I'll disable field vs field comparison at validation time (I'll allow only foldable expressions on the right) until we have a final decision on all the aspects.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback - let's get this in!
## Summary Keep in sync with new ES|QL builtin function addition: elastic/elasticsearch#103656 Do not merge until the related ER PR is still in review. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
## Summary Keep in sync with new ES|QL builtin function addition: elastic/elasticsearch#103656 Do not merge until the related ER PR is still in review. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
## Summary Keep in sync with new ES|QL builtin function addition: elastic/elasticsearch#103656 Do not merge until the related ER PR is still in review. ### Checklist - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Add an operator to perform case insensitive equality comparison of strings.
The operator symbol is
=~
, eg.The operator acts as a normal==
for all non-string typesThe operator is defined only for string values
For multi-values the behavior is consistent with
==
, ie.multi_value =~ "string"
returnsnull
Related to #103599