Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: add =~ operator (case insensitive equality) #103656

Merged
merged 28 commits into from
Jan 26, 2024

Conversation

luigidellaquila
Copy link
Contributor

@luigidellaquila luigidellaquila commented Dec 21, 2023

Add an operator to perform case insensitive equality comparison of strings.
The operator symbol is =~, eg.

row a = "FooBar" | where a =~ "foobar"

The operator acts as a normal == for all non-string types
The operator is defined only for string values

For multi-values the behavior is consistent with ==, ie. multi_value =~ "string" returns null

Related to #103599

@luigidellaquila luigidellaquila marked this pull request as ready for review December 27, 2023 11:46
@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Dec 27, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to touch the QL side - see my comments.
Also the operator need to work on multi-value fields - please add unwrapping and tests.


evalEquals#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where emp_no == 10001
| eval a = first_name =~ "georgi", b = first_name == "georgi", c = first_name =~ "GEORGI", d = first_name =~ "Geor"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try "GeoRgI" or something similar just in cae.

@@ -130,6 +130,7 @@ RP : ')';
TRUE : 'true';

EQ : '==';
EQ_IGNORE_CASE : '=~';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we use SEQ for "string equality" (just like in EQL) or IEQ for "Insensitive Equality".
Same for the backing class.


import static org.elasticsearch.xpack.ql.expression.TypeResolutions.ParamOrdinal.DEFAULT;

public class EqualsIgnoreCase extends BinaryComparison {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringEquals or InsensitiveEquals

@@ -25,6 +25,10 @@ public static Boolean eq(Object l, Object r) {
return i == null ? null : i.intValue() == 0;
}

public static Boolean eqIgnoreCase(Object l, Object r) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's using this method?
If needed see StringComparisons inside EQL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed (together with all the other changes in QL)

* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/
package org.elasticsearch.xpack.ql.expression.predicate.operator.comparison;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the ESQL package.

@@ -336,15 +337,15 @@ static Query translate(BinaryComparison bc, TranslatorHandler handler) {
if (bc instanceof LessThanOrEqual) {
return new RangeQuery(source, name, null, false, value, true, format, zoneId);
}
if (bc instanceof Equals || bc instanceof NullEquals || bc instanceof NotEquals) {
if (bc instanceof Equals || bc instanceof NullEquals || bc instanceof NotEquals || bc instanceof EqualsIgnoreCase) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to touch QL - see InsensitiveBinaryComparisons inside EQL on how to extend the method (add a dedicated TranslatorHandler).

@luigidellaquila
Copy link
Contributor Author

luigidellaquila commented Dec 28, 2023

Thanks for the feedback @costin, I pushed some changes and moved all the logic in ESQL module (apart from one small fix in QL).

Also the operator need to work on multi-value fields

I think this aspect needs some clarifications, let's move the discussion off-line

@wchaparro wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected behavior when doing something like first_name =~ last_name?
The original issue this PR covers says

Introduce =~ operator for performing case insensitive equality between a string field and a string literal
The field can be both single and multi value. The literal has to be single value.

Should I see an error, or warning or did the requirements change in the meantime and the original issue and/or the PR description needs an update?


On a similar note, if I do where last_name =~ 123 I get no results and no relevant warning. What is the expected behavior in this case?


Another similar behavior is with where last_name =~ languages where I get a CCE:

class org.elasticsearch.compute.data.IntArrayBlock cannot be cast to class org.elasticsearch.compute.data.BytesRefBlock

but if I do where last_name == languages I am getting a more meaningful error message:

"reason": "Found 1 problem\nline 1:24: first argument of [last_name == languages] is [keyword] so second argument must also be [keyword] but was [integer]",
                "stack_trace": "org.elasticsearch.xpack.esql.analysis.VerificationException: Found 1 problem\nline 1:24: first argument of [last_name == languages] is [keyword] so second argument must also be [keyword] but was [integer]

@astefan
Copy link
Contributor

astefan commented Jan 4, 2024

This test also fails row last_name = "lortz" | where "lOrTz" =~ last_name:

org.elasticsearch.xpack.ql.QlIllegalArgumentException: Should not fold expression\r\n\tat org.elasticsearch.xpack.ql.expression.Expression.fold(Expression.java:82)\r\n\tat org.elasticsearch.xpack.ql.expression.Literal.of(Literal.java:110)\r\n\tat org.elasticsearch.xpack.ql.optimizer.OptimizerRules$ConstantFolding.rule(OptimizerRules.java:101)

whereas row last_name = "lortz" | where "lortz" == last_name doesn't.

@luigidellaquila
Copy link
Contributor Author

What's the expected behavior when doing something like first_name =~ last_name?

I don't have a strong opinion on this.
I supported it in the implementation just because also == supports it, but there is an optimization in case the right-hand side is foldable.
On the other hand, LIKE and RLIKE accept only a literal, so =~ could be the same.

For sure having a string literal there will reduce validation risks, solving possible problems at parsing time.

On a similar note, if I do where last_name =~ 123 I get no results and no relevant warning

I think it's a bug (lack of validation), it should throw an exception as == does.

Another similar behavior is with where last_name =~ languages where I get a CCE:

same as above. I'll check and fix it.

This test also fails row last_name = "lortz" | where "lOrTz" =~ last_name:

this is interesting, I'll investigate

Thanks!

@astefan
Copy link
Contributor

astefan commented Jan 4, 2024

What's the expected behavior when doing something like first_name =~ last_name?

I don't have a strong opinion on this. I supported it in the implementation just because also == supports it, but there is an optimization in case the right-hand side is foldable. On the other hand, LIKE and RLIKE accept only a literal, so =~ could be the same.

If this is supported on purpose, then there should be tests covering it.
But, at the same time, the original issue mentions a literal. My questions spawned from this confusion between the issue description and the implementation itself which should be clarified.

@luigidellaquila luigidellaquila requested review from a team as code owners January 5, 2024 16:57
@luigidellaquila
Copy link
Contributor Author

Thanks for you feedback @astefan
I pushed two fixes and more tests to cover validation and field-to-field comparison.

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luigi - this looks pretty complete.
I've added a comment regarding testing and whether field attributes are allowed or not both sides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear from the tests if the right hand side allows only literals (the original requirement), folded expressions or generic expressions (which you added).
I think it's the first variant but I don't see any tests validating this.
So please add more test to either validate that literals/folded expressions are required (and fields are not allowed) or vice-versa - queries that have fields on both sides and more over expressions:
where concat(field, "constant") =~ concat(field, concat("con", "stant)) etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @costin
Yes, the implementation supports any kind of expression, both on the left and on the right.
I added a few more tests for this.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests between multi-values and single values.

And until we figure out #103599 fully when it comes to mv and wildcard behavior, please disable/prevent usage of this operator between fields in the Verifier.



expressionsLeftRight#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where substring(first_name, 1, 2) =~ substring(last_name, -2) | keep emp_no, first_name, last_name | sort emp_no;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this some more and we need to conclude how to address the missing requirements (multi-value and wildcard matching) from
#103599 before enabling this functionality.
If =~ is asymmetric like EQL : than comparing two fields can't happen. in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with; it's better to enable them later than have them disabled.

In other words, please keep the comparison only between a literal/folded expression and a field but not between two fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with

👍 I think it's a wise decision, I'll disable field vs field comparison at validation time (I'll allow only foldable expressions on the right) until we have a final decision on all the aspects.

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback - let's get this in!

@luigidellaquila luigidellaquila merged commit 79b7dbb into elastic:main Jan 26, 2024
15 checks passed
dej611 added a commit to elastic/kibana that referenced this pull request Jan 30, 2024
## Summary

Keep in sync with new ES|QL builtin function addition:
elastic/elasticsearch#103656

Do not merge until the related ER PR is still in review.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
## Summary

Keep in sync with new ES|QL builtin function addition:
elastic/elasticsearch#103656

Do not merge until the related ER PR is still in review.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary

Keep in sync with new ES|QL builtin function addition:
elastic/elasticsearch#103656

Do not merge until the related ER PR is still in review.

### Checklist

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants