Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: add =~ operator (case insensitive equality) #103656

Merged
merged 28 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
87f2ec8
Introduce =~ (case insensitive equals) operator
luigidellaquila Dec 21, 2023
4c1e89c
Lucene pushdown
luigidellaquila Dec 21, 2023
67ac517
Fix pushdown for non-strings
luigidellaquila Dec 21, 2023
b268716
Implement constant evaluator using Automaton
luigidellaquila Dec 28, 2023
c54f444
Implement review suggestions
luigidellaquila Dec 28, 2023
e002329
Fix folding and verification
luigidellaquila Jan 5, 2024
1f16348
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 5, 2024
952f4c3
Merge branch 'feature/esql_case_insensitive' into esql/equals_tilde
elasticmachine Jan 8, 2024
53c0eec
Limit =~ to string fields
luigidellaquila Jan 9, 2024
c5be3ee
Merge remote-tracking branch 'luigidellaquila/esql/equals_tilde' into…
luigidellaquila Jan 9, 2024
5417b2f
Add support for wildcards
luigidellaquila Jan 10, 2024
d53727b
Update docs/changelog/103656.yaml
luigidellaquila Jan 11, 2024
f1b423d
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 11, 2024
a79fd5d
Add tests and code cleanup
luigidellaquila Jan 11, 2024
c50ba9b
Remove dead code
luigidellaquila Jan 11, 2024
2dc49b0
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 12, 2024
9ec7fbf
Remove ZoneID
luigidellaquila Jan 12, 2024
52718cc
Optimize using term queries when no wildcards in the pattern
luigidellaquila Jan 15, 2024
929b34f
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 16, 2024
5bca81c
Simplify validation
luigidellaquila Jan 16, 2024
1999e3e
Reduce the scope to exact match (no wildcards)
luigidellaquila Jan 17, 2024
aac2340
More tests
luigidellaquila Jan 17, 2024
65941c9
Merge branch 'main' into esql/equals_tilde
elasticmachine Jan 17, 2024
4051636
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 18, 2024
84e58d7
Merge remote-tracking branch 'luigidellaquila/esql/equals_tilde' into…
luigidellaquila Jan 18, 2024
c13ab85
More tests
luigidellaquila Jan 22, 2024
e8023eb
Implement review suggestions
luigidellaquila Jan 25, 2024
453a77f
Merge branch 'main' into esql/equals_tilde
elasticmachine Jan 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear from the tests if the right hand side allows only literals (the original requirement), folded expressions or generic expressions (which you added).
I think it's the first variant but I don't see any tests validating this.
So please add more test to either validate that literals/folded expressions are required (and fields are not allowed) or vice-versa - queries that have fields on both sides and more over expressions:
where concat(field, "constant") =~ concat(field, concat("con", "stant)) etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @costin
Yes, the implementation supports any kind of expression, both on the left and on the right.
I added a few more tests for this.

Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@

simpleFilter#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in a follow up we can remove the skip from the name of the test that we print. And maybe put it on the next line. It's kind of a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hurts me as well, especially when I see it in the logs. But does it work if I move it to the next line? I'll check it separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Separately.

from employees | where first_name =~ "mary" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


simpleFilterUpper#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "MARY" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;

simpleFilterPartial#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mar" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
;

mixedConditionsAnd#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mary" AND emp_no == 10011 | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


mixedConditionsOr#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mary" OR emp_no == 10001 | keep emp_no, first_name, last_name |sort emp_no;

emp_no:integer | first_name:keyword | last_name:keyword
10001 | Georgi | Facello
10011 | Mary | Sluis
;


evalEquals#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where emp_no == 10001
| eval a = first_name =~ "georgi", b = first_name == "georgi", c = first_name =~ "GEORGI", d = first_name =~ "Geor"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try "GeoRgI" or something similar just in cae.

| keep emp_no, first_name, a, b, c, d;

emp_no:integer | first_name:keyword | a:boolean | b:boolean | c:boolean | d:boolean
10001 | Georgi | true | false | true | false
;


filterNumeric#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where emp_no =~ 10001 | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10001 | Georgi | Facello
;
1 change: 1 addition & 0 deletions x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ RP : ')';
TRUE : 'true';

EQ : '==';
EQ_IGNORE_CASE : '=~';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we use SEQ for "string equality" (just like in EQL) or IEQ for "Insensitive Equality".
Same for the backing class.

NEQ : '!=';
LT : '<';
LTE : '<=';
Expand Down
2 changes: 1 addition & 1 deletion x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ string
;

comparisonOperator
: EQ | NEQ | LT | LTE | GT | GTE
: EQ | EQ_IGNORE_CASE | NEQ | LT | LTE | GT | GTE
;

explainCommand
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License
// 2.0; you may not use this file except in compliance with the Elastic License
// 2.0.
package org.elasticsearch.xpack.esql.evaluator.predicate.operator.comparison;

import java.lang.IllegalArgumentException;
import java.lang.Override;
import java.lang.String;
import org.elasticsearch.compute.data.Block;
import org.elasticsearch.compute.data.BooleanBlock;
import org.elasticsearch.compute.data.BooleanVector;
import org.elasticsearch.compute.data.Page;
import org.elasticsearch.compute.operator.DriverContext;
import org.elasticsearch.compute.operator.EvalOperator;
import org.elasticsearch.core.Releasables;
import org.elasticsearch.xpack.esql.expression.function.Warnings;
import org.elasticsearch.xpack.ql.tree.Source;

/**
* {@link EvalOperator.ExpressionEvaluator} implementation for {@link EqualsIgnoreCase}.
* This class is generated. Do not edit it.
*/
public final class EqualsIgnoreCaseBoolsEvaluator implements EvalOperator.ExpressionEvaluator {
private final Warnings warnings;

private final EvalOperator.ExpressionEvaluator lhs;

private final EvalOperator.ExpressionEvaluator rhs;

private final DriverContext driverContext;

public EqualsIgnoreCaseBoolsEvaluator(Source source, EvalOperator.ExpressionEvaluator lhs,
EvalOperator.ExpressionEvaluator rhs, DriverContext driverContext) {
this.warnings = new Warnings(source);
this.lhs = lhs;
this.rhs = rhs;
this.driverContext = driverContext;
}

@Override
public Block eval(Page page) {
try (BooleanBlock lhsBlock = (BooleanBlock) lhs.eval(page)) {
try (BooleanBlock rhsBlock = (BooleanBlock) rhs.eval(page)) {
BooleanVector lhsVector = lhsBlock.asVector();
if (lhsVector == null) {
return eval(page.getPositionCount(), lhsBlock, rhsBlock);
}
BooleanVector rhsVector = rhsBlock.asVector();
if (rhsVector == null) {
return eval(page.getPositionCount(), lhsBlock, rhsBlock);
}
return eval(page.getPositionCount(), lhsVector, rhsVector).asBlock();
}
}
}

public BooleanBlock eval(int positionCount, BooleanBlock lhsBlock, BooleanBlock rhsBlock) {
try(BooleanBlock.Builder result = driverContext.blockFactory().newBooleanBlockBuilder(positionCount)) {
position: for (int p = 0; p < positionCount; p++) {
if (lhsBlock.isNull(p)) {
result.appendNull();
continue position;
}
if (lhsBlock.getValueCount(p) != 1) {
if (lhsBlock.getValueCount(p) > 1) {
warnings.registerException(new IllegalArgumentException("single-value function encountered multi-value"));
}
result.appendNull();
continue position;
}
if (rhsBlock.isNull(p)) {
result.appendNull();
continue position;
}
if (rhsBlock.getValueCount(p) != 1) {
if (rhsBlock.getValueCount(p) > 1) {
warnings.registerException(new IllegalArgumentException("single-value function encountered multi-value"));
}
result.appendNull();
continue position;
}
result.appendBoolean(EqualsIgnoreCase.processBools(lhsBlock.getBoolean(lhsBlock.getFirstValueIndex(p)), rhsBlock.getBoolean(rhsBlock.getFirstValueIndex(p))));
}
return result.build();
}
}

public BooleanVector eval(int positionCount, BooleanVector lhsVector, BooleanVector rhsVector) {
try(BooleanVector.Builder result = driverContext.blockFactory().newBooleanVectorBuilder(positionCount)) {
position: for (int p = 0; p < positionCount; p++) {
result.appendBoolean(EqualsIgnoreCase.processBools(lhsVector.getBoolean(p), rhsVector.getBoolean(p)));
}
return result.build();
}
}

@Override
public String toString() {
return "EqualsIgnoreCaseBoolsEvaluator[" + "lhs=" + lhs + ", rhs=" + rhs + "]";
}

@Override
public void close() {
Releasables.closeExpectNoException(lhs, rhs);
}

static class Factory implements EvalOperator.ExpressionEvaluator.Factory {
private final Source source;

private final EvalOperator.ExpressionEvaluator.Factory lhs;

private final EvalOperator.ExpressionEvaluator.Factory rhs;

public Factory(Source source, EvalOperator.ExpressionEvaluator.Factory lhs,
EvalOperator.ExpressionEvaluator.Factory rhs) {
this.source = source;
this.lhs = lhs;
this.rhs = rhs;
}

@Override
public EqualsIgnoreCaseBoolsEvaluator get(DriverContext context) {
return new EqualsIgnoreCaseBoolsEvaluator(source, lhs.get(context), rhs.get(context), context);
}

@Override
public String toString() {
return "EqualsIgnoreCaseBoolsEvaluator[" + "lhs=" + lhs + ", rhs=" + rhs + "]";
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License
// 2.0; you may not use this file except in compliance with the Elastic License
// 2.0.
package org.elasticsearch.xpack.esql.evaluator.predicate.operator.comparison;

import java.lang.IllegalArgumentException;
import java.lang.Override;
import java.lang.String;
import org.elasticsearch.compute.data.Block;
import org.elasticsearch.compute.data.BooleanBlock;
import org.elasticsearch.compute.data.BooleanVector;
import org.elasticsearch.compute.data.DoubleBlock;
import org.elasticsearch.compute.data.DoubleVector;
import org.elasticsearch.compute.data.Page;
import org.elasticsearch.compute.operator.DriverContext;
import org.elasticsearch.compute.operator.EvalOperator;
import org.elasticsearch.core.Releasables;
import org.elasticsearch.xpack.esql.expression.function.Warnings;
import org.elasticsearch.xpack.ql.tree.Source;

/**
* {@link EvalOperator.ExpressionEvaluator} implementation for {@link EqualsIgnoreCase}.
* This class is generated. Do not edit it.
*/
public final class EqualsIgnoreCaseDoublesEvaluator implements EvalOperator.ExpressionEvaluator {
private final Warnings warnings;

private final EvalOperator.ExpressionEvaluator lhs;

private final EvalOperator.ExpressionEvaluator rhs;

private final DriverContext driverContext;

public EqualsIgnoreCaseDoublesEvaluator(Source source, EvalOperator.ExpressionEvaluator lhs,
EvalOperator.ExpressionEvaluator rhs, DriverContext driverContext) {
this.warnings = new Warnings(source);
this.lhs = lhs;
this.rhs = rhs;
this.driverContext = driverContext;
}

@Override
public Block eval(Page page) {
try (DoubleBlock lhsBlock = (DoubleBlock) lhs.eval(page)) {
try (DoubleBlock rhsBlock = (DoubleBlock) rhs.eval(page)) {
DoubleVector lhsVector = lhsBlock.asVector();
if (lhsVector == null) {
return eval(page.getPositionCount(), lhsBlock, rhsBlock);
}
DoubleVector rhsVector = rhsBlock.asVector();
if (rhsVector == null) {
return eval(page.getPositionCount(), lhsBlock, rhsBlock);
}
return eval(page.getPositionCount(), lhsVector, rhsVector).asBlock();
}
}
}

public BooleanBlock eval(int positionCount, DoubleBlock lhsBlock, DoubleBlock rhsBlock) {
try(BooleanBlock.Builder result = driverContext.blockFactory().newBooleanBlockBuilder(positionCount)) {
position: for (int p = 0; p < positionCount; p++) {
if (lhsBlock.isNull(p)) {
result.appendNull();
continue position;
}
if (lhsBlock.getValueCount(p) != 1) {
if (lhsBlock.getValueCount(p) > 1) {
warnings.registerException(new IllegalArgumentException("single-value function encountered multi-value"));
}
result.appendNull();
continue position;
}
if (rhsBlock.isNull(p)) {
result.appendNull();
continue position;
}
if (rhsBlock.getValueCount(p) != 1) {
if (rhsBlock.getValueCount(p) > 1) {
warnings.registerException(new IllegalArgumentException("single-value function encountered multi-value"));
}
result.appendNull();
continue position;
}
result.appendBoolean(EqualsIgnoreCase.processDoubles(lhsBlock.getDouble(lhsBlock.getFirstValueIndex(p)), rhsBlock.getDouble(rhsBlock.getFirstValueIndex(p))));
}
return result.build();
}
}

public BooleanVector eval(int positionCount, DoubleVector lhsVector, DoubleVector rhsVector) {
try(BooleanVector.Builder result = driverContext.blockFactory().newBooleanVectorBuilder(positionCount)) {
position: for (int p = 0; p < positionCount; p++) {
result.appendBoolean(EqualsIgnoreCase.processDoubles(lhsVector.getDouble(p), rhsVector.getDouble(p)));
}
return result.build();
}
}

@Override
public String toString() {
return "EqualsIgnoreCaseDoublesEvaluator[" + "lhs=" + lhs + ", rhs=" + rhs + "]";
}

@Override
public void close() {
Releasables.closeExpectNoException(lhs, rhs);
}

static class Factory implements EvalOperator.ExpressionEvaluator.Factory {
private final Source source;

private final EvalOperator.ExpressionEvaluator.Factory lhs;

private final EvalOperator.ExpressionEvaluator.Factory rhs;

public Factory(Source source, EvalOperator.ExpressionEvaluator.Factory lhs,
EvalOperator.ExpressionEvaluator.Factory rhs) {
this.source = source;
this.lhs = lhs;
this.rhs = rhs;
}

@Override
public EqualsIgnoreCaseDoublesEvaluator get(DriverContext context) {
return new EqualsIgnoreCaseDoublesEvaluator(source, lhs.get(context), rhs.get(context), context);
}

@Override
public String toString() {
return "EqualsIgnoreCaseDoublesEvaluator[" + "lhs=" + lhs + ", rhs=" + rhs + "]";
}
}
}
Loading