Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: add =~ operator (case insensitive equality) #103656

Merged
merged 28 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
87f2ec8
Introduce =~ (case insensitive equals) operator
luigidellaquila Dec 21, 2023
4c1e89c
Lucene pushdown
luigidellaquila Dec 21, 2023
67ac517
Fix pushdown for non-strings
luigidellaquila Dec 21, 2023
b268716
Implement constant evaluator using Automaton
luigidellaquila Dec 28, 2023
c54f444
Implement review suggestions
luigidellaquila Dec 28, 2023
e002329
Fix folding and verification
luigidellaquila Jan 5, 2024
1f16348
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 5, 2024
952f4c3
Merge branch 'feature/esql_case_insensitive' into esql/equals_tilde
elasticmachine Jan 8, 2024
53c0eec
Limit =~ to string fields
luigidellaquila Jan 9, 2024
c5be3ee
Merge remote-tracking branch 'luigidellaquila/esql/equals_tilde' into…
luigidellaquila Jan 9, 2024
5417b2f
Add support for wildcards
luigidellaquila Jan 10, 2024
d53727b
Update docs/changelog/103656.yaml
luigidellaquila Jan 11, 2024
f1b423d
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 11, 2024
a79fd5d
Add tests and code cleanup
luigidellaquila Jan 11, 2024
c50ba9b
Remove dead code
luigidellaquila Jan 11, 2024
2dc49b0
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 12, 2024
9ec7fbf
Remove ZoneID
luigidellaquila Jan 12, 2024
52718cc
Optimize using term queries when no wildcards in the pattern
luigidellaquila Jan 15, 2024
929b34f
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 16, 2024
5bca81c
Simplify validation
luigidellaquila Jan 16, 2024
1999e3e
Reduce the scope to exact match (no wildcards)
luigidellaquila Jan 17, 2024
aac2340
More tests
luigidellaquila Jan 17, 2024
65941c9
Merge branch 'main' into esql/equals_tilde
elasticmachine Jan 17, 2024
4051636
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 18, 2024
84e58d7
Merge remote-tracking branch 'luigidellaquila/esql/equals_tilde' into…
luigidellaquila Jan 18, 2024
c13ab85
More tests
luigidellaquila Jan 22, 2024
e8023eb
Implement review suggestions
luigidellaquila Jan 25, 2024
453a77f
Merge branch 'main' into esql/equals_tilde
elasticmachine Jan 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/103656.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 103656
summary: "ESQL: add =~ operator (case insensitive equality)"
area: ES|QL
type: feature
issues: []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear from the tests if the right hand side allows only literals (the original requirement), folded expressions or generic expressions (which you added).
I think it's the first variant but I don't see any tests validating this.
So please add more test to either validate that literals/folded expressions are required (and fields are not allowed) or vice-versa - queries that have fields on both sides and more over expressions:
where concat(field, "constant") =~ concat(field, concat("con", "stant)) etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @costin
Yes, the implementation supports any kind of expression, both on the left and on the right.
I added a few more tests for this.

Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@

simpleFilter#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in a follow up we can remove the skip from the name of the test that we print. And maybe put it on the next line. It's kind of a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hurts me as well, especially when I see it in the logs. But does it work if I move it to the next line? I'll check it separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Separately.

from employees | where first_name =~ "mary" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


simpleFilterUpper#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "MARY" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


simpleFilterPartial#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mar" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
;


mixedConditionsAnd#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mary" AND emp_no == 10011 | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


mixedConditionsOr#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mary" OR emp_no == 10001 | keep emp_no, first_name, last_name |sort emp_no;

emp_no:integer | first_name:keyword | last_name:keyword
10001 | Georgi | Facello
10011 | Mary | Sluis
;


evalEquals#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where emp_no == 10001
| eval a = first_name =~ "georgi", b = first_name == "georgi", c = first_name =~ "GEORGI", d = first_name =~ "Geor", e = first_name =~ "GeoRgI"
| keep emp_no, first_name, a, b, c, d, e;

emp_no:integer | first_name:keyword | a:boolean | b:boolean | c:boolean | d:boolean | e:boolean
10001 | Georgi | true | false | true | false | true
;


constantsAndFolding#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
row name = "foobar" | where "FoObAr" =~ name;

name:keyword
foobar
;


noWildcardSimple#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
row name = "foobar" | where name =~ "FoOb*";

name:keyword
;


noWildcard#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "Georg*" | sort emp_no | keep emp_no, first_name;

emp_no:integer | first_name:keyword
;


noWildcardSingle#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "Georg?" | sort emp_no | keep emp_no, first_name;

emp_no:integer | first_name:keyword
;


fieldRight#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where "Guoxiang" =~ first_name | keep emp_no, first_name;

emp_no:integer | first_name:keyword
10015 | Guoxiang
;


expressionsRight#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ concat("Tzv","ETAN") | keep emp_no, first_name;

emp_no:integer | first_name:keyword
10007 | Tzvetan
;


expressionsLeft#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where concat(first_name, "_foo") =~ "TzvETAN_fOo" | keep emp_no, first_name;

emp_no:integer | first_name:keyword
10007 | Tzvetan
;


expressionsLeftRight#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where substring(first_name, 1, 2) =~ substring(last_name, -2) | keep emp_no, first_name, last_name | sort emp_no;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this some more and we need to conclude how to address the missing requirements (multi-value and wildcard matching) from
#103599 before enabling this functionality.
If =~ is asymmetric like EQL : than comparing two fields can't happen. in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with; it's better to enable them later than have them disabled.

In other words, please keep the comparison only between a literal/folded expression and a field but not between two fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ESQL we might do thing differently but until we reach a conclusion, I prefer we don't ship these features to begin with

👍 I think it's a wise decision, I'll disable field vs field comparison at validation time (I'll allow only foldable expressions on the right) until we have a final decision on all the aspects.


emp_no:integer | first_name:keyword | last_name:keyword
10055 | Georgy | Dredge
10091 | Amabile | Gomatam
;
1 change: 1 addition & 0 deletions x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ RP : ')';
TRUE : 'true';

EQ : '==';
CIEQ : '=~';
NEQ : '!=';
LT : '<';
LTE : '<=';
Expand Down
166 changes: 166 additions & 0 deletions x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.tokens
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
DISSECT=1
DROP=2
ENRICH=3
EVAL=4
EXPLAIN=5
FROM=6
GROK=7
INLINESTATS=8
KEEP=9
LIMIT=10
MV_EXPAND=11
PROJECT=12
RENAME=13
ROW=14
SHOW=15
SORT=16
STATS=17
WHERE=18
UNKNOWN_CMD=19
LINE_COMMENT=20
MULTILINE_COMMENT=21
WS=22
EXPLAIN_WS=23
EXPLAIN_LINE_COMMENT=24
EXPLAIN_MULTILINE_COMMENT=25
PIPE=26
STRING=27
INTEGER_LITERAL=28
DECIMAL_LITERAL=29
BY=30
AND=31
ASC=32
ASSIGN=33
COMMA=34
DESC=35
DOT=36
FALSE=37
FIRST=38
LAST=39
LP=40
IN=41
IS=42
LIKE=43
NOT=44
NULL=45
NULLS=46
OR=47
PARAM=48
RLIKE=49
RP=50
TRUE=51
EQ=52
CIEQ=53
NEQ=54
LT=55
LTE=56
GT=57
GTE=58
PLUS=59
MINUS=60
ASTERISK=61
SLASH=62
PERCENT=63
OPENING_BRACKET=64
CLOSING_BRACKET=65
UNQUOTED_IDENTIFIER=66
QUOTED_IDENTIFIER=67
EXPR_LINE_COMMENT=68
EXPR_MULTILINE_COMMENT=69
EXPR_WS=70
METADATA=71
FROM_UNQUOTED_IDENTIFIER=72
FROM_LINE_COMMENT=73
FROM_MULTILINE_COMMENT=74
FROM_WS=75
UNQUOTED_ID_PATTERN=76
PROJECT_LINE_COMMENT=77
PROJECT_MULTILINE_COMMENT=78
PROJECT_WS=79
AS=80
RENAME_LINE_COMMENT=81
RENAME_MULTILINE_COMMENT=82
RENAME_WS=83
ON=84
WITH=85
ENRICH_POLICY_NAME=86
ENRICH_LINE_COMMENT=87
ENRICH_MULTILINE_COMMENT=88
ENRICH_WS=89
ENRICH_FIELD_LINE_COMMENT=90
ENRICH_FIELD_MULTILINE_COMMENT=91
ENRICH_FIELD_WS=92
MVEXPAND_LINE_COMMENT=93
MVEXPAND_MULTILINE_COMMENT=94
MVEXPAND_WS=95
INFO=96
FUNCTIONS=97
SHOW_LINE_COMMENT=98
SHOW_MULTILINE_COMMENT=99
SHOW_WS=100
COLON=101
SETTING=102
SETTING_LINE_COMMENT=103
SETTTING_MULTILINE_COMMENT=104
SETTING_WS=105
'dissect'=1
'drop'=2
'enrich'=3
'eval'=4
'explain'=5
'from'=6
'grok'=7
'inlinestats'=8
'keep'=9
'limit'=10
'mv_expand'=11
'project'=12
'rename'=13
'row'=14
'show'=15
'sort'=16
'stats'=17
'where'=18
'|'=26
'by'=30
'and'=31
'asc'=32
'='=33
','=34
'desc'=35
'.'=36
'false'=37
'first'=38
'last'=39
'('=40
'in'=41
'is'=42
'like'=43
'not'=44
'null'=45
'nulls'=46
'or'=47
'?'=48
'rlike'=49
')'=50
'true'=51
'=='=52
'=~'=53
'!='=54
'<'=55
'<='=56
'>'=57
'>='=58
'+'=59
'-'=60
'*'=61
'/'=62
'%'=63
']'=65
'metadata'=71
'as'=80
'on'=84
'with'=85
'info'=96
'functions'=97
':'=101
2 changes: 1 addition & 1 deletion x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ string
;

comparisonOperator
: EQ | NEQ | LT | LTE | GT | GTE
: EQ | CIEQ | NEQ | LT | LTE | GT | GTE
;

explainCommand
Expand Down
Loading