Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: add =~ operator (case insensitive equality) #103656

Merged
merged 28 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
87f2ec8
Introduce =~ (case insensitive equals) operator
luigidellaquila Dec 21, 2023
4c1e89c
Lucene pushdown
luigidellaquila Dec 21, 2023
67ac517
Fix pushdown for non-strings
luigidellaquila Dec 21, 2023
b268716
Implement constant evaluator using Automaton
luigidellaquila Dec 28, 2023
c54f444
Implement review suggestions
luigidellaquila Dec 28, 2023
e002329
Fix folding and verification
luigidellaquila Jan 5, 2024
1f16348
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 5, 2024
952f4c3
Merge branch 'feature/esql_case_insensitive' into esql/equals_tilde
elasticmachine Jan 8, 2024
53c0eec
Limit =~ to string fields
luigidellaquila Jan 9, 2024
c5be3ee
Merge remote-tracking branch 'luigidellaquila/esql/equals_tilde' into…
luigidellaquila Jan 9, 2024
5417b2f
Add support for wildcards
luigidellaquila Jan 10, 2024
d53727b
Update docs/changelog/103656.yaml
luigidellaquila Jan 11, 2024
f1b423d
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 11, 2024
a79fd5d
Add tests and code cleanup
luigidellaquila Jan 11, 2024
c50ba9b
Remove dead code
luigidellaquila Jan 11, 2024
2dc49b0
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 12, 2024
9ec7fbf
Remove ZoneID
luigidellaquila Jan 12, 2024
52718cc
Optimize using term queries when no wildcards in the pattern
luigidellaquila Jan 15, 2024
929b34f
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 16, 2024
5bca81c
Simplify validation
luigidellaquila Jan 16, 2024
1999e3e
Reduce the scope to exact match (no wildcards)
luigidellaquila Jan 17, 2024
aac2340
More tests
luigidellaquila Jan 17, 2024
65941c9
Merge branch 'main' into esql/equals_tilde
elasticmachine Jan 17, 2024
4051636
Merge branch 'main' into esql/equals_tilde
luigidellaquila Jan 18, 2024
84e58d7
Merge remote-tracking branch 'luigidellaquila/esql/equals_tilde' into…
luigidellaquila Jan 18, 2024
c13ab85
More tests
luigidellaquila Jan 22, 2024
e8023eb
Implement review suggestions
luigidellaquila Jan 25, 2024
453a77f
Merge branch 'main' into esql/equals_tilde
elasticmachine Jan 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear from the tests if the right hand side allows only literals (the original requirement), folded expressions or generic expressions (which you added).
I think it's the first variant but I don't see any tests validating this.
So please add more test to either validate that literals/folded expressions are required (and fields are not allowed) or vice-versa - queries that have fields on both sides and more over expressions:
where concat(field, "constant") =~ concat(field, concat("con", "stant)) etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @costin
Yes, the implementation supports any kind of expression, both on the left and on the right.
I added a few more tests for this.

Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@

simpleFilter#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in a follow up we can remove the skip from the name of the test that we print. And maybe put it on the next line. It's kind of a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hurts me as well, especially when I see it in the logs. But does it work if I move it to the next line? I'll check it separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Separately.

from employees | where first_name =~ "mary" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


simpleFilterUpper#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "MARY" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;

simpleFilterPartial#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mar" | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
;

mixedConditionsAnd#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mary" AND emp_no == 10011 | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10011 | Mary | Sluis
;


mixedConditionsOr#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where first_name =~ "mary" OR emp_no == 10001 | keep emp_no, first_name, last_name |sort emp_no;

emp_no:integer | first_name:keyword | last_name:keyword
10001 | Georgi | Facello
10011 | Mary | Sluis
;


evalEquals#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where emp_no == 10001
| eval a = first_name =~ "georgi", b = first_name == "georgi", c = first_name =~ "GEORGI", d = first_name =~ "Geor", e = first_name =~ "GeoRgI"
| keep emp_no, first_name, a, b, c, d, e;

emp_no:integer | first_name:keyword | a:boolean | b:boolean | c:boolean | d:boolean | e:boolean
10001 | Georgi | true | false | true | false | true
;


filterNumeric#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
from employees | where emp_no =~ 10001 | keep emp_no, first_name, last_name;

emp_no:integer | first_name:keyword | last_name:keyword
10001 | Georgi | Facello
;


constantsAndFolding#[skip:-8.12.99, reason:case insensitive operators implemented in v 8.13]
row name = "foobar" | where "FoObAr" =~ name;

name:keyword
foobar
;
1 change: 1 addition & 0 deletions x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ RP : ')';
TRUE : 'true';

EQ : '==';
SEQ : '=~';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEQ sounds like "sequence". Should it be CIEQ or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first proposal was for EQ_IGNORE_CASE, but it was a bit verbose.
We went for SEQ only because it's the same we have in EQL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how SEQ can be confusing. CEIQ or EQS (Equals String) both work for me.
I prefer string equality since it's more generic and it might go beyond just case insensitive comparisons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see a majority for CIEQ, I'll use it

NEQ : '!=';
LT : '<';
LTE : '<=';
Expand Down
159 changes: 159 additions & 0 deletions x-pack/plugin/esql/src/main/antlr/EsqlBaseLexer.tokens
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
DISSECT=1
DROP=2
ENRICH=3
EVAL=4
EXPLAIN=5
FROM=6
GROK=7
INLINESTATS=8
KEEP=9
LIMIT=10
MV_EXPAND=11
PROJECT=12
RENAME=13
ROW=14
SHOW=15
SORT=16
STATS=17
WHERE=18
UNKNOWN_CMD=19
LINE_COMMENT=20
MULTILINE_COMMENT=21
WS=22
EXPLAIN_WS=23
EXPLAIN_LINE_COMMENT=24
EXPLAIN_MULTILINE_COMMENT=25
PIPE=26
STRING=27
INTEGER_LITERAL=28
DECIMAL_LITERAL=29
BY=30
AND=31
ASC=32
ASSIGN=33
COMMA=34
DESC=35
DOT=36
FALSE=37
FIRST=38
LAST=39
LP=40
IN=41
IS=42
LIKE=43
NOT=44
NULL=45
NULLS=46
OR=47
PARAM=48
RLIKE=49
RP=50
TRUE=51
EQ=52
SEQ=53
NEQ=54
LT=55
LTE=56
GT=57
GTE=58
PLUS=59
MINUS=60
ASTERISK=61
SLASH=62
PERCENT=63
OPENING_BRACKET=64
CLOSING_BRACKET=65
UNQUOTED_IDENTIFIER=66
QUOTED_IDENTIFIER=67
EXPR_LINE_COMMENT=68
EXPR_MULTILINE_COMMENT=69
EXPR_WS=70
METADATA=71
FROM_UNQUOTED_IDENTIFIER=72
FROM_LINE_COMMENT=73
FROM_MULTILINE_COMMENT=74
FROM_WS=75
PROJECT_UNQUOTED_IDENTIFIER=76
PROJECT_LINE_COMMENT=77
PROJECT_MULTILINE_COMMENT=78
PROJECT_WS=79
AS=80
RENAME_LINE_COMMENT=81
RENAME_MULTILINE_COMMENT=82
RENAME_WS=83
ON=84
WITH=85
ENRICH_LINE_COMMENT=86
ENRICH_MULTILINE_COMMENT=87
ENRICH_WS=88
ENRICH_FIELD_LINE_COMMENT=89
ENRICH_FIELD_MULTILINE_COMMENT=90
ENRICH_FIELD_WS=91
MVEXPAND_LINE_COMMENT=92
MVEXPAND_MULTILINE_COMMENT=93
MVEXPAND_WS=94
INFO=95
FUNCTIONS=96
SHOW_LINE_COMMENT=97
SHOW_MULTILINE_COMMENT=98
SHOW_WS=99
'dissect'=1
'drop'=2
'enrich'=3
'eval'=4
'explain'=5
'from'=6
'grok'=7
'inlinestats'=8
'keep'=9
'limit'=10
'mv_expand'=11
'project'=12
'rename'=13
'row'=14
'show'=15
'sort'=16
'stats'=17
'where'=18
'|'=26
'by'=30
'and'=31
'asc'=32
'='=33
','=34
'desc'=35
'.'=36
'false'=37
'first'=38
'last'=39
'('=40
'in'=41
'is'=42
'like'=43
'not'=44
'null'=45
'nulls'=46
'or'=47
'?'=48
'rlike'=49
')'=50
'true'=51
'=='=52
'=~'=53
'!='=54
'<'=55
'<='=56
'>'=57
'>='=58
'+'=59
'-'=60
'*'=61
'/'=62
'%'=63
']'=65
'metadata'=71
'as'=80
'on'=84
'with'=85
'info'=95
'functions'=96
2 changes: 1 addition & 1 deletion x-pack/plugin/esql/src/main/antlr/EsqlBaseParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ string
;

comparisonOperator
: EQ | NEQ | LT | LTE | GT | GTE
: EQ | SEQ | NEQ | LT | LTE | GT | GTE
;

explainCommand
Expand Down
Loading