-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL grammar updates and tests #49658
Changes from 5 commits
ccc864d
7299bd1
796033a
37fdda6
8ec422e
b2ee44a
c8fc383
988215e
c0deb01
9f5eed4
937e317
98d06ec
66aa579
e33236f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,9 +6,6 @@ | |
|
||
grammar EqlBase; | ||
|
||
tokens { | ||
DELIMITER | ||
} | ||
|
||
singleStatement | ||
: statement EOF | ||
|
@@ -19,45 +16,54 @@ singleExpression | |
; | ||
|
||
statement | ||
: query (PIPE pipe)* | ||
: query pipe* | ||
; | ||
|
||
query | ||
: sequence | ||
| join | ||
| condition | ||
| eventQuery | ||
; | ||
|
||
sequenceParams | ||
: WITH (MAXSPAN EQ timeUnit) | ||
; | ||
|
||
sequence | ||
: SEQUENCE (by=joinKeys)? (span)? | ||
match+ | ||
(UNTIL match)? | ||
: SEQUENCE (by=joinKeys sequenceParams? | sequenceParams by=joinKeys?)? | ||
sequenceTerm sequenceTerm+ | ||
(UNTIL sequenceTerm)? | ||
; | ||
|
||
join | ||
: JOIN (by=joinKeys)? | ||
match+ | ||
(UNTIL match)? | ||
joinTerm joinTerm+ | ||
(UNTIL joinTerm)? | ||
; | ||
|
||
pipe | ||
: kind=IDENTIFIER (booleanExpression (COMMA booleanExpression)*)? | ||
: PIPE kind=IDENTIFIER (booleanExpression (COMMA booleanExpression)*)? | ||
; | ||
|
||
|
||
joinKeys | ||
: BY qualifiedNames | ||
; | ||
|
||
span | ||
: WITH MAXSPAN EQ DIGIT_IDENTIFIER | ||
: BY expression (COMMA expression)* | ||
; | ||
|
||
match | ||
: LB condition RB (by=joinKeys)? | ||
joinTerm | ||
: subquery (by=joinKeys)? | ||
; | ||
|
||
sequenceTerm | ||
: subquery (FORK (EQ booleanValue)?)? (by=joinKeys)? | ||
; | ||
|
||
subquery | ||
: LB eventQuery RB | ||
; | ||
|
||
condition | ||
: event=qualifiedName WHERE expression | ||
eventQuery | ||
: event=identifier WHERE expression | ||
; | ||
|
||
expression | ||
|
@@ -66,6 +72,7 @@ expression | |
|
||
booleanExpression | ||
: NOT booleanExpression #logicalNot | ||
| relationship=identifier OF subquery #processCheck | ||
| predicated #booleanDefault | ||
| left=booleanExpression operator=AND right=booleanExpression #logicalBinary | ||
| left=booleanExpression operator=OR right=booleanExpression #logicalBinary | ||
|
@@ -81,9 +88,7 @@ predicated | |
// dedicated calls for each branch are not used to reuse the NOT handling across them | ||
// instead the property kind is used for differentiation | ||
predicate | ||
: NOT? kind=BETWEEN lower=valueExpression AND upper=valueExpression | ||
| NOT? kind=IN LP valueExpression (COMMA valueExpression)* RP | ||
| NOT? kind=IN LP query RP | ||
: NOT? kind=IN LP valueExpression (COMMA valueExpression)* RP | ||
; | ||
|
||
valueExpression | ||
|
@@ -98,18 +103,19 @@ primaryExpression | |
: constant #constantDefault | ||
| functionExpression #function | ||
| qualifiedName #dereference | ||
| ESCAPED_IDENTIFIER #identifierEscape | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not declare both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, function names and pipe names should be the only exceptions to that. # ticks around the full path. dots interpreted as normal
`some-escaped-field.one-more.escaped-!-field`
# ticks around each key
`some-escaped-field`.`one-more`.`escaped-!-field` There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the |
||
| LP expression RP #parenthesizedExpression | ||
; | ||
|
||
functionExpression | ||
: identifier LP (expression (COMMA expression)*)? RP | ||
: name=identifier LP (expression (COMMA expression)*)? RP | ||
; | ||
|
||
constant | ||
: NULL #nullLiteral | ||
| number #numericLiteral | ||
| booleanValue #booleanLiteral | ||
| STRING+ #stringLiteral | ||
| string #stringLiteral | ||
; | ||
|
||
comparisonOperator | ||
|
@@ -120,26 +126,16 @@ booleanValue | |
: TRUE | FALSE | ||
; | ||
|
||
qualifiedNames | ||
: qualifiedName (COMMA qualifiedName)* | ||
; | ||
|
||
qualifiedName | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why add the square brackets and integers? Not sure how that qualifies a name if there's no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This lets us type sequence by pid
[process where process.name == "winword.exe"]
[file where file.extension == "exe"]
| unique events[0].command_line There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow - if we don't support arrays how would that work with pipes? As for array, grammar-wise the way to go is through a dedicated rule:
This supports both literal expressions for the index and clearly differentiates between a name and the array syntax. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is more flexible than the existing EQL, which has its pros and cons. some_function(...)["some field"] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Forgot to respond to this - to avoid having a function as an argument use qualifiedName or identifier instead in the array rule: which would prevent the function rule from applying. |
||
: (identifier DOT)* identifier | ||
: identifier (DOT identifier | LB INTEGER_VALUE+ RB)* | ||
; | ||
|
||
identifier | ||
: quoteIdentifier | ||
| unquoteIdentifier | ||
: IDENTIFIER | ||
; | ||
|
||
quoteIdentifier | ||
: QUOTED_IDENTIFIER #quotedIdentifier | ||
; | ||
|
||
unquoteIdentifier | ||
: IDENTIFIER #unquotedIdentifier | ||
| DIGIT_IDENTIFIER #digitIdentifier | ||
timeUnit | ||
: number unit=IDENTIFIER? | ||
; | ||
|
||
number | ||
|
@@ -151,31 +147,26 @@ string | |
: STRING | ||
; | ||
|
||
AND: 'AND'; | ||
ANY: 'ANY'; | ||
ASC: 'ASC'; | ||
BETWEEN: 'BETWEEN'; | ||
BY: 'BY'; | ||
CHILD: 'CHILD'; | ||
DESCENDANT: 'DESCENDANT'; | ||
EVENT: 'EVENT'; | ||
FALSE: 'FALSE'; | ||
IN: 'IN'; | ||
JOIN: 'JOIN'; | ||
MAXSPAN: 'MAXSPAN'; | ||
NOT: 'NOT'; | ||
NULL: 'NULL'; | ||
OF: 'OF'; | ||
OR: 'OR'; | ||
SEQUENCE: 'SEQUENCE'; | ||
TRUE: 'TRUE'; | ||
UNTIL: 'UNTIL'; | ||
WHERE: 'WHERE'; | ||
WITH: 'WITH'; | ||
AND: 'and'; | ||
rw-access marked this conversation as resolved.
Show resolved
Hide resolved
|
||
BY: 'by'; | ||
rw-access marked this conversation as resolved.
Show resolved
Hide resolved
|
||
FALSE: 'false'; | ||
FORK: 'fork'; | ||
IN: 'in'; | ||
JOIN: 'join'; | ||
MAXSPAN: 'maxspan'; | ||
NOT: 'not'; | ||
NULL: 'null'; | ||
OF: 'of'; | ||
OR: 'or'; | ||
SEQUENCE: 'sequence'; | ||
TRUE: 'true'; | ||
UNTIL: 'until'; | ||
WHERE: 'where'; | ||
WITH: 'with'; | ||
|
||
// Operators | ||
EQ : '=' | '=='; | ||
NEQ : '<>' | '!='; | ||
NEQ : '!='; | ||
LT : '<'; | ||
LTE : '<='; | ||
GT : '>'; | ||
|
@@ -194,9 +185,16 @@ LP: '('; | |
RP: ')'; | ||
PIPE: '|'; | ||
|
||
|
||
ESCAPED_IDENTIFIER | ||
: '`' (~'`')* '`' | ||
; | ||
|
||
STRING | ||
: '\'' ( ~'\'')* '\'' | ||
| '"' ( ~'"' )* '"' | ||
: '\'' ('\\' [btnfr"'\\] | ~[\r\n'\\])* '\'' | ||
| '"' ('\\' [btnfr"'\\] | ~[\r\n"\\])* '"' | ||
| '?"' ('\\"' |~["\r\n])* '"' | ||
| '?\'' ('\\\'' |~['\r\n])* '\'' | ||
; | ||
|
||
INTEGER_VALUE | ||
|
@@ -210,31 +208,24 @@ DECIMAL_VALUE | |
| DOT DIGIT+ EXPONENT | ||
; | ||
|
||
// make @timestamp not require escaping, since @ has no other meaning | ||
IDENTIFIER | ||
: (LETTER | '_') (LETTER | DIGIT | '_' | '@' )* | ||
; | ||
|
||
DIGIT_IDENTIFIER | ||
: DIGIT (LETTER | DIGIT | '_' | '@')+ | ||
: (LETTER | '_' | '@') (LETTER | DIGIT | '_')* | ||
; | ||
|
||
QUOTED_IDENTIFIER | ||
rw-access marked this conversation as resolved.
Show resolved
Hide resolved
|
||
: '"' ( ~'"' | '""' )* '"' | ||
; | ||
|
||
fragment EXPONENT | ||
: 'E' [+-]? DIGIT+ | ||
: [Ee] [+-]? DIGIT+ | ||
rw-access marked this conversation as resolved.
Show resolved
Hide resolved
|
||
; | ||
|
||
fragment DIGIT | ||
: [0-9] | ||
; | ||
|
||
fragment LETTER | ||
: [A-Z] | ||
: [A-Za-z] | ||
; | ||
|
||
SIMPLE_COMMENT | ||
LINE_COMMENT | ||
: '//' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN) | ||
; | ||
|
||
|
@@ -246,9 +237,12 @@ WS | |
: [ \r\n\t]+ -> channel(HIDDEN) | ||
; | ||
|
||
|
||
// Catch-all for anything we can't recognize. | ||
// We use this to be able to ignore and recover all the text | ||
// when splitting statements with DelimiterLexer | ||
/* | ||
UNRECOGNIZED | ||
: . | ||
; | ||
; | ||
*/ |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expression
is quite loose vs identifier as it considersfoo +1
,1/2
etc as valid grammar and it's not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional, and does mirror the existing grammar.
It'll likely have performant implications on the stack, but existing implementations of EQL do support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Languages are defined by their syntax - how parsing is done, whether it's grammar that leads to generation, done by hand, etc.. it's an implementation detail.
Considering we're looking at reimplementing the parsing in a different language, with a different type of parser (LL vs PEG until v0.7, LALR now in 0.8) I don't think fully mirroring the grammar is attainable nor should be a goal.
Further more in this case, the grammar is incorrect so why keep that ?
Between correctness and compatibility, we should always chose the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree. Even more precisely, I would say that languages are defined by the inputs they accept. I'm not advocating for mirroring a PEG grammar in ANTLR, but I do think that we should strive the same inputs and provide the same semantics as before. We can make exceptions as needed, and fix things along the way, but they should be intentional and we should be aware.
I think this is where we disagree. The previous grammar and implementations actually used this functionality. It wasn't too loose on accident. There were a few cases where the underlying schema wasn't enough, for
by
to match the underlying events together and we have had to use something akin to this, for example:I think that if we ever have to choose between correctness and compatibility, choosing correctness makes sense. But in this case, the design was intentional so I don't think these are in conflict. If we need to restrict this because it's not scalable within ES, we can do that. But that doesn't mean the previous version was incorrect, just because there weren't many examples that exercised arbitrary expressions for a join key.